I thought of an alternative to Rack and WSGI (the protocol spec, not the libraries (rack.rb and wsgiref.py)). Please note that it may not be organized because I just wrote down my ideas.
I think this article will be revised several times in the future. Feel free to comment if you have any comments.
Ruby's Rack and Python's WSGI are abstract specifications for HTTP requests and responses.
For example in Rack:
class RackApp
def call(env) #env is a Hash object that represents the request
status = 200 #Status code
headers = {"Content-Type"=>"text/plain"} #header
body = "Hello" #body
return status, headers, [body] #These three represent the response
end
end
Ruby's Rack and Python's WSGI are specifications that abstract HTTP requests and responses in this way.
This allows web applications to be used with any application server (WEBrick, Unicorn, Puma, UWSGI, waitress) that supports Rack or WSGI. For example, you can easily switch between using WEBrick and waitress, which are easy to use during development, and using the fast Unicorn, Puma, and UWSGI in a production environment.
Rack and WSGI are also designed to make it easy to add functionality by using the so-called decorator pattern. For example
You can do that without changing your web application.
##Original Rack application
app = RackApp()
##For example, add session functionality
require 'rack/sesison/cookie'
app = Rack::Session::Cookie.new(app,
:key => 'rack.session', :path=>'/',
:expire_after => 3600,
:secret => '54vYjDUSB0z7NO0ck8ZeylJN0rAX3C')
##For example, show detailed errors only in development environment
if ENV['RACK_ENV'] == "development"
require 'rack/showexceptions'
app = Rack::ShowExceptions(app)
end
Wrapper objects that add functionality to the original web application in this way are called "Middleware" in Rack and WSGI. In the above example, Rack :: Session :: Cookie
and Rack :: ShowException
are middleware.
WSGI is the original specification for Rack. Rack wouldn't have been born without WSGI.
When WSGI first appeared, there was a similar Java Servlet. However, the Servlet specification was quite complicated and difficult to implement [^ 1]. Also, due to the complicated specifications, the behavior may differ slightly for each application server, so in the end, everyone was in a state of checking the specifications by running Tomcat, which is the reference implementation, without looking at the specifications.
That's why WSGI came out as a very simple thing with completely different specifications, although I sympathize with the idea of Servlet.
[^ 1]: Java and IBM are good at making things unnecessarily complicated.
Let's look at the specific code. Below is the WSGI sample code.
class WSGIApp(object):
##environ is a hash representing the request(dictionary)object
def __call__(self, environ, start_response):
status = "200 OK" #Strings, not numbers
headers = [ #List of keys and values, not hashes
('Content-Type', 'text/plain'),
]
start_response(status, headers) #Start a response
return [b"Hello World"] #Return the body
If you look at this, you can see that it is quite different from Rack.
200
) in Rack, but a string (ex:"200 OK"
) in WSGI.
This is different if you use your own status code.
For example, if you want to use your own status code "509 Bandwidth Limit Exceeded", there is no problem with WSGI, but in Rack you can easily specify "509" but you can specify "Bandwidth Limit Exceeded" ( (By specification) Not available.str
(that is, binary for Python2, Unicode string for Python3). However, the response body is always (a list of) binaries.Now, in my opinion, the biggest problem with WSGI is probably the existence of a callback function called start_response ()
. Because of this, beginners must first understand "functions that receive functions (higher-order functions)" in order to understand WSGI, which is a high threshold [^ 2].
[^ 2]: Advanced people who say "higher-order functions are easy to understand" are fundamentally lacking in the ability to understand where beginners stumble, so they are functional without beginners. Please return to the world of languages. Not a great player or a great manager. A person who is versatile in sports is not suitable for teaching exercise onchi.
Calling a WSGI application is also wasteful because of start_response ()
. This is really troublesome.
##If you don't prepare something like this one by one
class StartResponse(object):
def __call__(self, status, headers):
self.status = status
self.headers = headers
##Unable to call WSGI application
app = WSGIApplication()
environ = {'REQUEST_METHOD': 'GET', ...(snip)... }
start_response = StartResponse()
body = app.__call__(environ, start_response)
print(start_response.status)
print(start_response.headers)
(Actually, for WSGI (PEP-333), a specification called Web3 (PEP-444) that improved this point was proposed in the past. In this Web3, the callback function is abolished and it is similar to Rack. It was designed to return status, headers, body
to. I personally expected it, but it was not adopted in the end. I'm sorry.)
WSGI also has a bit of annoyance that the response header is a list of keys and values instead of a hash (dictionary) object. That's because you have to search the list every time you set the header.
##For example, if you have a response header like this
resp_headers = [
('Content-Type', "text/html"),
('Content-Disposition', "attachment;filename=index.html"),
('Content-Encoding', "gzip"),
]
##You have to search the list one by one to set the value
key = 'Content-Length'
val = str(len(content))
for i, (k, v) in enumerate(resp_headers):
if k == key: # or k.tolower() == key.tolower()
break
else:
i = -1
if i >= 0: #Overwrite if there is
resp_headers[i] = (key, val)
else: #If not, add
resp_headers.append((key, val))
This is a hassle. It would be nice to define a dedicated utility function, but it was better to use a hash (dictionary) object anyway.
##Hash object(Dictionary object)Then ...
resp_headers = {
'Content-Type': "text/html",
'Content-Disposition': "attachment;filename=index.html",
'Content-Encoding': "gzip",
]
##Very easy to set the value!
## (However, it is assumed that the case of the key name is unified.)
resp_headers['Content-Length'] = str(len(content))
Rack (Ruby) is a specification determined with reference to WSGI (Python). Rack is very similar to WSGI, but has been improved to be simpler.
class RackApp
def call(env) #env is a hash object that represents the request
status = 200
headers = {
'Content-Type' => 'text/plain;charset=utf-8',
}
body = "Hello World"
return status, headers, [body] #These three represent the response
end
end
The specific differences are as follows.
'Content-Type'
or 'content-type'
, it can be a problem, so keep it unified. is needed).Now, in Rack, the response header is represented by a hash object. In this case, what about headers that can appear multiple times, such as Set-Cookie
?
In Rack Specifications, there is the following description.
The values of the header must be Strings, consisting of lines (for multiple header values, e.g. multiple Set-Cookie values) separated by "\n".
In other words, if the value of the header is a multi-line string, it is considered that the header has appeared multiple times.
But what about this specification? That's because we need to find out if every response header contains a newline character. This will reduce performance.
headers.each do |k, v|
v.split(/\n/).each do |s| #← Double loop;-(
puts "#{k}: #{s}"
end
end
Rather than this, the specification that "headers that appear multiple times make the values an array" seems to be better.
headers.each do |k, v|
if v.is_a?(Array) #← This is better
v.each {|s| puts "#{k}: #{s}" }
else
puts "#{k}: #{v}"
end
end
Alternatively, you can treat only the Set-Cookie header specially. The only header that can appear multiple times is Set-Cookie [^ 3], so this specification is not bad either.
set_cookie = "Set-Cookie"
headers.each do |k, v|
if k == set_cookie # ← Set-Special treatment only for cookies
v.split(/\n/).each {|s| puts "#{k}: #{s}" }
else
puts "#{k}: #{v}"
end
end
[^ 3]: I think there was another Via header, but it's not covered in the Rack or WSGI category, so you should only consider Set-Cooki.
Another point is about the close ()
method of the response body.
The Rack and WSGI specifications specify that if the response body object has a method called close ()
, the application server will call close ()
when the response to the client is complete. This is a specification mainly assuming that the response body is a File object.
def call(env)
filename = "logo.png "
headers = {'Content-Type' => "image/png",
'Content-Length' => File.size(filename).to_s}
##Open the file
body = File.open(filename, 'rb')
##The opened file is sent by the application server when the response is completed.
##Automatically close()Is called
return [200, headers, body]
end
But I think this is just a matter of closing the file at the end of the ʻeach ()` method.
class AutoClose
def initialize(file)
@file = file
end
def each
##This is not efficient because it is read line by line
#@file.each |line|
# yield line
#end
##It is more efficient to read in a larger size
while (s = @file.read(8192))
yield s
end
ensure #If you read all the files or if there is an error
@file.close() #Automatically close
end
end
This specification to call if there is a close ()
method is necessary in the case where the ʻeach () method of the response body is never called. Personally, I think I should have considered a cleanup specification like
teardown ()` in xUnit rather than a "only thinking about File objects" specification (although). I don't have a good idea either).
In both Rack and WSGI, HTTP requests are represented as hash (dictionary) objects. This is called the Environment in the Rack and WSGI specifications.
Let's see what this looks like.
## Filename: sample1.ru
require 'rack'
class SampleApp
## Inspect Environment data
def call(env)
status = 200
headers = {'Content-Type' => "text/plain;charset=utf-8"}
body = env.map {|k, v| "%-25s: %s\n" % [k.inspect, v.inspect] }.join()
return status, headers, [body]
end
end
app = SampleApp.new
run app
When I ran this with rackup sample1.ru -E production -s puma -p 9292
and accessed http: // localhost: 9292 / index? X = 1 in a browser, I got the following result, for example. This is the contents of the Environment.
"rack.version" : [1, 3]
"rack.errors" : #<IO:<STDERR>>
"rack.multithread" : true
"rack.multiprocess" : false
"rack.run_once" : false
"SCRIPT_NAME" : ""
"QUERY_STRING" : "x=1"
"SERVER_PROTOCOL" : "HTTP/1.1"
"SERVER_SOFTWARE" : "2.15.3"
"GATEWAY_INTERFACE" : "CGI/1.2"
"REQUEST_METHOD" : "GET"
"REQUEST_PATH" : "/index"
"REQUEST_URI" : "/index?x=1"
"HTTP_VERSION" : "HTTP/1.1"
"HTTP_HOST" : "localhost:9292"
"HTTP_CACHE_CONTROL" : "max-age=0"
"HTTP_COOKIE" : "_ga=GA1.1.1305719166.1445760613"
"HTTP_CONNECTION" : "keep-alive"
"HTTP_ACCEPT" : "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
"HTTP_USER_AGENT" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9"
"HTTP_ACCEPT_LANGUAGE" : "ja-jp"
"HTTP_ACCEPT_ENCODING" : "gzip, deflate"
"HTTP_DNT" : "1"
"SERVER_NAME" : "localhost"
"SERVER_PORT" : "9292"
"PATH_INFO" : "/index"
"REMOTE_ADDR" : "::1"
"puma.socket" : #<TCPSocket:fd 14>
"rack.hijack?" : true
"rack.hijack" : #<Puma::Client:0x3fd60649ac48 @ready=true>
"rack.input" : #<Puma::NullIO:0x007fac0c896060>
"rack.url_scheme" : "http"
"rack.after_reply" : []
(rack.hijack is a new feature introduced in Rack 1.5. For more information, please see here.)
This environment contains three types of data.
The Environment is a collection of these items. Personally, I don't like this kind of specification, and I would like you to at least separate the request header from the rest.
The reason for this specification is that it is based on the CGI specification. I don't think young people today know about CGI, but that's why it was used very often in the past. WSGI borrowed this CGI specification to determine the Environment specification, and Rack inherits it. Therefore, it may look strange to someone who does not know CGI. Someone might say, "Why is the User-Agent header changed to HTTP_USER_AGENT? You can just use the User-Agent string."
As we have seen, an Environment object is a hash object that contains dozens of elements.
From a performance standpoint, creating a hash object with dozens of elements is undesirable in Ruby and Python, as it is quite expensive to operate. For example, with Keight.rb, a framework 100 times faster than Ruby on Rails, ** it may take longer to create an Environment object than it takes to process a request **.
Let's actually check it with a benchmark script.
# -*- coding: utf-8 -*-
require 'rack'
require 'keight'
require 'benchmark/ips'
##Action class(Controller in MVC)Create
class API < K8::Action
mapping '/hello', :GET=>:say_hello
def say_hello()
return "<h1>Hello, World!</h1>"
end
end
##Create a Rack application and assign an action class
mapping = [
['/api', API],
]
rack_app = K8::RackApplication.new(mapping)
##Execution example
expected = [
200,
{"Content-Length"=>"22", "Content-Type"=>"text/html; charset=utf-8"},
["<h1>Hello, World!</h1>"]
]
actual = rack_app.call(Rack::MockRequest.env_for("/api/hello"))
actual == expected or raise "assertion failed"
## GET /api/Environment object that represents hello
env = Rack::MockRequest.env_for("/api/hello")
##benchmark
Benchmark.ips do |x|
x.config(:time => 5, :warmup => 1)
##Create a new Environment object(make a copy)
x.report("just copy env") do |n|
i = 0
while (i += 1) <= n
env.dup()
end
end
##Create an Environment object to handle the request
x.report("Keight (copy env)") do |n|
i = 0
while (i += 1) <= n
actual = rack_app.call(env.dup)
end
actual == expected or raise "assertion failed"
end
##Reuse Environment objects to handle requests
x.report("Keight (reuse env)") do |n|
i = 0
while (i += 1) <= n
actual = rack_app.call(env)
end
actual == expected or raise "assertion failed"
end
x.compare!
end
When I ran this, I got the following results, for example (Ruby 2.3, Keight.rb 0.2, OSX El Capitan):
Calculating -------------------------------------
just copy env 12.910k i/100ms
Keight (copy env) 5.523k i/100ms
Keight (reuse env) 12.390k i/100ms
-------------------------------------------------
just copy env 147.818k (± 8.0%) i/s - 735.870k
Keight (copy env) 76.103k (± 4.4%) i/s - 381.087k
Keight (reuse env) 183.065k (± 4.8%) i/s - 916.860k
Comparison:
Keight (reuse env): 183064.5 i/s
just copy env: 147818.2 i/s - 1.24x slower
Keight (copy env): 76102.8 i/s - 2.41x slower
From the last three lines we can see that:
In this situation, further speeding up the framework will not make the application much faster. To overcome this deadlock, it seems good to improve the Rack specification itself.
(TODO)
Well, finally get into the main subject.
To solve the problems I've described so far, I've considered an alternative to the current Rack and WSGI. So-called, "My thoughts on Saikyo no Raku".
The new specification remains an abstraction of HTTP requests and responses. So I'll focus on how to abstract these two.
Also, the current Rack and WSGI partially inherit the CGI specification. However, CGI is an old-fashioned specification that assumes that data is passed via environment variables. It's not suitable for this era, so you can forget about the CGI spec.
HTTP requests are divided into the following elements:
The request method can be an uppercase string or Symbol. Symbol seems to be better in terms of performance.
meth = :GET
The request path can be a string. Rack needs to consider SCRIPT_NAME as well as PATH_INFO, but now that no one will use SCRIPT_NAME, we'll just consider the PATH_INFO equivalent.
path = "/index.html"
The request header can be a hash object. Also, I don't want to convert like User-Agent → HTTP_USER_AGENT, but HTTP / 2 seems to have lowercase header names, so I'll probably match it.
headers = {
"host" => "www.example.com",
"user-agent" => "Mozilla/5.0 ....(snip)....",
....(snip)....,
}
The query parameter is either nil
or a string. If there is no ?
, It becomes nil
, and if there is, it becomes a string (it may be an empty string).
query = "x=1"
I / O related (rack.input and rack.errors and rack.hijack or puma.socket) should be in one array. These are just the equivalents of stdin, stderr and stdout ... aren't they? Perhaps socket doubles as rack.input, but I'm not familiar with it, so I'll separate it here.
ios = [
StringIO.new(), # rack.input
$stderr, # rack.errors
puma_socket,
]
The value of other request information changes for each request. This should be a hash object.
options = {
http: "1.1", # HTTP_VERSION
client: "::1", # REMOTE_ADDR
protocol: "http", # rack.url_scheme
}
The last server information should not change unless the application server has changed. So once you create it as a hash object, you can reuse it.
server = {
name: "localhost".freeze, # SERVER_NAME
port: "9292".freeze, # SERVER_PORT
'rack.version': [1, 3].freeze,
'rack.multithread': true,
'rack.multiprocess': false,
'rack.run_once': false,
}.freeze
Consider a Rack application that receives these.
class RackApp
def call(meth, path, headers, query, ios, options, server)
input, errors, socket = ios
...
end
end
Wow, it has 7 arguments. This is a little cool, isn't it? The first three (meth, path and headers) are the core of the request, so leaving them alone as arguments, query and ios are likely to be grouped into options.
options = {
query: "x=1", # QUERY_STRING
#
input: StringIO.new, # rack.input,
error: $stderr, # rack.erros,
socket: puma_socket, # rack.hijack or puma.socket
#
http: "1.1", # HTTP_VERSION
client: "::1", # REMOTE_ADDR
protocol: "http", # rack.url_scheme
}
This will reduce the number of arguments from seven to five.
class RackApp
def call(meth, path, headers, options, server)
query = options[:query]
input = options[:input]
error = options[:error]
socket = options[:socket] # or :output ?
...
end
end
Well, I think it's okay to use this.
The HTTP response can still be represented by the status, header, and body.
def call(meth, path, headers, options, server)
status = 200
headers = {"content-type"=>"application/json"},
body = '{"message":"Hello!"}'
return status, headers, body
end
However, I think the Content-Type header can be treated specially. Because in current Rack applications, only Content-Type headers, such as {"Content-Type "=>" text / html} "
and {" Content-Type "=>" application / json "}
This is because there are many cases where it is not included. Therefore, if you treat only Content-Type specially and make it independent, it will be a little simpler.
def call(meth, path, headers, options, server)
##Than this
return 200, {"Content-Type"=>"text/plain"}, ["Hello"]
##This is more concise
return 200, "text/plain", {}, ["Hello"]
end
There are some other issues as well.
However, since most responses return a string as a body, it is useless to wrap it in an array one by one. If possible, the body should be a "string, or an object that returns a string with ʻeach ()`". </ dd>
But what is really desirable is to have a feature equivalent to teardown ()
. It's a pity that I can't think of any specific specifications [^ 4]. </ dd>
[^ 4]: I thought that rack.after_reply
was that, but it seems to be a unique function of Puma.
(TODO)
(TODO)
(TODO)
We would like to hear the opinions of experts.
Just on the Rack mailing list I got a question about HTTP2 support. There was a little talk about Rack2 related to it, so I went through various things.
Recommended Posts