Asynchronous Servers in Python
Nicholas Piël | December 22, 2009There has already been written a lot on the C10K problem and it is known that the only viable option to handle LOTS of concurrent connections is to handle them asynchronously. This also shows that for massively concurrent problems, such as lots of parallel comet connections, the GIL in Python is a non-issue as we handle the concurrent connections in a single thread.
In this post i am going to look at a selection of asynchronous servers implemented in Python.
Asynchronous Server Specs
Since Python is really rich with (asynchronous) frameworks, I collected a few and looked at the following features:
- What License does the framework have?
- Does it provide documentation?
- Does the documentation contain examples?
- Is it used in production somewhere?
- Does it have some sort of community (mailinglist, irc, etc..)?
- Is there any recent activity?
- Does it have a blog (from the owner)?
- Does it have a twitter account?
- Where can i find the repository?
- Does it have a Thread Pool?
- Does it provide access to a TCP Socket?
- Does it have any Comet features?
- Is it using EPOLL?
- What kind of server is it? (greenlets, callbacks, generators etc..)
This gave me the following table.
| Name | Lic. | Doc | Ex. | Prod. | Com. | Act. | Blog | Twt | Rep. | Pool | Wsgi | Scket | Cmet | Epoll | Test | Style |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Twisted | MIT | Yes | Yes | Yes | Huge | Yes | Lots | No | Trac | Yes | Yes | Yes | No | Yes | Yes | Callback |
| Tornado | Apache | Yes | Yes | F.Feed | Yes | Yes | FB | Yes | GHub | No | Lim. | Yes | No | Yes | No | Async |
| Orbited | MIT | Yes | Yes | Yes | Yes | Yes | Yes | No | Trac | No | No | Yes | Yes | Yes | Yes | Callback |
| DieselWeb | BSD | Yes | Yes | STalk | Yes | Yes | Yes | Yes | BitB. | No | Lim. | Yes | Yes | Yes | No | Generator |
| MultiTask | MIT | Some | No | No | No | No | Yes | No | Bzr | No | No | No | No | No | No | Generator |
| Chiral | GPL2 | API | No | No | IRC | No | No | No | Trac | No | Yes | Yes | Yes | Yes | Yes | Coroutine |
| Eventlet | MIT | Yes | Yes | S. Life | Yes | Yes | Yes | No | BitB. | Yes | Yes | Yes | No | Yes | Yes | Greenlet |
| FriendlyFlow | GPL2 | Some | One | No | No | No | No | Yes | Ggle | No | No | Yes | No | No | Yes | Generator |
| Weightless | GPL2 | Yes | No | Yes | No | No | No | Yes | SF | No | No | Yes | No | No | Yes | Generator |
| Fibra | MIT | No | No | No | No | No | Yes | No | Ggle | No | No | Yes | No | No | No | Generator |
| Concurrence | MIT | Yes | Yes | hyves | Yes | Yes | No | No | GHub | No | Yes | Yes | No | Yes | Yes | Tasklet |
| Circuits | MIT | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Trac | No | Yes | Yes | No | No | Yes | Async |
| Gevent | MIT | Yes | Yes | Yes | Yes | Yes | Yes | Yes | BitB. | No | Yes | Yes | No | Yes | Yes | Greenlet |
| Cogen | MIT | Yes | Yes | Yes | No | Yes | Yes | Yes | Ggle | No | Yes | Yes | No | Yes | Yes | Generator |
This is quite a list and i probably still missed a few. The main reasons for using a framework and not implementing something your self is that you hope to be able to accelerate your own development process by standing on the shoulders of other developers. I think it therefore is important that there is documentation, some sort of developers community (mailinglist fe) and that it is still active. If we take this as a requirement we are left with the following solutions:
- Orbited / Twisted (callbacks)
- Tornado (async)
- Dieselweb (generator)
- Eventlet (greenlet)
- Concurrence (stackless)
- Circuits (async)
- Gevent (greenlet)
- Cogen (generator)
To quickly summarize this list; Twisted has been the de-facto standard to async programming with Python. It has an immense community, a wealth of tools, protocols and features. It has grown big and some say it reminds them of shirtless men drinking Jager-bombs complex. This is also one of the biggest reasons why people are looking elsewhere. Recently Facebook released the code of their async. approach called Tornado which is also using callbacks and recent benchmark show that it outperforms Twisted.
A common heard argument against programming with callbacks is that it can get overly complex. A programmatically cleaner approach is to use light-weight threads (imho). This can be achieved by using a different Python implementation; Stackless (such as Concurrence is using) or a plugin for regular python Greenlet (such as Eventlet and Gevent are using). Another approach is to simulate these light-weight threads with Python generators, such as Dieselweb and Cogen are doing.
This should already show that while all these frameworks provide you asynchronous concurrency they do this in each of their own ways. I want to invite you to look at these frameworks as they all have their own code gems. For example, Concurrence has a non-blocking interface to MySQL. Eventlet has a neat thread-pool, Tornado can pre-fork over CPU’s, Gevent offloads HTTP header parsing and DNS lookups to Libevent, Cogen has sendfile support and Twisted probably already has a factory doing exactly what you are planning to do next.
The Ping Pong Benchmark
In this benchmark i am going to focus on the performance of the framework to listen on a socket and write to incoming connections. The client pings the socket by opening it, the server responds with a ‘Pong!’ and closes the socket. This should be really simple but it is a pain to create something that does this in an asynchronous and non-blocking way from scratch and that is exactly the reason why we are looking at these frameworks. It is all about making our lives easier.
Ok, for this benchmark i am going to use httperf, a high performance tool that understands the HTTP protocol. If we want httperf to play along in our Ping-Pong benchmark we have to make it understand the ‘PONG!’ response. We can do this by mimicking a HTTP server and have our server respond with:
HTTP/1.0 200 OK
Content-Length: 5Pong!
instead of just ‘Pong!’. Also, since most default server configurations are not set up to handle a large amount of concurrent requests, we need to make a few adjustments:
- Raise the per-process file limit by compiling httperf after some adjustments.
- Raise the per-user file limit, set ‘ulimit -n 10000‘ on both server and client.
- Raise kernel limit on file handles: ‘echo “128000″ > /proc/sys/fs/file-max’.
- Increase the connection backlog, ‘sysctl -w net.core.netdev_max_backlog = 2500‘
- Raise the maximum connections with ’sysctl -w net.core.somaxconn = 250000‘
With these settings my Debian Lenny system was ready to hammer the different servers up to rates far beyond the capacity of the frameworks. I used the following command
httperf –hog –timeout=60 –client=0/1 –server=localhost –port=10000 –uri=/ –rate=400 –send-buffer=4096 –recv-buffer=16384 –num-conns=40000 –num-calls=1
And increased the rate with an interval of 100 from 400 up to 9000 requests per second for a total of 40.000 requests at each interval.
Code
What will now follow, is the implementation of the server side in the different frameworks. It should show the different approaches the frameworks take.
Twisted
Gentlemen start your reactor!
from twisted.internet import epollreactor epollreactor.install()
from twisted.internet.protocol import Protocol, Factory
from twisted.internet import reactor
class Pong(Protocol):
def connectionMade(self):
self.transport.write("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n")
self.transport.loseConnection()
# Start the reactor
factory = Factory()
factory.protocol = Pong
reactor.listenTCP(8000, factory)
reactor.run()
Tornado
Tornado, does not hide the raw socket interface, which makes this example more lengthy then the others.
import errno
import functools
import socket
from tornado import ioloop, iostream
def connection_ready(sock, fd, events):
while True:
try:
connection, address = sock.accept()
except socket.error, e:
if e[0] not in (errno.EWOULDBLOCK, errno.EAGAIN):
raise
return
connection.setblocking(0)
stream = iostream.IOStream(connection)
stream.write("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n", stream.close)
if __name__ == '__main__':
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.setblocking(0)
sock.bind(("", 8010))
sock.listen(5000)
io_loop = ioloop.IOLoop.instance()
callback = functools.partial(connection_ready, sock)
io_loop.add_handler(sock.fileno(), callback, io_loop.READ)
try:
io_loop.start()
except KeyboardInterrupt:
io_loop.stop()
print "exited cleanly"
Dieselweb
While this example is beautifully small, i do not really enjoy the generator approach which sprinkles ‘yield’ all over the place.
from diesel import Application, Service
def server_pong(addr):
yield "HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n"
app = Application()
app.add_service(Service(server_pong, 8020))
app.run()
Circuits
I think the Circuit code is the most beautiful of them all, very elegent.
from circuits.net.sockets import TCPServer
class PongServer(TCPServer):
def connect(self, sock, host, port):
self.write(sock, 'HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n')
self.close(sock)
PongServer(('localhost', 8050)).run()
Eventlet
The Eventlet uses a Greenlet approach.
from eventlet import api
def handle_socket(sock):
sock.makefile('w').write("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n")
sock.close()
server = api.tcp_listener(('localhost', 8030))
while True:
try:
new_sock, address = server.accept()
except KeyboardInterrupt:
break
# handle every new connection with a new coroutine
api.spawn(handle_socket, new_sock)
Gevent
Gevent is presented as a rewrite of eventlet focussing on performance.
import gevent
from gevent import socket
def handle_socket(sock):
sock.sendall("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n")
sock.close()
server = socket.socket()
server.bind(('localhost', 8070))
server.listen(500)
while True:
try:
new_sock, address = server.accept()
except KeyboardInterrupt:
break
# handle every new connection with a new coroutine
gevent.spawn(handle_socket, new_sock)
Concurrence
Concurrence uses the Tasklet approach, it can be run under Greenlet and under Stackless Python. In this benchmark there was not really any performance difference between the two different engines.
from concurrence import dispatch, Tasklet
from concurrence.io import BufferedStream, Socket
def handler(client_socket):
stream = BufferedStream(client_socket)
writer = stream.writer
writer.write_bytes("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n")
writer.flush()
stream.close()
def server():
server_socket = Socket.new()
server_socket.bind(('localhost', 8040))
server_socket.listen()
while True:
client_socket = server_socket.accept()
Tasklet.new(handler)(client_socket)
if __name__ == '__main__':
dispatch(server)
Cogen
Cogen, uses the generator approach as well.
import sys
from cogen.core import sockets
from cogen.core import schedulers
from cogen.core.coroutines import coroutine
@coroutine
def server():
srv = sockets.Socket()
adr = ('0.0.0.0', len(sys.argv)>1 and int(sys.argv[1]) or 1200)
srv.bind(adr)
srv.listen(500)
while 1:
conn, addr = yield srv.accept()
fh = conn.makefile()
yield fh.write("HTTP/1.0 200 OK\r\nContent-Length: 12\r\n\r\nHello World!\r\n")
yield fh.flush()
conn.close()
m = schedulers.Scheduler()
m.add(server)
m.run()
Results
The first graph clearly shows at which connection rate (on the horizontal axis) the successful connection rate starts to degrade. It shows a huge difference between the best performer; Tornado with 7400 requests per second and the worst, Circuits with 1400 requests per second (which doesn’t use EPOLL). This connection rate was sustained for at least 40.000 requests. We can see that, when the hammering of the server continues beyond rates the server can handle, the performance drops. This is caused by connection errors or timeouts.
This graph shows the response time, it is clearly visible that once the maximum connection rate has been reached the overal response time starts to increase.
The last graph shows the amount of errors, ie no return of a 200 detected by httperf. We can see a correlation between the performance of the server and the returned errors at a given request rate. The performing servers return less overall errors. There is however, one exception. Cogen was able to return ALL its requests successfully no matter how hard it was hammered. It is therefore not visible in this graph. This is interesting, at 9000 requests per second it was still able to answer all requests. However, the average connection time (from socket open till socket close) was about 7 seconds meaning that Cogen was serving about 28000 concurrent connections somewhat at reduced performance but not dropping them.
Notes
This post should make it clear that Python has a rich set of options toward asynchronous programming. All tested frameworks show great performance. I mean, even Circuits results with 1300 requests per second isn’t too bad. Tornado really blew me away with its performance at 7400 requests per second. But if i had to choose a favorite i would probably go with Gevent, i am really digging its greenlet style.
The clean Greentlet / Stackless style is really cool, especially since Stackless Python is keeping up nowadays with CPython. There was some talk on a mailing list about Gevent running on Stackless. The concurrence framework already runs on Stackless and can thus be a great option already if you are looking for specific features of Stackless Python such as tasklet-pickling.
I want to make clear that this test only shows how these frameworks perform at a relatively simple task. It could be that when more stuff is going on in the background the results will change. However, I feel that this benchmark is a great indicator of how each frameworks handles a socket connection.
In the coming days I plan to investigate this some more. I will also check out how these Python frameworks stack up against its equivalents in different languages, fe Ape, CometD, NodeJS. Stay tuned!

Subscribe to the RSS feed
Receive updates by Email
Social comments and analytics for this post…
This post was mentioned on Twitter by Nichol4s: New post: Asynchronous Servers in Python http://bit.ly/8UHKhK #in…
nice job on this… Circuits implementation looks really clean, but look like you pay for the simplicity in performance.
Really nice benchmarks! Thanks for you work. It is very valuable.
Nice comparison!
I however found it difficult to read the graphs and tell which line corresponds to which framework. The indicators in the legend are too small.
When you hover with your mouse over the lines or the legend the javascript gods will tell you what is what.
Cheers!
Congrats, great benchmark!,
Some remarks (me being the author of Concurrence
…
You list the Concurrence license as ‘Hyves’, but I think it should be MIT (at least that is the intention…, why did you think otherwise?).
Also in your matrix you did not put an entry for ‘automated tests’. I think that should also count when considering frameworks, e.g. how mature is their developement process. For instance Twisted has a very large and comprehensive test-suite, and this is also something I strive for with Concurrence.
It would also be nice if you mentioned memory usage for the various frameworks when having many connections at the same time. The stable version of Concurrence you tested for instance has a problem with using more memory than needed (current trunk version has that fixed).
Hi Henk,
Thanks for the remarks and opening up concurrence! I was not able to find any known license on the concurrence website or the repo. I only found this License on Github. As I am not that versed in the license-business I just named it Hyves to be sure. But since you say it is supposed to be MIT, i will update the matrix. Btw, this matrix does have a column ‘test’. But I agree that this doesn’t really say much.
Benchmarking a lot of different frameworks is hard and i cut some corners here and there. Monitoring memory usage is one of them, maybe i’ll do this in the upcoming cross-language benchmark.
Henk, I’m checking out Concurrence now. I like that the buffer management per client is clearly coded. I read the commit history and saw “buffer sharing” reduces memory. What does that really mean though?
Thank you, great post but… where is Kamaelia ?
http://www.kamaelia.org
Best Regards
Damn,
Yes that is one I can seriously not leave out. Will add that.
Sorry about that.
[...] Nicholas Piël » Socket Benchmark of Asynchronous Servers in Pythonnichol.as [...]
event driven != asynch*
nice graphs.
*posix
=== popurls.com === popular today…
yeah! this story has entered the popular today section on popurls.com…
Event driven programming with greenlet is great. I’ve written about how to build a webserver using python’s BaseHTTPServer and coroutines at my blog.
http://erik.gorset.no/2009/12/building-comet-enabled-http-server-in.html
thanks, great work!
Was surprised to see Tornado do so well.
What about the good old asyncore based medusa?
http://svn.zope.org/Zope/trunk/src/ZServer/medusa/
When i find some time i will add Kamaelia and Medusa to this benchmark.
Have to get some xmas presents first
[...] Nicholas Piël » Socket Benchmark of Asynchronous Servers in Python (tags: python concurrency benchmark server network webdev library blog) [...]
Absolutely excellent post. It’s extremely gratifying to see so much interest in async/coroutine network servers. Since I started using Twisted in 2000 there has been little understanding or acceptance of this technique, so it’s extremely gratifying to see all the interest it has been getting lately.
I agree that benching medusa and kamaelia would be excellent.
One little nit, you did not specify the listen backlog parameter in either the gevent or eventlet cases. Specifying a large backlog might help reduce the error rate.
You are correct, i raised the backlog for all frameworks to 500 (well beyond what a normal linux distro allows; 128).
You didn’t post your server specs. How much RAM, how many CPUs (and what speed), and how much memory? Please also post the memory consumption (RSS/VSZ) of each server.
Also, it would have been useful to rate for each module the quality of documentation (Twisted’s, for example, exists but is abysmal), whether it supports generic TCP connections easily, and whether it supports SSL and peer certificate authentication.
The benchmark was run on a pretty old MacBook Pro 2.2Ghz core duo with 4 gigs of ram running Debian Lenny.
I will look at the memory consumption in the upcoming cross-language benchmark.
Concerning your other ‘requests’, a qualitative comparison of documentation is really hard, I have been focussing mainly on the framework acting as some sort of Comet / Websocket server and in that situation I don’t think SSL support is an issue as i imagine that in a production setting these daemons would be sitting behind a proxy anyway.
The reason I asked about generic TCP socket handling is that the title of your post is “Socket Benchmark of Asychronous Servers in Python,” not “Socket Benchmark of Asychronous HTTP Servers in Python.” There is a significant difference between the two.
Also, SMP scalability is very important, which I why I asked about your server specs. Knowledge of a rate on 1 CPU is interesting (assuming there is no latency in request processing), but not very interesting if it doesn’t scale somewhat linearly with the number of CPUs in the system.
SSL is important, for Comet or otherwise. Proxies obscure remote IP addresses and are best avoided if you care about DOS attack mitigation.
Hey there
Great post: I linked to it from my related question on stackoverflow (http://stackoverflow.com/questions/1824418/a-clean-lightweight-alternative-to-pythons-twisted) as I thought it might be useful for anyone landing there (seems to be a popular one from the number of votes).
One point to note is that some of these frameworks are not fully portable. I looked at a lot of them when trying to decide what to use as a replacement for tornado (not portable) and found this to be an issue. It might be worth adding an extra column to the table at the top.
Thanks for taking the time to do these tests though.
Jamie
Yes, well i am mainly interested in Linux (and thus Epoll) but i might add that to this or the upcoming cross-language benchmark.
In this case all frameworks that use Libevent (Gevent fe) should work on Windows.
Thanks for adding a link from stackoverflow.com
Making sure the backlog is the same everywhere would be nice. For example, you’re specifying 5000 in the Tornado code, whereas the Twisted default value is 50. Passing backlog=5000 to reactor.listenTCP would fix that.
Thomas, i already noticed that the backlog for twisted was 50 so i manually adjusted that.
Did not know i could pass the backlog argument to listenTCP, thanks.
Excellent!
Good to see Twisted drop down.
Sad to see eventlet showing more errors than gevent.
what about basic asyncore ? )
Great roundup! The Eventlet benchmark could benefit from a little tweaking. I used the following code and achieved significantly better performance.
from eventlet.green import socket from eventlet import api,hubs hubs.use_hub("pyevent") def handle_socket(sock): sock.sendall("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n") sock.close() server = socket.socket() server.bind(('localhost', 8030)) server.listen(500) while True: try: new_sock, address = server.accept() except KeyboardInterrupt: break # handle every new connection with a new coroutine api.spawn(handle_socket, new_sock)The major difference here is that makefile() and its consequent object creation cost are not called. This alone makes a big difference, and has the side benefit of being more readable and closer in spirit to the other tests.
Also, this code uses pyevent for event dispatching. Pyevent is disabled by default within Eventlet because it is not compatible with threads, but in this sort of test, it provides better performance.
Oh wow that is bad formatting. Sorry, I hope you get the idea. Thanks again for doing this great roundup.
I fixed the markup, you can enclose code between
tags, at least on this blog. However, most other Wordpress blogs support the [/code] tag, just FYI. I will try the sock.sendall approach + py events somewhere after xmas.
Thanks for your suggestion!
Looking forward to the update!
[...] Nicholas Piël » Socket Benchmark of Asynchronous Servers in Python (tags: python asynchronous benchmark concurrency server performance network twisted) [...]
I’d like to point out that gevent’s wsgi server is build on top of libevent-http module. As such it should be more efficient than most pure Python alternatives.
[...] Benchmark of Asynchronous Servers in Python [...]
Great post, thanks a lot! Also looking forward to any updates or followups.
By the way, Chiral is GPLv2:
http://chiral.j4cbo.com/trac/changeset?new=51%40%2F&old=50%40%2F
P.S. it would also be appreciated if anybody who follows up with a comparison across languages includes Erlang’s OTP as well.
Good post! Thanks
Actually, it appears as though Circuits does have epoll support. You only need to provide the right class from circuits.net.pollers as a kwarg to circuits.net.Server():
http://trac.softcircuit.com.au/circuits/browser/circuits/net/sockets.py#L394
Thanks for the info!
I feel this is largely misinforming to someone who doesn’t understand how the frameworks work as the approaches and server architectures wildly differ:
For example, tornado would fork in two processes and that’s the only reason it won the benchmark. It’s a nice feature but one could argue that this should be handled elsewhere (loadbalancer/clustering).
gevent is largely written in C (well, pyrex apparently) and that’s the reason it’s the second fastest (it would clearly win if tornado would run in single process mode).
Also, the server code approaches differ, some start coroutines/greenlets/whateverlets on a new connection and some don’t. This is a very important difference and the results can differ much depending on this aspect.
The error rates worry me – I think it’s a good exercise to find out why they happen.
I would add portability and test code coverage to the comparison table.
The benchmarked tornado does NOT prefork and thus uses only one process. This functionality has only been added very recently to the trunk.
I don’t really think it is an issue how a certain framework has been implemented. The end user only cares about the ease of use and its performance. Not wether the heavy lifting is being done by an external library (such as libevent) or an optimized inner loop.
I agree with you that there is some inconsistency with the functionality of the different implementations. A more thorough test could make those irrelevant. Ie, instead of a single ping-pong make the server respond to multiple ‘ping!’ requests by a single client, each fired with a certain interval.
True that.
However, I noticed that you use smaller backlog and response for the cogen sample. Also, if you spawn a coroutine for each new connection and avoid makefile you can get a bit more response rate out of it. Eg:
import sys
from cogen.core import sockets
from cogen.core import schedulers
from cogen.core.coroutines import coroutine
@coroutine
def server():
srv = sockets.Socket()
adr = (‘0.0.0.0′, len(sys.argv)>1 and int(sys.argv[1]) or 1200)
srv.bind(adr)
srv.listen(5000)
while 1:
conn, addr = yield srv.accept()
m.add(handler, args=(conn,))
@coroutine
def handler(conn):
yield conn.send(“HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n”)
conn.close()
m = schedulers.Scheduler()
m.add(server)
m.run()
Would using asynchronous code in twisted make a difference. I just notice that in diesel you are using non-blocking code by using the generator syntax where in the Twisted code you are using blocking code instead of returning a deferred or using a generator. Does this make a difference in the tests?
Just to note, twisted also can use the generator syntax.
Twisted example is asynchronous.
transport.write() call merely buffers the data. Actual sending happens when the descriptor is ready for writing.
I tried Circuits (on a Linode 360 running Linux 2.6) because its component architecture looked interesting and the example code was clean.
$ sysctl net.core.somaxconn net.core.netdev_max_backlog
net.core.somaxconn = 250000
net.core.netdev_max_backlog = 2500
$ cat /proc/sys/fs/file-max
1001000
$ ulimit -n
1001000
I changed port to 10000 in Circuits example code to match your httperf command-line.
I also followed the instructions for installing a custom httperf. Be careful to use the proper httperf if you already had it installed (e.g. by default it installs to /usr/local/bin/).
I also made sure that Circuits was using epoll:
PongServer((‘localhost’, 10000), poller=EPoll, backlog=500).run()
$ httperf –hog –timeout=60 –client=0/1 –server=localhost –port=10000 –uri=/ –rate=400 –send-buffer=4096 –recv-buffer=16384 –num-conns=40000 –num-calls=1
httperf: warning: open file limit > FD_SETSIZE; limiting max. # of open files to FD_SETSIZE
Maximum connect burst length: 3
Total: connections 40000 requests 33288 replies 33216 test-duration 204.995 s
Connection rate: 195.1 conn/s (5.1 ms/conn, <=16450 concurrent connections)
Connection time [ms]: min 0.1 avg 12921.3 max 45608.6 median 0.5 stddev 20278.0
Connection time [ms]: connect 12973.1
Connection length [replies/conn]: 1.000
Request rate: 162.4 req/s (6.2 ms/req)
Request size [B]: 62.0
Reply rate [replies/s]: min 0.0 avg 162.0 max 400.0 stddev 195.6 (41 samples)
Reply time [ms]: response 16.2 transfer 0.0
Reply size [B]: header 38.0 content 5.0 footer 0.0 (total 43.0)
Reply status: 1xx=0 2xx=33216 3xx=0 4xx=0 5xx=0
CPU time [s]: user 37.28 system 167.49 (user 18.2% system 81.7% total 99.9%)
Net I/O: 16.6 KB/s (0.1*10^6 bps)
Errors: total 6784 client-timo 6784 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0
No errors (all 2xx).
Sorry, you also need: from circuits.net.pollers import EPoll
I also tried the same test on a monster 2x quad core xeon 5520
httperf –hog –timeout=60 –client=0/1 –server=localhost –port=10000 –uri=/ –rate=400 –send-buffer=4096 –recv-buffer=16384 –num-conns=40000 –num-calls=1
Maximum connect burst length: 1
Total: connections 40000 requests 40000 replies 40000 test-duration 100.001 s
Connection rate: 400.0 conn/s (2.5 ms/conn, FD_SETSIZE; limiting max. # of open files to FD_SETSIZE
Maximum connect burst length: 40
Total: connections 40000 requests 39975 replies 20859 test-duration 114.926 s
Connection rate: 348.0 conn/s (2.9 ms/conn, <=26959 concurrent connections)
Connection time [ms]: min 233.0 avg 6634.8 max 72470.5 median 3327.5 stddev 9396.5
Connection time [ms]: connect 4989.3
Connection length [replies/conn]: 1.000
Request rate: 347.8 req/s (2.9 ms/req)
Request size [B]: 62.0
Reply rate [replies/s]: min 0.0 avg 189.6 max 1384.3 stddev 392.4 (22 samples)
Reply time [ms]: response 3230.8 transfer 0.0
Reply size [B]: header 38.0 content 5.0 footer 0.0 (total 43.0)
Reply status: 1xx=0 2xx=20859 3xx=0 4xx=0 5xx=0
CPU time [s]: user 2.43 system 112.48 (user 2.1% system 97.9% total 100.0%)
Net I/O: 28.7 KB/s (0.2*10^6 bps)
Errors: total 19141 client-timo 583 socket-timo 0 connrefused 0 connreset 18558
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0
Your comment system ate my last comment and spit it back out in 2 parts.
httperf –hog –timeout=60 –client=0/1 –server=localhost –port=10000 –uri=/ –rate=4000 –send-buffer=4096 –recv-buffer=16384 –num-conns=40000 –num-calls=1
httperf: warning: open file limit > FD_SETSIZE; limiting max. # of open files to FD_SETSIZE
Maximum connect burst length: 40
Total: connections 40000 requests 39975 replies 20859 test-duration 114.926 s
Connection rate: 348.0 conn/s (2.9 ms/conn, <=26959 concurrent connections)
Connection time [ms]: min 233.0 avg 6634.8 max 72470.5 median 3327.5 stddev 9396.5
Connection time [ms]: connect 4989.3
Connection length [replies/conn]: 1.000
Request rate: 347.8 req/s (2.9 ms/req)
Request size [B]: 62.0
Reply rate [replies/s]: min 0.0 avg 189.6 max 1384.3 stddev 392.4 (22 samples)
Reply time [ms]: response 3230.8 transfer 0.0
Reply size [B]: header 38.0 content 5.0 footer 0.0 (total 43.0)
Reply status: 1xx=0 2xx=20859 3xx=0 4xx=0 5xx=0
CPU time [s]: user 2.43 system 112.48 (user 2.1% system 97.9% total 100.0%)
Net I/O: 28.7 KB/s (0.2*10^6 bps)
Errors: total 19141 client-timo 583 socket-timo 0 connrefused 0 connreset 18558
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0
I take that back – there are a number of timeouts and connreset errors. Interesting.
Thanks for your remarks though.
The client-timeout errors are caused by the timeout (60seconds in your case) set on the httperf command line. You can make most of these go away by increasing the timeout, but it is interesting to see the difference between all the frameworks. I am not really sure what causes the connection reset errors.
I am still planning to post an update to this benchmark, but i will need some extra machines for that.
ps,
I am curious how the maximum / optimum request rates of both machines differ.
I tried backlog=50000 which resulted in 0 connection reset by peer errors. Why? I do not see the correlation.
Errors: total 27903 client-timo 27903 socket-timo 0 connrefused 0 connreset 0
However, there’s a very high number of cient-timeout errors in the same results, even after increasing the httperf timeout to 3m (–timeout=180) which is of course ridiculous.
—
Some interesting notes from the httperf man page:
“client-timo: The number of times a session, connection, or call failed due to a client timeout (as specified by the –timeout and –think-timeout) options.”
“–timeout=X
Specifies the amount of time X that httperf is willing to wait for a server reaction. The timeout is specified in seconds and can be a fractional number (e.g., –timeout 3.5). This timeout value is used when establishing a TCP connection, when sending a request, when waiting for a reply, and when receiving a reply. If during any of those activities a request fails to make forward progress within the alloted time, httperf considers the request to have died, closes the associated connection or session and increases the client-timo error count. The actual timeout value used when waiting for a reply is the sum of this timeout and the think-timeout (see option –think-timeout). By default, the timeout value is infinity.”
—
Also,
“Since the machine that httperf runs on has only a finite set of resource available, it can not sustain arbitrarily high HTTP loads. For example, one limiting factor is that there are only roughly 60,000 TCP port numbers that can be in use at any given time. Since on most UNIX systems it takes one minute for a TCP connection to be fully closed (leave the TIME_WAIT state), the maximum rate a client can sustain is at most 1,000 requests per second.”
—
Also, it’s general practice to not benchmark from the same host as the server being benchmarked.
Nice, concurrence (trunk) “just works”.
—
buffer size (default): 8KiB read + 8KiB write, default backlog of 255:
httperf –hog –timeout=60 –client=0/1 –server=localhost –port=10000 –uri=/ –rate=1000 –send-buffer=4096 –recv-buffer=16384 –num-conns=40000 –num-calls=1
httperf: warning: open file limit > FD_SETSIZE; limiting max. # of open files to FD_SETSIZE
Maximum connect burst length: 1
Total: connections 40000 requests 40000 replies 40000 test-duration 39.997 s
Connection rate: 1000.1 conn/s (1.0 ms/conn, FD_SETSIZE; limiting max. # of open files to FD_SETSIZE
Maximum connect burst length: 1
Total: connections 40000 requests 40000 replies 40000 test-duration 39.997 s
Connection rate: 1000.1 conn/s (1.0 ms/conn, FD_SETSIZE; limiting max. # of open files to FD_SETSIZE
Maximum connect burst length: 6
Total: connections 40000 requests 40000 replies 40000 test-duration 13.004 s
Connection rate: 3076.0 conn/s (0.3 ms/conn, <=931 concurrent connections)
Connection time [ms]: min 0.4 avg 150.7 max 3065.4 median 65.5 stddev 515.1
Connection time [ms]: connect 93.0
Connection length [replies/conn]: 1.000
Request rate: 3076.0 req/s (0.3 ms/req)
Request size [B]: 62.0
Reply rate [replies/s]: min 3898.7 avg 3905.5 max 3912.4 stddev 9.7 (2 samples)
Reply time [ms]: response 57.6 transfer 0.0
Reply size [B]: header 38.0 content 5.0 footer 0.0 (total 43.0)
Reply status: 1xx=0 2xx=40000 3xx=0 4xx=0 5xx=0
CPU time [s]: user 1.03 system 11.98 (user 7.9% system 92.1% total 100.0%)
Net I/O: 315.4 KB/s (2.6*10^6 bps)
Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0
—
looks good to me.
grrr, your comment system is completely fuxored.
Hi all,
James here (prologic) the developer of circuits.
Brian has brought my attention to this very
interesting blog and I must say I’m a bit
disappointed with circuits’ results, however
I am not surprised. I haven’t quite been able
to perfect the EPoll Component in
circuits.net.pollers and I believe the performance
penalty you see in the results here to be a defect.
I’m having another look to see if I can rectify this
in the development branch (http://hg.softcircuit.com.au/projects/circuits-dev/).
Thanks for the nice comment about circuits having
a nice clean implementation and the circuits version
of pongserver having the most beautiful code of them all.
(“code is the most beautiful of them all,”).
cheers
James
cheers
James
You need to look at Circuits again. James’ latest patch makes the epoll support level-triggered instead of edge-triggered. There are no errors (all 200) now.
http://gist.github.com/280103
What is circuits, what is tornado, what is geven on the graphic??? I see three blue colors!
Thanks for doing this! I’d be interested in the Medusa results — that’s what I’m currently using for UpLib (http://uplib.parc.com/). I keep wondering if I should shift to something else, but so far Medusa seems to keep working just fine.
The current Medusa is at http://www.amk.ca/python/code/medusa.html.
I can’t thank you enough for writing this awesome blog post. I recently gave a talk about Python concurrency, http://jjinux.blogspot.com/2009/12/python-concurrency.html, but I think I like your blog post even better since it shows the hard data and real code.
[...] For more performance info, James Abley pointed me to a very complete benchmark of available Python asynchronous webservers. It looks like Tornado is a real monster of [...]
Hi there, just poking you to see whether you’re going to update these with the corrected code that all the maintainers of these libraries have posted. You said you’d get to it “somewhere after xmas”….it’s definitely after xmas!
Nice post.
Kamaelia project is missing.
[...] The superior performance is not the only benefit of tight integration with libevent. Other benefits are [...]
I’m using gevent.http to rewrite your test code and get 70% faster
, gevent is soooooooo cool!
This is excellent research. I’m currently looking for an async framework and I hadn’t even heard of gevent. Thank you for putting this together and publishing it.
Great, informative article!
FYI Syncess, a high performance asynchronous client and server library using Stackless Python has born after this comparison. See Syncless at http://code.google.com/p/syncless/
I’ve also written a feature comparison of event-driven and coroutine-based non-blocking I/O libraries for Python. Read it at: http://ptspts.blogspot.com/2010/05/feature-comparison-of-python-non.html
Just as a quick comparison I created a client based around pyev, here is the program
import socket import signal import pyev class Client(object): def start(self, loop, sock): self.msg = "HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n" self.sock = sock self.write = self.write_msg self.writewatcher = pyev.Io(self.sock, pyev.EV_WRITE, loop, self.write) self.writewatcher.start() def write_msg(self, watcher, events): self.sock.send(self.msg) self.writewatcher.stop() self.sock.close() class Server(object): def start(self, loop): self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) self.sock.bind(('localhost', 10000)) self.sock.listen(500) self.acceptwatcher = pyev.Io(self.sock, pyev.EV_READ, loop, self.accept) self.acceptwatcher.start() def accept(self, watcher, events): (sock, addr) = self.sock.accept() client = Client() client.start(watcher.loop, sock) def interrupt(watcher, events): watcher.loop.unloop() loop = pyev.default_loop() sigwatch = pyev.Signal(signal.SIGINT, loop, interrupt) sigwatch.start() srv = Server() srv.start(loop) loop.loop()Here is the results using ra=10000
Note that the sockets are in fact blocking, I never set nonblocking, I have no doubt I could do even better with nonblocking.
If this bench is saying anything, its saying nothing comes even close to pyev in the python world, nothing.
Just to really push things, I tried 20k per second…
And finally how the twisted example does on my machine in comparison, exactly as taken from this blog when given a rate of 10k…
It basically fails to even finish…
[...] a minimalist framework in Python! You can read more about it in their main website. Also check this article which compares the asynchronous servers in python. Tornado walks its [...]
I2JfUx http://cra3Zzphu47hvm4bbmp82f0vwJs.com
[...] Asynchronous Servers in Python [...]
Your benchmarks are tainted because you are not using the same backlog value to the listen call. Likewise, some are simply using the default value provided by the framework. As an example, Twisted is using 5000 while gevent is using 500. Who knows what is being used for the other frameworks.
This deviation is value has implications for all of your tests, but especially so for the error and connection rate tests. Please standardize on a single listen value and re-run your tests.
Ignoring the above error, your summary of async/network/event frameworks available for Python is very interesting and informative.
It’s funny goodluck redtube sick =-P free girl boy redtube 916 red tube tranny alone 110125 red tube lesbo oqvzc hot erotic oil les massage redtube 0721 redtube slim teen 0883 redtube squirting videos %-DDD red tube tied fxyv redtube teen sucks dysqz redtube jenna jamason 95862
It’s serious russian preteen beauties 462368 preteen gay story =-DDD preteen bikini competition 4439 12yr old preteen 682 preteen models paradise :-OO preteens naked galleries 813464 preteens nude preview 86873 preteen russian mpeg ebmsf preteen pantyhose fun 729 preteen girl model11 gxz
[...] over a websocket connection. This requirement of websocket connections is the reason why I benchmarked various event-driven servers about a year ago. The benchmark results showed that the servers which [...]
[...] проекта. Заинтересовавшись, я нашёл ещё вот этот и этот тесты. В них Торнадо показал себя достаточно неплохо. [...]
[...] a minimalist framework in Python! You can read more about it in their main website. Also check this article which compares the asynchronous servers in python. Tornado walks its [...]
[...] Asynchronous Servers in Python [...]