Asynchronous Servers in PythonNicholas Piël | December 22, 2009
There has already been written a lot on the C10K problem and it is known that the only viable option to handle LOTS of concurrent connections is to handle them asynchronously. This also shows that for massively concurrent problems, such as lots of parallel comet connections, the GIL in Python is a non-issue as we handle the concurrent connections in a single thread.
In this post i am going to look at a selection of asynchronous servers implemented in Python.
Asynchronous Server Specs
Since Python is really rich with (asynchronous) frameworks, I collected a few and looked at the following features:
- What License does the framework have?
- Does it provide documentation?
- Does the documentation contain examples?
- Is it used in production somewhere?
- Does it have some sort of community (mailinglist, irc, etc..)?
- Is there any recent activity?
- Does it have a blog (from the owner)?
- Does it have a twitter account?
- Where can i find the repository?
- Does it have a Thread Pool?
- Does it provide access to a TCP Socket?
- Does it have any Comet features?
- Is it using EPOLL?
- What kind of server is it? (greenlets, callbacks, generators etc..)
This gave me the following table.
This is quite a list and i probably still missed a few. The main reasons for using a framework and not implementing something your self is that you hope to be able to accelerate your own development process by standing on the shoulders of other developers. I think it therefore is important that there is documentation, some sort of developers community (mailinglist fe) and that it is still active. If we take this as a requirement we are left with the following solutions:
- Orbited / Twisted (callbacks)
- Tornado (async)
- Dieselweb (generator)
- Eventlet (greenlet)
- Concurrence (stackless)
- Circuits (async)
- Gevent (greenlet)
- Cogen (generator)
To quickly summarize this list; Twisted has been the de-facto standard to async programming with Python. It has an immense community, a wealth of tools, protocols and features. It has grown big and some say it reminds them of shirtless men drinking Jager-bombs complex. This is also one of the biggest reasons why people are looking elsewhere. Recently Facebook released the code of their async. approach called Tornado which is also using callbacks and recent benchmark show that it outperforms Twisted.
A common heard argument against programming with callbacks is that it can get overly complex. A programmatically cleaner approach is to use light-weight threads (imho). This can be achieved by using a different Python implementation; Stackless (such as Concurrence is using) or a plugin for regular python Greenlet (such as Eventlet and Gevent are using). Another approach is to simulate these light-weight threads with Python generators, such as Dieselweb and Cogen are doing.
This should already show that while all these frameworks provide you asynchronous concurrency they do this in each of their own ways. I want to invite you to look at these frameworks as they all have their own code gems. For example, Concurrence has a non-blocking interface to MySQL. Eventlet has a neat thread-pool, Tornado can pre-fork over CPU’s, Gevent offloads HTTP header parsing and DNS lookups to Libevent, Cogen has sendfile support and Twisted probably already has a factory doing exactly what you are planning to do next.
The Ping Pong Benchmark
In this benchmark i am going to focus on the performance of the framework to listen on a socket and write to incoming connections. The client pings the socket by opening it, the server responds with a ‘Pong!’ and closes the socket. This should be really simple but it is a pain to create something that does this in an asynchronous and non-blocking way from scratch and that is exactly the reason why we are looking at these frameworks. It is all about making our lives easier.
Ok, for this benchmark i am going to use httperf, a high performance tool that understands the HTTP protocol. If we want httperf to play along in our Ping-Pong benchmark we have to make it understand the ‘PONG!’ response. We can do this by mimicking a HTTP server and have our server respond with:
HTTP/1.0 200 OK
instead of just ‘Pong!’. Also, since most default server configurations are not set up to handle a large amount of concurrent requests, we need to make a few adjustments:
- Raise the per-process file limit by compiling httperf after some adjustments.
- Raise the per-user file limit, set ‘ulimit -n 10000‘ on both server and client.
- Raise kernel limit on file handles: ‘echo “128000″ > /proc/sys/fs/file-max’.
- Increase the connection backlog, ‘sysctl -w net.core.netdev_max_backlog = 2500‘
- Raise the maximum connections with ’sysctl -w net.core.somaxconn = 250000‘
With these settings my Debian Lenny system was ready to hammer the different servers up to rates far beyond the capacity of the frameworks. I used the following command
httperf –hog –timeout=60 –client=0/1 –server=localhost –port=10000 –uri=/ –rate=400 –send-buffer=4096 –recv-buffer=16384 –num-conns=40000 –num-calls=1
And increased the rate with an interval of 100 from 400 up to 9000 requests per second for a total of 40.000 requests at each interval.
What will now follow, is the implementation of the server side in the different frameworks. It should show the different approaches the frameworks take.
Gentlemen start your reactor!
from twisted.internet import epollreactor epollreactor.install() from twisted.internet.protocol import Protocol, Factory from twisted.internet import reactor class Pong(Protocol): def connectionMade(self): self.transport.write("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n") self.transport.loseConnection() # Start the reactor factory = Factory() factory.protocol = Pong reactor.listenTCP(8000, factory) reactor.run()
Tornado, does not hide the raw socket interface, which makes this example more lengthy then the others.
import errno import functools import socket from tornado import ioloop, iostream def connection_ready(sock, fd, events): while True: try: connection, address = sock.accept() except socket.error, e: if e not in (errno.EWOULDBLOCK, errno.EAGAIN): raise return connection.setblocking(0) stream = iostream.IOStream(connection) stream.write("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n", stream.close) if __name__ == '__main__': sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) sock.setblocking(0) sock.bind(("", 8010)) sock.listen(5000) io_loop = ioloop.IOLoop.instance() callback = functools.partial(connection_ready, sock) io_loop.add_handler(sock.fileno(), callback, io_loop.READ) try: io_loop.start() except KeyboardInterrupt: io_loop.stop() print "exited cleanly"
While this example is beautifully small, i do not really enjoy the generator approach which sprinkles ‘yield’ all over the place.
from diesel import Application, Service def server_pong(addr): yield "HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n" app = Application() app.add_service(Service(server_pong, 8020)) app.run()
I think the Circuit code is the most beautiful of them all, very elegent.
from circuits.net.sockets import TCPServer class PongServer(TCPServer): def connect(self, sock, host, port): self.write(sock, 'HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n') self.close(sock) PongServer(('localhost', 8050)).run()
The Eventlet uses a Greenlet approach.
from eventlet import api def handle_socket(sock): sock.makefile('w').write("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n") sock.close() server = api.tcp_listener(('localhost', 8030)) while True: try: new_sock, address = server.accept() except KeyboardInterrupt: break # handle every new connection with a new coroutine api.spawn(handle_socket, new_sock)
Gevent is presented as a rewrite of eventlet focussing on performance.
import gevent from gevent import socket def handle_socket(sock): sock.sendall("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n") sock.close() server = socket.socket() server.bind(('localhost', 8070)) server.listen(500) while True: try: new_sock, address = server.accept() except KeyboardInterrupt: break # handle every new connection with a new coroutine gevent.spawn(handle_socket, new_sock)
Concurrence uses the Tasklet approach, it can be run under Greenlet and under Stackless Python. In this benchmark there was not really any performance difference between the two different engines.
from concurrence import dispatch, Tasklet from concurrence.io import BufferedStream, Socket def handler(client_socket): stream = BufferedStream(client_socket) writer = stream.writer writer.write_bytes("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n") writer.flush() stream.close() def server(): server_socket = Socket.new() server_socket.bind(('localhost', 8040)) server_socket.listen() while True: client_socket = server_socket.accept() Tasklet.new(handler)(client_socket) if __name__ == '__main__': dispatch(server)
Cogen, uses the generator approach as well.
import sys from cogen.core import sockets from cogen.core import schedulers from cogen.core.coroutines import coroutine @coroutine def server(): srv = sockets.Socket() adr = ('0.0.0.0', len(sys.argv)>1 and int(sys.argv) or 1200) srv.bind(adr) srv.listen(500) while 1: conn, addr = yield srv.accept() fh = conn.makefile() yield fh.write("HTTP/1.0 200 OK\r\nContent-Length: 12\r\n\r\nHello World!\r\n") yield fh.flush() conn.close() m = schedulers.Scheduler() m.add(server) m.run()
The first graph clearly shows at which connection rate (on the horizontal axis) the successful connection rate starts to degrade. It shows a huge difference between the best performer; Tornado with 7400 requests per second and the worst, Circuits with 1400 requests per second (which doesn’t use EPOLL). This connection rate was sustained for at least 40.000 requests. We can see that, when the hammering of the server continues beyond rates the server can handle, the performance drops. This is caused by connection errors or timeouts.
This graph shows the response time, it is clearly visible that once the maximum connection rate has been reached the overal response time starts to increase.
The last graph shows the amount of errors, ie no return of a 200 detected by httperf. We can see a correlation between the performance of the server and the returned errors at a given request rate. The performing servers return less overall errors. There is however, one exception. Cogen was able to return ALL its requests successfully no matter how hard it was hammered. It is therefore not visible in this graph. This is interesting, at 9000 requests per second it was still able to answer all requests. However, the average connection time (from socket open till socket close) was about 7 seconds meaning that Cogen was serving about 28000 concurrent connections somewhat at reduced performance but not dropping them.
This post should make it clear that Python has a rich set of options toward asynchronous programming. All tested frameworks show great performance. I mean, even Circuits results with 1300 requests per second isn’t too bad. Tornado really blew me away with its performance at 7400 requests per second. But if i had to choose a favorite i would probably go with Gevent, i am really digging its greenlet style.
The clean Greentlet / Stackless style is really cool, especially since Stackless Python is keeping up nowadays with CPython. There was some talk on a mailing list about Gevent running on Stackless. The concurrence framework already runs on Stackless and can thus be a great option already if you are looking for specific features of Stackless Python such as tasklet-pickling.
I want to make clear that this test only shows how these frameworks perform at a relatively simple task. It could be that when more stuff is going on in the background the results will change. However, I feel that this benchmark is a great indicator of how each frameworks handles a socket connection.
In the coming days I plan to investigate this some more. I will also check out how these Python frameworks stack up against its equivalents in different languages, fe Ape, CometD, NodeJS. Stay tuned!