Phillip Pearson - Second p0st

tech notes and web hackery from the guy that brought you bzero, python community server, the blogging ecosystem, the new zealand coffee review and the internet topic exchange

2008-8-4

Queuing with Thrift

Thrift is awesome. I just wrote a toy in-memory queue server with it in Python, and right off the bat it happily handles ~2200 requests/sec on my laptop (running inside a Colinux VM, which is limited to only using one CPU core).

I've thrown the code up on github, if anyone's interested: simple-thrift-queue.

... more like this: []

Initial queueing experiments

I've been playing with RabbitMQ over the last couple of days. I've found that the reports of AMQP being a pain in the ass to get started with a right on the mark.

py-amqplib was quite good, although when I use basic_consume and a callback, it seems to use a lot of CPU and I can only handle about 170 msgs/sec (which remains fairly constant as I add processes - i.e. two processes can do ~85 msgs/sec each). With basic_get it never gets far above 20 msgs/sec but doesn't use much CPU - I guess it spends most of its time waiting for data.

I gave the QPid Python library a go after that and couldn't get basic_consume to work, but basic_get was all right. Forking off 20 worker processes and getting them to poll with basic_get, it would handle about 300 msgs/sec easily (with the generator and RabbitMQ, on the same machine, each using 20-30% CPU). Each worker sat at around 3% CPU. Trying again with py-amqplib, the worker processes used about 9% CPU each (with the same throughput).

So if you're looking for higher performance but don't mind battling through the complete lack of documentation, QPid appears to be the way to go. Here's some working receiver code:

import qpid, time

conn = qpid.client.Client('localhost', 5672, qpid.spec.load('qpid/specs/amqp.0-8.xml'), vhost='/')
print conn.start({"LOGIN": "your login name here", "PASSWORD": "your password here"})
ch = conn.channel(1)
print ch.channel_open()
r = ch.access_request('/data', active=True, read=True, write=True)
ticket = 0
ch.exchange_declare(ticket, "tempexch", "direct", durable=False, auto_delete=False)
ch.queue_declare(queue="tempqueue", durable=False, exclusive=False, auto_delete=False)
ch.queue_bind(queue="tempqueue", exchange="tempexch", routing_key="tempqueue")

while 1:
    msg = ch.basic_get(queue="tempqueue")
    c = msg.content
    if c is not None:
        # handle message now
        print c.body
        ch.basic_ack(msg.delivery_tag)
    else:
        time.sleep(1)


That said, 300 messages per second is not very fast; I think I've read somewhere that RabbitMQ performance for a non-durable queue (with client and server in C) should be somewhere in the tens of thousands of messages per second per CPU (~400k per second on a 16-core box). I assume I'm doing something wrong somewhere...

Update: Ben Hood contacted me and pointed to RabbitMQ's SimplePerformanceTests page, which apparently have typical performance several orders of magnitude higher than my results here. I'll give these a go and hopefully see what's wrong with my own attempt.
... more like this: [, ]

Building Thrift on Debian

Having a go at building Facebook's Thrift cross-language serializer/deserializer engine.

Be aware that a successful run of configure script doesn't mean that it will build for sure -- in my case I didn't have pkg-config or byacc installed. Not having pkg-config results in an syntax error about MONO in configure, and not having byacc results in a compiler error about yywrap. These instructions work for me on Debian.

... more like this: []