Phillip Pearson - web + electronics notes

tech notes and web hackery from a new zealander who was vaguely useful on the web back in 2002 (see: python community server, the blogging ecosystem, the new zealand coffee review, the internet topic exchange).

2008-1-13

What PHP gets right

Ian Bicking does a stunning job of explaining what I've been pondering about for a while; especially relevant right now after the conversation started by DreamHost about Rails deployment on shared hosting.

His main point is that PHP has a CGI-like model: everything gets loaded on every request. Amazingly, this turns out to be fairly practical, because all the core stuff is written in C, so you can get pretty decent performance out of a system like this. mod_php keeps the core of the language and all C extensions in memory, so you don't have to keep reloading the interpreter like you do for 'real' CGI, but (at least without APC/eAccelerator installed) it doesn't keep any compiled PHP code or data around between requests.

To make web apps in any other language work, you usually end up needing to put something together that preloads the app and serves lots of requests -- mod_perl for Perl, some SCGI server for Python (Django, TurboGears), Mongrel for Ruby, Tomcat/Jetty/Resin for Java. This works well in the end but is limiting for the shared host, as the process that preloads stuff generally can't be shared between customers, or they'll step on each other's toes. Perl/Python/Ruby people love VPSes for this reason: you get a fairly isolated environment in which you can install and run what you like. Whereas in PHP's case, you can throw a bunch of apps together on one Apache install and generally it'll all work out OK.

This isn't to say that PHP is necessarily *better* than any other language, but it's certainly easier to deploy. I used to do a lot of web stuff in Python, and I'm having a lot of fun working with Rails these days, but for pretty much anything experimental, I use PHP, because I can just throw a few scripts up on my web server and forget about them, as opposed to having to set something up to keep a separate process alive on the server.

Another interesting though: ASP.NET seems to have the deployment/process model sorted out quite nicely as well. I'm not that interested in actually developing with it, but Microsoft has obviously put some thought into making it usable for shared hosts, by enabling a decent level of isolation between customers.

How could one do something like this for a system like Rails or Django? Maybe nginx with some magic to autostart Mongrels as required? If behind Apache, maybe something like mod_fastcgi but with a separate pool handler, so you could keep a number of per-customer pools of Ruby processes around? It feels pretty feasible but I imagine that the technology isn't there yet. The best thing I can think of off the top of my head is something like this:

- Perlbal in front, with a manager process that reconfigures it as appropriate when new Mongrels come online. This is feasible as Perlbal is reconfigurable without a restart.

- A UNIX user for each customer, with a very lightweight (read: doesn't use much memory) manager process that starts and stops Mongrels as required by load (i.e. if Perlbal starts to build up a backlog of requests for one customer's app, it starts more, and if the pool isn't seeing much activity, it kills them off) and sends messages to the Perlbal manager as appropriate.

This doesn't feel like something that would be *too* painful to prototype, if you do all the manager processes in Python or Ruby to start with.