Phillip Pearson - web + electronics notes

tech notes and web hackery from a new zealander who was vaguely useful on the web back in 2002 (see: python community server, the blogging ecosystem, the new zealand coffee review, the internet topic exchange).

2003-2-3

E-mail experiment

I haven't had a lot of time to reply to e-mail recently, and a few messages about my web projects have been waiting in my drafts folder for some time. As a bit of an experiment, I'm going to take a leaf from McCusker's book and reply to the fairly public-sounding messages out here on this weblog.

If I've quoted you here and you'd like me to paraphrase you rather than cutting and pasting verbatim, drop me a line and I'll sort it out. I'll do my best to only publish things I don't expect people to mind me publishing.

Today we have a message from Michael Fagan:

I was thinking that perhaps most ITE channels should be multilingual, with checkboxes for what language posts you want to view. That way there wouldn't need to be several channels for the same topic.

Of course, then either the user would have to input what language their post was in, or it would have to be detected. Automatic detection would be better, as I don't think TrackBack pings could include more information.

That's an interesting idea. I'm surprised how many non-English channels have appeared. The French and Spanish bloggers are out in force!

I wonder what would be the best way to implement it. As you say, TrackBack pings don't include a lot of information. I wonder if peeking at the character set of the HTTP request would help.

On the other hand, I bet there has been a bit of research into identifying the language of bits of text. A simple (and probably quite effective) method would be to count up the number of words that fit into an English dictionary, and then for a French dictionary, and Spanish, and so on, and pick the language that matches best. That wouldn't be hard to code, either.

Update: Google leads me to TextCat, a GPL tool that can recognise text in sixty-odd languages. Passing a few sentences in English, French and Spanish through the online demo yielded good results for me. So I guess that answers Michael's question. Now, to find the time to hack it in there ... :-)

Tuning InterBase

While most things in InterBase (well, Firebird) are very quick, it looks like opening database connections is painfully slow. Last night I got a ridiculous speedup in my web app (from 1 hit/sec to 100 hits/sec) by caching the database connection between hits. Luckily, mod_python makes that very easy - any global variables that are assigned while importing modules are remembered, presumably on a per-thread basis.

Instead of this:

class Database:
    pass

def connect():
    return Database()


do this:

class Database:
    pass

_data = Database()
def connect():
    return _data


All things going well, we should be seeing a public installation of this app sometime in the next few weeks. Stay tuned.