Phillip Pearson - web + electronics notes

tech notes and web hackery from a new zealander who was vaguely useful on the web back in 2002 (see: python community server, the blogging ecosystem, the new zealand coffee review, the internet topic exchange).

2005-1-8

Something I'd like to see: likesearchd

This would be really handy: a server, in C or C++, to do really quick substring searches over lots of data. MySQL seems to be very slow at doing things like this, probably because the data is all spread out over many memory pages / disk blocks:

SELECT some columns FROM sometable WHERE foo LIKE '%bar%';

For a table with 100K rows, that can take many seconds. But if I dump out the foo column with SELECT INTO OUTFILE, and write a C program to iterate over all the data and check for 'bar' using strstr(), it can do the same search many times per second.

Incredibly, it even seems slow for MySQL to do something like this on a table with a key set on (lastname, firstname):

SELECT lastname,firstname FROM people ORDER BY lastname,firstname LIMIT 10000,50;

That query can take over a second, so it would be much faster to get the C program to find the ids of rows 10000-10050 and run this one 50 times:

SELECT lastname,firstname FROM people WHERE id=(id received from C program);

This all feels very wrong to me: there should be some way to do this better from inside MySQL. But I can't find it. Anyone know of a better way?