Phillip Pearson - web + electronics notes

tech notes and web hackery from a new zealander who was vaguely useful on the web back in 2002 (see: python community server, the blogging ecosystem, the new zealand coffee review, the internet topic exchange).


WWW 2005 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics

Hmm, something that would be cool to go to, if I can scrape together the cash and annual leave days: WWW 2005 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, Chiba, Japan.

Do I still do enough interesting stuff in the blogosphere to have something to write about, though?

Blog spam sucks

It's kind of depressing to see the volume of spam in the comments to Dave Sifry's post announcing today's weblog anti-spam summit. Sigh.

A related cry for help: if anyone's got a blacklist of annoying news spam sites like, or anything in the HarWester Network, or would like me to start a public one, please ping me or leave a comment. I'm just starting to wake up to the need to get rid of things like that from the Topic Exchange.

Weblog URL stemming

Reading Leonard Richardson's paper about his Ultra Gleeper recommendation engine, I notice that he's run into the problem of stemming weblog URLs.

I managed to write a reasonable stemmer back when I was running the Blogging Ecosystem; if I remember, when I've got some free time I'll dig this out and improve it to do a better job matching more modern[1] URLs. It's a function that would be handy to have in an open source library.

Update: I've done this and put up a first cut of the code; there will be more posts on this blog under the 'urlstemmer' topic as things proceed.


1. Back in 2002 and 2003, when the ecosystem was operating, people tended to use either simple MT-style archive links ("/archives/12345.html") or dated ones ("/2003/2/2.html"), whereas now it's quite popular to put your post title in the URL.

