Reading Leonard Richardson's paper about his Ultra Gleeper recommendation engine, I notice that he's run into the problem of stemming weblog URLs.
I managed to write a reasonable stemmer back when I was running the Blogging Ecosystem; if I remember, when I've got some free time I'll dig this out and improve it to do a better job matching more modern[1] URLs. It's a function that would be handy to have in an open source library.
Update: I've done this and put up a first cut of the code; there will be more posts on this blog under the 'urlstemmer' topic as things proceed.
----
1. Back in 2002 and 2003, when the ecosystem was operating, people tended to use either simple MT-style archive links ("/archives/12345.html") or dated ones ("/2003/2/2.html"), whereas now it's quite popular to put your post title in the URL.
... more like this: [