Phillip Pearson - web + electronics notes

tech notes and web hackery from a new zealander who was vaguely useful on the web back in 2002 (see: python community server, the blogging ecosystem, the new zealand coffee review, the internet topic exchange).

2003-6-2

Lucene note: build your index all at once

It looks like Lucene doesn't like searching through indices that have been built in lots of little chunks. I get proper results if I build an index of 300 blog posts like this:

    - open index
    - for post in posts:
        - add post to index
    - close index

... but if I do it like this, it seems to stop indexing them after the first couple of hundred:

    - for post in posts:
        - open index
        - add post to index
        - close index

Update: It looks like the reason for this behaviour was that I was passing a true value as the create parameter to IndexWriter's constructor in the test. I'm seeing the same behaviour in my application code, though, which sets it to false. Odd.

Update 2 (2003-06-12): Figured the application out now (finally)! The problem was a combination of the use of different analyzers in indexing and search and a broken custom query object that was blowing away some of the hits.
... more like this: [, ]

Downhill: Finding paths between blogs

Downhill uses the Blogging Ecosystem data to find the shortest path between two given weblogs. Nice.
... more like this: [, , ]