Phillip Pearson - Second p0st

tech notes and web hackery from the guy that brought you bzero, python community server, the blogging ecosystem, the new zealand coffee review and the internet topic exchange

2003-6-2

Lucene note: build your index all at once

It looks like Lucene doesn't like searching through indices that have been built in lots of little chunks. I get proper results if I build an index of 300 blog posts like this:

    - open index
    - for post in posts:
        - add post to index
    - close index

... but if I do it like this, it seems to stop indexing them after the first couple of hundred:

    - for post in posts:
        - open index
        - add post to index
        - close index

Update: It looks like the reason for this behaviour was that I was passing a true value as the create parameter to IndexWriter's constructor in the test. I'm seeing the same behaviour in my application code, though, which sets it to false. Odd.

Update 2 (2003-06-12): Figured the application out now (finally)! The problem was a combination of the use of different analyzers in indexing and search and a broken custom query object that was blowing away some of the hits.
... more like this: [, ]

Downhill: Finding paths between blogs

Downhill uses the Blogging Ecosystem data to find the shortest path between two given weblogs. Nice.
... more like this: [, , ]