Phillip Pearson - web + electronics notes

tech notes and web hackery from a new zealander who was vaguely useful on the web back in 2002 (see: python community server, the blogging ecosystem, the new zealand coffee review, the internet topic exchange).


Candidate for best-named-blog

Pod Bay Door.

Brings back memories ... Space Quest I, anyone?

Making stories easier in bzero

New command for bzero 0.13 (not here yet): 'bzero story (blogname) (storypath)'

Makes it easy to make a new story. Just as you might type:

    bzero post foobar

to make a new post to your blog foobar, now you can type:

    bzero story foobar bar_baz_boz

to make a new story that will be saved as http://.../2002/12/17/bar_baz_boz.html. (Almost, but not quite, TBL-style URLs).

(That's how I did the 'search engine' story in the last post.)
... more like this: []

Search engines

Thinking about search engines today. Does Python Community Server need one?

I just installed mnoGoSearch on FreeBSD and wrote up my notes on it (follow the link for a longer description).

Here are my notes on building a search engine right into PyCS: Building a search engine in Python.

More on this later if I go ahead with it.

Update: Thanks to Paul Hardwick for pointing out Opilio, a search engine for Frontier that is being ported to Python. Unfortunately there doesn't seem to be anything to download yet.

The Opi FAQ also mentions ht://dig, which I had completely forgotten about. It's a standalone search engine in C, like mnoGoSearch.

Robert Barksdale mentions ZCatalog, a search engine for Zope that appears to also run on its own. This sounds the most interesting so far.

A note: Some of my recent work on PyCS has been sponsored by RealWorld Systems, so they can get the features they want on a community server I'm hosting for them. This is what got me to think about this whole search engine issue. Thanks, guys!
... more like this: [, , , , ]

Hmm ...

Time to make the HTML parsing for the ecosystem a little cleverer!

The ecosystem obviously hasn't reloaded recently. When it does, expect Goatee's stats page to become rather longer.

It's tricky to handle this sort of stuff. Basically you need an HTML parser that's robust enough to handle the crap HTML that most people put in their pages, without being lenient enough to allow links that aren't really there. That's something I haven't (yet) taken the time to do. It may become necessary sometime, though ...

BTW, an aside to the person behind Goatee (?): this may adversely affect your Google ranking. Google is apparently quite harsh on link spamming.

Personally, I think it's an interesting experiment. It always pays to have deficiencies like this brought out into the open, because knowledge of what can happen is essential to building 'strong' systems that can cope with malicious attacks.
... more like this: []