Phillip Pearson - web + electronics notes

tech notes and web hackery from a new zealander who was vaguely useful on the web back in 2002 (see: python community server, the blogging ecosystem, the new zealand coffee review, the internet topic exchange).

2002-7-29

First run done

Right, we have some ecosystem results here. 639 blogs, although a lot of the UserLand-hosted ones (editthispage.com etc) won't have been counted because my crawler was blocked after going too fast in the first run. The next thing I'll do is get it to group pages by IP address and only fetch, say, one page every minute from a single IP, which should let me get the UserLand ones OK. I wonder what the 'nasty crawler' threshold is on the server.

(Tech note: I cache pages, so I'll only ever fetch any given page once. I shouldn't stress any single site at all, but servers which host thousands of blogs will notice quite a few hits. That will change.)
... more like this: []

Blogging ecosystem experiments

Following Dave's link to the Truth Laid Bear's Blogging Ecosystem yesterday, it looks like he's planning to automate it. What a coincidence - I got halfway through coding something like this a while back.

After a bit of hacking last night, I'm now running a crawl of Dave Winer's blogroll and a snapshot from weblogs.com right now. It's taking a while because I seem to have got myself blacklisted from the userland servers for hitting them too hard. Oops. I've set it to wait 5 mins between fetches; hopefully that's OK. Maybe I'll just have to leave off those blogs for the moment.

You can see the running totals here. Not too pretty yet, and they won't be complete for several hours.
... more like this: [, ]