Phillip Pearson - web + electronics notes

tech notes and web hackery from a new zealander who was vaguely useful on the web back in 2002 (see: python community server, the blogging ecosystem, the new zealand coffee review, the internet topic exchange).


... and back again ;-)

Checking back at and merging all the blogs that appeared today but not when I did the original mass import yielded a surprising increase - about 50% - so now the ecosystem comprises 1452 blogs (and related bits and pieces that, embarrassingly enough, top the 'most linked' list, one by a long margin).

hooray for shell script --> lynx -source $1 | ./ | ./

I just did another grab from, so in an hour or so (when the crawler's finished its job) the ecosystem should be nearer the 2000-blog mark.

Anyone like to hazard a guess at exactly how many blogs there are out there? Blogdex indexes 13391 and csmonitor guesses over 200,000. Hmm. Wouldn't it be cool to analyse the links in the whole lot?

phil@icicle:~/crawler$ df
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/hda7 2245196 2076032 100732 96% /

phil@icicle:~/crawler$ ll *.db
-rwxr--r-- 1 phil phil 69338227 Aug 2 00:17 crawl.db


Time to get a bigger hard disk for dev.myelin.
A reblog?

Doc Searls demands a reblog! A quick Google search reveals about 7020 pages link to him. But I'm only picking up 91. Right.

This calls for some blogroll tuning. Let's see what we come up with.

Update: I added in a few of the top links from the Google search, but most were UserLand sites, and the crawler still appears to be blocked from accessing them. As the cache is probably getting a bit out of date, I refetched everything from (check the cache for your page if the links don't look right). Doc appears to be up to 96 links, which puts him in 8th place (up from 9th this morning).

What next?
