... and back again ;-)
Checking back at weblogs.com and merging all the blogs that appeared today but not when I did the original mass import yielded a surprising increase - about 50% - so now the ecosystem comprises 1452 blogs (and related bits and pieces that, embarrassingly enough, top the 'most linked' list, one by a long margin).
hooray for shell script --> lynx -source $1 | ./opml.pl | ./removedupes.pl
I just did another grab from weblogs.com/changes.xml, so in an hour or so (when the crawler's finished its job) the ecosystem should be nearer the 2000-blog mark.
Anyone like to hazard a guess at exactly how many blogs there are out there? Blogdex indexes 13391 and csmonitor guesses over 200,000. Hmm. Wouldn't it be cool to analyse the links in the whole lot?
phil@icicle:~/crawler$ df
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/hda7 2245196 2076032 100732 96% /
phil@icicle:~/crawler$ ll *.db
-rwxr--r-- 1 phil phil 69338227 Aug 2 00:17 crawl.db
Heh.
Time to get a bigger hard disk for dev.myelin.
... more like this: [Blogging Ecosystem, Weblogs.Com]
hooray for shell script --> lynx -source $1 | ./opml.pl | ./removedupes.pl
I just did another grab from weblogs.com/changes.xml, so in an hour or so (when the crawler's finished its job) the ecosystem should be nearer the 2000-blog mark.
Anyone like to hazard a guess at exactly how many blogs there are out there? Blogdex indexes 13391 and csmonitor guesses over 200,000. Hmm. Wouldn't it be cool to analyse the links in the whole lot?
phil@icicle:~/crawler$ df
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/hda7 2245196 2076032 100732 96% /
phil@icicle:~/crawler$ ll *.db
-rwxr--r-- 1 phil phil 69338227 Aug 2 00:17 crawl.db
Heh.
Time to get a bigger hard disk for dev.myelin.