Phillip Pearson - web + electronics notes

tech notes and web hackery from a new zealander who was vaguely useful on the web back in 2002 (see: python community server, the blogging ecosystem, the new zealand coffee review, the internet topic exchange).

2002-8-19

New ecosystem application

I just released another ecosystem analysis script: nearest neighbours, which starts from one blog in the ecosystem and figures out how many other blogs it can reach by following links.

As with all the other scripts, it's in Python, and full source code is available. It works on the ecosystem dataset, which is about a 1 megabyte download and contains all the link information gathered in the process of creating the ecosystem. The dataset is freely available for other people to use in their research (as long as you credit the source).

Aha, found them :)

You can blame LiveJournal for the aforementioned half-million blogs.

Searching for dark blogs

MSNBC's Steven Levy has an article up (found via Dave) about blogging in general.

He points out: "the various computer-generated lists that purport to probe what.s happening on Planet Blog don't go beyond the 10,000 or so most popular ones".

That's a very good point. Cameron Marlow suspects there there are over half a million blogs out there -- but where are they? My ecosystem spider picks up 100 or so new ones each day from weblogs.com, but at that rate it'll be over a decade before it's indexing half a million. Cameron's own Blogdex indexes somewhere between 10-20,000; that's over double what I'm doing but still nowhere near the magic 500,000 figure.

It could be a worthwhile project to go out there and find all those "dark blogs". Anybody up to it? Blogger has a directory of blogs, which would be a good place to start.