the list | dataset | spider info | more applications | archive | press | linking back
Here are some scripts that work out some more statistics on the ecosystem data.
To run any of them, you'll need to have downloaded the link data and blog.py. Don't forget to untar (tar -vzxf linkData.tar.gz) the data before running the scripts. As always, you need a copy of Python.
Starting from one point in the ecosystem, follow links and figure out how many people you can get to (and how long it takes to get to them). Also saves (to 'unreachable.txt') a list of blogs that are not reachable (cannot be reached by following links from the starting point).
Kind of a debugging script to work out the kinks in the next script to be posted, this one works out how common certain levels of linkedness are. There are far too many people in the ecosystem with no incoming links!
Inspired by Steven Dulaney's post, here is a script that works out the mean number of links going out from each page and finds the number of 'degrees of separation' between any two randomly-selected blogs.