Q & A
It's time to answer some of the questions and requests (also some I've receved by email) about the ecosystem:
My link counts are wrong!
Cool - tell me and I'll fix it. I know that the counts aren't perfect; they're certainly different from TTLB's ecosystem at times. I'm continually ironing out bugs, but I need to have examples of things not working before I can find out why. So drop me a line and I'll sort it out for you!
Looks like you are missing the forward links for people using Blogrolling.com
Yes - I was :)
Not now though! The crawler will now search through your blog and look for blogrolling.com links, then follow them and pull out all the links from there. As such, people like Jim S have jumped rather markedly in the 'most prolific linkers' column!
What is gzip?
gzip is a compression program - it does more or less the same thing as WinZip, except it's not tied to Windows. It's embedded in a lot of web servers and browsers, and acts to compress the web pages as they get sent across the 'net. Basically it makes pages load faster by reducing the amount of data sent.
Right now, the ecosystem main page is 290,817 bytes long, but it compresses down to 34,347 (an 89% reduction). This means that it takes about 2 seconds for the web server to send it out rather than about 20 ;-)
What is format of the ecosystem file?
I'm using MetaKit to store the data, but I'm moving to just using the cache files (click on the 'c' next to a blog name in the ecosystem main page). The only stuff that ends up on disk is:
- the list of blogs
- a copy of all the web pages the crawler downloads (the cache pages)
- the ecosystem main page and stats pages
Unfortunately that means I don't have a nice compact summary of the data that people can download and play with. However, if anybody would like me to, I can produce something in XML or some other format that's easier to parse.
Either that or I can just give you the source code for the crawler and you can build it yourself ;-)
It would be interesting to see a ranking number on the lists as they are rather long now.
Done - everything is ranked now :)
Note that I'm only showing the top 500 in each list, as the HTML page was getting too long and my web server was having trouble. I'll put the other ones back onto some other pages when I have some free time.
Also we need some better graph analysis to show clusters of related sites. I'll maybe post some ideas I had on this on my site. I have only been meaning to write the paper ten years.
Good idea - I'll be watching your site ;-)
That's been one of my goals since the start, but I haven't had the time to implement it.
Apparently TTLB has something up his sleeve, but he's not telling me yet!
... more like this: [Blogging Ecosystem]
My link counts are wrong!
Cool - tell me and I'll fix it. I know that the counts aren't perfect; they're certainly different from TTLB's ecosystem at times. I'm continually ironing out bugs, but I need to have examples of things not working before I can find out why. So drop me a line and I'll sort it out for you!
Looks like you are missing the forward links for people using Blogrolling.com
Yes - I was :)
Not now though! The crawler will now search through your blog and look for blogrolling.com links, then follow them and pull out all the links from there. As such, people like Jim S have jumped rather markedly in the 'most prolific linkers' column!
What is gzip?
gzip is a compression program - it does more or less the same thing as WinZip, except it's not tied to Windows. It's embedded in a lot of web servers and browsers, and acts to compress the web pages as they get sent across the 'net. Basically it makes pages load faster by reducing the amount of data sent.
Right now, the ecosystem main page is 290,817 bytes long, but it compresses down to 34,347 (an 89% reduction). This means that it takes about 2 seconds for the web server to send it out rather than about 20 ;-)
What is format of the ecosystem file?
I'm using MetaKit to store the data, but I'm moving to just using the cache files (click on the 'c' next to a blog name in the ecosystem main page). The only stuff that ends up on disk is:
- the list of blogs
- a copy of all the web pages the crawler downloads (the cache pages)
- the ecosystem main page and stats pages
Unfortunately that means I don't have a nice compact summary of the data that people can download and play with. However, if anybody would like me to, I can produce something in XML or some other format that's easier to parse.
Either that or I can just give you the source code for the crawler and you can build it yourself ;-)
It would be interesting to see a ranking number on the lists as they are rather long now.
Done - everything is ranked now :)
Note that I'm only showing the top 500 in each list, as the HTML page was getting too long and my web server was having trouble. I'll put the other ones back onto some other pages when I have some free time.
Also we need some better graph analysis to show clusters of related sites. I'll maybe post some ideas I had on this on my site. I have only been meaning to write the paper ten years.
Good idea - I'll be watching your site ;-)
That's been one of my goals since the start, but I haven't had the time to implement it.
Apparently TTLB has something up his sleeve, but he's not telling me yet!