Phillip Pearson - Second p0st

tech notes and web hackery from the guy that brought you bzero, python community server, the blogging ecosystem, the new zealand coffee review and the internet topic exchange

2003-5-4

Weirdness in the referrer log

The Googlebot is usually fairly well-behaved, but today I saw this:

/pycs_search/htdig-pycs-snapshot-20030402.tar.gz - 245 hits (32276480 bytes)
      23: 64.68.84.42; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      20: 64.68.84.51; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      18: 64.68.84.31; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      17: 64.68.85.13; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      15: 64.68.84.76; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      15: 64.68.84.39; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      15: 64.68.84.143; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      13: 64.68.84.149; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      12: 64.68.84.16; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      12: 64.68.84.137; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      11: 64.68.85.6; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      11: 64.68.84.15; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      9: 64.68.84.6; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      9: 64.68.84.153; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      9: 64.68.84.144; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      7: 64.68.84.46; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      7: 64.68.84.132; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      7: 64.68.84.131; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      5: 64.68.85.9; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      4: 64.68.84.49; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      4: 64.68.84.134; Googlebot/2.1 (+http://www.googlebot.com/bot.html)
      2: 64.68.84.43; Googlebot/2.1 (+http://www.googlebot.com/bot.html)


It looks like a whole heap of different instances of Googlebot have been downloading the file in 128K chunks. It's about 5 MB long, so I guess they've got six different copies of it in the cache right now. I didn't realise they indexed .tar.gz files ;-)
... more like this: []