9 July 2014

IP Addresses of 5000 Robots Spidering the Web

A sends:

5000 Robots

Attached is a comma separated values (CSV) file which lists in IP order over 5000 addresses currently spidering the web.

You can see the usual "legit" suspects - Google, Bing, Yahoo, etc.

But worthy of note is the number of robots now operating from the Amazon Cloud, many of which use the open-source Gocrawl package (whose user agent defaults to "Googlebot").

CSV fields are IP in dot notation, IP as integer, CNET as integer, BNET as integer, Host (as returned from PHP function "gethostbyaddr") and User Agent.