9 July 2014
IP Addresses of 5000 Robots Spidering the Web
Attached is a comma separated values (CSV) file which lists in IP order over
5000 addresses currently spidering the web.
You can see the usual "legit" suspects - Google, Bing, Yahoo, etc.
But worthy of note is the number of robots now operating from the Amazon
Cloud, many of which use the open-source Gocrawl package (whose user agent
defaults to "Googlebot").
CSV fields are IP in dot notation, IP as integer, CNET as integer, BNET as
integer, Host (as returned from PHP function "gethostbyaddr") and User Agent.