25 June 2013. The PRC Cryptome-scraping bot has returned today after being
away three days.
184.108.40.206 - - [25/Jun/2013:00:00:00 -0400] "HEAD /isp-spy/ebay-paypal-spy.pdf
HTTP/1.0" 200 - "http://cryptome.org/isp-spy/online-spying.htm"
90MB of log files and 2:56 hours later ended:
220.127.116.11 - - [25/Jun/2013:02:56:13 -0400] "HEAD /dmca/cr19no99e.txt
HTTP/1.0" 200 - "http://cryptome.org/dmca/dmca-index.htm" "Wget/1.12 (linux-gnu)"
24 June 2013. The PRC Bot has gone away, last came 22 June 2013, 02:30AM.
It came first January 15, 2013, 10:59AM, and has come daily since then.
14 June 2013. The scraping occurs about two hours a day, not every two hours.
13 June 2013
PRC Bot Continues to Scrape Cryptome
Relevant to the Edward Snowden disclosures about NSA hacking China and many
others: For several months the IP address 18.104.22.168, traceable to central
Beijing, logs into Cryptome with HEAD and GET commands every day commencing
at 00:00 and running non-stop for over 2 hours. Start and finish log files
22.214.171.124 - - [13/Jun/2013:00:00:00 -0400]
"HEAD /2013/03/shabak-spy.pdf HTTP/1.0" 200 -
126.96.36.199 - - [13/Jun/2013:02:26:22 -0400]
"HEAD /dmca/cr19no99e.txt HTTP/1.0" 200 -
Total daily log files of this IP address is over 50MB, predominantly HEAD
commands but periodically GET commands, and occasionally scraping the entire
site of 70K files. Few other bots are as persistent and predictable.
It is possible the bot is cloaking its origin with the PRC IP address, perhaps
hiding a version of the NSA bot
(2) from 1999 or that of
another official or commercial spy. Not much is available on Google about
this bot's address and attack.
We have been unable to block the attack with a variety of .htaccess blocking
mechanisms which have been effective against other bots.
Advice on how to do so would be appreciated. Edward Snowden, hello.