Donate for the Cryptome archive of files from June 1996 to the present

25 June 2013. The PRC Cryptome-scraping bot has returned today after being away three days. - - [25/Jun/2013:00:00:00 -0400] "HEAD /isp-spy/ebay-paypal-spy.pdf 
HTTP/1.0" 200 - ""
"Wget/1.12 (linux-gnu)" 90MB of log files and 2:56 hours later ended: - - [25/Jun/2013:02:56:13 -0400] "HEAD /dmca/cr19no99e.txt HTTP/1.0" 200 - "" "Wget/1.12 (linux-gnu)"

24 June 2013. The PRC Bot has gone away, last came 22 June 2013, 02:30AM. It came first January 15, 2013, 10:59AM, and has come daily since then.

14 June 2013. The scraping occurs about two hours a day, not every two hours.

13 June 2013

PRC Bot Continues to Scrape Cryptome


Relevant to the Edward Snowden disclosures about NSA hacking China and many others: For several months the IP address, traceable to central Beijing, logs into Cryptome with HEAD and GET commands every day commencing at 00:00 and running non-stop for over 2 hours. Start and finish log files today: - - [13/Jun/2013:00:00:00 -0400] 
"HEAD /2013/03/shabak-spy.pdf HTTP/1.0" 200 -
"Wget/1.12 (linux-gnu)" ... - - [13/Jun/2013:02:26:22 -0400] "HEAD /dmca/cr19no99e.txt HTTP/1.0" 200 - ""
"Wget/1.12 (linux-gnu)"

Total daily log files of this IP address is over 50MB, predominantly HEAD commands but periodically GET commands, and occasionally scraping the entire site of 70K files. Few other bots are as persistent and predictable.

It is possible the bot is cloaking its origin with the PRC IP address, perhaps hiding a version of the NSA bot (1) (2) from 1999 or that of another official or commercial spy. Not much is available on Google about this bot's address and attack.

We have been unable to block the attack with a variety of .htaccess blocking mechanisms which have been effective against other bots.

Advice on how to do so would be appreciated. Edward Snowden, hello.