16 February 2002. Add Sun (which points to a 1998 article on robots.txt revelations: http://www.eiffel.com/private/meyer/robots.html )
15 February 2002. Add sites:
The Internet Movie Database, Defense Intelligence Agency, Argonne National Laboratory, Princeton University, American Airlines (informative, backdoor?), Disney, Electronic Data Systems (EDS) (careers/closer_look?), Center for Disease Control, US Agency for International Development, Department of Commerce (China?), Food and Drug Administration (area51?), Department of Health and Human Services, National Science Foundation (infonerdish).
14 February 2002. Thanks to S.
As noted by S, a site's file tree may be partially examined by calling up its robots.txt file if the site uses that method to exclude file access. To access a site's robots.txt:
http://www.site.name/robots.txt
White House (informative), European Union, United Nations (odd), NSA, DSD, FBI, Army, Air Force, House of Representatives, Department of Justice, US Courts, Treasury Department, IRS, Los Alamos National Laboratory (informative), Lawrence Livermore National Laboratory (jed's killers?) Verisign (most informative), New York Times, Morgan Stanley (informative), Citibank, Yale (Napster?), Stanford, MIT, Federation of American Scientists, Safeweb, Anonymizer (informative), EFF, Cryptome.
If you locate sites with really scandalous directories -- sex, porn, crime, payoffs, codewords, classified material, security holes, backdoors -- archive the evidence and let us know; send to: jya@pipeline.com
http://www.whitehouse.gov/robots.txt
# robots.txt for http://www.whitehouse.gov/ User-agent: * Disallow: /cgi-bin Disallow: /search Disallow: /query.html Disallow: /help Disallow: /afac/index.htm/text Disallow: /afac/text Disallow: /appointments/text Disallow: /cea/text Disallow: /ceq/text Disallow: /contact/text Disallow: /dpc/text Disallow: /email/text Disallow: /energy/text Disallow: /espanol/text Disallow: /firstlady/images/text Disallow: /firstlady/news-speeches/releases/print/text Disallow: /firstlady/news-speeches/releases/text Disallow: /firstlady/news-speeches/speeches/print/text Disallow: /firstlady/news-speeches/speeches/text Disallow: /firstlady/news-speeches/text Disallow: /firstlady/photoessay/text Disallow: /firstlady/text Disallow: /fsbr/text Disallow: /government/handbook/text Disallow: /government/images/text Disallow: /government/text Disallow: /greeting/text Disallow: /history/art/images/text Disallow: /history/art/text Disallow: /history/eeobtour/images/text Disallow: /history/eeobtour/text Disallow: /history/firstladies/text Disallow: /history/presidents/text Disallow: /history/text Disallow: /history/tours/print/text Disallow: /history/tours/text Disallow: /history/whtour/images/text Disallow: /history/whtour/text Disallow: /holiday/text Disallow: /homeland/text Disallow: /infocus/defense/text Disallow: /infocus/economy/text Disallow: /infocus/education/states/text Disallow: /infocus/education/text Disallow: /infocus/energy/text Disallow: /infocus/environment/text Disallow: /infocus/faith-based/text Disallow: /infocus/medicare/text Disallow: /infocus/social-security/text Disallow: /infocus/tax-relief/text Disallow: /infocus/text Disallow: /kids/abc/text Disallow: /kids/album/text Disallow: /kids/barney/text Disallow: /kids/connection/text Disallow: /kids/contact/text Disallow: /kids/dreamteam/text Disallow: /kids/firstlady/text Disallow: /kids/guide/print/text Disallow: /kids/guide/text Disallow: /kids/holiday/text Disallow: /kids/india/text Disallow: /kids/mrscheney/text Disallow: /kids/ofelia/text Disallow: /kids/president/text Disallow: /kids/quiz/text Disallow: /kids/spotty/text Disallow: /kids/teeball/text Disallow: /kids/teeball2/text Disallow: /kids/teeball3/text Disallow: /kids/text Disallow: /kids/timeline/text Disallow: /kids/tour/text Disallow: /kids/vicepresident/text Disallow: /library/omb/text Disallow: /mrscheney/news/text Disallow: /mrscheney/text Disallow: /national-anthem/text Disallow: /nec/text Disallow: /news/briefings/print/text Disallow: /news/briefings/text Disallow: /news/freedominitiative/text Disallow: /news/images/text Disallow: /news/nominations/text Disallow: /news/orders/text Disallow: /news/press/radio/text Disallow: /news/press/text Disallow: /news/print/releases/text Disallow: /news/print/text Disallow: /news/proclamations/text Disallow: /news/radio/print/text Disallow: /news/radio/text Disallow: /news/releases/2001/01/images/print/text Disallow: /news/releases/2001/01/images/text Disallow: /news/releases/2001/01/print/text Disallow: /news/releases/2001/01/text Disallow: /news/releases/2001/02/images/print/text Disallow: /news/releases/2001/02/images/text Disallow: /news/releases/2001/02/print/text Disallow: /news/releases/2001/02/text Disallow: /news/releases/2001/03/images/print/text Disallow: /news/releases/2001/03/images/text Disallow: /news/releases/2001/03/print/text Disallow: /news/releases/2001/03/text Disallow: /news/releases/2001/04/images/print/text Disallow: /news/releases/2001/04/images/text Disallow: /news/releases/2001/04/print/text Disallow: /news/releases/2001/04/text Disallow: /news/releases/2001/05/images/print/text Disallow: /news/releases/2001/05/images/text Disallow: /news/releases/2001/05/print/text Disallow: /news/releases/2001/05/text Disallow: /news/releases/2001/06/images/print/text Disallow: /news/releases/2001/06/images/text Disallow: /news/releases/2001/06/print/text Disallow: /news/releases/2001/06/text Disallow: /news/releases/2001/07/images/print/text Disallow: /news/releases/2001/07/images/text Disallow: /news/releases/2001/07/print/text Disallow: /news/releases/2001/07/text Disallow: /news/releases/2001/08/images/print/text Disallow: /news/releases/2001/08/images/text Disallow: /news/releases/2001/08/print/text Disallow: /news/releases/2001/08/text Disallow: /news/releases/2001/09/images/print/text Disallow: /news/releases/2001/09/images/text Disallow: /news/releases/2001/09/print/text Disallow: /news/releases/2001/09/text Disallow: /news/releases/2001/10/images/print/text Disallow: /news/releases/2001/10/images/text Disallow: /news/releases/2001/10/print/text Disallow: /news/releases/2001/10/text Disallow: /news/releases/2001/11/images/print/text Disallow: /news/releases/2001/11/images/text Disallow: /news/releases/2001/11/print/text Disallow: /news/releases/2001/11/text Disallow: /news/releases/2001/12/images/print/text Disallow: /news/releases/2001/12/images/text Disallow: /news/releases/2001/12/print/text Disallow: /news/releases/2001/12/text Disallow: /news/releases/2002/01/images/print/text Disallow: /news/releases/2002/01/images/text Disallow: /news/releases/2002/01/print/text Disallow: /news/releases/2002/01/text Disallow: /news/releases/print/text Disallow: /news/releases/text Disallow: /news/reports/text Disallow: /news/text Disallow: /news/usbudget/blueprint/text Disallow: /news/usbudget/states/print/text Disallow: /news/usbudget/states/text Disallow: /nsc/text Disallow: /oa/foia/text Disallow: /oa/jobs/text Disallow: /oa/oapo/text Disallow: /oa/text Disallow: /omb/budget/fy2002/text Disallow: /omb/budget/text Disallow: /omb/bulletins/text Disallow: /omb/circulars/a001/text Disallow: /omb/circulars/a016/text Disallow: /omb/circulars/a019/text Disallow: /omb/circulars/a021/text Disallow: /omb/circulars/a025/text Disallow: /omb/circulars/a034/text Disallow: /omb/circulars/a045/text Disallow: /omb/circulars/a050/text Disallow: /omb/circulars/a076/text Disallow: /omb/circulars/a087/text Disallow: /omb/circulars/a089/text Disallow: /omb/circulars/a094/text Disallow: /omb/circulars/a097/text Disallow: /omb/circulars/a102/text Disallow: /omb/circulars/a11/text Disallow: /omb/circulars/a110/text Disallow: /omb/circulars/a119/text Disallow: /omb/circulars/a122/text Disallow: /omb/circulars/a123/text Disallow: /omb/circulars/a126/text Disallow: /omb/circulars/a127/text Disallow: /omb/circulars/a129/text Disallow: /omb/circulars/a130/text Disallow: /omb/circulars/a131/text Disallow: /omb/circulars/a133/text Disallow: /omb/circulars/a133_compliance/00/text Disallow: /omb/circulars/a133_compliance/text Disallow: /omb/circulars/a134/text Disallow: /omb/circulars/a135/text Disallow: /omb/circulars/text Disallow: /omb/credit.bak/text Disallow: /omb/credit/text Disallow: /omb/fedreg/text Disallow: /omb/financial/text Disallow: /omb/foia/text Disallow: /omb/gils/text Disallow: /omb/grants/text Disallow: /omb/inforeg/text Disallow: /omb/legislative/7day/text Disallow: /omb/legislative/paygo/text Disallow: /omb/legislative/sap/105-1/text Disallow: /omb/legislative/sap/105-2/text Disallow: /omb/legislative/sap/106-1/text Disallow: /omb/legislative/sap/106-2/text Disallow: /omb/legislative/sap/107-1/appropriations/text Disallow: /omb/legislative/sap/107-1/number/text Disallow: /omb/legislative/sap/107-1/subcommittee/text Disallow: /omb/legislative/sap/107-1/text Disallow: /omb/legislative/sap/107-2/text Disallow: /omb/legislative/sap/1997/text Disallow: /omb/legislative/sap/1998/text Disallow: /omb/legislative/sap/1999/text Disallow: /omb/legislative/sap/2000/text Disallow: /omb/legislative/sap/text Disallow: /omb/legislative/testimony/text Disallow: /omb/legislative/text Disallow: /omb/memoranda/text Disallow: /omb/mgmt-gpra/text Disallow: /omb/organization/text Disallow: /omb/procurement/text Disallow: /omb/pubpress/text Disallow: /omb/recruitment/text Disallow: /omb/reports/text Disallow: /omb/text Disallow: /omb/whatsnew/text Disallow: /onap/text Disallow: /pfiab/text Disallow: /president/100days/text Disallow: /president/american-flag/text Disallow: /president/attack-response/text Disallow: /president/domestic-gallery/text Disallow: /president/gallery/photoessay/text Disallow: /president/gallery/text Disallow: /president/heartland-tour-gallery/text Disallow: /president/holiday/cards/text Disallow: /president/holiday/cheer/text Disallow: /president/holiday/deck-halls/text Disallow: /president/holiday/decorations/text Disallow: /president/holiday/hanukkah/text Disallow: /president/holiday/tree/text Disallow: /president/holiday/whtree/text Disallow: /president/images/text Disallow: /president/independence-day/text Disallow: /president/international-gallery/text Disallow: /president/intl-gallery2/text Disallow: /president/intl-gallery3/text Disallow: /president/intl-gallery4/text Disallow: /president/intl-gallery5/text Disallow: /president/intl-gallery6/text Disallow: /president/presidential-homes/text Disallow: /president/putin-visit/text Disallow: /president/statedinner-mexico-200109/text Disallow: /president/statedinnerprep-mexico-200109/text Disallow: /president/statevisitday2-mexico-200109/TEMP/text Disallow: /president/statevisitday2-mexico-200109/text Disallow: /president/tee-ball-01/text Disallow: /president/tee-ball-02/text Disallow: /president/tee-ball-03/text Disallow: /president/text Disallow: /president/world-leaders/text Disallow: /response/diplomatic/text Disallow: /response/military/text Disallow: /response/text Disallow: /text Disallow: /vicepresident/images/text Disallow: /vicepresident/news-speeches/speeches/print/text Disallow: /vicepresident/news-speeches/speeches/text Disallow: /vicepresident/news-speeches/text Disallow: /vicepresident/photoessay/text Disallow: /vicepresident/text Disallow: /whmo/text User-agent: whsearch Disallow: /cgi-bin Disallow: /search Disallow: /query.html Disallow: /help Disallow: /sitemap.html Disallow: /privacy.html Disallow: /accessibility.html
http://www.europa.eu.int/robots.txt
# robots.txt for EUROPA httpd-80 production server # # created by Rudi Mosselmans on 8/10/96 # User-agent: * # match any robot name Disallow: /cgi-bin/ # don't allow robots into cgi-bin Disallow: /comm/agriculture/rica/dwh/ # prevent robots from overrunning SAS Disallow: /comm/commissioners/liikanen/_ # Albert Rouben 20010918 Disallow: /comm/commissioners/liikanen/bin # Albert Rouben 20010918
# robots.txt for http://www.cyberschoolbus.org/ User-agent: * Disallow:
User-agent: * Disallow: /images/ Disallow: /templates/ Disallow: *.gif Disallow: notice.html Disallow: statistics.html
Australian Defense Signals Direcorate
http://www.dsd.gov.au/robots.txt
User-agent: * Disallow: /cgi-bin/sources
Federal Bureau of Investigation
User-Agent: Disallow:
http://www.army.mil/robots.txt
User-agent: * Disallow: /cgi-bin/ Disallow: /reports/ Disallow: /summary/ Disallow: /old_design_ahp/ Disallow: /beta/ Disallow: /Documentation/ Disallow: /old_design_ahp/ Disallow: /images/ Disallow: /logs/ Disallow: /monthly/ Disallow: /photos/ Disallow: /vetinfo/ Disallow: /100days/
#robots.txt to skip over CGI directories User-agent: * Disallow: /cgi-bin/ Disallow: /passcgi/ Disallow: /tmp/
http://www.house.gov/robots.txt
# # No robots allowed in the following directories ! # User-agent: * Disallow: /htbin Disallow: /docs/ARCHIVE Disallow: /docs/apps Disallow: /docs/moved_sites Disallow: /docs/temp Disallow: /docs/test Disallow: /docs/sites/bin Disallow: /docs/sites/etc Disallow: /docs/sites/dev Disallow: /docs/sites/usr Disallow: /docs/sites/other/webassistance
http://www.usdoj.gov/robots.txt
User-agent: * Disallow: /Admin/ Disallow: /cgi-bin/ Disallow: /help/ Disallow: /img/ Disallow: /gif/ Disallow: /ins/ Disallow: /gopherdata/ Disallow: /ojp/ Disallow: /wusage/ Disallow: /archive/ Disallow: /opa/pr/support/ User-agent: Netscape-Compass-Robot/Archive Disallow:
http://www.uscourts.gov/robots.txt
User-Agent: Disallow:
http://www.treasury.gov/robots.txt
User-agent: * Disallow: /cgi-bin/ Disallow: /getstats/ Disallow: /home-temp/ Disallow: /logs/ Disallow: /new.junk/ Disallow: /public/ Disallow: /statbot/ Disallow: /templates/ Disallow: /webcache/ Disallow: /test/
User-agent: * Disallow: /foo/foobar.html Disallow: /barfoo
Los Alamos National Laboratory
http://www.lanl.gov/robots.txt
User-agent: * Disallow: /tools/hypermail/ Disallow: /projects/etcap/bib/ Disallow: /orgs/cic/cic1/testsite Disallow: /orgs/im/im1/testsite Disallow: /projects/asci/statusreports/ Disallow: /projects/asci/ascijobs/ Disallow: /projects/asci/OLD_ARCHIVE/ Disallow: /projects/asci/bluemtn/OLD_BLUE_ARCHIVE/ Disallow: /projects/sme/OLD_ARCHIVE/ Disallow: /www-team/ Disallow: /projects/wwwug/OLD_STUFF Disallow: /projects/asci/DCE/OLD Disallow: /orgs/citpo
Lawrence Livermore National Laboratory
http://www.llnl.gov/robots.txt
# robots.txt file for www.llnl.gov User-agent: * # all web crawlers and searchers Disallow: /tmp/ # temp files #Disallow: /www/llnl-bin/ # stay out of binaries #Disallow: /www/llnl_only # stay out of internal #Disallow: /www/llnl_only-bin # stay out of internal binaries #Disallow: /www/review # stay out of unreviewed pages # This is how Lee thinks this should look Disallow: cgi-bin/ # stay out of binaries Disallow: llnl-bin/ # stay out of binaries Disallow: /llnl-bin/ # stay out of binaries #Disallow: /llnl_only/ # stay out of internal # disallowed by httpd server Disallow: /llnl_only-bin/ # stay out of internal binaries Disallow: /development/ # Stay out of development Disallow: /development-bin/ # stay out of development-bin dirs Disallow: /review/ # stay out of unreviewed pages Disallow: /stats/ # stay out of statistics pages Disallow: /llnl_only/stats/ # stay out of statistics pages Disallow: /llnl/lists/historyarc # Stay out of list-of-lists history Disallow: /historyarc/ # Stay out of list-of-lists history Disallow: atp/comprehensive2-95.html # jed's killers Disallow: atp/www-servers.html Disallow: atp/telecom-media.html Disallow: /atp/crackdown/ # wrong stuff Disallow: llnl_only/tid/lof/test # Library of Future test files Disallow: llnl/lists/ # memory fault problems Disallow: /www/IPandC/opportunities93 # obsolete pages Disallow: /www/tid/lof/documents # lof pages, index them manually
_________________
[Thanks to SP.]
"jed's killers" may have to do with security -
"I am a person who likes challenges, physical and mental. My university degrees are bachelors in Mathematics and Physics, and a Masters in Mathematics from the Davis campus of the University of California (where I REALLY enjoyed school and packed in every class I possibly could). I was a hacker coming out of school. I dropped into a wonderful environment at the Lawrence Livermore Laboratory where I was able to hack the early ARPA network (circa 1973) implementing network protocols and breaking into computer systems on the net as part of a "tiger" team. I was the technical liaison for LLNL to the ARPA network during the middle 1970s. Although my given name is James, there were too many kids that answered to "Jim" in high school, so I changed my preferred nickname to "Jed", my initials." - http://www.webstart.com/jed/jed-personal.html
http://verisign.com/robots.txt
User-Agent: * Disallow: /about/ Disallow: /aol/ Disallow: /att/ Disallow: /authentic/ Disallow: /aventail/ Disallow: /b2b/ Disallow: /cd/ Disallow: /cdrom/ Disallow: /checkpoint/ Disallow: /client/ Disallow: /clientauth/ Disallow: /contact/ Disallow: /cps/ Disallow: /criticalpath/ Disallow: /cus/ Disallow: /demos/ Disallow: /developers/ Disallow: /dm/ Disallow: /domain/ Disallow: /ebiz/ Disallow: /employment/ Disallow: /error/ Disallow: /events/ Disallow: /exchange/ Disallow: /feature/ Disallow: /gov/ Disallow: /government/ Disallow: /graphics/ Disallow: /idcenter/ Disallow: /images/ Disallow: /installshield/ Disallow: /investor/ Disallow: /its/ Disallow: /japan/ Disallow: /learn/ Disallow: /library/ Disallow: /link/ Disallow: /lobby/ Disallow: /mcsp/ Disallow: /microsoft/ Disallow: /netscape/ Disallow: /microsoft/ Disallow: /msmail/ Disallow: /netsure/ Disallow: /newballgame/ Disallow: /nike/ Disallow: /nowsafe/ Disallow: /nsi/ Disallow: /nspremsvcs/ Disallow: /offer/ Disallow: /onsite/ Disallow: /partner/ Disallow: /payment/ Disallow: /press/ Disallow: /product/ Disallow: /rpa/ Disallow: /rpa-kr/ Disallow: /rsa2000/ Disallow: /rsc/ Disallow: /securemail/ Disallow: /server/ Disallow: /servicecenter/ Disallow: /services/ Disallow: /set/ Disallow: /sia/ Disallow: /signio/ Disallow: /site/ Disallow: /smime/ Disallow: /solutions/ Disallow: /spectrum/ Disallow: /spt/ Disallow: /supporyt/ Disallow: /trial/ Disallow: /transarc/ Disallow: /update/ Disallow: /valid/ Disallow: /vpnseminar/ Disallow: /vselp/ Disallow: /webtrust/ Disallow: /westgroup/ Disallow: /whitepaper/ Disallow: /win2000/ Disallow: /wireless/ Disallow: /y2k/
http://www.nytimes.com/robots.txt
# robots.txt, nytimes.com 1/18/2001 # User-agent: * Disallow: /96 Disallow: /97 Disallow: /98 Disallow: /99 Disallow: /00 Disallow: /01 Disallow: /1996 Disallow: /1997 Disallow: /1998 Disallow: /1999 Disallow: /2000 Disallow: /2001 Disallow: /library Disallow: /aponline Disallow: /reuters Disallow: /cnet Disallow: /partners Disallow: /archives Disallow: /indexes Disallow: /events Disallow: /features Disallow: /reference Disallow: /specials Disallow: /services Disallow: /thestreet Disallow: /weather Disallow: /RealMedia
http://www.morganstanley.com/robots.txt
User-agent: * Disallow: /institutional/investmentmanagement/10 Disallow: /institutional/investmentmanagement/20 Disallow: /institutional/investmentmanagement/30 Disallow: /institutional/investmentmanagement/40 Disallow: /institutional/investmentmanagement/50 Disallow: /institutional/investmentmanagement/hnavs Disallow: /institutional/investmentmanagement/products Disallow: /institutional/investmentmanagement/clbuttons Disallow: /institutional/investmentmanagement/img Disallow: /institutional/investmentmanagement/cgi-bin/msdwim/parser.pl Disallow: /institutional/investmentmanagement/cgi-bin/msdwim/siteSearch.pl Disallow: /institutional/investmentmanagement/cgi-bin/msdwim/productSearch.pl Disallow: /institutional/investmentmanagement/70/71 Disallow: /institutional/investmentmanagement/70/72 Disallow: /institutional/investmentmanagement/70/73 Disallow: /institutional/investmentmanagement/70/74 Disallow: /institutional/investmentmanagement/70/75
http://www.citibank.com/robots.txt
# robots.txt for http://www.citicorp.com/ User-agent: * Disallow: /cgi-bin/ # scripts Disallow: /usage/ # WWW usage statistics Disallow: /statistics/ # UNIX statistics Disallow: /wwwstat/ # More WWW stats Disallow: /accesswatch/ # Even More WWW stats Disallow: /branches/AP # Asian/Pacific Branches (too many...) Disallow: /branches/EU # European Branches (too many...) Disallow: /branches/LA # Latin American Branches (too many...) Disallow: /branches/NA # North American Branches (too many...)
http://www.yale.edu/robots.txt
User-agent: * Disallow: /engineering/ Disallow: /webmaster/stats/ Disallow: /webmaster/logs/ Disallow: /napster/
http://www.stanford.edu/leland/robots.txt
# Robot Policy file as per Robot Exclusion standard 17-jun-94 User-agent: Lycos Disallow: / User-agent: Lycos_Spider_(T-Rex)/1.0 Disallow: / User-agent: Lycos_Spider_(T-Rex)/3.0 Disallow: /
Massachusetts Institute of Technology
# robots.txt for http://www.mit.edu/ User-agent: * Disallow: /cgi/ Disallow: /comment Disallow: /finger Disallow: /machine Disallow: /zlocate Disallow: /zwrite
Federation of American Scientists
User-agent: * Disallow: /eye/kosovo User-agent: ia_archiver Disallow: /irp/overhead/ User-agent: ia_archiver Disallow: /irp/facilities/
# exclude help system from robots User-agent: * Disallow: /manual/ Disallow: /doc/ Disallow: /gif/ # but allow htdig to index our doc-tree User-agent: susedig Disallow:
http://anonymizer.com/robots.txt
User-agent: * Disallow: /china/ Disallow: /documents/ Disallow: /errors/ Disallow: /images/ Disallow: /includes/ Disallow: /india/ Disallow: /japan/ Disallow: /styles/ Disallow: /cgi-bin/
Electronic Frontier Foundation
User-agent: * # applies to all robots Disallow: /temp # disallow indexing of these pages Disallow: /test Disallow: /Temp Disallow: /Test Disallow: /tmp Disallow: /Tmp Disallow: /templates Disallow: /Templates Disallow: /internal Disallow: /Internal Disallow: /staff Disallow: /Staff Disallow: /old Disallow: /Old Disallow: /duh Disallow: /homes/mech/Temp Disallow: /homes/mech/A-G Disallow: /~mech/Temp Disallow: /~mech/A-G Disallow: /˜mech/Temp Disallow: /˜mech/A-G Disallow: /˜mech/Temp Disallow: /˜mech/A-G Disallow: /˜mech/Temp Disallow: /˜mech/A-G Disallow: /%7Emech/Temp Disallow: /%7emech/A-G
http://cryptome.org/robots.txt
# go away User-agent: * Disallow: /
Added 15 February 2002
[Thanks to BH.]
LOL I like the message at the end of this one. :)
# robots.txt for http://us.imdb.com/ & mirror sites
User-agent: * Disallow: /MyMovies Disallow: /register Disallow: /tiger_redirect Disallow: /Title/ASIN* Disallow: /M/ Disallow: /Ballot/ Disallow: /Icons/ Disallow: /Movies/ Disallow: /harvest_me Disallow: /Tsearch Disallow: /Nsearch Disallow: /Credits Disallow: /Details Disallow: /More Disallow: /Bio Disallow: /List Disallow: /GName Disallow: /SName Disallow: /FName Disallow: /AName Disallow: /RName Disallow: /PName Disallow: /VName Disallow: /Movies Disallow: /Companies Disallow: /Mlinks Disallow: /Guests Disallow: /Quotes Disallow: /OnThisDay Disallow: /BusinessThisDay Disallow: /Goofs Disallow: /Trivia Disallow: /Goofs Disallow: /Soundtracks Disallow: /CrazyCredits Disallow: /AlternateVersions Disallow: /Recommendations Disallow: /AddRecommendation Disallow: /Reviews Disallow: /Tawards Disallow: /Ratings Disallow: /Awards Disallow: /Sales Disallow: /SearchBios Disallow: /BTrivia Disallow: /BQuotes Disallow: /BWorks Disallow: /BPublicity Disallow: /SearchQuotes Disallow: /BAgent Disallow: /Business Disallow: /Taglines Disallow: /ReleaseDates Disallow: /Locations Disallow: /Technical Disallow: /Laserdisc Disallow: /DVD Disallow: /Laserdisc Disallow: /Literature Disallow: /Trailers Disallow: /NUrls Disallow: /TUrls Disallow: /Ratings Disallow: /OnTV Disallow: /Ontv Disallow: /Psales Disallow: /Pawards Disallow: /Posters Disallow: /Showing Disallow: /Quiz Disallow: /BornInYear Disallow: /DiedInYear Disallow: /MarriedInYear Disallow: /ExciteTitle Disallow: /TitleBrowse Disallow: /Vote Disallow: /WorkedWith Disallow: /Character Disallow: /SearchTrivia Disallow: /SearchLiterature Disallow: /SearchGoofs Disallow: /SearchTechnical Disallow: /SearchRatios Disallow: /SearchBusiness Disallow: /SearchLaserdisc Disallow: /SearchDVD Disallow: /SearchAwards Disallow: /SearchSongs Disallow: /SearchVersions Disallow: /SearchCrazy Disallow: /SearchPlots Disallow: /SearchPlotWriters Disallow: /SearchTaglines Disallow: /ShowAll Disallow: /LocationTree Disallow: /JointVentures Disallow: /pick_n_mix Disallow: /prepare_data Disallow: /name_pick_n_mix Disallow: /MetaSearch Disallow: /HelpPage Disallow: /ActorSearch Disallow: /ActressSearch Disallow: /BornWhere Disallow: /Overlap Disallow: /ReleasedInYear Disallow: /DiedWhere Disallow: /Maltin Disallow: /CommentsShow Disallow: /CommentsEnter Disallow: /CommentsAuthor Disallow: /CommentsIndex Disallow: /Showtimes Disallow: /Find Disallow: /Lookup Disallow: /boards # stop reading here! :-) # Stay out of these directories they contain temporary files, or will # cause unwanted DATABASE QUERY system load or BOTH. # Robots that cause a denial of service will be charged $0.01 per unwanted # request to this site. Follow the guidelines + robots.txt convention and # we can all live happily together. # # If you have any questions or need permission to work inside of the # protected URLs, please contact www @ imdb.com # Mozilla/4.0 (compatible; MSIE 5.12; Mac_PowerPC) ... we are watching you!
# robots.txt # # purpose: Exclude robots from directories. # # description: A 'robots.txt' file consists of 'records' # which are composed of a 'User-agent' line followed by # some number of 'Disallow' directives. There should NOT # be any blank lines between the 'User-agent' and 'Disallow' # directives. There may be multiple 'records' for different # named 'User-agents'. # # modification history # ---------------------------------------------------------- # 20000619 Created # # ---------------------------------------------------------- User-agent: * Disallow: /Business Disallow: /Careers Disallow: /Errors Disallow: /Graphics Disallow: /History Disallow: /Jmic Disallow: /Public Disallow: /This Disallow: /bin
# robots.txt for www.anl.gov User-agent: * Disallow: /x500 Disallow: /ECT/et Disallow: /ECT/templates Disallow: /pjbmosaic Disallow: /ECT/djs Disallow: /OPA/galley/captions/ Disallow: /CPP/exploit.html Disallow: /MERTC/
http://www.princeton.edu/robots.txt
# General purpose User-agent: * Disallow: /cgi-bin/ Disallow: /cgi/ Disallow: /~mablount/unix/ # an infinite virtual URL space Disallow: /~mablount/mablount/ # an infinite virtual URL space Disallow: /~mkporwit/pub_links/ # an infinite virtual URL space Disallow: /~sdesiano/pub_links/ # an infinite virtual URL space Disallow: /~gmsierra/pub/ # an infinite virtual URL space Disallow: /~lhjensen/pub_links/ # an infinite virtual URL space Disallow: /~euphorb/Issues/ # an infinite virtual URL space Disallow: /~mablount/Submissions/ # an infinite virtual URL space Disallow: /~mablount/Staff/ # an infinite virtual URL space Disallow: /webinator/ #Disallow: /Princeton/GG/ Disallow: /~financial/ # Under development Disallow: /dev/ # CIT Web Services Group development area
http://www.aa.com/robots.txt
User-agent: * Disallow: /404 Disallow: /aa_home Disallow: /aad/* Disallow: /adjust_profile.tmpl Disallow: /away Disallow: /back.html Disallow: /backdoor.html Disallow: /backto.html Disallow: /bookmark.tmpl Disallow: /bookmarks Disallow: /bottom.html Disallow: /citibank Disallow: /corporate Disallow: /countries.txt Disallow: /default.html Disallow: /directURL Disallow: /editorials Disallow: /entry.tmpl Disallow: /error Disallow: /example.html Disallow: /footerlong.html Disallow: /footershort.html Disallow: /frameset.html Disallow: /global-metanav.html Disallow: /global-metanav.html.secure Disallow: /graphics.html Disallow: /images Disallow: /index-error.tmpl Disallow: /index-guest.tmpl Disallow: /index-member.tmpl Disallow: /index.tmpl Disallow: /index_sitedown.html Disallow: /intro.html Disallow: /list.txt Disallow: /main-bookmark.tmpl Disallow: /main-guest.tmpl Disallow: /main-member.tmpl Disallow: /media Disallow: /message.html Disallow: /message.html.down Disallow: /meta Disallow: /nav-member.tmpl Disallow: /navguest.tmpl Disallow: /npd Disallow: /onlinelogo.gif Disallow: /pands Disallow: /profile8.tmpl Disallow: /promo Disallow: /prototypes Disallow: /redirect.links.tmpl Disallow: /redirect.tmpl Disallow: /redirect2.tmpl Disallow: /redirect3.tmpl Disallow: /redirect4.tmpl Disallow: /secure Disallow: /secure2 Disallow: /shop.html Disallow: /sitetour Disallow: /specials Disallow: /specialsfarecontent.tmpl Disallow: /specialsfaremore.tmpl Disallow: /targeting Disallow: /TAusage.html.tmpl Disallow: /ticonderoga Disallow: /travelp/* Disallow: /upgrade.html Disallow: /wait.html Disallow: /wait.new.html # # USWeb additions Disallow: /usw-metrics
http://disney.go.com/robots.txt
# # robot.txt file for http://disney.go.com/ # User-agent: * Disallow: /cgi-bin Disallow: /Mail Disallow: /Help Disallow: /Search Disallow: /Sign-in Disallow: /ActiveDesktop Disallow: /Legal Disallow: /Ads/ Disallow: /DesignOnline Disallow: /DisneyCareers Disallow: /DisneyStore Disallow: /DisneyWorld/Local Disallow: /TravelAgents/Booking Disallow: /DisneyVacationClub/VideoOffer Disallow: /globalmedia/chrome Disallow: /dmail User-agent: Ultraseek Disallow: /cgi-bin Disallow: /Mail Disallow: /Search Disallow: /Sign-in Disallow: /ActiveDesktop Disallow: /Legal Disallow: /Ads/ Disallow: /DesignOnline Disallow: /DisneyWorld/Local Disallow: /TravelAgents/Booking Disallow: /DisneyVacationClub/VideoOffer Disallow: /globalmedia/chrome Disallow: /dmail
User-agent: * Disallow: /ads/ Disallow: /database/ Disallow: /excite/ Disallow: /images/ Disallow: /administrative/common_includes/ Disallow: /administrative/common_images/ Disallow: /administrative/maintenance/ Disallow: /administrative/portal/ Disallow: /administrative/stock_info/ Disallow: /administrative/swish/ Disallow: /administrative/excite/ Disallow: /administrative/ir_info/ Disallow: /administrative/images/ Disallow: /administrative/news/ Disallow: /administrative/stock_info/ Disallow: /administrative/search/ Disallow: /95annual_text/ Disallow: /careers/closer_look/ Disallow: /gmec/ Disallow: /general/workforce_mgmt/client/ Disallow: /case_studies/security/ Disallow: /general/career_resource/
# Ignore FrontPage files User-agent: * Disallow: /_borders Disallow: /_derived Disallow: /_fpclass Disallow: /_overlay Disallow: /_private Disallow: /_themes Disallow: /_vti_bin Disallow: /_vti_cnf Disallow: /_vti_log Disallow: /_vti_map Disallow: /_vti_pvt Disallow: /_vti_txt # Rover is a bad dog User-agent: Roverbot Disallow: / # EmailSiphon is a hunter/gatherer which extracts email addresses for spam-mailers to use User-agent: EmailSiphon Disallow: /
US Agency for International Development
http://www.usaid.gov/robots.txt
User-agent: * Disallow: /~*/ Disallow: /info_technology/xweb/ Disallow: /economic_growth/egad/ Disallow: /pop_health/dev/ Disallow: /regions/eni/partners/ Disallow: /test/ Disallow: /info_technology/itt/ptt Disallow: /iss/ Disallow: /regions/afr/mission/test/rcsa/ Disallow: /test/senegal/ Disallow: /info_technology/xweb/stats/ Disallow: /stats/
User agent: * disallow: /TopicalIndex User agent: * disallow: /ops User agent: * disallow: /china User agent: * disallow: /_BORDERS User agent: * disallow: /_CUSUDI User agent: * disallow: /_DISC1 User agent: * disallow: /_DISCU~1 User agent: * disallow: /_PRIVATE User agent: * disallow: /_SDISC1 User agent: * disallow: /_VTI_BIN User agent: * disallow: /_VTI_CNF User agent: * disallow: /_VTI_LOG User agent: * disallow: /_VTI_PVT User agent: * disallow: /_VTI_TXT User agent: * disallow: /CRC User agent: * disallow: /digeconomy User agent: * disallow: /directory user agent: * disallow: /ecommerce user agent: * disallow: /images user agent: * disallow: /pics user agent: * disallow: /listserv user agent: * disallow: /logos user agent: * disallow: /ops user agent: * disallow: /samples User agent: * disallow: /script3 user agent: * disallow: /search user agent: * disallow: /srchadmin user agent: * disallow: /test user agent: * disallow: /tperl
#robots.txt file for http://www.fda.gov User-agent: * Disallow: /scripts/ Disallow: /data/ Disallow: /binn/ Disallow: /cder/test/ Disallow: /opacom/area51/ Disallow: /oha/ Disallow: /oc/ofacs/staffmanuals/3130_3.htm Disallow: /oc/clips/ #Disallow: /cdrh/ftparea/cdrh/MDR/coll/mdr/mdrcoll/ Hit-rate: 30 # wait 30 seconds before starting a new URL request default=30 Visiting-hours: 23:00EDT-05:00EDT #index this site between 11PM - 5AM EDT Concurrent-hits: 2 # limit concurrent active URLS to 2 for each index server
Department of Health and Human Services
# robots.txt for http://www.hhs.gov # robots.txt for http://www.os.dhhs.gov # robots.txt for http://www.dhhs.gov # robots.txt for http://os.dhhs.gov:80 user-agent: * # directed to all spiders, not just Scooter Disallow: /CGI/ Disallow: /cgi-bin/ Disallow: /analog/ Disallow: /analyze/ Disallow: /images/ Disallow: /template/ Disallow: /templates/ Disallow: /sources/ Disallow: /response/ Disallow: /search/ Disallow: /wwwstats/ Disallow: /HTMLgen/ Disallow: /htmlgen/ Disallow: /HTMLGen/ Disallow: /gay/ Disallow: /hrsa/ Disallow: /osophs/ Disallow: /aoa/ Disallow: /epsa/ Disallow: /su_docs/ Disallow: /access_stats Disallow: /progorg/ohr/forum/EAPopen/ Disallow: /progorg/ohr/forum/EAPpro/ Disallow: /cio/ Disallow: /news/archive/ Disallow: /masters/ Hit-rate: 30 # wait 30 seconds before starting a new URL request default=30 Visiting-hours: 01:00EST-05:00EST # index this site between 1AM - 5AM EST Concurrent-hits: 2 # limit concurrent active URLs to 2 for each index server
# robots.txt for http://www.nsf.gov/ # see <http://web.nexor.co.uk/mak/doc/robots/norobots.html> for an explanation. # Change history: User-agent: MOMspider # The Multi-Owner Maintenance Spider Disallow: /cgi-bin/ # Script files Disallow: /stats/ # Big statistics files Disallow: /pubsys/data/ # ODS database files Disallow: /search97cgi/vtopic/ # Disable index search engine Disallow: /home/ebulletin/archive/ # skip Ebulletin archive Disallow: /sbe/srs/start.htm User-agent: vspider # Verity spider Disallow: /cgi-bin/ # Script files Disallow: /stats/ # Big statistics files Disallow: /home/nsforg/ # Breaks Verity spider - temporary Disallow: /awards/ # Award abstracts Disallow: /pubsys/data/ # ODS database files Disallow: /search97cgi/vtopic/ # Disable index search engine Disallow: /seind98/topdemo.htm # temp Disallow: /nsf99338/topdemo.htm # temp Disallow: /home/ebulletin/archive/ # skip Ebulletin archive Disallow: /sbe/srs/start.htm Disallow: /web/ # authoring guide User-agent: * # All other spiders should avoid Disallow: /cgi-bin/ # Script files Disallow: /stats/ # Big statistics files Disallow: /pubsys/data/ # ODS database files Disallow: /search97cgi/vtopic/ # Disable index search engine Disallow: /seind98/topdemo.htm # temp Disallow: /nsf99338/topdemo.htm # temp Disallow: /home/ebulletin/archive/ # skip Ebulletin archive Disallow: /sbe/srs/start.htm Disallow: /web/ #authoring guide
[Thanks to JM.]
16 February 2002
# /robots.txt for www.sun.com #-------------------------------------------------------------------------- # Mon Feb 2 11:59:27 PST 1998, Fred Elliott # A NOTE TO THOSE WHO'D BOTHER TO LOOK AT THIS FILE: # # Bertrand Meyer's excellent "comp.risks" posting about the potential # for misusing "robots.txt" files # (http://www.eiffel.com/private/meyer/robots.html) includes a snapshot # of the contents of this file here on www.sun.com. # # In the article, Bertrand speculates that the directories listed below # contain proprietary information. Well, they don't. They do, though, # contain information that we'd prefer people register for before they # download it. # # The purpose of the "robots.txt" file is to keep these directories # from being indexed so that the average user doesn't stumble across them # while performing searches, and those that should be accessing these # directories will do so through the URL that requires them to register. # Of course, having the contents of this file advertised in "comp.risks" # diminishes its purpose. Thanks Bertrand. ;-) # # If you do actually go to the trouble of figuring out how to download # the files without registering, what you'll end up with is 1 or 2MB of # stuff that is meaningless to you unless you have purchased an # Ultra AX board from Sun. So, please do purchase an Ultra AX board, # but then you might as well use the URL you'll be given along with it. #-------------------------------------------------------------------------- # # Thu Jan 30 16:58:19 PST 1997, Fred Elliott # o Created this file to prevent indexing of one # SME directory. User-agent: * Disallow: /sparc/SPARCengineUltraAX/download/ Disallow: /microelectronics/SPARCengineUltraAX/download/ Disallow: /javachip/SPARCengineUltraAX/download/ Disallow: /javachips/SPARCengineUltraAX/download/ Disallow: /joeroebuck/ # Java Systems files Disallow: /javastation/remotewindowing/citrix/JICAEng.zip Disallow: /javastation/remotewindowing/citrix/JavaEnt.tar