[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: internet archive (WAS: The Economist and e-Archiving)



yes, any and all exclusions would need to go into the robots.txt file. The
effect on the Wayback Machine IS retroactive (though not instantaneous, of
course.) if the economist wanted to block access to a particular file,
they would just add an exclusion to their robots file. The next time the
internet archive robot visited the economist, it would pick up the robots
file and use it to update the site exclusions.

A quick look at the Economist site shows that a lot of it is in fact
exposed to robots- google shows 41,000 pages indexed, including /countries
, /cities , /books , /world , /science

http://www.google.com/search?hl=en&ie=ISO-8859-1&q=site%3Aeconomist.com+economist&btnG=Google+Search


At 12:15 AM -0400 7/3/03, informania@supanet.com wrote:
If I understand this correctly, this means that all and any exclusions
would need to be put into the robot.txt file.  Surely this can't be done
retroactively? In relation to this thread, the question is, how can such a
file be used to get the Wayback Machine or other archiving services to
retroactively delete or deny access to content?

It looks from the example you have sent that the Economist refuses access
to index any materials in what looks like all the subdirectories on their
site - if that's the case, does that mean the whole of the Economist would
not be picked up by the Wayback Machine?


Chris Zielinski
STP, CSI/EGB/WHO
Avenue Appia, CH-1211
Geneva, Switzerland
Tel (Mobile): 0044797-10-45354