[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: internet archive (WAS: The Economist and e-Archiving)



Eric Hellman wrote,

<In other words, if you put a robots.txt file on your server that excludes
indexing of any files with path starting with "/content/", then they will
remove from the archive any files from your server with path starting with
"/content/".>

So someone writing something that they think might get censored after
publication should handily add a robots.txt file (="Kick me") at the front
of their work so that the censorship can be accomplished on archive.com? I
don't think so!

Granted that, in the case of The Economist, the newspaper might take the
decision to add such a file to all of their articles (still seems very
doubtful though), but other than such mass-market publications, I can't
see this happening. Consequently, in practice, retrospective deletions
from The Wayback Machine remain difficult if not impossible.

Chris Zielinski

-----------

Just joined to say that the Wayback Machine is doing a good job at obeying
robots.txt. One of my sites has been archived for more than 3 years. I
wanted them to stop it and excluded ia_archiver. All files have been
removed from their servers within a pretty short period of time. All it
says is:

We're sorry, access to <URL snipped> has been blocked by the site owner
via robots.txt.

Janet Smith
http://www.zomilla.org