[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: internet archive (WAS: The Economist and e-Archiving)



You can only have one robots.txt file per internet host. Lack of a
robots.txt is interpreted as "it's ok to index this site" or "the idiots
that manage this site don't know about robot exclusion", which are treated
by robots as equivalent assertions. For the economist, which seems to know
what it's doing, it looks like this:

#
# Economist.com robots.txt
#
# Created MS 29 May 2001 Full disallow
# Amended MS 27 Jul 2001 Allow directories
#
User-agent: *
Disallow: /about
Disallow: /admin
Disallow: /background
Disallow: /bookshop
Disallow: /briefings
Disallow: /campusoffer
Disallow: /cart
Disallow: /CFDOCS
Disallow: /CFIDE
Disallow: /checkout
Disallow: /classes
Disallow: /cm
Disallow: /community
Disallow: /Copy of markets
Disallow: /countries_old
Disallow: /deal
Disallow: /editorial
Disallow: /email
Disallow: /events
Disallow: /globalagenda
Disallow: /help
Disallow: /images
Disallow: /library
Disallow: /maporama
Disallow: /markets
Disallow: /mba-direct
Disallow: /me
Disallow: /members
Disallow: /mobile
Disallow: /newswires
Disallow: /partners
Disallow: /perl
Disallow: /printedition
Disallow: /search
Disallow: /shop
Disallow: /shop_old
Disallow: /specialdeal
Disallow: /specialdeal1
Disallow: /studentoffer
Disallow: /subcenter
Disallow: /subscriptions
Disallow: /surveys
Disallow: /test.txt
Disallow: /tfs
Disallow: /travel2.economist
Disallow: /voucher


At 5:13 PM -0400 6/27/03, informania@supanet.com wrote:
Eric Hellman wrote,

<In other words, if you put a robots.txt file on your server that excludes
indexing of any files with path starting with "/content/", then they will
remove from the archive any files from your server with path starting with
"/content/".>

So someone writing something that they think might get censored after
publication should handily add a robots.txt file (="Kick me") at the front
of their work so that the censorship can be accomplished on archive.com? I
don't think so!

Granted that, in the case of The Economist, the newspaper might take the
decision to add such a file to all of their articles (still seems very
doubtful though), but other than such mass-market publications, I can't
see this happening. Consequently, in practice, retrospective deletions
from The Wayback Machine remain difficult if not impossible.

Chris Zielinski