[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: internet archive (WAS: The Economist and e-Archiving)
- To: liblicense-l@lists.yale.edu
- Subject: RE: internet archive (WAS: The Economist and e-Archiving)
- From: Eric Hellman <eric@openly.com>
- Date: Sun, 29 Jun 2003 23:10:53 EDT
- Reply-to: liblicense-l@lists.yale.edu
- Sender: owner-liblicense-l@lists.yale.edu
You can only have one robots.txt file per internet host. Lack of a robots.txt is interpreted as "it's ok to index this site" or "the idiots that manage this site don't know about robot exclusion", which are treated by robots as equivalent assertions. For the economist, which seems to know what it's doing, it looks like this: # # Economist.com robots.txt # # Created MS 29 May 2001 Full disallow # Amended MS 27 Jul 2001 Allow directories # User-agent: * Disallow: /about Disallow: /admin Disallow: /background Disallow: /bookshop Disallow: /briefings Disallow: /campusoffer Disallow: /cart Disallow: /CFDOCS Disallow: /CFIDE Disallow: /checkout Disallow: /classes Disallow: /cm Disallow: /community Disallow: /Copy of markets Disallow: /countries_old Disallow: /deal Disallow: /editorial Disallow: /email Disallow: /events Disallow: /globalagenda Disallow: /help Disallow: /images Disallow: /library Disallow: /maporama Disallow: /markets Disallow: /mba-direct Disallow: /me Disallow: /members Disallow: /mobile Disallow: /newswires Disallow: /partners Disallow: /perl Disallow: /printedition Disallow: /search Disallow: /shop Disallow: /shop_old Disallow: /specialdeal Disallow: /specialdeal1 Disallow: /studentoffer Disallow: /subcenter Disallow: /subscriptions Disallow: /surveys Disallow: /test.txt Disallow: /tfs Disallow: /travel2.economist Disallow: /voucher At 5:13 PM -0400 6/27/03, informania@supanet.com wrote:
Eric Hellman wrote, <In other words, if you put a robots.txt file on your server that excludes indexing of any files with path starting with "/content/", then they will remove from the archive any files from your server with path starting with "/content/".> So someone writing something that they think might get censored after publication should handily add a robots.txt file (="Kick me") at the front of their work so that the censorship can be accomplished on archive.com? I don't think so! Granted that, in the case of The Economist, the newspaper might take the decision to add such a file to all of their articles (still seems very doubtful though), but other than such mass-market publications, I can't see this happening. Consequently, in practice, retrospective deletions from The Wayback Machine remain difficult if not impossible. Chris Zielinski
- Prev by Date: NERL Principles for Electronic Journal Licenses
- Previous by thread: RE: internet archive (WAS: The Economist and e-Archiving)
- Next by thread: Haworth problem: Good news
- Index(es):