[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: internet archive (WAS: The Economist and e-Archiving)
- To: liblicense-l@lists.yale.edu
- Subject: RE: internet archive (WAS: The Economist and e-Archiving)
- From: informania@supanet.com
- Date: Thu, 3 Jul 2003 00:15:02 EDT
- Reply-to: liblicense-l@lists.yale.edu
- Sender: owner-liblicense-l@lists.yale.edu
If I understand this correctly, this means that all and any exclusions would need to be put into the robot.txt file. Surely this can't be done retroactively? In relation to this thread, the question is, how can such a file be used to get the Wayback Machine or other archiving services to retroactively delete or deny access to content? It looks from the example you have sent that the Economist refuses access to index any materials in what looks like all the subdirectories on their site - if that's the case, does that mean the whole of the Economist would not be picked up by the Wayback Machine? Chris Zielinski STP, CSI/EGB/WHO Avenue Appia, CH-1211 Geneva, Switzerland Tel (Mobile): 0044797-10-45354 -----Original Message----- From: owner-liblicense-l@lists.yale.edu Sent: Monday, 30 June 2003 05:11 To: liblicense-l@lists.yale.edu Subject: RE: internet archive (WAS: The Economist and e-Archiving) You can only have one robots.txt file per internet host. Lack of a robots.txt is interpreted as "it's ok to index this site" or "the idiots that manage this site don't know about robot exclusion", which are treated by robots as equivalent assertions. For the economist, which seems to know what it's doing, it looks like this: # # Economist.com robots.txt # # Created MS 29 May 2001 Full disallow # Amended MS 27 Jul 2001 Allow directories # User-agent: * Disallow: /about Disallow: /admin Disallow: /background Disallow: /bookshop Disallow: /briefings Disallow: /campusoffer Disallow: /cart Disallow: /CFDOCS Disallow: /CFIDE Disallow: /checkout Disallow: /classes Disallow: /cm Disallow: /community Disallow: /Copy of markets Disallow: /countries_old Disallow: /deal Disallow: /editorial Disallow: /email Disallow: /events Disallow: /globalagenda Disallow: /help Disallow: /images Disallow: /library Disallow: /maporama Disallow: /markets Disallow: /mba-direct Disallow: /me Disallow: /members Disallow: /mobile Disallow: /newswires Disallow: /partners Disallow: /perl Disallow: /printedition Disallow: /search Disallow: /shop Disallow: /shop_old Disallow: /specialdeal Disallow: /specialdeal1 Disallow: /studentoffer Disallow: /subcenter Disallow: /subscriptions Disallow: /surveys Disallow: /test.txt Disallow: /tfs Disallow: /travel2.economist Disallow: /voucher At 5:13 PM -0400 6/27/03, informania@supanet.com wrote: >Eric Hellman wrote, > ><In other words, if you put a robots.txt file on your server that excludes >indexing of any files with path starting with "/content/", then they will >remove from the archive any files from your server with path starting with >"/content/".> > >So someone writing something that they think might get censored after >publication should handily add a robots.txt file (="Kick me") at the front >of their work so that the censorship can be accomplished on archive.com? I >don't think so! > >Granted that, in the case of The Economist, the newspaper might take the >decision to add such a file to all of their articles (still seems very >doubtful though), but other than such mass-market publications, I can't >see this happening. Consequently, in practice, retrospective deletions >from The Wayback Machine remain difficult if not impossible. > >Chris Zielinski
- Prev by Date: SSP Discussion list--additional details
- Next by Date: RE: Sabo Bill: Measure Calls for Wider Access to Federally
- Previous by thread: SSP Discussion list--additional details
- Next by thread: RE: internet archive (WAS: The Economist and e-Archiving)
- Index(es):