[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FW: Crawling publishers' sites



Scott Mellon scott.mellon@nrc.ca writes:

 >or ethics involved in sending a web crawler to visit and
 >index the sites of publishers for whom we have site licences...

I think the key is to respect the 'robots.txt' file. Longstanding
custom requires that sites wishing not to be crawled say so in the
'robots.txt' on the site. See "A Standard for Robot Exclusion" 
at http://info.webcrawler.com/mak/projects/robots/robots.html
and no doubt many other places.

A publisher has no valid cause for complaint if you follow those
guidlines. They are well established.

Daniel Feenberg
National Bureau of Economic Research
feenberg@nber.org