[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: FW: Crawling publishers' sites
- To: liblicense-l@lists.yale.edu
- Subject: Re: FW: Crawling publishers' sites
- From: Mark Jordan <mjordan@sfu.ca>
- Date: Mon, 19 Apr 1999 18:37:13 EDT
- Reply-To: liblicense-l@lists.yale.edu
- Sender: owner-liblicense-l@lists.yale.edu
Hi Scott, I don't know about the legality of crawling publisher's sites, but doing so may not work out very well because they likely prohibit crawlers from indexing their sites (using the robots.txt method, for example), or have their "goods" behind some kind of password. While some crawlers such as ht://Dig can be configured to use passwords to access restricted directories, I'm not sure how these crawlers handle large numbers of passwords. Also, the different kinds of passwords used (basic web server passwords, passwords entered via a form in a web page, etc.) might complicate things. On the other hand, ht://Dig can index Adobe PDF files as well has standard HTML files, so it might be a good tool for doing what you describe. Their site is http://www.htdig.org/. Mark Mark Jordan Librarian / Analyst, Systems Division W.A.C. Bennett Library, Simon Fraser University Burnaby, BC, V5A 1S6, Canada Email mjordan@sfu.ca / Phone (604) 291 5753 / Fax (604) 291 3023 ________________________________________ On Thu, 15 Apr 1999, Mellon, Scott wrote: > I would be interested in hearing about experiences or thoughts subscribers > may have on the legality or ethics involved in sending a web crawler to > visit and index the sites of publishers for whom we have site licences. > > The resulting database would be made available on our Intranet only; > i.e. only for the use of those for whom we have licenced access to the > publishers. > Scott Mellon > CISTI Advanced Services > Ottawa, Canada K1A 0S2 > Tel: (613)993-0994, Fax / Docufax: (613) 952-8246 > mailto:scott.mellon@nrc.ca > http://www.nrc.ca/cisti
- Prev by Date: Re: Do your end-users see publishers' licenses?
- Next by Date: Re: Regisstering with Wiley/encyclopedias
- Prev by thread: Re: FW: Crawling publishers' sites
- Next by thread: Do your end-users see publishers' licenses?
- Index(es):