[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: mining and rights
- To: liblicense-l@lists.yale.edu
- Subject: Re: mining and rights
- From: Phil Davis <pmd8@cornell.edu>
- Date: Thu, 1 Jun 2006 22:16:36 EDT
- Reply-to: liblicense-l@lists.yale.edu
- Sender: owner-liblicense-l@lists.yale.edu
I've never seen a licensing agreement that states *how* an information resource can be used. Textual analysis is use, whether it is performed by someone doing keyword searches, or by a machine doing sequence similarly matching. That said, there are some unwritten rules about what constitutes *use* and distinguishes it from *abuse*. Without understanding the intent of the user, it is impossible to distinguish systematic downloading for the purposes of textual analysis, from systematic downloading for the purposes of stealing a publisher's content. Security software cannot distinguish the intent of data mining from stealing -- they both look like systematic downloading, and most publishers are pretty quick to stop this form of use. The Spider Activity Reports from Blackwell are a good example of this.
While I think the future is wide open for new tools that enable a researcher to perform analysis on large literature collections, we may need to distinguish the counting of downloads that emanate from data mining software from ordinary human searching and browsing. A single individual using data mining software may make COUNTER usage reports essentially incomprehensible to a librarian.
--Phil Davis
At 05:20 PM 5/30/2006, you wrote:
Joe Esposito's inquiry -- I would be very interested to hear comment from publishers -- about the licensing issues raised by wanting to use large databases of journal articles for data mining connects with something in an interview with Cliff Lynch in the May/June Educause Review. Excerpts: "We now have about fifty years of investment in text analysis and text mining. THe intelligence community is still spending heavily on these technologies, and industry is getting very interested for lots of reasons. For example, I'm told that the pharmaceutical industry is very interested in computational mining of the biomedical literature base. This is an important part of what is at stake in these massive digitization programs. Are we going to be able simply to read the digitized works, or are we going to be able to compute on them at scale as well? (Presumably, Google will be able to compute on everything it digitizes, even the in-copyright works. Almost nobody seems to have figured this out yet! What an amazing and unique resource. It's not clear what the academy broadly will be able to compute on.) The answer will make a big difference for the future of scholarship. This move to computation on text corpora is going to have vast implications that we haven't even thought about yet -- implications for copyright, implications for publishers, implications for research groups. In fact, it may represent the point of ultimate meltdown for copyright as we know it today." Leaving aside the undoubted substantial potential -- are there any indications that mining issues are affecting the way publishers are granting or withholding access to material? Jim O'Donnell Georgetown U.
- Prev by Date: liblicense-l postings
- Next by Date: Re: Spider Activity Reports from Blackwell Synergy
- Previous by thread: liblicense-l postings
- Next by thread: Re: mining and rights
- Index(es):