[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mining and rights



I've never seen a licensing agreement that states *how* an information resource can be used. Textual analysis is use, whether it is performed by someone doing keyword searches, or by a machine doing sequence similarly matching. That said, there are some unwritten rules about what constitutes *use* and distinguishes it from *abuse*. Without understanding the intent of the user, it is impossible to distinguish systematic downloading for the purposes of textual analysis, from systematic downloading for the purposes of stealing a publisher's content. Security software cannot distinguish the intent of data mining from stealing -- they both look like systematic downloading, and most publishers are pretty quick to stop this form of use. The Spider Activity Reports from Blackwell are a good example of this.

While I think the future is wide open for new tools that enable a researcher to perform analysis on large literature collections, we may need to distinguish the counting of downloads that emanate from data mining software from ordinary human searching and browsing. A single individual using data mining software may make COUNTER usage reports essentially incomprehensible to a librarian.

--Phil Davis


At 05:20 PM 5/30/2006, you wrote:
Joe Esposito's inquiry -- I would be very interested to hear comment
from publishers -- about the licensing issues raised by wanting to
use large databases of journal articles for data mining connects
with something in an interview with Cliff Lynch in the May/June
Educause Review.  Excerpts:

 "We now have about fifty years of investment in text analysis
 and text mining.  THe intelligence community is still spending
 heavily on these technologies, and industry is getting very
 interested for lots of reasons.  For example, I'm told that the
 pharmaceutical industry is very interested in computational
 mining of the biomedical literature base.  This is an important
 part of what is at stake in these massive digitization programs.
 Are we going to be able simply to read the digitized works, or
 are we going to be able to compute on them at scale as well?
 (Presumably, Google will be able to compute on everything it
 digitizes, even the in-copyright works.  Almost nobody seems to
 have figured this out yet!  What an amazing and unique resource.
 It's not clear what the academy broadly will be able to compute
 on.)  The answer will make a big difference for the future of
 scholarship.  This move to computation on text corpora is going
 to have vast implications that we haven't even thought about yet
 -- implications for copyright, implications for publishers,
 implications for research groups.  In fact, it may represent the
 point of ultimate meltdown for copyright as we know it today."

Leaving aside the undoubted substantial potential -- are there any
indications that mining issues are affecting the way publishers are
granting or withholding access to material?

Jim O'Donnell
Georgetown U.