[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Comment by Peter Brantley



I recently posted a query concerning data-mining to this list. I happened to share it with Peter Brantley of the California Digital Library, who replied in his characteristically thoughtful way. His remarks are pasted in below, with his permission (with the original informal style of a personal email). Note in particular the comment about the Open Text Mining Initiative that is being promulgated by the Nature Group.

Joe Esposito

___

I think this is a very intelligent question, and certainly one that is being asked. it's not yet a problem at the CDL, and hasn't been discussed there; but I have discussed flavors of this with others.

I think there are various ways in which digitized texts could produce transformative additional IP.

there is the text mining means you describe, in which services or users are able to elucidate or uncover meanings, linkages, and patterns there were previously undisclosed. These in turn could be published or leveraged for revenue in various ways. (companies like MarkLogic build their businesses off this kind of work).

there is the value-add that social software techniques could produce, through the production of lists, perhaps pointing deep into texts, or at small portions of texts; the IP inherent in annotation and tagging (who owns these?); and additions to expert ontologies that might be used within text mining to further value. (Just a few examples).

there are also virtual texts, in which users able to search across a range of material might be able to produce new and useful derivatives, such as "The 100 Best Salpicon Recipes" - what portion of that IP could be claimed by the original publisher? is that akin to the relationship of a movie to a screenplay?

I would note that one of the innovations that Nature Publishing has recently provided is the Open Text Mining Initiative, which explicitly provides a mechanism for publishers to produce machine readable files that facilitate text mining and indexing without rendering the text to human readership and without forsaking the lion's share of the IP. I think OTMI will potentially be very successful, and I think approaches like it will be embraced for at least an interim period of time.

####