[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Libraries criticized for role in Google Book Search (long)
- To: liblicense-l@lists.yale.edu
- Subject: RE: Libraries criticized for role in Google Book Search (long)
- From: richards1000@comcast.net
- Date: Fri, 16 Jan 2009 19:56:34 EST
- Reply-to: liblicense-l@lists.yale.edu
- Sender: owner-liblicense-l@lists.yale.edu
Bernie: Here are my thoughts: Overall, I think these comments don't reflect the agreements and facts, or fail to accept that libraries operate with limited resources. Respecting the comments that participating libraries "are just giving away access to one company that is cornering the market on on-line access," and have fostered the "centralizing and commercializing [of] knowledge under a single corporate umbrella," I disagree. The participating libraries did not "give away" access to Google; they received what they perceived to be a valuable consideration, in the form of digital copies of those books (PDF plus OCR plus work-level and structural metadata), accompanied by what they perceived to be fair usage rights under the circumstances. (See final paragraph respecting the circumstances of bargaining.) Nor is Google "cornering the market on on-line access" to these all of these titles. Respecting the public domain works, in many instances digital copies are already available on the Internet from sources such as the participants in the Open Content Alliance; and under their individual contracts with Google (which, I believe, will continue to govern the digitized public domain titles after the settlement becomes effective), participating libraries may make their digital copies available to their own patrons and to nonpatrons, through such third parties as HathiTrust. Respecting the in-copyright but out of print titles, vendors other than Google, such as netLibrary, ebrary, and many others, have digitized thousands of such titles, which presently compete with Google's digitized copies. In addition, the Google Book settlement, a nonexclusive agreement, enables participating libraries to negotiate new digitization agreements with the copyright owners and vendors other than Google, and facilitates such new transactions by permitting the Books Rights Registry to be used in deals with vendors other than Google. Although Google may have a temporary advantage respecting older in-copyright and out of print titles, the settlement lowers entry barriers to that market. Finally, the notion that the Google Book endeavor either "centraliz[es]" or "commercializ[es]" knowledge merits some comment. First, "knowledge" is not the subject of the original Google contracts or the settlement, because copyright and other property rights in information at issue here attach at the level of expression, not of knowledge. All the materials at issue here are readily accessible to academic users and the public in print or digital format by avenues unrelated to Google. The reservoir of human knowledge is not diminished by one drop by virtue of these agreements. Second, by lowering barriers to entry to the older in-copyright and out of print market, the settlement arguably will foster competition in that market, which may increase the dissemination of those works. If that dissemination leads to greater knowledge, then the settlement may nurture, rather than constrain, the growth of knowledge. Respecting the claim that participating libraries acted "without concern for user confidentiality," I think the documents read otherwise. Respecting the University of Michigan's (UM) and University of Texas's (UT) original contracts with Google, personally identifying information of patrons may be protected by the phrase "customer lists" in section 6.1, or, if not, then the parties may well have thought that no personally identifiable information of individual patrons would be disclosed to Google during the digitization process or downstream. In the settlement agreement, the parties appear to promise to keep confidential personally identifiable information of patrons in the phrase "about any customers" in section 15.1 by means of the confidentiality agreements referred to in section 15.2, and the auditors will keep such information confidential under a nondisclosure agreement pursuant to section 8.2(c)(i). Respecting the claim that participating libraries acted "without concern for ... preservation . . . or long-term sustainability," I think that's inaccurate. Respecting preservation format, the Library of Congress deems PDF a preferred digital preservation format for "[t]ext with page-layout rendering," see http://www.digitalpreservation.gov/formats/content/text_preferences.shtml The UM and UT original contracts with Google require Google to give the libraries OCR, page images, and metadata (work-level and structural); that is, PDF files with embedded text and structural metadata (connecting text and images). I believe those PDF files are consistent with the Library of Congress digital preservation standard. (Note that the LC standard appears to permit PDF without structural tags, but Google provided structural tags with the library digital copies; see, e.g., http://babel.hathitrust.org/cgi/pt?id=mdp.39015055053659.) (The settlement does not appear to specify the digital formats that Google will give Fully Participating Libraries.) Respecting preservation environment, the UM and UT original contracts with Google enabled those libraries to transfer their digital copies to third parties, and UM has transferred them to HathiTrust, for, among other purposes, preservation. HathiTrust appears to be pursuing a preservation strategy that complies with present standards. See http://www.hathitrust.org/objectives . What's more, the settlement agreement permits each Fully Participating Library to "reproduce and make technical adaptations to ... its [library digital copies] as reasonably necessary to preserve, maintain, manage, and keep [them] technologically current." Section 7.2(b)(i). Respecting the claim that participating libraries acted "without concern for ... image quality," I think the documents read otherwise. The UM and UT original contracts with Google expressly give the libraries the right to engage in quality control of the images by sampling them on a regular basis. Respecting the claim that participating libraries acted "without concern for ... search prowess," that's not how the agreements read or the end-products appear. To the extent that "search prowess" depends upon both OCR and structural metadata, the UM and UT original contracts with Google provided for both. To the extent that "search prowess" depends upon the quality of the search engine applied to the copies that Google retained, I think little needs to be said about the quality of Google's current full text search service. To the extent that "search prowess" depends upon the quality of the search engines applied to the library-retained copies, the UM and UT original contracts with Google permit access through those libraries' own search services, as well as through services of third parties. For example, HathiTrust plans to develop advanced search tools for retrieval of Google library digital copies transferred to it, including "[r]obust discovery mechanisms like full-text cross-repository searching." See http://www.hathitrust.org/objectives. The settlement permits each Fully Participating Library to "develop or obtain and . . . deploy finding tools that allow its users to identify pertinent Books within its [library digital copies] or generate information from" the same, section 7.2(b)(iv), including search tools to be used in data mining. Section 7.2(b)(vi). Respecting the claim that participating libraries acted "without concern for . . . metadata standards," again this seems inaccurate. As noted above, the original UM and UT contracts with Google required Google to provide work-level and structural metadata with the library digital copies, and this metadata appears to conform to the Library of Congress's digital preservation standards. As one can see by viewing the library digital copies in HathiTrust, those copies are linked to full MARC 21 bibliographic records, (MARC 21 being an international metadata standard; see http://www.loc.gov/marc/annmarc21.html); and feature PDF structural metadata (both structural tags identifying document segments and metadata linking text and images), PDF being a national digital preservation standard (see http://www.digitalpreservation.gov/formats/content/text_preferences.shtml). Respecting the assertion that the participating libraries "chose the expedient way rather than the best way to build and extend their collections," this seems too harsh a view of research libraries with limited cash resources. Authorities seem to say that the "best" way to digitize text files, if cost is no issue, is to generate, for each document, both an XML version and a PDF/A version that contains embedded text with structural tags, because, among other reasons, between them they preserve both logical structure and original layout; see http://www.digitalpreservation.gov/formats/content/text_preferences.shtml. But creating two separate files for each document is costly, and arguably beyond the means of many research institutions. LC also appears to say that PDF/A or one of the other PDF subtypes alone, without XML, meets its digital preservation standards, even if the PDF file lacks structural tags. Though I can't tell whether the Google participant library digital copies are in PDF/A or another PDF subtype, I can see that they are PDF and that they have structural tags, and so they appear to exceed LC's baseline digital preservation standard. So if "best" is defined to mean meeting national standards given limited resources, the participating libraries arguably satisfied that definition respecting building their digital collections. In terms of extending libraries' collections, if one has unlimited resources and can fund all digitization oneself, the best way to use digital resources to extend one's public domain collections may be to impose no access or distribution restrictions on the digital copies. However, where research libraries' cash resources are limited, "best" should arguably be defined in terms of the most favorable bargain a library, acting in the interests of its parent institution and patrons, can strike with a capable digitization outsourcer willing to accept noncash consideration. A deliverable conforming to standards but bearing some usage restrictions may well satisfy that definition. Respecting in-copyright materials, since rights holders will practically always insist on usage restrictions as a condition of digitization no matter what the library offers, there's no basis for faulting the Google library participants for accepting such restrictions on digital copies of copyrighted works. -- Rob Richards The preceding comments are not offered as legal advice and do not constitute legal advice. -- Robert C. Richards, Jr., J.D.*, M.A., M.S.L.I.S. Philadelphia, PA E-mail: richards1000@comcast.net * Member, New York Bar, Retired Status
- Prev by Date: RE: "Giveaways" and "corners" (RE: Libraries criticized for role in
- Next by Date: Re: Any libraries subscribed to SERU?
- Previous by thread: RE: "Giveaways" and "corners" (RE: Libraries criticized for role in
- Next by thread: Announcements for ALA and SSP upcoming events
- Index(es):