[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Recent Google Announcements
- To: "'liblicense-l@lists.yale.edu'" <liblicense-l@lists.yale.edu>
- Subject: RE: Recent Google Announcements
- From: "Stuivenga, Will" <wstuivenga@secstate.wa.gov>
- Date: Tue, 1 Feb 2005 17:00:56 EST
- Reply-to: liblicense-l@lists.yale.edu
- Sender: owner-liblicense-l@lists.yale.edu
We used to say the same kinds of things (see quoted comments from David Dillard, below) about search engines, pre-Google. That is, semantically based relevance ranking algorithms didn't cut it. Back in the mid-nineties, when I taught library seminars on search engines, I used to tell my classes that 3 or 4 hits (out of the first 10) that appeared even slightly related to the intended query topic was about par for the course. Think back to your early use of AltaVista, Lycos, Web Crawler, InfoSeek, etc. What made Google work was their ranking based on link frequency. This was a way of putting human intelligence back into the equation. People creating web pages decide what other pages to link to. Google tapped into this human ranking in an automated way, and increased the actual relevancy of their results significantly, to where search engines actually became useful. At first glance, it would seem to be a lot harder to do the same for masses of text scanned from masses of books. Books don't have links. But, surely the Google people are already thinking and working on this problem. Some approaches that occur to me with very little time or thought put into it would include the number of libraries holding a title (Google is already potentially obtaining this type of information from OCLC WorldCat) and/or citation analysis using the indexes and bibliographies of the books. Perhaps they can tap into sales and recommendation data from Amazon and other online booksellers. My attitude is one of healthy skepticism, but I'm certainly not prepared to unilaterally reject Google's endeavors in this regard as having little or no potential. Michael Gorman, in his December 17 commentary in the _LA Times_ "Google and God's Mind," basically writes off Google's entire book scanning endeavor as useless. I think this attitude is premature and quite likely unjustified. Despite agreeing with most of what he and Walt Crawford wrote in their classic _Future Libraries: Dreams, Madness and Reality_. We'll just have to wait and see how well Google's book scanning and other recent endeavors actually work. The very idea of having 70,000 or more books available for searching in full text boggles the mind, and could very well be one of the first steps toward the complete digitization of human knowledge that has been such a perennial assumption of the science fiction genre. At the very least, it would seem to signal another likely paradigm shift. It is my understanding (from discussions with vendors), that even the large general periodical database aggregators (like ProQuest, EBSCO, Thomson Gale) are talking to Google and the other search engines. My guess is that one of the next big Google announcements will be a complete index of one of those databases added to Google, or more likely to Google Scholar, with links to the content, but requiring local library authentication or payment to obtain access to more than a citation. So, while I certainly wouldn't declare metasearch to be dead (as Joseph Esposito asked in his original query), it could very well become subsumed by Google at some later point, a few years down the road. Maybe even faster than we think. And then again, maybe not. Only time will tell, and one thing is for sure, we live in interesting times! Will Will Stuivenga <wstuivenga@secstate.wa.gov> Project Manager, Statewide Database Licensing (SDL) Washington State Library Division, Office of the Secretary of State 360.704.5217 fax: 360.586.7575 http://www.statelib.wa.gov/library/libraries/projects/sdl/ -----Original Message----- From: David P. Dillard [mailto:jwne@astro.ocis.temple.edu] Sent: Monday, January 31, 2005 3:40 PM To: LIBLICENCE DISCUSSION GROUP Subject: Re: Recent Google Announcements <snip> The very limitations of Google searching capabilities in such a huge full text bottomless vat of information would seem to necessitate tremendous searching skill and work to find content pertinent to a client's need in this bottomless galaxy of information that is in the process of being created. Hence, from all of the material in the book and other online collections that is full text and that is also public domain, there may be projects from databanks with heartier software that enable multi-step searching, variable distance proximity searching and other sophisticated search capabilities to create more powerful and searchable databases for this kind of database content. Indeed, the owners of FirstSearch and WorldCat already have a relationship with Google and in particular the Google Scholar program. This could and may be built upon. Google may also develop a more powerful search interface as well. <snip> A huge collection of full text materials requires much more precise searching capability than a finite index of content that has only subject headings and citations. Finding the coincidence of two words in a book of several thousand pages will be close to useless with all but the most specialized word combinations. If search tools are weak or erratic in a universe of words, the searches conducted in them may usually produce low relevancy results and huge numbers of hits. It is one thing to produce a tremendous body of content. It is quite another to be able to create the software that will facilitate finding specific information from that resource effectively with a reliable and powerful software. <snip> Sincerely, David Dillard Temple University (215) 204 - 4584 jwne@astro.temple.edu
- Prev by Date: RE: Recent Google announcements
- Next by Date: copyright and licensing: two quick comments and a question
- Previous by thread: RE: Recent Google announcements
- Next by thread: Re: Recent Google announcements
- Index(es):