[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Recent Google Announcements



We used to say the same kinds of things (see quoted comments from David
Dillard, below) about search engines, pre-Google. That is, semantically
based relevance ranking algorithms didn't cut it. Back in the
mid-nineties, when I taught library seminars on search engines, I used to
tell my classes that 3 or 4 hits (out of the first 10) that appeared even
slightly related to the intended query topic was about par for the course.
Think back to your early use of AltaVista, Lycos, Web Crawler, InfoSeek,
etc.

What made Google work was their ranking based on link frequency. This was
a way of putting human intelligence back into the equation. People
creating web pages decide what other pages to link to. Google tapped into
this human ranking in an automated way, and increased the actual relevancy
of their results significantly, to where search engines actually became
useful.

At first glance, it would seem to be a lot harder to do the same for
masses of text scanned from masses of books. Books don't have links. But,
surely the Google people are already thinking and working on this problem.
Some approaches that occur to me with very little time or thought put into
it would include the number of libraries holding a title (Google is
already potentially obtaining this type of information from OCLC WorldCat)
and/or citation analysis using the indexes and bibliographies of the
books. Perhaps they can tap into sales and recommendation data from Amazon
and other online booksellers.

My attitude is one of healthy skepticism, but I'm certainly not prepared
to unilaterally reject Google's endeavors in this regard as having little
or no potential. Michael Gorman, in his December 17 commentary in the _LA
Times_ "Google and God's Mind," basically writes off Google's entire book
scanning endeavor as useless. I think this attitude is premature and quite
likely unjustified. Despite agreeing with most of what he and Walt
Crawford wrote in their classic _Future Libraries: Dreams, Madness and
Reality_.

We'll just have to wait and see how well Google's book scanning and other
recent endeavors actually work. The very idea of having 70,000 or more
books available for searching in full text boggles the mind, and could
very well be one of the first steps toward the complete digitization of
human knowledge that has been such a perennial assumption of the science
fiction genre. At the very least, it would seem to signal another likely
paradigm shift.

It is my understanding (from discussions with vendors), that even the
large general periodical database aggregators (like ProQuest, EBSCO,
Thomson Gale) are talking to Google and the other search engines. My guess
is that one of the next big Google announcements will be a complete index
of one of those databases added to Google, or more likely to Google
Scholar, with links to the content, but requiring local library
authentication or payment to obtain access to more than a citation.

So, while I certainly wouldn't declare metasearch to be dead (as Joseph
Esposito asked in his original query), it could very well become subsumed
by Google at some later point, a few years down the road. Maybe even
faster than we think. And then again, maybe not. Only time will tell, and
one thing is for sure, we live in interesting times!

Will

Will Stuivenga <wstuivenga@secstate.wa.gov>
Project Manager, Statewide Database Licensing (SDL)
Washington State Library Division,
Office of the Secretary of State
360.704.5217 fax: 360.586.7575
http://www.statelib.wa.gov/library/libraries/projects/sdl/


-----Original Message-----
From: David P. Dillard [mailto:jwne@astro.ocis.temple.edu] 
Sent: Monday, January 31, 2005 3:40 PM
To: LIBLICENCE DISCUSSION GROUP
Subject: Re: Recent Google Announcements

<snip>

The very limitations of Google searching capabilities in such a huge full
text bottomless vat of information would seem to necessitate tremendous
searching skill and work to find content pertinent to a client's need in
this bottomless galaxy of information that is in the process of being
created.  Hence, from all of the material in the book and other online
collections that is full text and that is also public domain, there may be
projects from databanks with heartier software that enable multi-step
searching, variable distance proximity searching and other sophisticated
search capabilities to create more powerful and searchable databases for
this kind of database content.  Indeed, the owners of FirstSearch and
WorldCat already have a relationship with Google and in particular the
Google Scholar program.  This could and may be built upon.  Google may
also develop a more powerful search interface as well.

<snip>

A huge collection of full text materials requires much more precise
searching capability than a finite index of content that has only subject
headings and citations.  Finding the coincidence of two words in a book of
several thousand pages will be close to useless with all but the most
specialized word combinations.  If search tools are weak or erratic in a
universe of words, the searches conducted in them may usually produce low
relevancy results and huge numbers of hits.  It is one thing to produce a
tremendous body of content.  It is quite another to be able to create the
software that will facilitate finding specific information from that
resource effectively with a reliable and powerful software.

<snip>

Sincerely,
David Dillard
Temple University
(215) 204 - 4584
jwne@astro.temple.edu