Google to scan books from major libraries

Dec. 14, 2004, 4:58 AM UTC / Source: The Associated Press

Google Inc. is trying to establish an online reading room for five major libraries by scanning stacks of hard-to-find books into its widely used Internet search engine.

The ambitious initiative announced late Monday gives Mountain View, Calif.-based Google the right to index material from the New York public library as well as libraries at four universities — Harvard, Stanford, Michigan and Oxford in England.

The Michigan and Stanford libraries are the only two so far to agree to submit all their material to Google’s scanners.

The New York library is allowing Google to include a small portion of its books no longer covered by copyright while Harvard is confining its participation to 40,000 volumes so it can gauge how well the process works. Oxford wants Google to scan all its books originally published before 1901.

Scanning books so they can be read through computers isn’t new. Both Google and Amazon.com already have programs that offer online glimpses of new books while an assortment of other sites for several years have provide digital access to some material in libraries scattered around the country.

But Google’s latest commitment could have the biggest impact yet, given the breadth of material that the company hopes to put into its search engine, which has become renowned for its processing speed, ease of use and accuracy.

'This is the day the world changes'
“It’s a significant opportunity to bring our material to the rest of the world,” said Paul LeClerc, president of the New York Public Library. “It could solve an old problem: If people can’t get to us, how can we get to them?”

Librarians are also excited about the prospect of creating a digital record for the reams of valuable material written long before computers were conceived.

“This is the day the world changes,” said John Wilkin, a University of Michigan librarian working with Google. “It will be disruptive because some people will worry that this is the beginning of the end of libraries. But this is something we have to do to revitalize the profession and make it more meaningful.”

The project gives Google’s search engine another potential drawing card as it faces stiffening competition for Yahoo Inc. and Microsoft Corp.’s MSN. Attracting visitor traffic is crucial to Google’s financial health because the company depends on revenue generated by people clicking on advertising links posted next to the main body of search results.

Scanning the library books figures to be a daunting task, even for a cutting edge company such as Google, whose online index of 8 billion Web pages already has revolutionized the way people look for information.

Work will take years
Michigan’s library alone contains 7 million of its library volumes — about 132 miles of books. Google hopes to get the job done at Michigan within six years, Wilkin said.

Harvard’s library is even larger with 15 million volumes. Virtually all of that material will be off limits until Google shows it can scan the material without losing or damaging anything, said Harvard professor Sidney Verba, who also is director of the university’s library.

“The librarians at Harvard are very punctilious about protecting their great treasures,” Verba said.

The project also poses other prickly issues, such as how to convert material written in foreign languages, and the issue of protecting copyrighted books.

As it does with new books already included in its search engine, Google will only allow its users to view the bibliographies or other snippets of copyrighted books scanned from the libraries. The search engine will provide unrestricted access to all material in the public domain — work no longer covered by copyrights.

The books scanned from libraries will be included in the same Google index the spans the Web.

By throwing everything into the same pot, Google risks burying the library book results far below the Web documents containing the same search terms term, reducing the usefulness of the feature, said Danny Sullivan, editor of Search Engine Watch, an industry newsletter.