Hey Guys, I just wanted to brag a little before you read this essay of mine. I guess the Google Book Search Project is a pretty hot topic and another expert on the subject cited my work in one of his essays. Be sure to check it out here: http://www.fahad.com/blog/2006_01_01_fahadinc_archive.html
Joseph Huttner
December 7, 2005
Professor Lindell
Google Book Search Project
In the 3rd Century BC, the Romans constructed the Library of Alexandria in an attempt to contain all the knowledge of the ancient world. During this Golden Age, the Library’s collection amassed more than 700,000 titles and became the “center of technology, literature and science in the ancient world, producing works like Euclid’s Geometry and the invention of Archimedes’s Screw” (Swerlick, 1). Recently, Google Inc. announced plans for Google Book Search, an initiative to create a modern, electronic version of the Library of Alexandria by digitizing millions of books and scholarly papers, and having their contents searchable over the Internet. To be effective, the project requires many advances in book digitization and language translation. If successful, the Google Book Project will spur new demands for electronic paper and other types of readable media, and cause a monumental shift in the way people find and apply information.
The first step in creating the Google Book Search is to digitize millions of
volumes (Digitization converts paper pages into searchable electronic text).
Google must perform three basic functions for digitization to occur: document
analysis, optical character recognition, and contextual processing. In document
analysis, individual images are extracted from the book’s pages via high
resolution scanning (Encyclopedia of Computer Science, 1326). Google recently
implemented a convoy of new machines from Kirtas Technologies called the APT
BookScan 1200, which is capable of scanning up to 1200 pages per hour (Kirtas
Technologies). The machine works by utilizing a 16-megapixel digital camera
to photograph each page, then transfers the image to local storage. Then, a
robotic arm gently turns the page of the book, and the process is completed
for each additional page. The digital images are then ‘cleaned’
to remove smudges and other errors, cropped, and centered. At this point, the
image can be posted online, but “searching through the text is impossible”
(Graham, 1).
Using optical character recognition, or OCR, the digital images can be converted
into searchable text. The first part of Google’s OCR software recognizes
and isolates individual characters from the text image based on their shape.
Then, contextual processing occurs which “corrects misclassification made
by the recognition algorithm or to limit recognition choices” (Encyclopedia
of Computer Science, 1327). Basically, contextual processing uses English grammar
and style rules to change a misrecognized word into the correct word –
much like how a document editor functions in Microsoft Word. At this point,
the book can be uploaded to Google’s server and have its contents searched
via the Internet.
But there are barriers that prohibit people with Internet connectivity from
utilizing Google Book Search; most scanned titles are written in English, a
language spoken by only one-third of the world’s population (World Language,
1). Suppose a German researcher needs to document the history of lightning.
He discovers that Benjamin Franklin discovered lightning and attempts to use
Google Book Search to find relevant texts citing this discovery. Sadly, none
of the Franklin’s digitized biographies have been translated to German.
But if the man were an American scientist, he would have access to many more
books, leading to better research – hardly a fair scenario. Situations
like these have caused fear that the Google Book Search project will enhance
the dominance of the English language and of Anglo-Saxon ways of thinking, perhaps
resulting in a trend towards a single universal language. There are weak movements
within the European Union (specifically France) to create a European book program
with its own search engines, but lack of capital and technology makes this a
very unlikely scenario (Reuters, 1). The only solution is to develop new software
that can flawlessly translate books into different languages. Only then will
scientists and everyday people across the world truly have access to the same
information. Technologically, progress will come faster and more efficiently
with fewer people repeating and testing the same ideas.
If the Google Book Search project succeeds in digitizing books and translating
them into all the world’s languages, there will be societal impacts far
beyond anything humans can presently imagine. For comparison, when the Internet
was invented, nobody foresaw an interconnected world capable of instant communication.
In regards to the Google Book Search, libraries may become more computer-based
and people will research papers differently. But beyond this point, few people
can say with certainty how underdeveloped countries may become educated, or
the larger effects on the educational divide between the rich and the poor.
The only way to project the effects of the Google Book Project is to study the
technologies that led up to the project, and extrapolate the effects of these
technologies into the future. Interestingly, in 1945, Vannevar Bush described
plans to build a machine called the Memex, which closely resembles a centralized
form of the Google Book Search. Bush decided he needed to alleviate the problem
of people “being bogged down by the very mass of new knowledge”
by organizing “all human knowledge…into a complete planetary memory
for all of mankind” (Nyce, 60). The Memex was never built, but in design
it was a desk with a series of levers and gears that could hold millions of
books, articles, and journals using microfilm. A person would conduct research
using different documents contained within the Memex, and create “trails”
which linked together different parts of documents. These trails could be saved,
viewed, and edited by the user, creating indexed information that was organized
for easy access in the future.
Google Book Search operates like the Memex, following the commands of the human
user and navigating to books that are relevant to the user’s research.
These books can be printed and bookmarked to make future retrieval easier, similar
to indexing. There are two distinct advantages of the Google Book Search over
the Memex: greater speed and increased mobility (decentralization). If we extend
these trends into the future, we will likely be able to download all the world’s
books instantly and have them available on some type of local storage. Additionally,
wireless Internet connectivity will be available anywhere in the world and Google
Book Search will be accessible from any location. Lastly, new books will be
published both digitally and on paper, making them instantly searchable.
Over the long term, libraries, publishers and booksellers will develop new ways
that digital books can be stored, packaged and delivered (Crawley, 1). We will
likely see a fusion of digital and electronic media and enter the age of electronic
scrolls or e-paper; “cheap, plastic screens that will look and feel much
like paper; they will unroll from a pen-sized holder or cell phone and download
the day’s newspaper (or a new novel) through a wireless Internet connection”
(Crawley, 1). Though e-paper still has some technical problems, the current
pace of research suggests that the basic challenges of e-paper will be resolved
in the next few years (Crawley, 2). E-paper promises a much more pleasing display
that current LCD technology and “should win converts from those who currently
view reading electronic text as an eye-straining chore” (Crawley, 2).
With downloadable e-books posting steady sales increases in the last five years,
“there’s no reason why E-Ink’s stated goal of “radio
paper” – a rewriteable medium with the look and much of the feel
of real paper – should not find a ready audience with casual and dedicated
readers alike” (Crawley, 3).
The transformation of books from parchment scrolls at the Library of Alexandria
to electronic scrolls within Google servers can be accomplished if the Google
Book Search is implemented correctly. The digitization of paper books must be
done efficiently and accurately, and flawless language translation programs
must be developed if the entire world’s population is to have equal access
to digital books. Our progression from the Memex to the Google Book Search suggests
that in the future, all books will electronically accessible at any instant,
ready for reading and researching. New types of readable media will be able
to download books, yet be gentle on human eyes. As people rediscover books that
were once buried in depths of libraries, we may experience a surge of technological
advancements and developments in research across many fields. If Julius Caesar
were alive today, he would be happy to find the seeds of a second Golden Age
sprouting rather quickly. He could even read his own biography…after all,
it is only a mouse click away.
Bibliography
“APT BookScan 1200.” Kirtas Technologies. 7 December 2005 < http://www.kirtas-tech.com/APT1200.html>
“A World Language.” Oracle ThinkQuest. 7 December 2005 <http://library.thinkquest.org/18802/langeng.htm>
“Bush, Vannevar.” Encyclopædia Britannica. 2005. Encyclopædia Britannica Online. 6 Dec. 2005 <http://search.eb.com/eb/article-214012>.
Crawley, Devin. “Chasing E-Paper.” Quill & Quire Magazing. April 2005. 7 December 2005 <http://enlightenedlibrarian.com/Other%20Writing/>
Crawley, Devin. “The Infinite Library.” University of Toronto Magazine. 2005. 7 December 2005 <http://www.magazine.utoronto.ca/05autumn/library.asp>
Farrell, Nick. “Google Book Plan Angers the French.” The Inquirer. 7 December 2005 <http://www.theinquirer.net/?article=21369>
Graham, Jefferson. “Google Library Plan “a Huge Help.” USA Today. 7 December 2005 <http://www.usatoday.com/money/industries/technology/2004-12-14-google-usat_x.htm>
Nyce, James M. and Paul Kahn. From Memex to Hypertext – Vannevar Bush and the Mind’s Machine. San Diego: Academic Press Inc, 1991.
Ralston, Anthony et al. “Optical Character Recognition.” The Encyclopedia of Computer Science.” 2000: Nature Publishing Group.
Reuters. “Google book plan sparks French war of words.” C-Net.
21 February 2005
<http://news.com.com/Google+book+plan+sparks+French+war+of+words/2100-1024_3-5584569.html>
Swerlick, Andrew. “Creating our Library of Alexandria: Google Print falls under ‘fair use.’” The Emory Wheel Online. 1 November 2005. 7 December 2005 <http://www.emorywheel.com/vnews/display.v/ART/2005/11/01/436688a3e66eb>
Wanted Ads | My Essays | GROUPIES | Squirrel Talk | About Me | The Hogsniper | Cool Stuff | Urban Dictionary
Copyright © 2006 The Official Joseph Huttner Home Page, Inc.