Wednesday, March 5, 2008

Search > Google

In case you may be wondering how in the world did Google started, how it became one of the most success stories on the Internet and how will it continue to dominate the search technology, then check this brief history on how search technology evolved since the 1950s and the role Google played in it.

The history of document search dates back to the 1950s.

Search engines existed in those ancient times, but their primary use was to search a static collection of documents. In the early 60s, the research community gathered new data by digitizing abstracts of articles, enabling rapid progress in the field in the 60s and 70s. But by the late 80s, progress in this area had slowed down considerably.

In order to stimulate research in information retrieval, the National Institute of Standards and Technology (NIST) launched the Text Retrieval Conference (TREC) in 1992. TREC introduced new data in the form of full-text documents and used human judges to classify whether or not particular documents were relevant to a set of queries. They released a sample of this data to researchers, who used it to train and improve their systems to find the documents relevant to a new set of queries and compare their results to TREC's human judgments and other researchers' algorithms.

The TREC data revitalized research on information retrieval. Having a standard, widely available, and carefully constructed set of data laid the groundwork for further innovation in this field. The yearly TREC conference fostered collaboration, innovation, and a measured dose of competition (and bragging rights) that led to better information retrieval.

New ideas spread rapidly, and the algorithms improved. But with each new improvement, it became harder and harder to improve on last year's techniques, and progress eventually slowed down again.

And then came the web.

In its beginning stages, researchers used industry-standard algorithms based on the TREC research to find documents on the web. But the need for better search was apparent - now not just for researchers, but also for everyday users - and the web gave us lots of new data in the form of links that offered the possibility of new advances. There were developments on two fronts. On the commercial side, a few companies started offering web search engines, but no one was quite sure what business models would work. On the academic side, the National Science Foundation started a "Digital Library Project" which made grants to several universities.

Two Stanford grad students in computer science named Larry Page and Sergey Brin worked on this project. Their insight was to recognize that existing search algorithms could be dramatically improved by using the special linking structure of web documents. Thus PageRank was born.

Larry and Sergey initially tried to license their algorithm to some of the newly formed web search engines, but none were interested. Since they couldn't sell their algorithm, they decided to start a search engine themselves. The rest of the story is well-known.

This is an excerpt from "Why Data Matters" by Hal Varian, Chief Economist of Google.



Sphere: Related Content

No comments: