Home   About   Contact 
 Ecommerce Overview 
 Business Activities 
  Market Research 
  Online Marketing 
  Search Engine Marketing 
  Online Storefront 
  Online Payment 
  Order Fulfillment 
  Customer Support 
 Technologies 
  Architecture 
  Web Design 
  Application Development 
  Web Hosting 
  WebSite Monitoring 
  Network 
  Internet Security 
 Business Law 
 Links 

How Search Engines Work

2004-03-21
 

Components of a Search Engine All search engines are busy with doing three things - collecting documents from the Internet, analyzing the documents collected, and serving Web users' search requests. Each of those major tasks are performed by corresponding software component of a search engine.

  • Crawler or Spider This is a bot to collect documents on the Web.
  • Indexer A software analyzes documents and generates searchable indexes for the documents.
  • Query Server A system responses to user query and returns relevant documents.

Crawler Writing a simple crawler is not particularly challenging, if it's not trivial. To crawl billions of pages effectively, however, a crawler needs to make two major challenging decisions:

  • What Page to Crawl Each search engine uses different criteria to determine what pages to crawl. Google will not include a page if it's not linked by indexed page(s).
  • Frequency of Updating Google updates pages with higher PageRank values more frequently and updates home page of a site on a daily basis.
Compared with other parts of a search engine, crawler is the least sophisticated piece. What surprised me is that other search engines are clearly far behind Google in terms of completeness of documents and timeness of update. This is more likely an issue of philosophy rather than a issue of technology. Other search engines apparently don't think the completeness and timeness of crawling is important at all. Even though this is one of reasons they lost market shares to Google.

Indexer This is the heart of a search engine. The indexer performs two major tasks. First, it generates a list of page characteristics to summarize a document. Second, it produces a weighted searchable keyword list from the characteristics of the document. There're many startups in search engine industry besides the search engine titans Google, Yahoo and MSN. There will be conceptual breakthrough in indexing document in order to take search technology to next level. A new way of characterizing document may take eventually one startup from unknown to a major player. The Google's success in the search industry is clearly the result of leveraging PageRank concept.

Google characterizes a document in over one hundred factors. Not all factors are used for ranking search results currently. However, the completeness of ranking factors holds the key for the flexibility of ranking algorithm tuning in response to the market change. The Characteristics of Document are categorized into four groups:

  • Page Content,
  • Back Links,
  • Page Traffic and
  • Use Feedback.
The details of ranking factors are discussed in The Factors that Impact the Search Engine Result Ranking.

Query Server Query Server or Search Engine Software serves users' search requests. This is the part of the search engine that is visible to Web users.

  • First, it tries to understand and interpret users' search term. This is likely the competitive front for search engines to improve the quality of search results in near future. Unless search terms are properly interpreted, search engines can't find the most relevant documents. The way how search terms are interpreted will partly determine how documents are indexed in search engines.
  • Second, the Query Server will try to retrieve and rank relevant documents.
The fundamental reason that search engines sometimes fail to return relevant documents is the lack of understanding of search relevance. The quality of search is about quality and relevance of pages retrieved. Google has taken the search experience to a new level by using simple yet elegant PageRank algorithm for measuring importance of a Web page. Any search engines (Google, Yahoo, MSN, or startups), who want to take search experience to next level, will have to come up with a proper algorithm for computing search relevance based on new definition of the relevance.



Related Topics
Google PageRank - Basics, Secrets and Common Misunderstandings
PageRank Calculator
PageRank and Linking - the Definitive Guide
Factors Affecting Search Engine Ranking
Keyword Popularity on Google - an Online SEO Tool
Website Classification for Search Engine Marketing
Increase hits - Build Web Site Traffic from Search Engines

 

 
Copyright © 2003 insightin.com. All rights reserved.