The Inner Workings of Google

How does Google Work: Crawling, Indexing and Searching

© Preetam Kaushik

Aug 23, 2009
Working of Google: How does Google Work, www.commons.wikimedia.org
In simple words, Google crawls across pages, compiles an index and then stores them in a huge database and finally serves (returns) useful content when queried.

When anyone searches for any topic or resource on Google, what they get is an impressive list of links that is relevant to users’ search. We can say that Google acts like a “smart and experienced librarian” who knows to locate the book(s) that one needs within a short time. However, the difference is that Google does it all in a fraction of a second.

How does Google Work?

In simple words, Google crawls across pages, compiles an index and then stores them in a huge database and finally serves (returns) useful content when queried.

What is Crawling?

Googlebot, the search bot software of Google, performs the function of crawling. By crawling, the Googlebot finds new pages (also the updated ones) to be added to the index. Googlebot keeps a track of the links (mainly HREF and SRC links) on the crawled pages and then follows them. This leads to other Web pages.

New sites and dead links are also caught during crawling in order to update the index. In addition to these, updated sites are included. However, there are certain types of content such as rich media files that cannot be processed by Googlebot.

Googlebot is capable of deep crawling (harvesting links) on a large scale on almost every Web page that exists. It should be noted that Googlebot crawls and re-crawls the Web pages so that the Index is updated every time.

Googlebot also finds pages through an Add URL form, wherein the site owners can add their Website. This form has features to prevent spambots, which clog this form by using a large number of irrelevant URLs that can mislead the users.

The index database consists of the search terms stored in alphabetical order. It is a huge database of list of pages and the location of text, wherein the search terms are found.

Does a “Google Search” Return Relevant Content?

Google determines the relevancy of page content on the basis of hundreds of factors, out of which the PageRank, density of keywords, text flow, etc., are some. PageRank of a particular site determines how popular that site is.

It is, in fact, the score assigned to each Web page that is being indexed by Google. The higher the PageRank, the greater is the importance of the page with regard to the relevance of content and popularity according to Google.

If more sites link to a particular Website, then it implies that the site is offering highly relevant content. This is exactly what Google looking for. Moreover, it should be noted that Google gives importance to inbound links only rather than the trading links – these links (also known as reciprocal links) do not give reliable information regarding whether the site contains relevant data or not.

Note that the index database does not consist of stop words such as the, is, or, why, etc. These words do not contribute much in locating a relevant search term; therefore, they are ignored by Google.

In addition to all the above functions, this search engine ignores meta tags (which has been constantly “manipulated” by SEO professionals) and focuses mainly on the content of the page to determine its relevancy, which is subsequently reflected in the page rankings.

The Technology Behind Google

The technology of the aforementioned process is briefly explained as follows: The user types the query, which goes to Google Web Server. This query then travels to the Index Servers, which subsequently passes it to the Doc Servers that generate the “snippets” (these describe the results of the search). These search results are finally “served” to the user. As mentioned above, all these happen within a second or less (See : Google Technology)

Google as a company has always been following the principle of “Do no Evil,” and this has been evident in its constant attempts to make its search engine more user-friendly with new features being added every now and then. The hard work that goes behind this search engine come out with flying colors when the user is satisfied with the search results, and he/she turns to Google again for obtaining more and more information.


The copyright of the article The Inner Workings of Google in SEO Tools is owned by Preetam Kaushik. Permission to republish The Inner Workings of Google in print or online must be granted by the author in writing.


Working of Google: How does Google Work, www.commons.wikimedia.org
       


Post this Article to facebook Add this Article to del.icio.us! Digg this Article furl this Article Add this Article to Reddit Add this Article to Technorati Add this Article to Newsvine Add this Article to Windows Live Add this Article to Yahoo Add this Article to StumbleUpon Add this Article to BlinkLists Add this Article to Spurl Add this Article to Google Add this Article to Ask Add this Article to Squidoo