TechFocus
All about search engines
Edward Apurba Singha
Imagine yourself inside a big library wherein you are frantically searching for a reference book in order to prepare your assignment. You could see it's like looking for a needle in a haystack. Amid this moment of desperation comes the library management software with the promise of help. This software enables you to trace to the exact location of and other information about your desired book. Same is the case about a search engine which you need to browse the internet. The web has now become a virtual world of millions of websites. Your any attempt to sift any specific information from the web without a search engine is sure to end in vain. This is why search engines have become an integral part of the internet. The generic name 'search engine' does not actually mean any electronic or mechanical device. It is in fact a specially designed web page that is able to collect information from other sites on the internet. Based on keywords or a phrase, search engines accumulate information from different sources and keep an index of the words they find and from where they find them. Afterwards users are allowed to look for words or combination of words found in that index. The immense popularity of modern search engines is because of the mammoth growth of websites. At the early stage, however, the case was not as it is today. Back then people had to find essential files through programs like Archie and Gopher. These programs kept indexes of files on servers which were connected to the internet. Early search engines were capable of maintaining a relatively small index of a few hundred thousand pages. Documents and received inquires ranged between one or two thousand per day. Today, however, all top search engines will index hundreds of millions of pages and are capable of handling tens of millions of queries a day. The most important and fascinating aspect of a search engine is its search techniques. Some search engines look pretty simple. This humble look, however, hides the real complexity inside it. Special software robot called spider is the vital component of a search engine. Spider robots have to examine several pages in order to prepare a list of words -- a process called web crawling. The spider first searches for the most popular pages and gleans information from them. This mechanism is slightly different in different search engines. The google spiders, for example, mainly identify two things: the words and the location where they were found. They seek out words from the title, subtitles and Meta tags and prepare an index excluding the articles -- a, an, the. AltaVista works in a different way by indexing every single word on a page. It does not overlook the articles and insignificant words. Meta tags are responsible for accelerating the search process. Spiders examine it for a brief introduction to the contents of a web page. In many circumstances it creates hassle for which robot exclusion protocol was developed. This protocol is implemented in the Meta tag section at the beginning of a web page that guides the spider to leave the page. Once spiders' action is accomplished, the search engine needs to store the information in a way that makes it useful. The search engine, which generates result comprising only words and its links, apparently weakens its effectiveness, because it is not possible to perfectly identify the exact use of words on a particular website or links associated with it. In order to fix this problem most search engines store more than just the word and URL. An engine needs to determine the frequency of word emergence on a page. Then it assigns weight to each entry, with increasing values assigned to words as they appear near the top of the document, in sub-headings, links, Meta tags or in the title of the page. Each commercial search engine adopts different strategy for assigning weight to the words in its index. This is one of the reasons why a search for the same word on different search engines will produce different lists, with the pages presented in different order. As a user you can follow Boolean search technique in order to enrich your search experience. Some tips regarding this are given below: AND It is used to join two or more words. For instance, type Bangladesh + Sports and press enter. The search result will come up with information on Bangladeshi sports. Some search engines substitute the operator "+" for the word AND. OR It enables search engines to show result that encompasses one of the terms joined by OR. NOT It discards irrelevant topics from the search result. Some search engines substitute the operator "-" for NOT. FOLLOWED BY - One of the terms must be directly followed by the other. QUOTATION MARKS Inside quotation marks words are treated as a phrase which produces more precise result. Boolean search creates a dilemma when a word has different meanings. In such a case, new search tactics such as concept-based search and natural language queries will add a huge advantage to make your search more interesting.
|