A Web robot is a program that automatically traverses the web using automated script and retrieves documents. Web robot follows links on any website either within the site itself or to other websites to which that website links. Web robots are mainly used to create a copy of all the visited pages for later processing by a search engine. These robots always look for Robots.txt file. Sometimes, if they don’t find this file, they will not index that website, this is not the case with all search engines, different search engine have different algorithm written for their robot. Web robot are mainly used to create a copy of all the visited pages for later processing by a search engine, that will index the downloaded pages to provide fast searches.The most common types of Web robots are the search engine spiders.
Other than indexing web pages, web robots are also used for automating maintenance tasks on a Web site, such as checking links or validating HTML code.
These Web robots are also termed as web crawlers, bots, spiders, wanderers, worms, ants, and automatic indexers etc.

How do robot works?

* Normally robot starts from a list of URLs, especially from documents with many links from most popular sites on the web.

* Most indexing services also allow you to submit URLs manually, which will then be crawled and visited by the robot.

* Sometimes other sources for URLs are used, such as scanners through USENET postings, published mailing list achieves etc.

Once this starting point is got by a robot they can select URLs to visit and index.
Different robots have different indexing strategies depending on their algorithm set by search engine. Some robots index the HTML Titles, or the first few paragraphs, or parse the entire HTML and index all words, with weightings depending on HTML constructs, some parse the META tag, and all have different set of rules.
We cannot force or guess about what indexing strategy a robot should use.