Facts, Fiction and views by valuepitchers

 

Tuesday, September 11, 2007

Preventing listings

If a webmaster wants to avoid search listings that are not desirable, he can instruct the spider to skip the procedure of crawling certain files or directories which are not relevant. He can give the spider this instruction through a standard file called the robots.txt file, which is in the domain's root directory.



On the other hand, the webmaster can also explicitly exclude a page (or pages) from the database of a search engine. He can do so by using a meta tag which is specific to robots. A meta tag is defined as a set of elements in Hypertext Markup Language (HTML). They are used to provide meta data about a web page. The meta data provided by the meta elements are structured. They are placed into the form of meta tags and found in the head section of a document in HTML.



When a search engine visits a website, the robots.txt file which is located in the root directory of its domain is the first file it will crawl. It then parses the robots.txt file and gives instructions to the robots as to what pages they should avoid crawling.



The phrase robots exclusion standard is similar in meaning to the phrases robots exclusion protocol or robots.txt protocol. It is defined as a convention which has been laid down in order to prevent web spiders and web robots from gaining access to a website, or any part thereof, which is deemed fit for viewing by the public.



Search engines usually enlist the services of web robots for the purposes of categorising and archiving websites. Web robots also co-operate with webmasters for the purpose of proof-reading the source codes. The robot exclusion standard is complementary to Sitemaps, which is a robot inclusion standard for websites.



It is highly possible that a search engine crawler may keep a cached copy of the robots.txt file, which will enable it to occasionally crawl the pages that the webmaster does not want it to crawl. The pages that the webmaster is most likely to prevent the search engine crawler from crawling include pages which are login-specific. Some examples of login-specific pages are shopping carts and content which is specific to users, such as the search results that emerge from internal searches.



In the month of March in the year 2007, Google issued a stern warning to webmasters to prevent the indexing of the results of internal searches with immediate effect. According to the world's most popular search engine, these pages are tantamount to search spam.

0 Comments:

Post a Comment

Links to this post:

Create a Link

<< Home