How To Control Indexing And Crawling On Your Site

Share on TwitterShare on TumblrSubmit to StumbleUponSave on DeliciousDigg This

Google is now providing site owners and all who wants to know, information on controlling crawling and indexing of sites. Through this comprehensive resource you can learn everything you wanted to know about robots.txt files, robots meta tags, and X-Robots-Tag HTTP header directives and more. You can also learn how to prevent a PDF file from being indexed. Or how Googlebot, handles conflicting directives in your robots.txt file and so on. These information are already added on pages at code.google.com and anyone ca access them anytime.


Crawling is the process of retrieving content from websites and their pages without processing the results. Indexing is the next process of making sense out of the retrieved contents, storing the processing results. These documents will help you control aspects of both crawling and indexing, so you can determine how you would prefer your content to be accessed by crawlers as well as how you would like your content to be presented to other users in search results. You can learn how to allow or disallow crawlers or indexing to serve your purpose best.

The more you gather information and use them the more can you manipulate and give your site a boost. For webmasters, knowing all these adds more tools in their tool box. It will help them in guiding crawlers more specifically towards the useful content and away from the irrelevant ones. Doing this determines how crawlers will retrieve information from your site and index the content, affecting the prominence and relevance of your site on the world wide web.

Along with webmasters, these documents will also serve the starters like a great book. Plus there will be updates from time to time on any new addition to this documentation.