Friday, March 18, 2011

Why Your Website Need a Robots


There are several reasons you would want to control robots search engine visit to your site by a robots.txt file :

It saves your bandwidth
The spider won't visit areas where there is no useful information (your cgi-bin, images, administrator, etc)

It gives you a very basic level of protection
Although it's not very good security, it will keep people from easily finding stuff you don't want easily accessible via search engines. They actually have to visit your site and go to the directory instead of finding it on Google, MSN, Yahoo or Teoma.

It cleans up your logs
Every time a search engine visits your site it requests the robots.txt, which can happen several times a day. If you don't have one it generates a "404 Not Found" error each time. It's hard to wade through all of these to find genuine errors at the end of the month.

It can prevent spam and penalties associated with duplicate content.
Lets say you have a high speed and low speed version of your site, or a landing page intended for use with advertising campaigns. If this content duplicates other content on your site you can find yourself in ill-favor with some search engines. You can use the robots.txt file to prevent the content from being indexed, and therefore avoid issues. Some webmasters also use it to exclude "test" or "development" areas of a website that are not ready for public viewing yet.

It's good programming policy.
Pros have a robots.txt. Amateurs don't. What group do you want your site to be in? This is more of an ego/image thing than a "real" reason but in competitive areas or when applying for a job can make a difference. Some employers may consider not hiring a webmaster who didn't know how to use one, on the assumption that they may not to know other, more critical things, as well. Many feel it's sloppy and unprofessional not to use one.

You can't get Google Webmaster Tools without it.
In order for Google to validate your site, you need to have a working, validated robots.txt file - the robots.txt file generated by this tool validates. Since the Webmaster Tools are so valuable for insight into what the world most popular search engine thinks of your site, it's a good idea to use it.