Blog
I (Google) Robot
9 years ago Posted in: Blog 0

googlerobot

The primary reason for an SEO to be concerned with the Robots.txt file is because search engines like Google use crawlers or “robots” to index web content on the Internet.    The flip-side to the SEO coin is the other reason to be concerned about the Robots.txt file which is because spammers use Web robots to scan Web pages on the Internet for email addresses and other personal information.

In order to understand what the Robots.txt file can and cannot do for you, you must first understand what a Robots.txt is. A Robots.txt files is a text file used by Web site owners to tell Web robots which files it should not include or not include in its Index.  A Web site owner using a Robots.txt file is using The Robots Exlusion Protocol.

Because the Robots.txt file is simply a text file that is publicly available on your Web server, you should note that robots can not only simply ignore it, but they will also see what sections of your Web site you don’t want robots to use. If you are intending to use the Robots.txt file to hide information from spammers or other, more malicious users, you should really rethink the method to your madness.


What good is the Robots.txt file?

If robots can simply ignore the Robots.txt file, you might ask yourself, “What good is the Robots.txt file?”   This is a fair question to ask, but since we are speaking of the Robots.txt file in regards to SEO we need to address those robots that play well with the Robots.txt file.  Search engine Web robots, such as Google’s, use the Robots.txt file to decide which of your Web site Web pages it will Index and which pages it will leave alone.


Where do I put the Robots.txt file?

If you want search engines to use your Robots.txt file then you should make sure it is in the top-level directory of your Web server (the root folder for your Web site).  When a Web robot looks for the Robots.txt URL, it strips the path component from the current URL and puts “/robots.txt” in its place.  This means that if a Web robot is following a link to “http://www.example.com/blog/post.php” it will remove the “/blog/post.php” and replace it with “/robots.txt“, ending up with “http://www.example.com/robots.txt“. Since this is where the Web robot will look for your Robots.txt file by default, you should make certain that your Robots.txt file is in this root directory.


How does the Robots.txt file work?

The Robots.txt file is a text file used to give instructions about a Web site to Web robots. The first line of a Robots.txt file usually contains a row of text beginning with “User-agent: ” and ending with the robot or robots the Robots.txt file is referring to. If “User-agent: ” is followed by an asterick (*) then the Robots.txt file is using a wildcard character to refer to all robots or “any robot”.

The second line and often each line following contains a row of text that refers to those files or directories that the robot should not Index. For example, if you do not want a search engine to Index your /tmp folder because it contains those files that are only temporary and that do not hold significant content then you would use the following convention: “Disallow: /tmp/“. This will, in effect, tell the search engine Web robot to exclude all content contained in your /tmp directory.


If you would like all Web robots to access all of your content you would use the following:

User-agent: *
Disallow:


If you would like to exclude all Web robots from all of your content use the following:

User-agent: *
Disallow: /


If you would like to exclude all Web robots from just a portion of your content, such as your /tmp directory, use the following:

User-agent: *
Disallow: /tmp/

There are many other conventions you may use to limit the content search engine Web robots Index.   If you would like to learn more about these conventions you should visit The Web Robots Pages.  It is also worth noting to supply the meta info in the tags to instruct the robots where to go as well as creating a Robot.txt file.  This should look like the following:

<meta name="Robots" content="all,index,follow" />


Leave a Reply





  • Copyright © 2011. OwenDevelopment. All rights reserved.