It
seems that Google is currently
experimenting with new robots.txt
commands. If your robots.txt file
accidentally contains one of the
new commands, it might be that
your robots.txt file tells Google
to go away.
What is a robots.txt
file?
The robots.txt file is a
simple text file that must
be placed in your root directory
(http://www.example.com/robots.txt).
It tells the search engine
spider which web pages on
your website should be indexed
and which web pages should
be ignored.
You can use a simple text
editor to create a robots.txt
file. The content of a robots.txt
file consists of so-called "records".
A record contains the information
for a special search engine.
Each record consists of two
fields: the user agent line
and one or more Disallow lines.
Here's an example:
User-agent:
googlebot
Disallow: /cgi-bin/
This robots.txt file would
allow the "googlebot", which
is the search engine spider
of Google, to retrieve every
page from your site except
for files from the "cgi-bin" directory.
All files in the "cgi-bin" directory
will be ignored by googlebot.
Which new commands
is Google testing?
Webmasters have found out
that Google seems to be experimenting
with a Noindex commands
for the robots.txt file. It
basically seems to do the
same as the Disallow command
so it's not clear why Google
is using this command.
Other commands that might
be tested by Google are Noarchive
and Nofollow. However, none
of these commands is official
yet.
How does this affect
your rankings on Google?
If you accidentally
use the wrong commands then
you might tell Google to go
away although you want them
to index your pages.
For that reason, it is important
that you check the content
of your robotx.txt file.
How to check your robots.txt
file
Open your web browser and
enter www.yourdomain.com/robots.txt to
view the contents of your
robots txt file. Here are
the most important tips for
a correct robots.txt file:
- There are only two official commands
for the robots.txt file: User-agent and Disallow.
Do not use more commands than
these.
- Don't change the order of
the commands. Start with the
user-agent line and then add
the disallow commands:
User-agent:
*
Disallow: /cgi-bin/
- Don't use more than one
directory in a Disallow line. "Disallow:
/support /cgi-bin/ /images/" does
not work. Use an extra Disallow
line for every directory:
User-agent:
*
Disallow: /support
Disallow: /cgi-bin/
Disallow: /images/
- Be sure to use the right
case. The file names on your
server are case sensitve.
If the name of your directory
is "Support", don't
write "support" in
the robots.txt file.
You can find user agent names
in your log files by checking
for requests to robots.txt.
Usually, all search engine
spiders should be given the
same rights. To do that, use User-agent:
* in your robots.txt
file.
What happens if you
don't have a robots.txt file?
If your website doesn't have
a robots.txt file (you can
check this by entering your www.yourdomain.com/robotx.txt in
your web browser) then search
engines will automatically
index everything they can
find on your site.
Checking your robots.txt file
is important if you want search
engines to index your web pages.
However, indexing alone is not
enough. You must also make sure
that search engines find what
they're looking for when they
index your pages.
You can make sure that Google
indexes your web pages for the
right keywords by optimizing your
website. If search engine spiders
index unoptimized pages, chances
are that you won't get high
rankings.
|