One of the things Iíd been meaning to do for some time was to get round to creating a robots.txt file for this site. Iíd looked at various sources of information including A Standard For Robot Exclusion and How to create a robots.txt file but was still unsure as to which bits should be excluded so, in my usual fashion, I didnít bother doing anything at all!
However the subject has come up again a couple of times over the last few days at both Wolf-Howl and Connected Internet and pricked my conscience so I thought it was time to revisit it.
One of the main reasons for creating a robots.txt file is to prevent the search engines from reaching your content from more than one location (i.e. in your monthly archives, your category folders, your xml feed and on your front page) because this could lead to duplicate content issues. Other reasons are that you may have a private directory which you donít want the world to read but thatís something for Big G to explain. Today weíre only looking at the SEO reasons for creating the file so with that in mind, what should be included?
There seems to be a number of different view points on what should and shouldnít be included in the robots.txt file. Some say that you should include the wp-content, wp-admin, wp-includes and feed information whereas others say that theyíre fine to index but donít let the Googlebot anywhere near your archives. Another train of thought is to disallow access to your images folder whilst others warn you that your AdSense could go belly up by making sweeping changes to your robots file. In fact, no matter where you look, people are giving conflicting advice on the right way to create a robots.txt file for WordPress. No wonder I hadnít done anything about it until now!
If you were creating a regular web site youíd include robot meta tags to prevent Google from indexing certain pages. However you donít have this option within WordPress because all of the meta data is contained within only one file (header.php) which appears on every single WP page so any ďnoindex,nofollowĒ rules would be applied across the whole site and you wouldnít get indexed at all.
Taking it back to basics, the reason for creating the darned thing in the first place is to prevent Google potentially ditching your content into the supplemental index so what should you include in the file to prevent that from happening?
By including the bare minimum in your robots.txt file.
Why do I say that? Well because unless you know your WordPress install inside out you could end up shooting yourself in the foot. Every theme, plugin and tweak youíve made to your site affects how your site is structured. If you change any of these parameters and donít change your robots.txt file, you could end up seriously screwing yourself over.
A more sensible way of preventing duplicate content is by being more concise in the way that you structure your site. Michael Gray comes up with some excellent advice in his video blog about making WordPress search engine friendly and that is to only apply one category per post. Iíd not even thought about that before but heís right. Previously I was applying categories all over the shop but what that meant was that Google could find the same content in half a dozen different places so now Iím only using one category per post and will tidy up my archives shortly.
Iíve decided to implement a very basic version of the robots.txt file for the time being and will review the results in a few weeks time. I decided to keep the images folder accessible to Google because I do get some traffic through image search and whilst itís not uber sticky, itís still traffic at the end of the day. Equally I was specific about disallowing Googlebot to index /page/ but still allow the AdSense bot to look through archived pages.
Iím still confused as to whether to disallow access to the categories or dated archives to prevent possible issues. Looking at my Google data, I canít work out which bit Big G doesnít like so Iím going to implement the basic version of the robots.txt file first and then see what happens from there.
I may well have got it horribly wrong - as Iíve said before Iím no expert - but itís a start and a key part of Twenty Steps is reporting my mistakes so you donít have to make them! Iím pretty sure weíll be revisiting this olí chestnut again
original link to this nice article is http://www.twentysteps.com/creating-the-ultimate-wordpress-robotstxt-file/