The robots.txt, META-robots & rel=nofollow and their impact on SEO
One the most common mistakes that many Webmasters and SEOs make is that they misuse the robots.txt, the META-robots and the rel=nofollow anchor attribute. All the above affect the indexing, the crawling and the PageRank distribution and as a result they can heavily affect the SEO status of a website and its rankings. In this article we will see what is the impact of robots.txt, META-robots and rel=nofollow on SEO and how you should use them to improve the rankings of your website.
The robots.txt (aka Robot Exclusion Standard) is a special text file that is placed in the root folder of a website (for example: www.example.com/robots.txt). It is used by Webmasters in order to give instructions about their websites to web spiders and crawlers. This file contains information about which pages should not be crawled by the web robots, what the maximum requested crawl rate should be, where the XML sitemap is located etc. One of the common uses of robots.txt is to prevent search engines from crawling a particular page or folder.
Below we’ll see the answers of few common questions about robots.txt that will help you prevent major SEO mistakes:
I blocked a page/folder/website by using robots.txt. Will Google/Bing/Yahoo crawl that page?
No they will not crawl it. All major Search Engines respect the Robot Exclusion Standard. This of course will not protect you from the spam bots since they do not respect the above standard.
I used robots.txt to block a page/folder/website. Will it appear in the search results?
YES! Blocking a particular page or a part of your website by using robots.txt does not mean that it will not appear in the Search Engine Results. The page will not be crawled by search engines but if they find enough information about the page/site on DMOZ or in the links of other pages, they will show it in the search results.
The title and the data found in the SERP will be based on the information that the search engines collected from other websites.
I blocked an internal page with robots.txt, will it receive PageRank?
YES! PageRank algorithm does not take into account whether a page is blocked by robots.txt, so the incoming links that point to those pages do pass link juice. Additionally since those blocked pages can`t be crawled by the search engines, their outbound links will not be found and thus they will be considered as dangling pages. This means that those pages evaporate all the PageRank that they receive so this leads to PageRank losses.
The META-robots is a METAtag that affects the search engine crawling and the way that PageRank flows from a webpage. This META-tag specifies whether a particular page should be indexed and whether the outgoing links should be followed. The default value of this tag is “index,follow” which means that this page is allowed to be indexed and all the links should be followed (except the ones that have rel=”nofollow” in them). The valid values of this META-tag are: index, noindex, follow, nofollow, none (none means noindex and nofollow [thank you Sebastian for the reminder]) and all (all means index and follow).
Below we examine the answers of common questions about META-robots:
I blocked a page by using the noindex META-robots value. Will Google/Bing/Yahoo crawl that page?
Yes they will crawl it. The search engines will visit that page and extract all the links of the page.
So if they crawl the page, it will also appear in the search results. Correct?
Absolutely NO! You should not confuse the terms crawling and indexing. It is one thing to get your pages crawled by a search engine and a completely different thing to get it indexed. Anyway, once the search engines crawl the webpage, they will identify the meta-robots noindex value and they will not index it. This means that this page will never appear in the search engine results.
I am a bit confused with the follow and nofollow values? How do they affect the PageRank distribution?
The follow value means that all the links found in this page flow PageRank (except the ones that contain the rel=”nofollow” attribute in them). The nofollow value means that all the links of the page are nofollowed and thus no PageRank flows from this page. This means that the page which blocks all of its links by using the meta-robots nofollow value, is considered a dangling page and it evaporates all of its PageRank.
I blocked an internal page with META-robots noindex, will it receive PageRank?
YES! As we mentioned earlier, the PageRank algorithm does not take into account whether a page is blocked by robots.txt or meta-robots, so the incoming links that point to those pages do pass link juice. The main difference with the robots.txt is that the pages that use the META-robots noindex value, can flow their PageRank to other pages if the follow value is also used. This means that if their links are followed the PageRank of the page will not be evaporated.
The rel=”nofollow” attribute is used in the anchor links that we want to block so that they do not pass any link juice to the target pages.
Below we examine the answers of common questions about rel=nofollow:
I blocked all the links of my site that point to a particular page by using the rel=nofollow. Will Google/Bing/Yahoo crawl and index that page?
Nofollowing all the links of your website that point to a particular page (internal or external), does not mean that the search engines will not be able to index the target page. If they find a dofollow link pointing to that page, they will crawl and index it. So rel=nofollow should not be used as a way to prevent the indexing of a page.
Do search engines crawl the links that are blocked with rel=nofollow?
Theoretically search engines do not crawl the target page of nofollowed links until they find at least one followed link that points to that page. Practically in many cases the major search engines do crawl nofollowed links. Additionally have in mind that in some cases those links can help you speedup the indexing of a new page. Of course the fact that in some cases they do crawl those links, does not mean that they pass any link juice/PageRank to them (see below).
How PageRank distribution is affected by nofollowed links?
First of all Nofollowed links do not pass any PageRank to the target page. Period.
In the past (before 2009), during PageRank calculations, the nofollowed links were just like they did not exist and they did not affect positively or negatively the PageRank distribution. Unfortunately after 2009, an update took place and the nofollowed links started to evaporate the amount of PageRank that they receive. In order to learn more about this update and in order to find out how to solve this issue read the article “The PageRank sculpting techniques and the nofollow issue“.
Note that you should be very careful when using robots.txt and META-robots since you can heavily affect your website`s indexing and your Search Engine Rankings. From SEO perspective, the best way to prevent a page from appearing in the search results is to use the noindex,follow META-robots value. In this way not only the page will not appear in the SERPs but also most of the PageRank of the page will pass to the other pages of the website through its links.
The table below summarizes how the major search engines handle robots.txt, META-robots and rel=nofollow.
Appears in SERP
|Yes, if it receives enough links.||Yes, all of the PageRank of the page is evaporated.|
|No||No, if the links of the page are followed.|
|META-robots nofollow||Yes, if other followed links are found pointing to the target page.||Yes, if the target page receives links from other webpages.||Yes, all of the PageRank of the page is evaporated.|
|rel=”nofollow”||Yes, if other followed links are found pointing to the target page.||Yes, if the target page receives links from other webpages.||Yes, some amount of PageRank of the page is evaporated.|
Last but not least, if you use extensively the rel=nofollow attribute in your website, you can avoid PageRank losses by using the method described in the article “PageRank Sculpting: You can still use rel=nofollow“.