{"id":1072,"date":"2010-09-13T10:26:40","date_gmt":"2010-09-13T07:26:40","guid":{"rendered":"http:\/\/www.webseoanalytics.com\/blog\/?p=1072"},"modified":"2010-09-13T10:26:40","modified_gmt":"2010-09-13T07:26:40","slug":"robotstxt-metarobots-relnofollow-the-impact-on-seo","status":"publish","type":"post","link":"http:\/\/www.webseoanalytics.com\/blog\/robotstxt-metarobots-relnofollow-the-impact-on-seo\/","title":{"rendered":"The robots.txt, META-robots &#038; rel=nofollow and their impact on SEO"},"content":{"rendered":"<p><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2010\/09\/metarobots-seo.jpg\" alt=\"metarobots-seo\" title=\"metarobots-seo\" width=\"650\" height=\"455\" class=\"alignnone size-full wp-image-1100\" style=\"margin-bottom:10px;\" srcset=\"http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2010\/09\/metarobots-seo.jpg 650w, http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2010\/09\/metarobots-seo-300x210.jpg 300w, http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2010\/09\/metarobots-seo-428x300.jpg 428w\" sizes=\"(max-width: 650px) 100vw, 650px\" \/>One the most common mistakes that many Webmasters and SEOs make is that they <strong>misuse<\/strong> the robots.txt, the META-robots and the rel=nofollow anchor attribute. All the above affect the <strong>indexing<\/strong>, the <strong>crawling<\/strong> and the <strong>PageRank distribution<\/strong> and as a result they can heavily affect the SEO status of a website and its <strong>rankings<\/strong>. In this article we will see what is the impact of robots.txt, META-robots and rel=nofollow on SEO and how you should use them to improve the rankings of your website.<\/p>\n<h2>Robots.txt<\/h2>\n<p>The robots.txt (aka Robot Exclusion Standard) is a special text file that is placed in the root folder of a website (for example: www.example.com\/robots.txt). It is used by Webmasters in order to give instructions about their websites to web spiders and crawlers. This file contains information about which pages should not be crawled by the web robots, what the maximum requested crawl rate should be, where the XML sitemap is located etc. One of the common uses of robots.txt is to <strong>prevent<\/strong> search engines from <strong>crawling<\/strong> a particular page or folder.<\/p>\n<p>Below we\u2019ll see the answers of few common questions about robots.txt that will help you prevent <strong>major SEO mistakes<\/strong>:<\/p>\n<h4>I blocked a page\/folder\/website by using robots.txt. Will Google\/Bing\/Yahoo crawl that page?<\/h4>\n<p><strong>No they will not crawl it.<\/strong> All major Search Engines respect the Robot Exclusion Standard. This of course will not protect you from the spam bots since they do not respect the above standard.<\/p>\n<h4>I used robots.txt to block a page\/folder\/website. Will it appear in the search results?<\/h4>\n<p><strong>YES!<\/strong> Blocking a particular page or a part of your website by using robots.txt does not mean that it will not appear in the Search Engine Results. The page will <strong>not be crawled<\/strong> by search engines but if they find enough information about the page\/site on <strong>DMOZ<\/strong> or in the <strong>links<\/strong> of other pages, they will show it in the search results.<\/p>\n<div style=\"text-align:center;margin:10px 0 10px 0;\"><img decoding=\"async\" title=\"alexa-robots.txt-blocked\" src=\"http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2010\/09\/alexa-robots.txt-blocked.jpg\" alt=\"alexa-robots.txt-blocked\" \/><\/div>\n<p>The title and the data found in the SERP will be based on the information that the search engines collected from other websites.<\/p>\n<div style=\"text-align:center;margin:10px 0 20px 0;\"><object classid=\"clsid:d27cdb6e-ae6d-11cf-96b8-444553540000\" width=\"640\" height=\"385\" codebase=\"http:\/\/download.macromedia.com\/pub\/shockwave\/cabs\/flash\/swflash.cab#version=6,0,40,0\"><param name=\"allowFullScreen\" value=\"true\" \/><param name=\"allowscriptaccess\" value=\"always\" \/><param name=\"src\" value=\"http:\/\/www.youtube.com\/v\/KBdEwpRQRD0?fs=1&amp;hl=en_US\" \/><param name=\"allowfullscreen\" value=\"true\" \/><embed type=\"application\/x-shockwave-flash\" width=\"640\" height=\"385\" src=\"http:\/\/www.youtube.com\/v\/KBdEwpRQRD0?fs=1&amp;hl=en_US\" allowscriptaccess=\"always\" allowfullscreen=\"true\"><\/embed><\/object><\/div>\n<h4>I blocked an internal page with robots.txt, will it receive PageRank?<\/h4>\n<p><strong>YES! <\/strong>PageRank algorithm does not take into account whether a page is blocked by robots.txt, so the incoming links that point to those pages do pass link juice. Additionally since those blocked pages can`t be crawled by the search engines, their outbound links will not be found and thus they will be considered as <strong>dangling pages<\/strong>. This means that those pages <strong>evaporate<\/strong> all the PageRank that they receive so this leads to <strong>PageRank losses<\/strong>.<\/p>\n<div style=\"text-align:center;margin:10px 0 20px 0;\"><object classid=\"clsid:d27cdb6e-ae6d-11cf-96b8-444553540000\" width=\"640\" height=\"385\" codebase=\"http:\/\/download.macromedia.com\/pub\/shockwave\/cabs\/flash\/swflash.cab#version=6,0,40,0\"><param name=\"allowFullScreen\" value=\"true\" \/><param name=\"allowscriptaccess\" value=\"always\" \/><param name=\"src\" value=\"http:\/\/www.youtube.com\/v\/TkopkUPF-M8?fs=1&amp;hl=en_US\" \/><param name=\"allowfullscreen\" value=\"true\" \/><embed type=\"application\/x-shockwave-flash\" width=\"640\" height=\"385\" src=\"http:\/\/www.youtube.com\/v\/TkopkUPF-M8?fs=1&amp;hl=en_US\" allowscriptaccess=\"always\" allowfullscreen=\"true\"><\/embed><\/object><\/div>\n<h2>META-robots<\/h2>\n<p>The META-robots is a METAtag that affects the search engine crawling and the way that PageRank flows from a webpage. This META-tag specifies whether a particular page should be <strong>indexed<\/strong> and whether the outgoing links should be <strong>followed<\/strong>. The default value of this tag is &#8220;index,follow&#8221; which means that this page is allowed to be indexed and all the links should be followed (except the ones that have rel=&#8221;nofollow&#8221; in them). The valid values of this META-tag are: <strong>index<\/strong>, <strong>noindex<\/strong>, <strong>follow<\/strong>, <strong>nofollow<\/strong>, <strong>none<\/strong> (none means noindex and nofollow [thank you Sebastian for the reminder]) and <strong>all<\/strong> (all means index and follow).<\/p>\n<p>Below we examine the answers of common questions about META-robots:<\/p>\n<h4>I blocked a page by using the noindex META-robots value. Will Google\/Bing\/Yahoo crawl that page?<\/h4>\n<p><strong>Yes they will crawl it.<\/strong> The search engines will visit that page and extract all the links of the page.<\/p>\n<h4>So if they crawl the page, it will also appear in the search results. Correct?<\/h4>\n<p><strong>Absolutely NO!<\/strong> You should not confuse the terms crawling and indexing. It is one thing to get your pages crawled by a search engine and a completely different thing to get it indexed. Anyway, once the search engines crawl the webpage, they will identify the meta-robots noindex value and they will <strong>not index<\/strong> it. This means that this page will <strong>never appear<\/strong> in the search engine results.<\/p>\n<h4>I am a bit confused with the follow and nofollow values? How do they affect the PageRank distribution?<\/h4>\n<p>The <strong>follow<\/strong> value means that all the links found in this page flow PageRank (except the ones that contain the rel=&#8221;nofollow&#8221; attribute in them). The <strong>nofollow<\/strong> value means that all the links of the page are nofollowed and thus no PageRank flows from this page. This means that the page which blocks all of its links by using the meta-robots nofollow value, is considered a <strong>dangling page<\/strong> and it evaporates all of its PageRank.<\/p>\n<h4>I blocked an internal page with META-robots noindex, will it receive PageRank?<\/h4>\n<p><strong>YES! <\/strong>As we mentioned earlier, the PageRank algorithm does not take into account whether a page is blocked by robots.txt or meta-robots, so the incoming links that point to those pages do pass link juice. The main difference with the robots.txt is that the pages that use the META-robots <strong>noindex<\/strong> value, can flow their PageRank to other pages if the <strong>follow<\/strong> value is also used. This means that if their links are followed the PageRank of the page will not be evaporated.<\/p>\n<h2>rel=&#8221;nofollow&#8221;<\/h2>\n<p>The rel=&#8221;nofollow&#8221; attribute is used in the anchor links that we want to block so that they <strong>do not pass<\/strong> any link juice to the target pages.<\/p>\n<p>Below we examine the answers of common questions about rel=nofollow:<\/p>\n<h4>I blocked all the links of my site that point to a particular page by using the rel=nofollow. Will Google\/Bing\/Yahoo crawl and index that page?<\/h4>\n<p>Nofollowing all the links of your website that point to a particular page (internal or external), does not mean that the search engines will not be able to index the target page. If they find a dofollow link pointing to that page, they will crawl and index it. So rel=nofollow should <strong>not be used<\/strong> as a way to <strong>prevent<\/strong> the indexing of a page.<\/p>\n<div style=\"text-align:center;margin:10px 0 20px 0;\"><object classid=\"clsid:d27cdb6e-ae6d-11cf-96b8-444553540000\" width=\"640\" height=\"385\" codebase=\"http:\/\/download.macromedia.com\/pub\/shockwave\/cabs\/flash\/swflash.cab#version=6,0,40,0\"><param name=\"allowFullScreen\" value=\"true\" \/><param name=\"allowscriptaccess\" value=\"always\" \/><param name=\"src\" value=\"http:\/\/www.youtube.com\/v\/bEBIkXUYSxc?fs=1&amp;hl=en_US\" \/><param name=\"allowfullscreen\" value=\"true\" \/><embed type=\"application\/x-shockwave-flash\" width=\"640\" height=\"385\" src=\"http:\/\/www.youtube.com\/v\/bEBIkXUYSxc?fs=1&amp;hl=en_US\" allowscriptaccess=\"always\" allowfullscreen=\"true\"><\/embed><\/object><\/div>\n<h4>Do search engines crawl the links that are blocked with rel=nofollow?<\/h4>\n<p><strong>Theoretically<\/strong> search engines<strong> do not<\/strong> <strong>crawl<\/strong> the target page of nofollowed links until they find at least one followed link that points to that page. <strong>Practically<\/strong> in many cases the major search engines <strong>do crawl nofollowed <\/strong>links. Additionally have in mind that in some cases those links can help you speedup the indexing of a new page. Of course the fact that in some cases they do crawl those links, does not mean that they pass any link juice\/PageRank to them (see below).<\/p>\n<h4>How PageRank distribution is affected by nofollowed links?<\/h4>\n<p>First of all Nofollowed links <strong>do not pass<\/strong> any <strong>PageRank <\/strong>to the target page. Period.<\/p>\n<p>In the past (before 2009), during PageRank calculations, the nofollowed links were just like they did not exist and they did not affect positively or negatively the PageRank distribution. Unfortunately after 2009, an update took place and the nofollowed links started to evaporate the amount of PageRank that they receive. In order to learn more about this update and in order to find out how to solve this issue read the article &#8220;<a href=\"http:\/\/www.webseoanalytics.com\/blog\/pagerank-sculpting-techniques-nofollow-issue\/\">The PageRank sculpting techniques and the nofollow issue<\/a>&#8220;.<\/p>\n<h2>Summary<\/h2>\n<p>Note that you should be very careful when using robots.txt and META-robots since you can heavily affect your website`s indexing and your Search Engine Rankings. From SEO perspective, the best way to prevent a page from appearing in the search results is to use the <strong>noindex,follow<\/strong> META-robots value. In this way not only the page will not appear in the SERPs but also most of the PageRank of the page will pass to the other pages of the website through its links.<\/p>\n<p>The table below summarizes how the major search engines handle robots.txt, META-robots and rel=nofollow.<\/p>\n<div style=\"text-align:center;margin:10px 0 10px  0;\">\n<table border=\"1\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td><\/td>\n<td>\n<p align=\"center\"><strong>Crawl<\/strong><strong> <\/strong><\/p>\n<\/td>\n<td>\n<p align=\"center\"><strong>Appears in SERP<\/strong><\/p>\n<\/td>\n<td>\n<p align=\"center\"><strong>PageRank Evaporation<\/strong><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td><strong>robots.txt<\/strong><\/td>\n<td>\n<p align=\"center\">No<\/p>\n<\/td>\n<td>Yes, if   it receives enough links.<\/td>\n<td>Yes, all   of the PageRank of the page is evaporated.<\/td>\n<\/tr>\n<tr>\n<td><strong>META-robots   noindex<\/strong><\/td>\n<td>\n<p align=\"center\">Yes<\/p>\n<\/td>\n<td>No<\/td>\n<td>No, if   the links of the page are followed.<\/td>\n<\/tr>\n<tr>\n<td><strong>META-robots   nofollow<\/strong><\/td>\n<td>Yes, if   other followed links are found pointing to the target page.<\/td>\n<td>Yes, if   the target page receives links from other webpages.<\/td>\n<td>Yes, all   of the PageRank of the page is evaporated.<\/td>\n<\/tr>\n<tr>\n<td><strong>rel=&#8221;nofollow&#8221;<\/strong><\/td>\n<td>Yes, if   other followed links are found pointing to the target page.<\/td>\n<td>Yes, if   the target page receives links from other webpages.<\/td>\n<td>Yes, some   amount of PageRank of the page is evaporated.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>Last but not least, if you use extensively the rel=nofollow attribute in your website, you can <strong>avoid PageRank losses<\/strong> by using the method described in the article &#8220;<a href=\"http:\/\/www.webseoanalytics.com\/blog\/pagerank-sculpting-you-can-still-use-nofollow\/\">PageRank Sculpting: You can still use rel=nofollow<\/a>&#8220;.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One the most common mistakes that many Webmasters and SEOs make is that they misuse the robots.txt, the META-robots and the rel=nofollow anchor attribute. All the above affect the indexing, the crawling and the PageRank distribution and as a result &hellip; <a href=\"http:\/\/www.webseoanalytics.com\/blog\/robotstxt-metarobots-relnofollow-the-impact-on-seo\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":1073,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[6,10],"tags":[],"aioseo_notices":[],"_links":{"self":[{"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/posts\/1072"}],"collection":[{"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/comments?post=1072"}],"version-history":[{"count":0,"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/posts\/1072\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/media\/1073"}],"wp:attachment":[{"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/media?parent=1072"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/categories?post=1072"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/tags?post=1072"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}