{"id":2290,"date":"2011-09-26T11:32:07","date_gmt":"2011-09-26T08:32:07","guid":{"rendered":"http:\/\/www.webseoanalytics.com\/blog\/?p=2290"},"modified":"2011-09-26T11:32:07","modified_gmt":"2011-09-26T08:32:07","slug":"googles-ajax-crawling-scheme-and-its-effects-on-seo","status":"publish","type":"post","link":"http:\/\/www.webseoanalytics.com\/blog\/googles-ajax-crawling-scheme-and-its-effects-on-seo\/","title":{"rendered":"Google&#8217;s AJAX crawling scheme and its effects on SEO"},"content":{"rendered":"<p><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/seo-vs-ajax.jpg\" alt=\"seo-vs-ajax\" title=\"seo-vs-ajax\" width=\"600\" height=\"437\" style=\"margin:0 0 10px 30px; display:block;\" \/>Almost 2 years ago, Google has proposed a new way of making the <a href=\"http:\/\/googlewebmastercentral.blogspot.com\/2009\/10\/proposal-for-making-ajax-crawlable.html\" rel=\"nofollow\" target=\"_blank\">AJAX applications &amp; websites crawlable<\/a>. As most of you know, search engines are very good in indexing and analyzing HTML documents but they are <strong>not particularly good<\/strong> in understanding JavaScript and thus crawling AJAX content. Even though 2 years have passed only a small amount of websites have properly implemented Google\u2019s suggestion. Moreover despite the fact that Google has started to crawl, index and present such AJAX pages on SERPs, other search engines like Bing do not support this new \u201cmethod\u201d.<\/p>\n<p>The reason why I write this article is to briefly explain <strong>what AJAX is<\/strong>, to discuss <strong>the problem of crawling AJAX<\/strong> content, to view the various <strong>solutions<\/strong> that have been proposed, to discuss the <strong>common mistakes <\/strong>that web developers make while using non Search Engine friendly technologies like AJAX &amp; Flash and finally to explain <strong>whether or not <\/strong>you should use Google\u2019s proposal. Even though this topic is a bit advanced and it requires <strong>basic knowledge <\/strong>of the web development techniques and of the web technologies in general, I try to focus less on the programming part and more on the SEO part. If you have any questions feel free to post them on the comments below and we\u2019ll try to cover them on an upcoming blog post.<\/p>\n<h2>What is AJAX?<\/h2>\n<div style=\"text-align:center;margin:10px 0 10px 0;\">\n<img decoding=\"async\" src=\"http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/ajax_p.jpg\" alt=\"ajax_p\" title=\"ajax_p\" \/>\n<\/div>\n<p>The AJAX is an acronym for <strong>Asynchronous Javascript And Xml<\/strong>. It is a set of web development techniques that allow software engineers to create interactive web applications. By using Javascript the browser interacts with the user and sends web requests to the server which is replying in XML (also in JSON or HTML). The AJAX is usually used to update specific parts of the HTML page without causing a redirect or a page refresh. Also the AJAX methods allow you to create fast web applications that reduce significantly the loading time and provide a better user experience. Explaining more about AJAX is beyond the scope of this article, nevertheless if you are a developer and you want to learn more I strongly recommend you to read the <a href=\"http:\/\/www.w3schools.com\/ajax\/default.asp\" rel=\"nofollow\" target=\"_blank\">AJAX tutorial by W3scools<\/a>.<\/p>\n<h2>The Problem of Crawling AJAX<\/h2>\n<p>The main problem of crawling AJAX is that it heavily relies on JavaScript which is a <strong>client-side scripting language<\/strong> that runs on Browser (Internet Explorer, Mozilla, Opera, Chrome etc). Moreover different browsers support different features and functions (even though this has started to change over the years). Last but not least, executing JavaScript requires <strong>additional resources<\/strong> and this <strong>increases the costs<\/strong> for the search engines. As we discussed in the past, when a particular method increases the operating costs of the Search Engines, they give <strong>incentives<\/strong> to the webmasters to avoid using such techniques or they work around the problem by proposing <strong>less costly approaches<\/strong>.<\/p>\n<p>Even though Google has admitted that they took steps in order to understand better <a href=\"http:\/\/googlewebmastercentral.blogspot.com\/2008\/04\/crawling-through-html-forms.html\" rel=\"nofollow\" target=\"_blank\">JavaScript, Flash and HTML Forms<\/a>, still it is not recommended to rely on these technologies since they are not Search Engine friendly. Also as you will see below, the solutions that have been proposed for crawling AJAX do not rely on executing JavaScript (which would increase the costs for Search Engines) but instead they make webmasters change their <strong>Website architecture<\/strong> to make it SEO friendly.<\/p>\n<h2>Solutions for Crawling AJAX<\/h2>\n<p>The 2 most popular techniques that have been proposed over the years are the <a href=\"http:\/\/domscripting.com\/presentations\/xtech2006\/\" rel=\"nofollow\" target=\"_blank\">Hijax Approach<\/a> and the <a href=\"http:\/\/www.google.com\/support\/webmasters\/bin\/answer.py?answer=174992\" rel=\"nofollow\" target=\"_blank\">AJAX crawling scheme<\/a> of Google.<\/p>\n<h4>The Hijax technique<\/h4>\n<p>According to the Hijax technique, when you have a link that executes AJAX or JavaScript, you <strong>should not<\/strong> code it like this:<\/p>\n<div style=\"margin:10px 0 10px 0; color: #59B3D8;\">\n&lt;a href=&#8221;javascript:someFunction(`somepage.html#parameter=1`)&#8221;&gt;Click Me&lt;\/a&gt;\n<\/div>\n<p>Neither like this:<\/p>\n<div style=\"margin:10px 0 10px 0; color: #59B3D8;\">\n&lt;a href=&#8221;#&#8221;&gt;Click Me&lt;\/a&gt;\n<\/div>\n<p>Both of the above approaches are very popular to web developers but unfortunately they do not provide a <strong>meaningful URL<\/strong> that can be used by the search engines. By using the Hijax technique the above link should be rewritten as follows:<\/p>\n<div style=\"margin:10px 0 10px 0; color: #59B3D8;\">\n&lt;a href=&#8221;somepage.html?parameter=1&#8243; onclick=&#8221;someFunction(`somepage.html#parameter=1`); return false&#8221;&gt;Click Me&lt;\/a&gt;\n<\/div>\n<p>The above code will redirect the search engine to the targeted page if the JavaScript is turned off but at the same time it will fire up the AJAX code if the JavaScript is turned on (obviously the someFunction method should handle the click and load the AJAX content to the user). As a result both the users and the search engines will be able to access the content of the linked page.<\/p>\n<p>Of course the above technique has several limitations since it does not cover cases where the AJAX content is created dynamically based on the input of the user.<\/p>\n<h4>The AJAX crawling scheme of Google<\/h4>\n<p>The Google\u2019s AJAX crawling scheme proposes to mark the addresses of all the pages that load AJAX content with specific chars. The whole idea behind it is to use <strong>special hash fragments<\/strong> (#!) in the URLs of those pages to indicate that they load AJAX content. When Google finds a link that points to an \u201cAJAX\u201d URL, for example \u201chttp:\/\/example.com\/page?query<strong>#!<\/strong>state\u201d, it automatically interprets it (escapes it) as \u201chttp:\/\/example.com\/page?query<strong>&amp;_escaped_fragment_=<\/strong>state\u201d. <\/p>\n<div style=\"text-align:center;margin:10px 0 10px 0;\">\n<img decoding=\"async\" src=\"http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/ajax-diagram-1.png\" alt=\"ajax-diagram-1\" title=\"ajax-diagram-1\" width=\"650\" \/>\n<\/div>\n<p>The programmer is <strong>forced<\/strong> to change his\/her Website Architecture in order to handle the above requests. So when Google sends a web request for the escaped URL, the server must be able to return <strong>the same<\/strong> HTML code as the one that is presented to the user when the AJAX function is called.<\/p>\n<p>After Google sees the \u201cAJAX URL\u201d and after interpreting (escaping it), it grabs the content of the page and indexes it. Finally when the indexed page is presented in the Search Results, Google shows the original AJAX URL to the user instead of the \u201cescaped\u201d one. As a result the programmer should be able to handle user\u2019s request and present the appropriate content when the page loads.<\/p>\n<div style=\"text-align:center;margin:10px 0 10px 0;\">\n<img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/webmasters_ajax_174992.png\" alt=\"webmasters_ajax_174992\" title=\"webmasters_ajax_174992\" width=\"400\" height=\"267\" class=\"alignnone size-full wp-image-2299\" srcset=\"http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/webmasters_ajax_174992.png 400w, http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/webmasters_ajax_174992-300x200.png 300w\" sizes=\"(max-width: 400px) 100vw, 400px\" \/>\n<\/div>\n<p>So as you probably understand Google proposed a way to make the AJAX content crawlable <strong>without executing JavaScript <\/strong>(those PhDs are clever, aren\u2019t they?). This technique is more <strong>generic<\/strong> than Hijax as it covers more cases, but it is much more complicated, it requires additional coding and it is currently <strong>supported only by Google<\/strong>.<\/p>\n<h2>Common Web Development mistakes<\/h2>\n<p>As we saw above, one of the most common mistakes that web developers make is that they don\u2019t provide in the JavaScript links a <strong>meaningful URL<\/strong> for the search engines. Another thing that many web developers ignore is that according to the web specifications &amp; protocols <strong>every<\/strong> URL parameter contained after the \u201c#\u201d symbol (hash fragments) is <strong>NEVER<\/strong> sent to the web server. So the following URLs create exactly the same web requests to the server:<\/p>\n<div style=\"margin:10px 0 10px 0; color: #59B3D8;\">\nhttp:\/\/example.com\/#state<\/p>\n<p>http:\/\/example.com\/#param1=1&#038;param2=2<\/p>\n<p>http:\/\/example.com\/#\/directory\/page.html<\/p>\n<p>http:\/\/example.com\/#\/directory\/page.html?param=1\n<\/p><\/div>\n<p>All the above links will generate a web request to the URL http:\/\/example.com\/ and all the extra parameters after # will totally be ignored. That is why Search Engines ignore everything after # (We\u2019ll talk about \u201c#!\u201d a bit later, but yes it creates also the same web request on the server as the previous URLs).<\/p>\n<p>SO! Based on the above we conclude that most of the AJAX or Flash techniques that promise SEO friendly URLs by using hash fragments are <strong>NOT working<\/strong>. Some of them even bother to change the title and the text of the holding page by using JavaScript in order to make the websites more SEO friendly. <strong>Don\u2019t lose your time<\/strong> with those techniques because they don\u2019t work. If you rely on JavaScript code to make your website SEO friendly you are going the wrong way!<\/p>\n<p>The only exception to the above rule is when the #! is used. So you might be thinking that if you use #! instead of # you will be ok. Unfortunately the answer is <strong>NO<\/strong>! Just by using it you will gain <strong>nothing<\/strong>. You must also write code in PHP, JSP, ASP or ASP.NET in order to ensure that your server will handle the Google\u2019s AJAX crawling scheme and present the appropriate holding page (as we explained above).<\/p>\n<h2>Should you use Google\u2019s proposal? Let\u2019s focus on a case study.<\/h2>\n<p>Currently Google\u2019s AJAX crawling scheme has been implemented by a relatively <strong>small amount<\/strong> of websites and many of them have not done it properly. One of the websites where it has been done a pretty good job is the <a href=\"http:\/\/thebullittagency.com\/\" rel=\"nofollow\" target=\"_blank\">thebullittagency.com<\/a> (note that it is not related to our company).<\/p>\n<p>First of all let\u2019s run the query \u201cFabric Blog Features Luca Bacchetti\u201d on Google. The URL of the first result is the following:<\/p>\n<p><a href=\"http:\/\/thebullittagency.com\/#!\/blog\/21-Fabric-Blog-Features-Luca-Bacchetti-\" rel=\"nofollow\" target=\"_blank\">http:\/\/thebullittagency.com\/#!\/blog\/21-Fabric-Blog-Features-Luca-Bacchetti-<\/a><\/p>\n<div style=\"text-align:center;margin:10px 0 10px 0;\">\n<img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/ajax-serp.jpg\" alt=\"ajax-serp\" title=\"ajax-serp\" width=\"564\" height=\"392\" class=\"alignnone size-full wp-image-2300\" srcset=\"http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/ajax-serp.jpg 564w, http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/ajax-serp-300x208.jpg 300w, http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/ajax-serp-431x300.jpg 431w\" sizes=\"(max-width: 564px) 100vw, 564px\" \/>\n<\/div>\n<p>Note that the title and the snippet on SERPs are unique and come from the particular blog post. Now let\u2019s see how the escaped URL looks like:<\/p>\n<p><a href=\"http:\/\/thebullittagency.com\/?_escaped_fragment_=\/blog\/21-Fabric-Blog-Features-Luca-Bacchetti-\" rel=\"nofollow\" target=\"_blank\">http:\/\/thebullittagency.com\/?&amp;_escaped_fragment_=\/blog\/21-Fabric-Blog-Features-Luca-Bacchetti-<\/a><\/p>\n<p>As required from the AJAX crawling scheme they return <strong>more or less<\/strong> the same HTML content as the one that they load with AJAX. Note that if you want to be 100% safe you should present <strong>exactly<\/strong> the same code to avoid automated bans for cloaking.<\/p>\n<p>Now let\u2019s see how Google handles the PageRank for the AJAX URLs. Let\u2019s check the PageRank values by using our <a href=\"http:\/\/www.webseoanalytics.com\/free\/seo-tools\/google-pagerank-checker.php\">PageRank Check tool<\/a>. (Note that all of our tools handle the AJAX crawling scheme). Here are the results:<\/p>\n<div style=\"margin:10px 0 10px 0; color: #59B3D8;\">\nhttp:\/\/thebullittagency.com\/#!\/blog\/ &#8211; <b>PageRank: 3<\/b><\/p>\n<p>http:\/\/thebullittagency.com\/?&amp;_escaped_fragment_=\/blog\/ &#8211; <b>PageRank: 3<\/b><\/p>\n<p>http:\/\/thebullittagency.com\/ &#8211; <b>PageRank: 4<\/b>\n<\/div>\n<p>So Google <strong>does handle<\/strong> the links and the PageRank values differently for the first 2 URLs since they don\u2019t have the same PR value as the homepage. Also the escaped URL (the second one) has exactly the same value as the first one. This is what we should expect after all since Google makes it clear that they handle those 2 URLs as the same. This is actually good news because it means that if someone decides to add a link to the blog post, all the link juice and the anchor text info will pass to <strong>the actual article<\/strong> and not to the homepage.<\/p>\n<p>Now let\u2019s see how many pages are indexed. If we search on Google the query \u201csite:thebullittagency.com\u201d, we\u2019ll get more than 1 thousand results which means that the website is normally indexed. Also if we try the query \u201csite:http:\/\/thebullittagency.com\/#!\/blog\u201d we\u2019ll get all the articles that have been written on their blog. So Google\u2019s AJAX crawling scheme is safe to use, right? <strong>Nope!<\/strong><\/p>\n<div style=\"text-align:center;margin:10px 0 10px 0;\">\n<img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/ajax-serp2.jpg\" alt=\"ajax-serp2\" title=\"ajax-serp2\" width=\"564\" height=\"392\" class=\"alignnone size-full wp-image-2301\" srcset=\"http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/ajax-serp2.jpg 564w, http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/ajax-serp2-300x208.jpg 300w, http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/ajax-serp2-431x300.jpg 431w\" sizes=\"(max-width: 564px) 100vw, 564px\" \/><br \/>\n<img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/ajax-serp3.jpg\" alt=\"ajax-serp3\" title=\"ajax-serp3\" width=\"564\" height=\"392\" class=\"alignnone size-full wp-image-2302\" srcset=\"http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/ajax-serp3.jpg 564w, http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/ajax-serp3-300x208.jpg 300w, http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/ajax-serp3-431x300.jpg 431w\" sizes=\"(max-width: 564px) 100vw, 564px\" \/>\n<\/div>\n<p>Let\u2019s do the query \u201csite:thebullittagency.com\u201d on Bing. Well, the things don\u2019t look that good there. There are only 2 results, the homepage and an ugly escaped URL that was probably submitted either manually or linked directly from an external source. So Bing <strong>does not<\/strong> handle at all those URLs, they ignore everything after #! and when a link is placed on an internal page they pass all the juice on the homepage.<\/p>\n<div style=\"text-align:center;margin:10px 0 10px 0;\">\n<img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/ajax-serp4.jpg\" alt=\"ajax-serp4\" title=\"ajax-serp4\" width=\"650\" height=\"315\" class=\"alignnone size-full wp-image-2303\" srcset=\"http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/ajax-serp4.jpg 650w, http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/ajax-serp4-300x145.jpg 300w, http:\/\/www.webseoanalytics.com\/blog\/wp-content\/uploads\/2011\/09\/ajax-serp4-500x242.jpg 500w\" sizes=\"(max-width: 650px) 100vw, 650px\" \/>\n<\/div>\n<p>But if this is true then why twitter.com is indexed properly on Bing? The answer is that Twitter does use the #! in their URLs but when a search engine requests the \u201chttp:\/\/twitter.com\/username\u201d version of the page they provide the HTML normally. Of course if a user tries to access this version they do a <strong>sneaky JavaScript redirect<\/strong> to the #! version by using the following code: \u201cwindow.location.replace(`\/#!\/Username`);\u201d. Why Twitter uses this approach? Because by using AJAX they have <strong>less page refreshes<\/strong>, they improve their loading time and they reduce their <strong>operating costs<\/strong> (less servers, more available bandwidth etc).<\/p>\n<p>The above technique <strong>is not<\/strong> a generic method that will help you index AJAX content and it is extremely <strong>dangerous<\/strong> since it violates the policy of Google concerning JavaScript redirects. It might be ok if you are Twitter (which had its Toolbar PageRank value dropped few months ago), but it is definitely not ok if you are a simple webmaster.<\/p>\n<h2>Conclusions<\/h2>\n<p>For me as a programmer the AJAX crawling problem is far from resolved. Google has proposed a clever and low cost for them solution for crawling AJAX content, nevertheless this approach is <strong>complicated<\/strong> and really <strong>costly<\/strong> for the developers. That is why 2 years after the proposal of the AJAX crawling scheme a very small amount of websites have actually implemented properly. Moreover we should note that currently <strong>only Google<\/strong> supports this scheme and by using it you risk <strong>losing the traffic<\/strong> that you receive from the other search engines.<\/p>\n<p>When should you use it? Perhaps you can use Google\u2019s method when you have <strong>no other choice<\/strong>. Personally I believe that you <strong>always have the choice<\/strong> of not using AJAX technology on pages that are important for Search Engines. If I had to use AJAX, I would go for the Hijax technique that is easier, safer and supported by all search engines.<\/p>\n<p>If you feel confused with all these, I strongly recommend you to stay away from AJAX and don\u2019t use it on your money-making landing pages. If you have questions or suggestions feel free to leave your comment below. Last but not least don\u2019t forget to share this article if you found it useful. Sharing is caring! \ud83d\ude42<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Almost 2 years ago, Google has proposed a new way of making the AJAX applications &amp; websites crawlable. As most of you know, search engines are very good in indexing and analyzing HTML documents but they are not particularly good &hellip; <a href=\"http:\/\/www.webseoanalytics.com\/blog\/googles-ajax-crawling-scheme-and-its-effects-on-seo\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":2298,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[6,10],"tags":[],"aioseo_notices":[],"_links":{"self":[{"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/posts\/2290"}],"collection":[{"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/comments?post=2290"}],"version-history":[{"count":0,"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/posts\/2290\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/media\/2298"}],"wp:attachment":[{"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/media?parent=2290"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/categories?post=2290"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.webseoanalytics.com\/blog\/wp-json\/wp\/v2\/tags?post=2290"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}