Google’s AJAX crawling scheme and its effects on SEO
The reason why I write this article is to briefly explain what AJAX is, to discuss the problem of crawling AJAX content, to view the various solutions that have been proposed, to discuss the common mistakes that web developers make while using non Search Engine friendly technologies like AJAX & Flash and finally to explain whether or not you should use Google’s proposal. Even though this topic is a bit advanced and it requires basic knowledge of the web development techniques and of the web technologies in general, I try to focus less on the programming part and more on the SEO part. If you have any questions feel free to post them on the comments below and we’ll try to cover them on an upcoming blog post.
What is AJAX?
The Problem of Crawling AJAX
Solutions for Crawling AJAX
The 2 most popular techniques that have been proposed over the years are the Hijax Approach and the AJAX crawling scheme of Google.
The Hijax technique
Neither like this:
Both of the above approaches are very popular to web developers but unfortunately they do not provide a meaningful URL that can be used by the search engines. By using the Hijax technique the above link should be rewritten as follows:
Of course the above technique has several limitations since it does not cover cases where the AJAX content is created dynamically based on the input of the user.
The AJAX crawling scheme of Google
The Google’s AJAX crawling scheme proposes to mark the addresses of all the pages that load AJAX content with specific chars. The whole idea behind it is to use special hash fragments (#!) in the URLs of those pages to indicate that they load AJAX content. When Google finds a link that points to an “AJAX” URL, for example “http://example.com/page?query#!state”, it automatically interprets it (escapes it) as “http://example.com/page?query&_escaped_fragment_=state”.
The programmer is forced to change his/her Website Architecture in order to handle the above requests. So when Google sends a web request for the escaped URL, the server must be able to return the same HTML code as the one that is presented to the user when the AJAX function is called.
After Google sees the “AJAX URL” and after interpreting (escaping it), it grabs the content of the page and indexes it. Finally when the indexed page is presented in the Search Results, Google shows the original AJAX URL to the user instead of the “escaped” one. As a result the programmer should be able to handle user’s request and present the appropriate content when the page loads.
Common Web Development mistakes
All the above links will generate a web request to the URL http://example.com/ and all the extra parameters after # will totally be ignored. That is why Search Engines ignore everything after # (We’ll talk about “#!” a bit later, but yes it creates also the same web request on the server as the previous URLs).
The only exception to the above rule is when the #! is used. So you might be thinking that if you use #! instead of # you will be ok. Unfortunately the answer is NO! Just by using it you will gain nothing. You must also write code in PHP, JSP, ASP or ASP.NET in order to ensure that your server will handle the Google’s AJAX crawling scheme and present the appropriate holding page (as we explained above).
Should you use Google’s proposal? Let’s focus on a case study.
Currently Google’s AJAX crawling scheme has been implemented by a relatively small amount of websites and many of them have not done it properly. One of the websites where it has been done a pretty good job is the thebullittagency.com (note that it is not related to our company).
First of all let’s run the query “Fabric Blog Features Luca Bacchetti” on Google. The URL of the first result is the following:
Note that the title and the snippet on SERPs are unique and come from the particular blog post. Now let’s see how the escaped URL looks like:
As required from the AJAX crawling scheme they return more or less the same HTML content as the one that they load with AJAX. Note that if you want to be 100% safe you should present exactly the same code to avoid automated bans for cloaking.
Now let’s see how Google handles the PageRank for the AJAX URLs. Let’s check the PageRank values by using our PageRank Check tool. (Note that all of our tools handle the AJAX crawling scheme). Here are the results:
http://thebullittagency.com/?&_escaped_fragment_=/blog/ – PageRank: 3
http://thebullittagency.com/ – PageRank: 4
So Google does handle the links and the PageRank values differently for the first 2 URLs since they don’t have the same PR value as the homepage. Also the escaped URL (the second one) has exactly the same value as the first one. This is what we should expect after all since Google makes it clear that they handle those 2 URLs as the same. This is actually good news because it means that if someone decides to add a link to the blog post, all the link juice and the anchor text info will pass to the actual article and not to the homepage.
Now let’s see how many pages are indexed. If we search on Google the query “site:thebullittagency.com”, we’ll get more than 1 thousand results which means that the website is normally indexed. Also if we try the query “site:http://thebullittagency.com/#!/blog” we’ll get all the articles that have been written on their blog. So Google’s AJAX crawling scheme is safe to use, right? Nope!
Let’s do the query “site:thebullittagency.com” on Bing. Well, the things don’t look that good there. There are only 2 results, the homepage and an ugly escaped URL that was probably submitted either manually or linked directly from an external source. So Bing does not handle at all those URLs, they ignore everything after #! and when a link is placed on an internal page they pass all the juice on the homepage.
For me as a programmer the AJAX crawling problem is far from resolved. Google has proposed a clever and low cost for them solution for crawling AJAX content, nevertheless this approach is complicated and really costly for the developers. That is why 2 years after the proposal of the AJAX crawling scheme a very small amount of websites have actually implemented properly. Moreover we should note that currently only Google supports this scheme and by using it you risk losing the traffic that you receive from the other search engines.
When should you use it? Perhaps you can use Google’s method when you have no other choice. Personally I believe that you always have the choice of not using AJAX technology on pages that are important for Search Engines. If I had to use AJAX, I would go for the Hijax technique that is easier, safer and supported by all search engines.
If you feel confused with all these, I strongly recommend you to stay away from AJAX and don’t use it on your money-making landing pages. If you have questions or suggestions feel free to leave your comment below. Last but not least don’t forget to share this article if you found it useful. Sharing is caring! 🙂