Duplicate Content: How to solve the problem

duplicatedIn the previous article “Duplicate Content: the effects on Search Engine Rankings” we have explained what duplicate content is and we have analyzed how it can affect the search engine rankings of our website. In this article we will focus on the best practices that can be used in order to solve the duplicate content problem and we’ll examine the technical aspects of the issue.

What is the root of the duplicate content problem?

As we saw in the previous post duplicate content comes from submitting multiple times the same content in different pages or websites, from using incorrectly multiple domain names and from using incorrect web development and SEO practices.

In the first case, the problem is usually caused by the webmasters who try to promote their websites by posting the same articles, texts or press releases in multiple websites. Additionally this could be the result of an incorrect link building strategy in which SEOs try to increase the number of backlinks by submitting the same content to multiple sources. Thus in this case, the root of the duplicate content problem is the user who tries to promote his website with grayhat or blackhat techniques.

In the second case, the problem is caused by companies that acquire and use multiple domain names for the same website. For example by pointing incorrectly the example.com and the example.co.uk to the same website it is certain that one will face duplicate content issues. Thus again in this case, the root of the problem is the webmaster or the web development company that does not know how to setup correctly the 301 redirections and that does not use the best web techniques.

The third case is much more interesting and technical. The root of the problem is that the HTTP protocol does not provide a standard way to identify the “best” URL of a particular page. This means that one page can be accessed by multiple valid URL addresses and at the same time no information is available about the canonical URL.

Examples:

  • http://example.com
  • http://www.Example.com
  • http://www.example.com
  • http://www.example.com/
  • http://www.example.com/index.html
  • http://www.example.com/index.htm
  • http://www.example.com/index.html?somevar=

All the above URLs could lead to the same page, but the HTTP protocol will neither point out the “best” one nor guarantee that all the above addresses are directed to the same page. So in the above example the http://www.example.com and the http://www.example.com/index.html could either lead to the same or to 2 completely different pages.

Also we need to have in mind that there are lots of different languages (PHP, ASP.NET, ASP, JSP, CGI, ColdFusion, etc) and web technologies that can be used in order to support dynamic websites. Due to the fact that the various web technologies support different features (default vs index pages, case sensitive vs case insensitive etc) the situation gets more complicated.

All the above difficulties make it easy for someone who does not understand how search engines work to make mistakes in the link structure of the website and affect the SEO campaign. So the question is how can we avoid those mistakes?

How to solve the duplicate content problem

In the first case the solution is relatively easy. All you need to do is to avoid submitting the exact same content to multiple sources and always make sure you use whitehat SEO techniques. Make sure you prepare different versions for the same article or press release so that search engines will not consider it as duplicate. This will help you build better links and generate more traffic.

In the second case, when there is the need of acquiring multiple domain names for the same website make sure you select only one primary domain and setup HTTP 301 redirections for the rest. So say for example that for a particular website you use 2 domain names: example.com (primary) and example.co.uk (secondary). Then what you want to do is to setup a 301 redirection to the example.co.uk so that whenever someone types this domain he/she will be redirected to the example.com. There are several ways to do this (DNS settings, .htaccess, PHP/ASP/JSP redirection etc), but the most straight forward is by modifying the DNS settings from the panel of your domain provider.

The third case is a bit more complicated. As we said in the previous article, Search Engines do take steps to minimize the effect of the duplicate problem by identifying the best URL for a particular page. They use a set of rules that are applied in the URLs in order to identify the best possible version (for example the trailing / is added after “.com”, the domain name is converted to lowercase, they determine whether to use the www or the non-www version etc). After that they are forced to visit the different URLs and analyze the pages in order to determine whether they are duplicate or unique. Nevertheless even if search engines do try to solve the issue, the SEO campaign can be affected and thus it is highly recommended working on your link structure in order to eliminate those problems.

Working on your link structure

So what you want to do is to make sure that all the links of your site point to the best URLs and that there are no situations where 2 different URLs lead to the same page.

Here is a list of the most important things you should look out:

  1. Remove all the URL variables that are not important from all the links (SESSIONIDs, empty variables etc).
  2. Decide whether to use the www or the non-www version for your site and place a 301 redirection to the other version.
  3. Decide whether to use the “index.html” in the URL when you point to the default page of a folder
  4. Add the trailing / at the end of each folder.

There is a great article by Ian Ring on this subject so I am not going to discuss it further. Make sure you read his article “Be a Normalizer – a C14N Exterminator” and also the Wikipedia article on URL normalization. All these tips can help you optimize the link structure of your website and this is going to help you solve the major duplicate problems.

301 redirections

Another great way to solve the problem is by using 301 redirections. Especially in cases where the incoming links of a particular page are divided between the various duplicate versions of the page it is highly recommended to use the above rules in order to select the “best” URL and then setup 301 redirections to the rest of the pages. This can be done easily by using either the .htaccess file or a programming solution (PHP redirection).

Canonical Tag

When working on the link structure of the website is not an option there is an alternative called Canonical Tag. The Canonical Tag was proposed by Google in order to resolve the duplicate content issue.

To be precise it is not a tag but a value of the attribute rel of the <link> tag:

<link rel=”canonical” href=”http://www.example.com/product.php?item=swedish-fish” />

It is placed in the HTML headers in order to notify the search engines about the best URL for the particular page. Using the canonical in your pages is something very useful and it can help you reduce drastically the amount of duplicate content within your site. Additionally it is a great way to pass the link juice that is lost to the duplicates back to the canonical pages.

Keep in mind that this tag is only a hint, not a directive for the major search engines. In order to use it properly the URLs (both the canonical and the duplicates) must be almost identical.

For more information about Canonical URLs check out the article of Matt Cutts SEO advice: url canonicalization and the article of the official Google Blog Specify your canonical.

The best methods to solve the duplicate content problem

As we said above, there are several ways to solve the problem. Here is the list of the methods that you should use (Note that it is highly recommended to try solving the problem by using the first 3 ways):

  1. Work on your link structure
  2. Use 301 redirections
  3. Use canonical tag
  4. Exclude parameters such as sessionIDs from Google Webmaster Tools Console
  5. Last resort, block the duplicate content with robots.txt
TRY THE WSA TOOLBOX
  • Pingback: Duplicate Content: the effects on Search Engine Rankings | Web SEO Analytics

  • Pingback: Tweets that mention Duplicate Content: How to solve the problem | Web SEO Analytics -- Topsy.com

  • Pingback: Duplicate Content: How to solve the problem | Web SEO Analytics :PC & Internet Lifestyle

  • Aidan

    Good article, while Google is better at handling it now than in the past it still causes problems on dynamic ecommerce sites. Good to see Google get better at taking the canonicalisation ‘hint’ working across domains now too.

  • Joshua

    Vasilis,

    I don’t understand your argument for the first type of duplicate content. It doesn’t make sense that Google would penalize the originator of content if anyone can copy it and paste it on their site. Of course duplicate copy on YOUR URL would cause problems. But having the same article somewhere else on the web…I’m not sure that is an issue. After all, you have no control whether or not someone is going to rip you off.

    What say you?

    Best,
    Joshua

  • doug hay

    I’m not sure how the use of RSS impacts the duplicate content issue. e.g. a press release syndicated on a wire service is a piece of content that ends up being duplicated on many news sites. Wonder how Google deals with that?

  • bbriniotis

    Hi Joshua,

    Thank you for your comment. You probably misunderstood what I wrote, so allow me to clarify the issue.

    I didn’t suggest that you will get a penalized because of submitting the same content in other websites. There is not such a thing as getting banned due to duplicate content. What I mean is that those particular links that come from duplicate pages will not make any good to you, mainly because they will be marked as duplicates from Google. That’s why submitting the same content in 100 places does not lead to a successful link building strategy. So since those pages are in external websites where you have no control, they will neither affect you negatively nor positively.

    Best Regards,
    Vasilis

  • Pingback: 重复内容:如何解决这个问题 « DaoXiaoKe

  • Adeel Janjua

    Hi… Read through the article but missed the part where the competition duplicates your content just to get you rank lower on G! Or is that a non-issue with search engines? nnThing is that one of my competitors has copied my entire website from head to toe, including the images and put it up on a new domain name he/she just purchased. Now this will definitely hurt my rankings, or so I believe. nnI do not want to pay for a DMCA report as how many times am I going to do this, right? ‘Cuz I am sure that once I pay $99 he/she will take down that site and do it on another domain… nnHow do I fight this??

  • SEO Marketing

    Using the hash:
    Few know the implementation and benefits in SEO Optimization (sites) that the character “#” can bring.
    Pound & SEOAs applications of hash characters can be extremely useful in the elimination of duplicate content and also to combat the loss of link juice assigned to different URLs.

  • Nick Squires

    Great article thanks. Can you clarify further, if your able to delete duplicate content on other sites then should you do this or will it have no psitive benifit, Nick squires

  • Pingback: Anonymous

  • Pingback: 7 deadly mistakes to avoid in Google Webmaster Tools Console | Web SEO Analytics

  • Pingback: SEO Checklist: 60 essential checks before launching a website | منابع و سوالات آزمون تحصيلات تكميلي

  • Pingback: Duplicate Content Problemi Nasıl Çözülür?

  • Pingback: Duplicate Content: the effects on Search Engine Rankings | Top SEO News

  • Door Levers

    At the moment, Copyscape is the software of choice for me when trying to check if the content I have is 100% legit. I like to be as genuine as possible when submitting any new content for the search engines.

Leave a Reply