The duplicate content is one of the major problems that most of the websites face. Search engines are also affected since they waste valuable resources in order to crawl, analyze and detect those pages. Finally since the duplicate content reduces the quality of the search results, this problem also affects the search engine users. So how can we solve this problem once and for all?
In this article we will focus on the root of the problem, we’ll see why it can be a major issue for the search engines and for the websites and we will explain how duplicate content affects SEO. In the next article we’ll examine in detail the most common web development and SEO mistakes that lead to duplicate content problems and we’ll suggest ways to solve the issue.
What is duplicate content?
The term Duplicate Content is used to describe the situation where multiple URLs have the same or almost the same content. Note that those pages can be part of the same or of different websites.
How duplicate content is created?
One of the common ways to get this result is by copy-pasting the same text in different pages or by submitting the same content/article/review in multiple sites. Additionally duplicate content can be the result of using poor Web development techniques or of developing a bad link structure.
Why duplicate content is a problem?
In order to understand why the Duplicate Content is a problem you need to see it from the search engine’s point of view. Search engines need to crawl, analyze & index, find the reputation of each page and be able to search fast through their index in order to return the results to the users. Having lots of duplicate content in a website is bad for search engines since they waste their resources on pages that do not usually have a significant value for the users.
Matt Cutts, a well known Google employee, has mentioned in one Google-Webmaster-Help video that in order to crawl a large part of the web you need a relatively small number of machines (more than 25 less than 1000). This means that crawling a website requires a relatively small amount of resources. Nevertheless the analysis of the page, the evaluation of the links and the indexation is a much more time consuming process. Those of you, who have coded web-spiders in the past, know that the analysis requires lots of CPU and memory comparing to the web requests. This is due to the complexity of the algorithms that are used in the text analysis.
Clearly the duplicate content is a problem for the search engine users because it affects the quality of the search results. But why this is a problem for the webmasters? Well, since this problem requires additional resources that cost money to the search engine companies, they try to force the webmasters and the SEOs to help them solve the issue. And the cheapest way to solve it is by motivating the webmasters to eliminate their duplicate pages.
Does duplicate content issue affects the rankings?
Even if duplicate content will not directly lead to bans from search engines (read the article “Why my SEO campaign failed? Part 1: Common On-page Optimization mistakes”), it does affect the SEO status of a website. When search engines identify cases of duplicate content they try to find out which is the best version of the page that should appear in the search results. Normally this choice is made based on the age of the page, the authority of the domain, the number of incoming links, the PageRank etc. So if few pages of your site contain lots of copy-pasted text from another pages or websites there is a good chance that they will not appear in the search results.
Additionally as we said before, duplicate pages can be a result of poor programming or link structure development. Usually the dynamic websites tend to pass variables in each dynamic URL in order to get a particular record from the Database:
In the above example the product with id 3012 could have lots of different pages (one for every color). If the content of every page does not change significantly then this could lead to duplicate content problems.
The PageRank distribution is negatively affected by the presence of duplicate pages. Since PageRank flows through the links, a lot of important link juice is directed to duplicate pages or gets evaporated. As a result the Rankings of the website are affected.
Finally another reason why duplicate content can negatively affect your rankings is that search engines find and index a particular number of pages from every website. The number of pages depends on the domain authority. If your site contains lots of duplicate pages, then the re-crawl period will increase and the new important pages that you add in your website will be indexed much slower.
Should duplicate content be a problem for all websites?
Certainly having the same content in many different pages of the website is not particularly useful for the users, but this does not mean that it’s a critical mistake. As we mentioned above it becomes a major problem for webmasters because it affects their Search Engine rankings.
Generally speaking, in most of the cases, unique quality content is very important for the users. But the question is, should duplicate content affect all webmasters or there are cases where they should not affect their SEO status? Many users have asked Matt Cutts in the past whether it is a problem for online stores to have the same generic product descriptions that are used also by other websites. Matt Cutts replied that this is a problem for their SEO campaigns and that if they want to attract more users they should do it by providing unique quality content and by differentiating from other e-commerce sites. I am sorry Matt but I have to strongly disagree with you on that.
My personal opinion is that when a user searches for a particular branded product, for example the new Xbox 360 console, he/she does not really care about the well-written description, the number of incoming links of the domain, the PageRank of the page, or the authority etc. He/she does care about the price, the product and the services that come with it. Using the same algorithms or principles to evaluate blogs and ecommerce websites is not the best practice. Even if I do understand that there are lots of technical difficulties in evaluating correctly the products, still you can’t force or suggest to the e-tailers that they have to spend lots of time and effort in rewriting their product descriptions in order to avoid a duplicate content disaster. E-tailers don’t differentiate from the competition by providing unique descriptions but by providing unique quality products and services.
That was the first part of the article. The next part will focus on more technical subjects and on how to solve the duplicate content problem.
Images by searchenginejournal, seodenver