Home » What is Duplicate Content and Its Effect on SEO?

What is Duplicate Content and Its Effect on SEO?

Last week I spoke with a client about his company’s blog and I gave him a few suggestions on how he could increase the engagement and improve the SEO of its website by using blog articles.

I was very surprised to see his negative reaction to the idea of re-posting his LinkedIn articles on the blog or re-posting other content he had created on social media in the past. His only objection was: “But isn’t that duplicated content? Won’t I be penalized by Google if I do that?”

This made me realize that a lot of people who’ve heard the term “duplicate content” often associate it with some black-hat SEO techniques, or even with a penalty. Many people don’t realize that their website already has some unintentionally created duplicate content.

In this article, I am speaking about duplicate content and the most common situation in which duplicated content may have a negative effect on your SEO. I will also cover the main causes of unintentionally created duplicate content on a website.

What is duplicate content?

Duplicate content is content that appears on the Internet on more than one URL. URL, as we know stands for “Uniform Resource Locator” and in simple words, it is the unique website address of a web page. So, if the same content appears at more than one website address, we’ve got duplicate content. 

The myth of the duplicate content penalty

There are a lot of misconceptions surrounding how Google handles duplicate content. One of the most popular ones is that Google could actually penalize your website if it has a big amount of duplicate content on it. This is not true! The presence of duplicate content on your website can not cause a penalty and there’s no such thing as a “duplicate content penalty.” 

In fact, according to Matt Cutts, 25 to 30 percent of the web is duplicate content. And a recent study by Raven Tools found that 29 percent of pages had duplicate content. So, we could freely assume that there’s a lot of duplicated content on the web and it unserious to think that 30 percent of all websites on the Internet are running a risk to get a Google penalty.

How duplicate content affects SEO and search visibility?

While technically there’s no penalty for duplicate content, it can still cause a negative impact on search engine rankings.

We know that every search engine’s goal is to provide optimal user experience and the most relevant search results for each search query. Therefore, listing in the SERPs multiple URLs that have the same content is not useful for the user, nor it is for the search engine.

In order to provide the best search experience, search engines will rarely show multiple URLs for the same content, but they will be forced to choose which version (which URL) is most likely to be the best result. So Google will choose one URLs and filter all the others in the search results. 

This may create problems for the search visibility of a piece of content that has duplicates:

  • The existence of duplicates dilutes the visibility of each of the versions of duplicates: the search engine will choose one URLs and filter all the URLs with duplicate content in the search results.
  • The existence of duplicates dilutes link equity: instead of all inbound links pointing to one piece of content (one URL), some websites may link to the duplicated versions of the content (different URLs), spreading the link equity among the duplicates.
  • Because inbound links are a ranking factor, this can then impact the search visibility of a piece of content.

How Google chooses which version of duplicate content should be shown?

As I mentioned, in the case of multiple URLs with duplicated content, Google will choose one URL and filter all the others in the search results. How Google chooses which URL to show?

Google tries to determine the original source of the duplicated content and display that one in the SERPs. How’s that done?

Google’s algorithms will group the various duplicate content versions into a cluster and the “best” URL in the cluster is displayed. The selection of the “best” URL is made on various signals (such as links) from pages within that cluster to the one being shown. For that reason, Google does not recommend website owners to block access to duplicate content. If Google can’t crawl all the versions, they can’t consolidate all the signals.

In most cases, Google will successfully identify the original source of the content and will filter all the duplicate versions in the search results. But on very rare occasions when Google ranks a duplicate version of the content instead of the original version, webmasters can notify Google and ask him to fix the mistake, by using the Scraper Report Tool.

Will publishing duplicated content from another source hurt your website?   

Let’s say, for example, that you have an e-shop that sells coffee beans online. You have recently discovered a scientific research article entitled “The Beneficial Effects of Coffee in Human Nutrition” that was published on the website of the University of Naples. You decide that this content will be very useful for your customers and you re-post (by re-post I mean copy/paste) the article on your blog. 

In this case, contrary to a popular belief, you don’t risk a penalty from Google, but you won’t receive any credit for the content either. When a user makes a search for “Beneficial Effects of Coffee” or a similar search query, they will most probably find the URL of the University of Naples’s website, where the article was initially published.  Your readers, however, who most probably never visit the website of the University of Naples, will have the opportunity to find and read it on your website.

Besides of what you may think this non-original content won’t hurt your website’s ranking, 

In which case duplicate content is bad for your website?

However, in some situations, duplicate content may create issues for your search visibility and that’s mainly when you have duplicated content between pages on your own website. Most often this type of duplicate content is unintentionally created.

These are the most common causes duplicate content is unintentionally created on your website:

  1. Your website has both WWW vs. non-WWW pages – If your site has separate versions at “www.yoursite.com” and “yoursite.com” (with and without the “www” prefix), and the same content lives at both versions, you’ve effectively created duplicates of each of those pages.
  • Your website has both HTTP vs. HTTPS pages – If your site maintains versions at both http:// and https:// protocols and the same content lives at both versions, you’ve effectively created duplicates of each of those pages.
  • If your website uses URL parameters in the URLs – URL parameters are a common duplicate content creator, because they do not change the content of a page, but they create multiple versions of one URL that has the same content. Most often URL parameters are used for tracking and sorting.

For example:

www.yourwebsite.com/products?pricehigh  is a duplicate of www.yourwebsite.com/products.

And same stands for www.yourwebsite.com/products?pricelow which is a duplicate of both.   

www.yourwebsite.com/article is a duplicate of ww.yourwebsite.com/article?source=rss

The duplicate URLs are caused not only by the parameters themselves but also by the order in which those parameters appear in the URL itself.

For example:

www.yourwebsite.com/?id=1&cat=2   is a duplicate of www.yourwebsite.com/?cat=2&id=1

  • If your website uses Session IDs in the URLs: session IDs are another common duplicate content creator. This occurs when each user that visits a website is assigned a different session ID that is stored and visible in the URL.
  • If your CMS creates printer-friendly pages: Printer-friendly versions of content can also cause duplicate content issues when multiple versions of the pages get indexed.

For example:
www.yourwebsite.com/print/article is a duplicate of www.yourwebsite.com/article

If your CMS creates printer-friendly pages  – Printer-friendly versions of content can also cause duplicate content issues when multiple versions of the pages get indexed.

If your CMS uses comment or reviews pagination – most content management systems, including WordPress, have an option to paginate your comments or product reviews. This will duplicate the content of your article or product page across multiple paginated URLs

For example:
www.yourwebsite.com/article/comment-page-1/ will have the same article content as www.yourwebsite.com/article/comment-page-2/ and www.yourwebsite.com/article

Although Google is good at recognizing URL parameters, in all of these 5 scenarios, you risk to confuse the search engine and a less desirable URL version of one page can be shown in search results. As we saw, this may lead to visibility and link equity delusion.

Fortunately, there are many solutions that can help you to avoid duplicate content issues and I will cover them in my next article. And feel free to contact me, if you want to know if your business website has duplicated content issues and how to fix them.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top