In life, it’s crazy how certain things can drive us crazy. I’m thinking of people who shamelessly double-cross you in line at the supermarket.

People who don’t cook pasta al dente, and cut it when they put it in boiling water… Or those early morning alarm clocks that ring, ring and ring again to get you out of a nice warm bed.

Man angry at his alarm clock is punching it.
We warned you, dear alarm clock, though.

And then you find out duplicate content. You know, that unpleasant moment when you find all or part of your content word for word on another website.

Or when that good old Ctrl+C Ctrl+V (or Cmd+C Cmd+V for the pro-Mac crowd) has wreaked havoc again. As they say, it really drives you crazy.

Duplicate content is a real problem when you find it on other people’s sites, but it can also pop up on your own WordPress site, without you even knowing it.

This is just as annoying, especially because it can have negative consequences for your SEO.

To avoid this, follow me!

In this post, you will learn everything about this damn duplicate content, and especially concrete solutions to get rid of it. And I promise, it’s guaranteed to be free of fuss.

Your best WordPress projects need the best host!

WPMarmite recommends Bluehost: great performance, great support. All you need for a great start.

CTA Bluehost WPMarmite

What is duplicate content?

Duplicate content is content that is similar on multiple web addresses (URLs) at once, whether on different pages of the same website, or on other websites.

This complicates the work of search engines like Google, which may choose to rank duplicate pages lower in their SERP (search engine results pages).

In order to make it clear, there are two main types of duplicate content:

  • Internal duplications, which take place on your site, without you really knowing it, most of the time. Let’s say they are made without your knowledge.
  • External duplications, when other sites copy all or part of your content on their pages.

The image of tracing paper

Now, to illustrate what duplicate content is, let’s take a step back several (long) years: to the elementary school benches, and the art lesson.

Do you remember the famous tracing paper, which allows you to reproduce an identical handwritten drawing? Well, duplicate content is a bit like that.

Let’s say that the basic drawing represents the original URL of your content, for example https://yourwebsite.com/your-awesome-post/.

The drawing reproduced identically (or partially) using the tracing paper, illustrates the duplicated URL: https://yourwebsite.com/your-awesome-post-bis/.

Is it clear for you? Then come back to the future, I mean to the present.

Doc, from the movie "Back to the Future", is holding electrical clamps.
It’s okay Doc, I’m ready.

More than a quarter of the web is duplicated

In 2013, Matt Cutts, a former Google engineer, said 25% to 30% of the content published on the web would be duplicate.

Even if this statistic goes back a bit, it gives you a pretty telling order of magnitude.

Fortunately, as Google states, Mostly, this is not deceptive in origin“.

This means that the causes of duplicate content are often technical and unintentional: the webmaster that you are does not create duplicate content on purpose.

Therefore, Google, the most used search engine in the World, will not tend to penalize you if your goal is not to “deceive and manipulate” its search results.

However, be careful: if Google does not consider this practice as spam, it does not really like duplicates either.

Why? Because in the end, it has to make extra efforts to index and “display pages containing distinct information”.

In SEO, the indexing phase corresponds to the moment when search engine robots scan pages on the entire web, in order to classify them in an index (a sort of gigantic database).
It is in this index that a search engine like Google draws to be able to display the most relevant results in its results pages (SERPs).

How does Google deal with duplicate content?

Well, I say “it” when talking about Google, but in fact I should have said “Google’s robots”, also called spiders or Googlebots.

Here is a schematic of how they operate, when they spot duplicates:

  • They browse the web for new contents, by navigating from link to link (remember that the web is huge).
  • When they come across duplicate content, they group them into a cluster.
  • Then they display THE best result, according to them, of the contents present in this cluster.
Eddie Murphy makes the ok sign with his hand.

The popularity bonus, rather than the seniority bonus

The problem is that this best result does not always correspond to the original content (the one that is not duplicate).

On this point, it’s hard to blame Google: imagine how difficult the task is for it, when it has to find the original among thousands of identical contents!

To proceed, Google does not rely on the date of publication of a content, as says Daniel Roch.

It would be too simple, since you can “modify at will in the administration the date of each of your contents”.

Google relies on “the popularity of the URL and the domain to determine who is at the origin of the content and who are the possible plagiarists”, adds Daniel Roch. “In other words, if a site with greater popularity steals content from you, you lose the battle with the search engine”.

The same Matt Cutts details this in this video, if you are interested:

We can keep in mind two major things from this demonstration:

  • Google does not penalize strictly speaking duplicate content, except in “rare cases” where it was created “to manipulate our rankings and deceive our users”. If this happens, the site concerned “will no longer appear in its search results”.
  • The rest of the time, duplicate content is not penalized, but it is just the same. If you are a victim of duplicate content and Google has decided not to display the original version of your content, you become invisible in its search results pages.

As a result, your search engine optimization (SEO) actions can suffer significant consequences.

A man shakes his head and says "oh no".
Yes, unfortunately.

What is the impact of duplicate content in SEO?

Duplicate content can have a negative impact on the SEO (Search Engine Optimization) of your content.

In other words, you could see traffic decrease on your site, and lose positions on search results pages for several reasons:

  • Google doesn’t know exactly which is the original version of a duplicate content, so it will only display one, and therefore “hide” all other identical results in its search results.
  • The backlinks that other users will make to your duplicate content will be less effective. The links will be distributed among several duplicate publications, and will therefore have less power. However, the more relevant backlinks a content has, the more it increases its chances to be better ranked.
  • You will consume more crawl budget (the maximum number of pages that Google can crawl on your WordPress website), because the search engine will have to spend more time crawling your duplicate content, with the risk of indexing new “original” content less quickly, or not indexing it at all.

As duplicate content often lurks in the shadows and can not always be identified and tamed, find out in the following part several ways to unmask it.

Zorro with his sword.

How to find and recognize duplicate content?

With your eyes: the visual method

You close them to sleep, then open them wide as soon as you wake up, and to read this article: your eyes are your first weapon to detect possible traces of duplicate content, especially external duplications.

Imagine: you published a post, several months ago, which distils tips on how to make a delicious chocolate brownie.

Now you’ve come across a publication that duplicates several passages from the original source word for word. “No way! I wrote this, you thief!”

Yes, it’s you, and you’ve been plagiarized. Now, the whole article wasn’t copied and pasted, but you might wonder if we’re in a duplicate content situation? Good point.

Join the WPMarmite subscribers

Get the last WPMarmite posts (and also exclusive resources).

WPMarmite English newsletter

In this matter, there is no precise rule. That is to say that no search engine defines a limit not to be crossed, like: “if you copy 40% of a content, you are a bad duplicator!

To help you out, let’s say that if entire sentences are copied – remember, Google talks about “large blocks of content” – you can consider that a content is duplicated.

You’re left with your eyes to cry, but know that there are also possible remedies, to dry your tears. I’ll come back to that later in this post.

After the eyes, there is a second weapon at your disposal: a tool to detect duplicate content.

With a dedicated tool: the third-party method

There are several solutions on the market to detect internal and external duplications. Presentation.

Kill Duplicate

Kill Duplicate plugin is helpful against duplicate content.

Kill Duplicate is an essential premium tool that helps to identify external duplications, especially by scanning your contents.

Complete, it also helps you to deal with plagiarism by proposing solutions directly on your dashboard (e.g. contact the host, the site or file a complaint).

Price: from €19/month (excl. VAT) i.e. ± $21.

Copyscape

Copyscape helps fight against external duplicate content.

Copyscape is a freemium solution that helps you find copies of your page on the web. To use it, just enter the URL of your choice in the search bar.

Then cross your fingers that nobody has copied you. 😉

You can then check which publications Copyscape has identified, to see if the content seems to be duplicated or not.

Copyscape is also available in a premium version with much more advanced features (from 3 cents per search).

DupliChecker

DupliChecker is a plagiarism checker software.

DupliChecker presents itself as “anti-plagiarism software”. Limited to 1,000 words per search in its free version, it allows you to check the originality of a text by entering its URL, a piece of text, or by downloading a file.

You can therefore use it before and after the publication of a content. If we can regret the presence of many ads, DupliChecker remains interesting because it displays several results by presenting you each time a similarity rate:

DupliChecker provides a similarity rate for duplicate content.

A Pro version is also available from $10, for a use up to 30,000 words.

Siteliner

Siteliner helps to identify internal duplicate content.

Siteliner will be perfect to “explore your site”, as it is, i.e. to identify internal duplications.

It presents its results in the form of graphs. The free version allows you to scan a site once every 30 days, up to 250 pages.

With the Pro offer, you can process up to 25,000 pages, and choose the ones you want to exclude from the identification process.

Screaming Frog

Screaming Frog is an SEO tool that tracks duplicate content.

Screaming Frog is not a tool specifically dedicated to identifying duplicate content. But it remains relevant to find internal duplications.

It is a crawler, a tool for analyzing your on-page SEO: it extracts and scans your site’s URLs for problems (e.g. broken links, title and meta description tag analysis, server errors, etc.).

It will therefore be able to inform you about certain duplicate elements such as h1 titles and title and meta description tags of your pages.

You can analyze up to 500 URLs with the free version. The Pro version costs £149/year (i.e. ± $197).

Google Search Console

Google Search Console is a useful Swiss army knife for the webmaster looking for duplicate content.

We end this list of tools with an essential Swiss Army knife: the Google Search Console.

This free tool allows you to better manage your site and track your SEO. It provides a lot of information: errors on your site, search analysis, links, indexing status, crawling errors, etc.

Unlike its little friends mentioned above, Google Search Console will not be able to tell you which URLs have been duplicated internally.

However, it can help you find out. To do this, simply go to the Index > Coverage menu. You can:

  • Check the number of indexed URLs. If you know that you have created 206 pages on your site, and that Google has indexed 674 of them, you know that there is surely some duplicate content lying around…
  • Check the excluded URLs, to know if they can fit in the duplicate content box.
Duplicate content on Google Search Console.

Also note that many SEO tools like Semrush or Ahrefs, to name a few, also have features to help you identify duplicate content on your site.

With a specific command from Google: the manual method

After this round of tools, there is one last lever you can activate to find duplicate content: Google.

To do this, the famous search engine offers operators, i.e. commands that you can specify in its search bar to filter its results more precisely.

Some of them can be efficient to hunt duplicate content, like the site search operator (site:). To search for external duplication, exclude your domain name from the search results by typing the following query:

-site:yourdomainname.com "title of your publication". This would give, in the example of the following WPMarmite article: -site:wpmarmite.com/en/ "test of 6 must-have SEO plugins on WordPress"

Google's search operators help you search for duplicate content.
Well that’s okay, after checking, these sites only use the link.

Well, that’s a big chunk you just swallowed. Now you know what duplicate content is and how to identify it.

Now you have to get rid of it. The rest of this post will focus on detailed instructions on how to stop:

  • Internal duplications
  • External duplications

I suggest you start with possible problems you may have on your WordPress website.

What causes internal duplicate content on WordPress (and how to solve it)?

URLs

A URL is the address of a web page. For example, the WPMarmite homepage can be found at the following URL: https://wpmarmite.com/en/.

As you can imagine, the more content your WordPress website has, the more URLs you will have. In the case of a large ecommerce website, for example, you can very quickly reach thousands of URLs if you sell a lot of products.

So far, so good. However, our famous URLs will start to bother you in some cases:

  • When they contain indications to track visits to a specific page. New parameters are then automatically added at the end of your URLs. For example, the initial URL will be https://yourpost.com, and the duplicate URL https://yourpost.com?utm_source=facebook. You may not see the difference, but a search engine will. 😉
  • When they contain parameters to filter the navigation. This is often the case on WooCommerce stores that use faceted search. This is very convenient for the user, who can sort products by size, color, price, etc. The concern is that this creates many duplicate pages, with almost word-for-word identical content, see:
    • https://yourstore.com/pants-black-size-m
    • https://yourstore.com/pants-black-size-l
  • When they make an undifferentiated use of slashes. For example: https://yourstore.com/pants-black-size-m and https://yourstore.com/pants-black-size-l are considered as two different URLs by Google, which is therefore duplicate content.

How to solve duplicate URLs problems?

The easiest way to solve a duplicate URL problem is to do what is called a 301 redirect.

A redirect allows you to automatically redirect a visitor wishing to access a URL A (e.g. https://mygreatwebsite.com), to a URL B (e.g. https://myawesomewebsite.com).

You can do this easily with the Redirection plugin.

Rather than blocking crawlers from accessing duplicate content on your website, using a robots.txt file, for example, Google also states that you can use what is called a canonical URL.

By using a specific attribute in your URL, you tell search engines which is the original version of a duplicate page.

This way, you ensure that it is this original version that will be taken into account for display in the results pages (rather than a duplicate version).

For your information, a canonical URL uses a little extra piece of HTML code, called rel="canonical". It looks like this, in practice:

<link rel="canonical" href="https://wpmarmite.com/en/astra-theme/" />

If you’re using the Yoast SEO plugin, you can fill in a canonical URL via the plugin’s editing interface:

Yoast SEO settings for setting up a canonical URL.

Note that by default, Yoast SEO adds the URL of the publication as a canonical URL. You will not have to do anything in most cases.

To learn how to set up Yoast SEO like a pro, go to our dedicated guide on the subject!

The pagination of comments

After URLs, let’s talk about a second cause of duplicate content on WordPress: comment pagination.

WordPress allows you to divide the comments left by your readers on a post into several pages.

On paper, this seems convenient for sites/blogs with lots of comments.

The reader can view the most recent comments first, and then choose to read older comments by going to another page.

This is where the problem lies. New URLs will automatically be created for each page, each time with the content of your post.

How to solve the comment pagination problem?

The main thing you can do is simply not to enable this option.

By default, it will not be checked when you install WordPress. However, I invite you to check it by going to the following menu: Settings > Discussion.

Make sure that the box “Break comments into pages, with 50 top level comments per page and the last page displayed by default” is unchecked.

Comment pagination settings on WordPress.

Tags

Since you’re on the WordPress admin interface, stay there, nice and warm.

Now let’s talk about tags, which are used to classify your posts (a bit like your categories, except that tags are optional).

Here again, the basic intention is good, if we look at it from the user’s point of view. A tag will allow him to check all your posts related to a specific subject (e.g. cinema).

Tags settings on WordPress.
You can create a tag through the menu Posts > Tags.

For your SEO, this is much more annoying, since WordPress generates new archive pages for each tag, which means that your post will end up on additional pages.

In other words, if you create 10 tags for the same post, you will end up with 10 duplicate posts…

How to prevent duplicate categories?

The best solution is not to use tags. If you really want to do this, think carefully about the consequences that this can have.

Domain name variations

Finally, it’s also possible that your domain name can be accessed under several variants (HTTPS, HTTP, www and without www):

  • https://example.com
  • https://www.example.com
  • http://example.com
  • http://www.example.com

Consequence? Your site will be accessible in several ways, or put differently, it will be duplicated 4 times.

This can happen for example if you have just switched to HTTPS without having redirected the HTTP version.

To find out if this is the case for you, manually enter each variant of your domain name in your favorite browser.

If there is no redirection to the accessible version of your site (i.e. the one in HTTPS), you will have to get to work.

Jim Carrey is frantically typing on his keyboard.

How to define a single variant for your domain name?

Until the switch to the new Google Search Console in 2019, it was possible to select a favorite domain on the free Google tool.

Now, the easiest way is to perform a 301 redirect. For this, you can for example use your cPanel interface, if your hosting company uses it. See our detailed instructions on the subject.

Speaking of domain names, we can only recommend that you read our complete guide to choosing a domain name.

Well, for the internal duplication, we will say that we are not too bad. Now, let’s talk about the measures to apply if you ever have to deal with external duplicate content.

You’ll see, we’ll take out the heavy artillery!

3 steps to get rid of duplicate external content

Red alert. You’re sure of it, your content has been duplicated. Once you’ve got past the stage of calling the offending site names, it’s time to take action.

In this case, what do you do? Do you yell your rage? Call the police or the fire department? Do you contact the FBI, or even the CIA, if the plagiarist is American?

Instead, just take a deep breath and follow the steps below, which should solve your problem.

Step 1: Contact the site owner

Before using the hard way, take it easy. First, try to find a peaceful way out of this annoying duplicate content problem.

First of all, contact the owner of the content you are bothered by to find out what is going on.

You can find information about who they are and how to contact them in several places like on:

  • The Contact page of their website.
  • The Terms of Service page.
  • The “Author” insert at the end of their publications.
  • Their social networks.
  • Your favorite search engine. For example, type in the first and last name of the person to see what comes up.
  • The database of domain names, the WHOIS.

Whois gives you information about the owner and the host, as well as technical details. You can also search for Whois domains with Gandi and Whois.net.

Whois.net is a database of domain names.
The homepage of Whois.net

After investigation, you have found an email? It’s time to write your best message, both polite and firm, detailing the situation.

Explain to the person that you have found a duplication of your content, why not adding screenshots and other tangible evidence.

Continue by indicating that this is copyright infringement (no one has the right to reproduce or distribute content without permission). Finish by asking the person to remove the plagiarized content.

Did you fail, despite your best efforts? Go to step 2.

Step 2: Contact the plagiarist’s host

So the person you identified as the infringer just won’t budge? Contacting his web host might make him bend. 😉

To do so, you have several options:

  • The contact details of the host should normally be on a Legal Notice page, on the website of the person who duplicated your content.
  • If not, you can find them thanks to the Whois.

When you have found the information you were looking for, send the same type of email you wrote in step 1, just adapting it to your recipient.

Web hosts are usually quite sensitive to duplicate content, and should help you. This problem has happened several times to WPMarmite, and it has helped Alex to have copy and pasted posts removed. 😉

If you still haven’t got your way, it’s time to do it the hard way: find out in step 3.

Step 3: Report the page to Google

Take out the last card in your deck, to be used as a last resort: reporting to Google.

To ask Google to remove from its search results “the page that infringes your copyright”, the famous search engine indicates that you have to send it a DMCA (Digital Millennium Copyright Act) request.

For your information, this is an American law whose objective is to fight against copyright infringement.

In detail, here is how to proceed, in order:

  • Go to this page, and choose the concerned Google service (normally, it will be “Google Search”).
  • Check the box “Intellectual property issue”.
  • Select “Copyright infringement”.
  • Check “Yes, I am the copyright owner or am authorized to act on the copyright owner’s behalf”.
  • Select “other” when asked about the type of content that is being infringed.
  • Click on the blue button “Create request”.
  • Complete the form, date, sign and submit it.
Google's DMCA form for copyright infringement.
A preview of the DMCA form that Google suggests you fill out.

As you have seen throughout these lines, sooner or later you will have to deal with duplicate content, whether it is internal or external.

If Google does not directly penalize this practice, duplicate content can have harmful consequences for your SEO strategy, with a drop in traffic and in your positions in the search engine results pages.

To tackle the problem head on, this post has detailed how to get rid of this plague in a concrete way, using tools and best practices.

How do you deal with duplicate content? Share your tips and feedback with us by posting a comment.