Many content marketers tremble at the thought of duplicate content—and for good reason. Duplicate content never lends to an ideal experience on your site, nor your ability to rank.
But the fact of the matter is that a lot of people have duplicate content on their sites, and it usually isn’t intentional or malicious in any way, shape, or form.
This isn’t a reason to panic, but it is a reason to take corrective action because duplicate content can impact your SEO (just not in the way that you might think). Continue reading below to learn more about duplicate content, how Google regards it, and how to correct any duplicate content issues on your site.
What is duplicate content?
Put simply, duplicate content is substantive blocks of content that can be found in more than one place on the internet–whether it’s found on multiple pages of your own website or it’s found on two or more different domains.
Duplicate content additionally refers to content that either completely matches the other content or is very similar to the other content. So, even if the content isn’t exact wording, it can still be deemed duplicate if it is similar enough.
Does Google penalize your website for duplicate content?
Many people are under the impression that Google penalizes a website for having duplicate content. On the contrary, Google themselves have confirmed that they do not penalize a website for duplicate content except on rare occasions when you’re trying to game the algorithm.
“In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.” – Google Search Central
However, duplicate content still impacts your search engine rankings and can be a detriment to your SEO efforts.
(P.s. If you’re wondering if that block quote will trigger “duplicate content” warnings—fear not. This element uses a blockquote HTML tag that Google can read and understand as a quote versus text that we’d typically include in an article.)
So, why does duplicate content matter?
There are several reasons why duplicate content matters to search engines. But before we dive into the specifics, it’s important to understand the purpose of a search engine.
The main goal of any search engine is to provide its users with accurate websites and information for their given search query. Search engines are able to do this by looking at pages with unique URLs and crawling the content of that page. So, if two URLs share the same text and information, how will Google choose which version of that content to show?
If you’re scratching your head trying to figure out the answer, then you know exactly how search engines feel when encountering duplicate content. Duplicate content simply confuses them.
To get more specific, here are the three main issues that duplicate content presents for search engines:
- When duplicate content is present on your website, search engines won’t know which version of the content to include or exclude from their indices.
- Search engines won’t know whether to direct link “juice” (e.g., trust, link equity, etc.) to one page or keep it separated between multiple versions. If link juice is spread out between multiple pages on your site, then you’re hurting your chances of ranking higher for your target keyword.
- As described above, they won’t know which version of the content to display or rank for a search query.
Your pages with similar content are essentially competing against each other for Google’s attention. Typically, the pages that have the most traffic and SEO equity (aka, authority) will be prioritized in search engines, and the other page(s) with duplicate content will not appear in SEPRs at all because the search engine will not want to show two pages with the same content for the same query.
A note about “spinning” content
Beware: Google does, in fact, penalize your website if you spin content in an auto-generated way because Google sees auto-generated “spun” content as deceptive and not intended to help the user, but intended solely for SEO gains. So, when rewriting your content, be careful not to “spin” the content and instead, make sure that the new content is unique.
Common causes of duplicate content (that aren’t intentional)
Oftentimes, you might be surprised to find duplicate pages on your site. This can happen for one of several reasons:
- eCommerce product pages: Many eCommerce sites may have descriptions that are shared between product variants (which all have their own URLs), or use a manufacturer’s description for products that they (plus many other retailers) sell.
- CMS categorization: While search engines regard URLs as a page’s unique identifier, content management systems (CMS) may assign an ID that’s unique to its database. For this reason, an article may be stored under the URL “www.site.com/title” and “www.site.com/category/title” that are both getting picked up by search engines (despite developers saying that it all refers back to one item in your database).
- URL parameters or session IDs: If you’re appending tracking tags to the end of your URLs, then you’re essentially creating unique URLs that refer back to the same content. Tracking tags are commonly used by content strategists who want to analyze traffic, and by ecommerce sites that use session IDs to track visitors and let them store items in a shopping cart (as an example). You don’t necessarily want to abolish these practices but do want to control their usage.
- Printer-friendly versions: If your CMS creates printer-friendly versions of your pages, then search engines can discover and regard these two URLs as duplicate content.
- Pagination: If you decide to paginate category pages or comments within a page, then it’s possible to wind up with duplicate elements across multiple pages with unique URLs.
- Http vs. Https: If both http and https URLs are active—or, if versions of your URL with and without the prefix “www” are active—then you’re looking at two separate pages with shared content.
How to resolve duplicate content issues
So, how do you find and resolve duplicate content?
There are a few quick-and-easy ways to detect duplicate issues.
- Search operators: Type “site:yoursite intitle:keyword” into Google to find pages on your site that have similar or shared titles. You can also search “intitle:yourtitle” to find external webpages that match your title. A third option is to search “site:your_exact_url” to see if any identical pages come up.
- Duplicate Content Checker: This free online tool enables you to enter a URL and spot pages that share the same URL or shows signs of plagiarism.
- Google Search Console or Google Analytics: Using either of these free tools, you can check if two versions of the same URL appear in your reports. At this point, you’ll want to look out for URLs that are similar except for their tracking tags, http vs. https, etc.
Rewrite your pages
When it comes to deduping your content, the best way is to simply rewrite your content to be unique. This allows you to create original, high-quality information.
However, having unique content on every page of your website isn’t always realistic, if, for example, you have an eCommerce site with many variation listings or you use pagination to aid the UX on your website. In these instances, the best solution is to implement a rel=”canonical” tag so Google and other search engines understand which page is the preferred version. The preferred version will be crawled and indexed.
You’re basically saying to Google, “Okay, I admit. We have several pages with the same content, but this is the page we want you to pay attention to.”
In another instance, you might want to replace an old page or want all of your http links to direct to https. You would then use a 301 redirect to pass all traffic and SEO value to your current page. You essentially consolidate each version of your page into one—however, there is a limit to how many times you should redirect a set of pages. In some cases, you may want to delete a page altogether.
Need a cleanup crew to help you out?
It’s best practice to check for duplicate content time and time again, but we know that this process can be tedious, especially if you have a large site. If you need an extra hand, RankScience’s team can help you to audit your site and handle duplicate content, as well as other potential issues that may be hurting your SEO growth. Contact us for a free consultation.
Editor’s note: This blog was originally published in January 2020 but has since been edited to include more up-to-date information and advice