We all know that we shouldn’t be copying content from other websites. But did you know that duplicate content can happen naturally and, while it shouldn’t cause a penalty, it can still limit your site’s potential success?
Thankfully, there are fixes available for most duplicate content issues.
“25% of all content is duplicate… not all of it is spam”
Matt Cutts, Former Head of Google Webspam
Why Is Duplicate Content a Problem?
Duplicate content means search engines have to waste time crawling all the different duplicate versions of a page, and you’re relying on them to do it in the way you want them to.
Having two or more versions of the same content also means authority signals (eg. backlinks) and social shares are split over numerous versions. Each version is then weaker, limiting their performance.
If you have duplicate content on your site, you are effectively leaving the health of your website in the hands of Google’s system.
How Do I Know If I've Got Duplicate Content on My Site?
Duplication might occur if you have:
- Two domains (eg. example.com and example.net)
- One or more subdomains (eg. a desktop site at www.example.com and a mobile version at m.example.com)
- Similar content for different regional/international audiences
- Duplicate or very similar content on different pages without a canonical tag (eg. printer-friendly versions, identical tag pages)
- Duplicate meta titles and descriptions on separate pages
- Different URLs linking to the same page (eg. different parameters; /blog and /blog/)
- Shared content across different domains
What does Google say?
Google defines duplicate content as identical or ‘appreciably similar’ blocks of content across the same domain or multiple domains. The causes of this duplication do not have to be malicious for the content to be filtered out of the search results.
TAKE BACK CONTROL: HOW TO FIX 7 COMMON DUPLICATION ISSUES
SET YOUR PREFERRED DOMAIN OPTIONS
Your user sees the same site on slightly different domains, but Google sees two different sites with duplicate content. This can be caused by:
- Domain variations. e.g. example.com and example.net.
- No set preferred www or non-www option.
- Secure (https) and non-secure (http) sections/versions of the same site.
- A mobile version of your site on a subdomain (eg. m.example.com)
- Staging websites, e.g. beta.example.com
- Use one of the following methods to tell Google which version of your site you want to be treated as the primary version:
- Use a 301 redirect to drive traffic from the secondary version to the primary one. This will redirect both users and search engines to the primary version.
- Alternatively, use an absolute URL in a canonical tag. This will tell search engines to rank the primary version in search results, but users will be unaffected.
- Use Webmaster Tools to tell google about your preferred www/non-www variant.
- Use a canonical tag to tell search engines which http/https, subdomain or domain variant they should show in search results.
SORT INTERNATIONAL SEO ISSUES WITH HREFLANG
If you have a website for each country with only minor differences, such as a currency variation, then search engines might treat each version as a separate, and duplicate, site.
Implement a hreflang tag, which allows you to tell Google which version of your site should be shown in which country, as well as potentially consolidating authority signals.
DISALLOW IDENTICAL CONTENT ON SEPARATE URLS
Content can easily be duplicated on multiple URLs within the same site, or even on separate sites if it’s been syndicated, shared across domains or plagiarized. Some common examples include:
- Providing a printer-friendly version on a separate URL
- Content (such as product/job/property descriptions) shared for multiple sites
- Syndicated content
- Plagiaristic content contributors or other sites ripping off your content
- Using too many tag pages about the same subject (eg. blog, blogging, blogger)
- URLs spelled with capital letters in some places
- Variations in the way a URL is presented (eg. /news?page=1&order=recent and /news?order=recent&page=1)
- The same path repeated twice (eg. /news/news/)
- Variations in the way a URL can be terminated (eg. /news.html and /news.aspx)
First, resolve the issues with 301 redirects, to direct users to the right version of your content and help Google index the preferred version.
Then implement the following safety measures:
- Implement a canonical tag on your pages to tell Google which is the preferred URL of the page.
- Change your Webmaster Tools parameter settings to exclude any parameters that don’t generate unique content.
- Disallow incorrect URLs in robots.txt to improve crawling efficiency.
You can use CopyScape to check that your content has not been plagiarized.
REMOVE NEAR-IDENTICAL VARIATIONS OF THE SAME PAGE FROM GOOGLE’S INDEX
If you have the same content on several pages with minor variations, such as separate product pages for different colours, then the content will appear as a duplicate.
Choose the best variation for Google to index (ideally the one that receives the most search traffic) and remove any other duplicate variations from Google’s index.
- Identify variations with low search traffic in analytics.
- Use robots.txt to prevent Google from crawling them.
- Use a canonical tag to direct Google to one primary variation.
You can link to the other variations from the primary page (using rel=”nofollow” in the link tag) to ensure your customers can still reach your content when they get to your site.
REPLACE DUPLICATE TITLES AND DESCRIPTIONS
If you have the same titles and descriptions on separate pages, then Google may ignore them or make up their own, which can look untidy and can affect click-through rates.
Use tools like Google Webmaster Tools and DeepCrawl to identify duplicate titles and descriptions, and edit them in your CMS.
Remember, Google will only show 512 pixels of your title in the search result, so try to include the unique elements of any title tag in the first few words.
Duplicate links could have diminishing value and, in some cases, be ignored completely.
However, a high volume of duplicated backlinks to your site could mean your site is mistaken for spam, resulting in a dreaded Google penalty.
Identify duplicate links using your favourite backlink tool (DeepCrawl has one built in) and contact the site to ask for the links to be removed.
Treat each unique anchor text and target URL as a separate backlink. The link could be duplicated on the same page or multiple pages.
FIND AND AVOID DUPLICATE CONTENT ISSUES WITH DEEPCRAWL
One ‘bad’ URL can cause big problems and affect all the other good SEO work you’re doing, so you need an accurate and reliable tool that will alert you to any issues before they start affecting traffic.
Here are just a few of the ways that DeepCrawl can help you identify and fix common duplicate content issues:
IDENTIFY DOMAIN DUPLICATION
DeepCrawl provides a report that includes all domain duplication variations and will identify any full site duplication.
Our other tool, Robotto, can detect whether a preferred www or non-www option has been configured correctly.
DETECT HREFLANG INCONSISTENCIES
DeepCrawl’s powerful hreflang report can detect hreflang values in Sitemaps, headers and HTML, showing any inconsistencies.
- Validation > Pages with hreflang Tags
- Validation > Pages without hreflang Tags
- Validation > Inconsistent hreflang Tags
- Page view > hreflang
FIND DUPLICATE PAGES ON SEPARATE URLS
The Indexation > Duplicate pages report highlights very similar pages across the site.
FIND SIMILAR CONTENT ON SEPARATE PAGES
The Content > Duplicate Body Content report looks at the body content only, to find pages which have different titles and HTML but very similar body text.
FIND DUPLICATE TITLES AND DESCRIPTIONS
Content > Duplicate Titles and Content > Duplicate Descriptions highlight the pages which have unique body content but where either the titles or descriptions are identical with another page.
Indexation > Unique Pages > [Page] > Links In shows you all internal links to a page so you can easily see the level of uniqueness across the links.
Run a backlinks crawl to see all the external backlinks to a page and identify a potentially large number of duplicate links which might be unnatural and could result in a penalty.
Want More Like This?
We hope that you’ve found this post useful in learning more about duplicate content issues.
You can read more about identifying duplicate content in our post about advanced methods for duplicate content detection.
Additionally, if you’re interested in keeping up with Google’s latest updates and best practice recommendations then why not loop yourself in to our emails?