Canonicalization

Canonicalization is a method used to help prevent duplicate content issues and manage the indexing of URLs in search engines. Using canonicals appropriately can be hugely helpful for SEO.

Implementing the canonical tag link attribute “rel=canonical” is a signal to search engines about the preferred page for indexing, and will be followed in most cases when it is correctly implemented to an equivalent page.

The collected SEO Office Hours notes below provide detailed information and best practices (straight from Google’s own search experts) for using canonicals on your website.

Order of Content for canonicalization Doesn’t Matter

October 10, 2014 Source

When Google is checking to see if pages are similar for the purpose of verifying canonicalization, the order of the content on the page doesn’t matter. Google can detect when the same content is in a different order on the a page. E.g. a set of identical search results in a different order.


Use Disallow to Improve Crawling Efficiency

October 10, 2014 Source

John recommends against robots.txt, because it prevents Google consolidating authority signals, but then says there are occassions when crawling efficiency is more important.


Hreflang URLs Should Always Be Canonical URLs

October 10, 2014 Source

Don’t include any URLs that redirect, are non-indexable, canonicalised, otherwise they might be ignored.


Hreflang Should Canonicalise to Preferred HTTP/HTTPS Variation

September 22, 2014 Source

When you have multiple language sites with hreflang, and you have http and https versions of the sites, you don’t need to worry about the hreflang for the non-canonical version. So if you canonicalise from http to https, then you don’t need any hreflang on the http.


Canonicalised Pages Stay in Google’s Index

August 29, 2014 Source

Canonicalised pages may remain showing as indexed for site: searches depending on the ‘site structure’. They are no considered as hard as a redirect, and the page can still surface for unique content. Canonical URLs are not crawled immediately, like a redirect would be. John suggests that if you have a large number of incorrect canonical tags, such as many pages canonicalising to a single page, they might ignore all canonical tags across the site. Google makes a clear recommendation that cleaning up broken canonical tags is a good idea.


Related Topics

Crawling Indexing Crawl Budget Crawl Errors Crawl Rate Disallow Directives in Robots.txt Sitemaps Last Modified Nofollow Noindex RSS Fetch and Render