Order of Content for canonicalization Doesn’t Matter
When Google is checking to see if pages are similar for the purpose of verifying canonicalization, the order of the content on the page doesn’t matter. Google can detect when the same content is in a different order on the a page. E.g. a set of identical search results in a different order.
Use Disallow to Improve Crawling Efficiency
John recommends against robots.txt, because it prevents Google consolidating authority signals, but then says there are occassions when crawling efficiency is more important.
Hreflang URLs Should Always Be Canonical URLs
Don’t include any URLs that redirect, are non-indexable, canonicalised, otherwise they might be ignored.
Hreflang Should Canonicalise to Preferred HTTP/HTTPS Variation
When you have multiple language sites with hreflang, and you have http and https versions of the sites, you don’t need to worry about the hreflang for the non-canonical version. So if you canonicalise from http to https, then you don’t need any hreflang on the http.
Canonicalised Pages Stay in Google’s Index
Canonicalised pages may remain showing as indexed for site: searches depending on the ‘site structure’. They are no considered as hard as a redirect, and the page can still surface for unique content. Canonical URLs are not crawled immediately, like a redirect would be. John suggests that if you have a large number of incorrect canonical tags, such as many pages canonicalising to a single page, they might ignore all canonical tags across the site. Google makes a clear recommendation that cleaning up broken canonical tags is a good idea.