Use rel=”canonical” or robots.txt instead of nofollow tags for internal linking
A question was asked about whether it was appropriate to use the nofollow attribute on internal links to avoid unnecessary crawl requests for URLs that you don’t wish to be crawled or indexed.
John replied that it’s an option, but it doesn’t make much sense to do this for internal links. In most cases, it’s recommended to use the rel=canonical tag to point at the URLs you want to be indexed instead, or use the disallow directive in robots.txt for URLs you really don’t want to be crawled.
He suggested figuring out if there is a page you would prefer to have indexed and, in that case, use the canonical — or if it’s causing crawling problems, you could consider the robots.txt. He clarified that with the canonical, Google would first need to crawl the page, but over time would focus on the canonical URL instead and begin to use that primarily for crawling and indexing.
Updating backlinks to a migrated domain helps with canonicalization
An attendee was talking about a website migration from domain A to domain B. They were setting up redirects, but asked whether the page authority and rankings would be negatively affected if there were many existing backlinks that point to domain A.
John replied that setting up redirects and using the Change of Address tool in Search Console will help Google understand the changes that have occurred during a site migration. However, he said that on a per-page basis they also try to look at canonicalization. When dealing with canonicalization on migrated domains, John said that redirects, internal links, and canonical tags play a role —- but external links also play a role. What could happen, if Google sees a lot of external links going to the old URL, is that they might index the old URL instead of the new one. This could be because they think the change might be temporary due to these linking signals. During site migrations, they recommend finding the larger websites linking to your previous domain and requesting that those backlinks are updated to make sure that they can align everything with the new domain.
Unless locations have unique content offerings, separate pages are not recommended
When asked about whether to canonicalize so-called ‘doorway pages’, John was keen to stress that there’s no one solution that fits every situation. The example given was a site that has separate pages for ‘piano lessons birmingham’ and ‘piano lessons london’. If there’s something unique about the offerings in each city, it’s generally fine to have separate URLs. If the information on both is the same, it’s recommended to consider folding these into one ‘stronger’ page, rather than diluting signals across multiple near-identical ones. You could also consider a mix of the two approaches if there’s a stand-out, unique element in one of those locations.
Best practices for canonicals on paginated pages can depend on your wider internal linking structure
John tackled one of the most common questions asked of SEOs; how should we be handling canonical attributes on paginated pages? Ultimately, it depends on the site architecture. If internal linking is strong enough across the wider site, it’s feasible to canonicalize all paginated URLs to page 1 without content dropping from the index. However, if you rely on Google crawling pages 2, 3… and so on to find all of the content you want to be crawled, make sure that paginated URLs self-canonicalize.
Make sure important content is not found only on canonicalized pages
John answered a question about whether duplicate content that appears in some form on both the canonicalized page and the canonical page needs to match. He replied that they don’t need to have the exact same content. With a canonical tag, Google will try to index the canonical page that was specified. If there is any unique content on the non-canonical pages then it won’t be indexed. So make sure that any content that is critical from canonicalized pages is also on the canonical page.
The URL parameter tool does not prevent pages from being crawled
John explained that any URLs set to be ignored within the URL Parameter tool may still be crawled, albeit at a much slower rate. Parameter rules set in the tool can also help Google to make decisions on which canonical tags should be followed.
Anything Contained on Non-canonical Pages Will Not Be Used for Indexing Purposes
When Google picks a canonical for a page, they will understand there is a set of pages, but only focus on the content and links of the canonical page. Anything that is only contained on the non-canonical versions will not be used for indexing purposes. If you have content on those pages that you would like to be indexed, John recommends ensuring they are different.
Review Canonical Signals if Google Are Continually Picking a Different Canonical to the Ones Set
Google may occasionally pick a canonical that is different the one that has been set for certain pages, but this doesn’t change anything from a ranking point of view. However, if you’re seeing this on a large scale, John recommends reviewing if you are sending confusing signals to Google.
No Need to Remove Internal Links on Non Canonical Pages as Google is Able to Figure Out Connections
Google sees links from a canonical page to a canonicalised page, and sometimes there can be multiple internal links that are associated with each. In this case, Google will combine all of the signals and keep them with each page, but is able to understand the connection between the canonical and canonicalised pages.