Don’t Use URLs That Change on the Fly
If URLs change on the fly to include session IDs, for example, this will cause Google to spend more resources on crawling duplicate content. This will also cause confusion around choosing the right canonical page.
Geotargeting Can’t be Applied to Language Versions Which Are Split Out Using Parameters
Google is unable to automatically detect and apply geotargeting if different language or country versions of a site are separated out using parameters.
Google Only Crawls Hashed URLs When Unique Content Detected
Google doesn’t usually crawl hash URLs unless it has detected over time that there is something unique it needs to pick up from the hashed version.
Use Log Files to Identify Crawl Budget Wastage & Issues With URL Structure
When auditing eCommerce sites, John recommends first looking at what URLs are crawled by Googlebot. Then identify crawl budget wastage and perhaps change the site’s URL structure to stop Googlebot crawling unwanted URLs with parameters, filters etc.
Google Can Ignore UTM Tracking Parameters in URLs
For URLs with UTM tracking parameters, Google will ignore the parameter and focus on the primary URL instead. Google will try to crawl UTM parameter URLs if they are linked to externally though, so use the parameter handling tool in GSC or the canonical tag in this instance.
Parameter Handling Signals Are Stronger Than the Canonical Signal
Google won’t blindly follow URL parameter handling set in GSC, but John says that this is a stronger signal than using canonicals.
Content Loaded On Hash URLs Won’t Be Indexed
Google doesn’t index URLs with a hash separately, so if content is only loaded when the hash is used on a URL rather than being loaded on the main URL, this content won’t be indexed.
GSC URL Parameters Are Signals, Not Rules, For Crawling
Rules set in URL Parameters in Search Console are used as a signal by Google for what they shouldn’t crawl, but they aren’t obeyed in the same way as robots directives.
Google Can Proactively Assume Duplicate Pages Before Crawling Them
Google will sometimes assumes that pages are duplicates before crawling them. This can happen when you have multiple parameters for your URLs that don’t actually change the content being served.