Notes from the Google Webmaster Hangout on the 5th of May, 2017.
Report Incorrect Structured Data Usage to Google
The Web spam report tool should be used to report when structured data is being used improperly. John reiterated that structured data has no direct impact on ranking positions, only the way Google displays information in search.
Google Filters Identical Duplicates During indexing, and Near Duplicates From Search Results Pages
When Google recognises identical pages, it will choose one version to index, and when pages are similar, only one may show up in search results. Google looks at factors such as rel canonicals, redirects and internal and external linking when identical pages are crawled to decide which one to index.
Near Identical Pages with HREFLANG may be Rolled Together
If you have identical pages which only differ a very small amount, such as a currency, Google may roll the pages together, but use HREFLANG to decide which one to show in search results.
Google Periodically Recrawls Pages with Crawl Errors
Google will sometimes retry pages which have previously thrown up crawl errors, even over a number of years, to make sure they are not missing anything new. If you see old URLs showing up as crawl errors, it’s not something you need to resolve.
Self-referencing HREFLANG tags are recommended to make it easier for webmasters to duplicate blocks of HREFLANG links on inter-related pages and to validate they are set up correctly, but they are not essential for Google.
Soft 404 Pages May Be Indexed then Later Dropped Out of the Index
Google initially indexes pages which then might be classified as Soft 404 pages, and then drops them from the index when they have processed the content.
Sitemap Errors Don’t Impact Rankings but can Slow Down Indexing
Sitemaps help Google improve the crawling and indexing of sites. If a sitemap can’t be properly processed, Google may take longer to index pages as have to rely on normal crawling and indexing to find those pages.
Google has algorithms to catch common mistakes made by webmasters with canonical tag implementation, but they may not work, so John recommends to remove incorrect canonical tags until they can be implemented correctly.
PubSubHubbub is the fastest way to get content into Google
RSS feeds with PubSubHubbub are the quickest way to get content updated in Google.
Indexable Product Variations Should Reflect Search Behaviour
Variations of pages which people are searching for should be made indexable, otherwise the variations should be folded together.
Google Doesn’t Always Crawl Lazy-loaded Images
Lazy-loading images which appear after a user interacts with a page won’t always be crawled, as Googlebot doesn’t trigger all the events on a page. Use the Fetch and Render tool to see what Googlebot sees.
301 and 302 Redirects Only Determine Which URL is Indexed
A 301 indicates to Google that the destination URL should be indexed, a 302 indicates to Google that the original URL should be indexed but they always use the content on the destination page.
Google Crawl Budget is Limited to a Server
Google limits the crawl rate for sites on the same server so that it doesn’t overload the server when crawling these sites.
Domain Migrations Should Take a Few Days
Google have improved the process for domain migrations so the new urls should be picked up within a couple of days.