In the Google Webmaster Hangout on 6th November, John Mueller discussed HTTP2, how Google’s full rendering of webpages affects your SEO requirements and mentioned the If Modified Since directive. Here are our notes with times.
Googlebot doesn't support HTTP2-only crawling
2:50: If your site only works on HTTP2, Google won’t be able to crawl it properly at the moment, so it still needs to work for HTTP. John mentioned that they are working on this and he suspects it will be ready around the end of the year.
Full page load now affects SEO, not just HTML
6:25: Because Google is now rendering pages in full, fetching CSS/JS files and images, you need to be concerned about the full page load, not just the HTML. If CSS/JS and image files take a long time to load, or if your page requires a lot of different files to be fetched in order to render the page as a browser would, then that will also ‘slow down how fast we’re able to index your content’.
‘If Modified Since’ and crawl efficiency
9:09: If Modified Since is essentially a way to optimize your crawling: Google, or browsers in general, will request only the content that has changed since a specific date. Google will reuse the cached version of the content instead of having to transfer the same content again.
Search Console Index Count can include duplicates
10:33: The Index Count in Search Console may include duplicate content (including multiple URLs that lead to the same page) which may be filtered out of ‘real-world’ search results. John recommends creating a Sitemap file containing all of your unique URLs, then submitting this in Search Console: this will give you more accurate information on the number of those URLs that are actually indexed.
Noindexed canonical URLs
23:10: Canonical tags pointing to a page with a noindex is a problem as Google is conflicted: this suggests that the canonical might just be ignored.
Canonicalize product variants, don’t noindex
24:39: If you have product variant pages, it is best to canonicalize the variations to a single version, which can consolidate all the ranking signals, instead of noindexing. However, you will prevent any unique content from those variations appearing in search results.
Establish hierarchy within site architecture
29:00: If you link every page to every other page, it’s difficult for Google to establish the context of any particular page. Google will also have trouble finding content that can only be reached through internal search on large sites.
Form field values and SEO
37:10: Google does look at text within form field values, such as drop downs, but not HTML comments.
Order of site: command results
38:03: Results for a site: command are ordered in a way ‘that we think makes sense’ for someone who’s looking for this site, but you shouldn’t be worried if you don’t see your homepage at the top.
Bug with the link: search operator
39:10: The link: search operator in Google doesn’t work for all sites due to a bug, but hasn’t been retired.
Demoted Sitelinks might still appear
43:14: Demote a Sitelink just reduces its weight: it should appear lower down, but might still appear. If you don’t want it to appear at all, then noindexing it might be a better approach, although this means that the page won’t be shown in search at all.
Bear in mind that a Sitelink demotion may take several weeks to process.
If you have a really bad Sitelink that won’t disappear, but you want it to remain indexed, then you can contact Google.
Words with/without accented characters treated as synonyms
49:10: Google is usually able to detect words with accented characters as synonyms and return similar results, because people search for both.
Content should be visible on the page when it’s marked-up
The implication here is that you might be penalized if you’re trying to trick Google with markup that’s not visible on the page.
It's ok to redirect Googlebot, not users, from URLs with tracking parameters
53:17: John says that although it’s technically cloaking, he doesn’t actually find it ‘that problematic’ to redirect Googlebot from URLs with tracking parameters to canonical URLs, but allow users not to be redirected, so they can be tracked in analytics.