Algorithm Changes May Result in Changes to Crawl Rate
The number of pages which Google wants to crawl may change during algorithm changes, which may be due to some pages being considered less important to show in search results, or from crawling optimization improvements.
Specify Timezone Formats Consistently Across Site & Sitemaps
Google is able to understand different timezone formats, for example, UTC vs GMT. However, it’s important to use one timezone format consistently across a site and its sitemaps to avoid confusing Google.
If a Robots.txt File Returns a Server Error for a Brief Period of Time Google Will Not Crawl Anything From the Site
If a robots.txt file returns a server error for a brief period of time Google will not crawl anything from the website until they are able to access it and crawl normally again. During the period of time where they are blocked from reaching the file they would assume all URLs are blocked and would therefore flag this in GSC. You can use the robots.txt request in your server logs to identify where this has occurred by reviewing the response size and code that was returned during each request.
It is Normal for Google to Occassionally Crawl Old URLs
Due to their rendering processes, Google will occasionally re-crawl old URLs in order to check their set up. You may see this within your log files, but it is normal and will not cause any problems.
Having a Reasonable Amount of HTML Comments Has No Effect on SEO
Comments within the HTML of a page do not have any effect on SEO unless there is a large amount, as they can make it difficult for Google to figure out where the content is and may impact the size and speed of the page. However, John confirmed he has never come across a page where HTML comments have been a problem.
Upper Limit For Recrawling Pages is Six Months
Google tends to recrawl pages at least once every six months as an upper limit.
Google is Able to Display Structured Data Results as Soon as the Page Has Been Re-crawled
After configuring pages to send structured data to Google, it will be able to display the structured data results the next time it crawls and indexes that page.
Google Still Respects the Meta Robots Unavailable After Directive
Google still respects the meta robots unavailable_after directive, this is used to specify a date when a page will no longer be available. John explained that it is likely their systems will recrawl the page around the date specified in order to make sure they are not just removing pages from the index that are still available.
Google Doesn’t Crawl Any URLs From Hostname When Robots.txt Temporarily 503s
If Google encounters a 503 when crawling a robots.txt file, it will temporarily not crawl any URLs on that hostname.