Crawling

Before a page can be indexed, and therefore appear within search results, it must be crawled by search engine crawlers, like Googlebot. There are many things to consider in order to get pages crawled and ensure they are adhering to the correct guidelines. These are covered within our Hangout Notes, as well as further research and recommendations.

Removing Low Quality Pages Takes Months to Impact Crawling and Site Quality

March 20, 2020 Source

Removing low-quality pages from your site may have a positive impact on crawling the rest of the site, but could take 3-9 months until you see changes in crawling which can be measured using log files. Improvements in the overall site quality may take even longer to have an impact. It’s unusual to have any negative impact from removing cruft content.


Average Fetch Time May be Affected by Groups of Slower Pages

March 20, 2020 Source

If Google is spending more time crawling a particular group of slow pages then it may make the average fetch time and crawled data look worse.


Rendered Page Resources Are Included in Google’s Crawl Rate

March 20, 2020 Source

The resources that Google fetches when they render a page are included in Google’s crawling budget and reported in the Crawl Stats data in Search Console.


Algorithm Changes May Result in Changes to Crawl Rate

February 21, 2020 Source

The number of pages which Google wants to crawl may change during algorithm changes, which may be due to some pages being considered less important to show in search results, or from crawling optimization improvements.


Specify Timezone Formats Consistently Across Site & Sitemaps

February 18, 2020 Source

Google is able to understand different timezone formats, for example, UTC vs GMT. However, it’s important to use one timezone format consistently across a site and its sitemaps to avoid confusing Google.


If a Robots.txt File Returns a Server Error for a Brief Period of Time Google Will Not Crawl Anything From the Site

January 31, 2020 Source

If a robots.txt file returns a server error for a brief period of time Google will not crawl anything from the website until they are able to access it and crawl normally again. During the period of time where they are blocked from reaching the file they would assume all URLs are blocked and would therefore flag this in GSC. You can use the robots.txt request in your server logs to identify where this has occurred by reviewing the response size and code that was returned during each request.


It is Normal for Google to Occassionally Crawl Old URLs

January 31, 2020 Source

Due to their rendering processes, Google will occasionally re-crawl old URLs in order to check their set up. You may see this within your log files, but it is normal and will not cause any problems.


Having a Reasonable Amount of HTML Comments Has No Effect on SEO

January 24, 2020 Source

Comments within the HTML of a page do not have any effect on SEO unless there is a large amount, as they can make it difficult for Google to figure out where the content is and may impact the size and speed of the page. However, John confirmed he has never come across a page where HTML comments have been a problem.


Upper Limit For Recrawling Pages is Six Months

January 22, 2020 Source

Google tends to recrawl pages at least once every six months as an upper limit.


Related Topics

Indexing Crawl Budget Crawl Errors Crawl Rate Disallow Sitemaps Last Modified Nofollow Noindex RSS Canonicalization Fetch and Render