Rendered Page Resources Are Included in Google’s Crawl Rate
The resources that Google fetches when they render a page are included in Google’s crawling budget and reported in the Crawl Stats data in Search Console.
Redirects Can Impact Crawl Budget Due to Added Time for URLs to be Fetched
If there are a lot of redirects on a site, this can impact crawl budget as Google will detect that URLs are taking longer to fetch and will limit the number of simultaneous requests to the website to avoid causing any issues to the server.
Excluded Pages in GSC Are Included in Overall Crawl Budget
The pages that have been excluded in the GSC Index Coverage report count towards overall crawl budget. However, your important pages that are valid for indexing will be prioritised if your site has crawl budget limitations.
Crawl Budget Not Affected by Response Time of Third Party Tags
For Google, crawl budget is determined by how many pages and resources they fetch from a website per day. If a page has a large response time they may crawl the site less to avoid overloading the server, but this will not be affected by any third party tags on the page.
Putting Resources on a Separate Subdomain May Not Optimize Crawl Budget
Google can still recognise if subdomains are part of the same server and will therefore distribute crawl budget for the server as a whole as it is still having to process all of the requests. However, putting static resources on a CDN will balance crawling across the two sources independently.
Check Server Logs If More Pages Crawled Than Expected
Crawl Budget Updates Based on Changes Made to Site
A site’s crawl budget changes a lot over time, as Google’s algorithms react quickly to changes made to a website. For example, if a new CMS is launched incorrectly with no caching and it’s slow, then Googlebot will likely slow down crawling over the next couple of days so that the server isn’t overloaded.
Use Log Files to Identify Crawl Budget Wastage & Issues With URL Structure
When auditing eCommerce sites, John recommends first looking at what URLs are crawled by Googlebot. Then identify crawl budget wastage and perhaps change the site’s URL structure to stop Googlebot crawling unwanted URLs with parameters, filters etc.