Crawl Budget

A crawl budget is allocated to every site and determines how many pages and resources can be crawled by search engines. Our Hangout Notes cover recommendations for the optimization of this crawl budget, as well as providing insights from Google around how crawl budget is controlled.

Crawl Budget Not Affected by Response Time of Third Party Tags

May 10, 2019 Source

For Google, crawl budget is determined by how many pages and resources they fetch from a website per day. If a page has a large response time they may crawl the site less to avoid overloading the server, but this will not be affected by any third party tags on the page.

Putting Resources on a Separate Subdomain May Not Optimize Crawl Budget

May 3, 2019 Source

Google can still recognise if subdomains are part of the same server and will therefore distribute crawl budget for the server as a whole as it is still having to process all of the requests. However, putting static resources on a CDN will balance crawling across the two sources independently.

Check Server Logs If More Pages Crawled Than Expected

May 1, 2019 Source

If Googlebot is crawling many more pages than it actually needs to be crawled on the site, John recommends checking the server logs to determine exactly which pages Googlebot is crawling. For example, it could be that JavaScript files with a session ID attached are being crawled and bloating the total no. crawled pages.

Crawl Budget Limitations May Delay JavaScript Rendering

December 21, 2018 Source

Sometimes the delay in Google’s JavaScript rendering is caused by crawl budget limitations. Google is actively working on reducing the gap between crawling pages, and rendering them with JavaScript, but it will take some time, so they recommend dynamic, hybrid or server-side rendering content for sites with a lot of content.

Crawl Budget Updates Based on Changes Made to Site

December 14, 2018 Source

A site’s crawl budget changes a lot over time, as Google’s algorithms react quickly to changes made to a website. For example, if a new CMS is launched incorrectly with no caching and it’s slow, then Googlebot will likely slow down crawling over the next couple of days so that the server isn’t overloaded.

Use Log Files to Identify Crawl Budget Wastage & Issues With URL Structure

July 13, 2018 Source

When auditing eCommerce sites, John recommends first looking at what URLs are crawled by Googlebot. Then identify crawl budget wastage and perhaps change the site’s URL structure to stop Googlebot crawling unwanted URLs with parameters, filters etc.

Small to Medium-Sized Sites Don’t Have to Worry About Crawl Budget

April 6, 2018 Source

Sites with ‘a couple hundred thousand pages’ or fewer don’t need to worry about crawl budget, Google will be able to crawl them just fine.

4xx Errors Don’t Mean Your Crawl Budget is Being Wasted

February 20, 2018 Source

Seeing Googlebot crawling old 404/410 pages doesn’t mean your crawl budget is being wasted. Google will revisit these when there is nothing else on the site to be crawled, which is a sign of capacity to crawl more.

Google AdsBot Crawling Doesn’t Impact Crawl Budget For Organic Search

January 9, 2018 Source

If Google AdsBot is crawling millions of ad pages then this won’t eat into your crawl budget for organic search. John recommends checking for tagged URLs in any ad campaigns to reduce ad crawling.

Related Topics

Crawling Indexing Crawl Errors Crawl Rate Disallow Sitemaps Last Modified Nofollow Noindex RSS Canonicalization Fetch and Render