Crawling

Before a page can be indexed, and therefore appear within search results, it must be crawled by search engine crawlers, like Googlebot. There are many things to consider in order to get pages crawled and ensure they are adhering to the correct guidelines. These are covered within our Hangout Notes, as well as further research and recommendations.

Using International IP Redirects Will Prevent Google From Finding Other Versions of A Site

July 12, 2019 Source

If you are redirecting based on international IP addresses, Google is likely to only see the redirect to the English version and would drop all of the other versions.


External Links Are More Critical for Initial Content Discovery & Crawling

June 28, 2019 Source

External links are useful for helping Google find and crawl new websites, but they become less important to Google once it has already discovered the site in question.


Images Implemented Via Lazy Loading Can be Used Like Any Other Image on a Page

June 25, 2019 Source

Images implemented via the lazy load script can be added to structured data and sitemaps without any issues, as long as they are embedded in a way that Googlebot is able to pick up.


Google Doesn’t Need To Be Able To Crawl The Add to Cart Pages of A Site

June 25, 2019 Source

It is not essential for Google to crawl the add to cart pages on e-commerce sites, so this shouldn’t affect a site’s performance in search for purchase intent queries.


Googlebot Does Crawl From a Handful of Regional IPs

June 14, 2019 Source

Googlebot does crawl from a small number of regional IPs, particularly in countries where they know it is hard to crawl from the US.


An Updated User Agent is Expected to Reflect The New Modern Rendering Infrastructure

June 14, 2019 Source

Google has been experimenting with the current user agent settings and is re-thinking the set u. John expects some changes to be announced in the future around an updated user agent so that it reflects the new modern rendering infrastructure.


The Site Diversity Update Won’t Affect How Subdomains Are Crawled

June 11, 2019 Source

The new change that was launched to show more diversity of sites in the search results won’t impact the way subdomains are currently crawled and processed, it will only impact how they are shown in the search results.


News Site Shown in Forum Snippets Can Reformat Comment Section or Block Comments From Crawling

June 11, 2019 Source

If a news site is being shown in forum snippets and this is problematic for you, either reformat the comments sections in a way that demotes the importance of this content or block comments from being crawlable by Google.


Google Has an Upper Limit of Around 5,000 Internal Links Per Page For Crawling

June 11, 2019 Source

Sites don’t normally exceed Google’s upper crawl limit for links on a page as it is quite high at around 5,000 links per page. However, John recommends only having necessary internal links so PageRank isn’t diluted too much.


Related Topics

Indexing Crawl Budget Crawl Errors Crawl Rate Disallow Sitemaps Last Modified Nofollow Noindex RSS Canonicalization Fetch and Render