Before a page can be indexed, and therefore appear within search results, it must be crawled by search engine crawlers, like Googlebot. There are many things to consider in order to get pages crawled and ensure they are adhering to the correct guidelines. These are covered within our Hangout Notes, as well as further research and recommendations.

404 Pages Crawled Less Than Noindex

October 27, 2015 Source

For expired/removed content, John says that Google prefer a 404 as it results in less crawling than a noindex.

Fetch & Render Shows Results for a Googlebot and Browser User Agent

October 27, 2015 Source

The Fetch and Render tool shows you 2 different renders, one for Googlebot which used the Googlebot user agent, and one for users which used a browser user agent. If JS/CSS is disallowed for Googlebot, it may not be able to render all the content in the same way.

Clean HTML and Structured Data Helps Google Understand Content

September 11, 2015 Source

Clean HTML and structured markup help Google better understand context

URLs in JavaScript May Be Crawled

May 19, 2015 Source

JavaScript variables which look like URLs may be crawled, which can generate server errors. But you can ignore them, or block with robots.txt

HTML Crawling Faster Than JavaScript for Page Discovery

April 24, 2015 Source

JavaScript processing takes longer than pure HTML crawling, so isn’t suitable for fast discovery of pages. John says ‘it takes another cycle or two longer to process’.

Image Re-Crawling Takes Longer After a URL Change

March 27, 2015 Source

Images are not crawled very frequently, so when you migrate them to new URLs/domains, it will take a lot longer than pages, perhaps months.

Wildcard Subdomain Configuration Causes Crawl Issues

December 23, 2014 Source

Using wildcard subdomains can make a site difficult to crawl.

CSS and JS Crawling Is Important for Mobile Compatability

December 16, 2014 Source

Allowing your CSS and JavaScript files to be crawlable does affect desktop pages, but is more important for mobile pages as they need to test for mobile compatibility.

Noindex Pages Can’t Accumulate PageRank

November 7, 2014 Source

Noindex pages can’t accumulate pagerank for the site, even though the pages can be crawled. So this isn’t an advantage over disallowing.

Related Topics

Indexing Crawl Budget Crawl Errors Crawl Rate Disallow Sitemaps Last Modified Nofollow Noindex RSS Canonicalization Fetch and Render