Home / SEO Office Hours / Crawling / Page 3

Crawling

Before a page can be indexed (and therefore appear within search results), it must first be crawled by search engine crawlers like Googlebot. There are many things to consider in order to get pages crawled and ensure they are adhering to the correct guidelines. These are covered within our SEO Office Hours notes, as well as further research and recommendations.

For more SEO knowledge on crawling and to optimize your site’s crawlability, check out Lumar’s additional resources:

If a Robots.txt File Returns a Server Error for a Brief Period of Time Google Will Not Crawl Anything From the Site

If a robots.txt file returns a server error for a brief period of time Google will not crawl anything from the website until they are able to access it and crawl normally again. During the period of time where they are blocked from reaching the file they would assume all URLs are blocked and would therefore flag this in GSC. You can use the robots.txt request in your server logs to identify where this has occurred by reviewing the response size and code that was returned during each request.

31 Jan 2020

It is Normal for Google to Occassionally Crawl Old URLs

Due to their rendering processes, Google will occasionally re-crawl old URLs in order to check their set up. You may see this within your log files, but it is normal and will not cause any problems.

31 Jan 2020

Having a Reasonable Amount of HTML Comments Has No Effect on SEO

Comments within the HTML of a page do not have any effect on SEO unless there is a large amount, as they can make it difficult for Google to figure out where the content is and may impact the size and speed of the page. However, John confirmed he has never come across a page where HTML comments have been a problem.

24 Jan 2020

Upper Limit For Recrawling Pages is Six Months

Google tends to recrawl pages at least once every six months as an upper limit.

22 Jan 2020

Google is Able to Display Structured Data Results as Soon as the Page Has Been Re-crawled

After configuring pages to send structured data to Google, it will be able to display the structured data results the next time it crawls and indexes that page.

10 Jan 2020

Google Still Respects the Meta Robots Unavailable After Directive

Google still respects the meta robots unavailable_after directive, this is used to specify a date when a page will no longer be available. John explained that it is likely their systems will recrawl the page around the date specified in order to make sure they are not just removing pages from the index that are still available.

10 Jan 2020

Google Doesn’t Crawl Any URLs From Hostname When Robots.txt Temporarily 503s

If Google encounters a 503 when crawling a robots.txt file, it will temporarily not crawl any URLs on that hostname.

13 Dec 2019

Google May Crawl More Frequently if it Detects Site Structure Has Changed

If you remove a large number of URLs, causing Google to crawl a lot of 404 pages, it may take this as a signal that your site structure has changed. This may lead to Google crawling the site more frequently in order to understand the changes.

10 Dec 2019

Google May Still Crawl Parts of a Site With Desktop Crawler

Even with the shift to mobile-first indexing, Google may still crawl parts of a site with the desktop crawler. John explained that this will not impact the site as long as things are working well on mobile.

15 Nov 2019

Use View Source or Inspect Element to Ensure Hidden Content is Readily Accessible in the HTML

If you have content hidden behind a tab or accordion, John recommends using the view source or inspect element tool to ensure the content is in the HTML by default. Content pre-loaded on the HTML will be treated as normal content on the page, however, if it requires an interaction to load, Google will not be able to crawl or index it.

1 Nov 2019

Back 3/19 Next