A sitemap is a list of all of the live URLs which exist on a site and is used to inform search engine crawlers of the most important pages and therefore which ones should be crawled and indexed. There are several things to consider when creating sitemaps, as well as understanding how search engines view them. We cover a range of these topics within our Hangout Notes, along with best practice recommendations and advice from Google.

Sitemaps Submitted Through GSC Will be Remembered for Longer

March 6, 2018 Source

Google’s memory for sitemaps is longer for those submitted through Google Search Console. Sitemaps that are submitted through robots.txt or are pinged anonymously are forgotten once they are removed from the robots.txt, or if they haven’t been pinged for a while.

Sitemap Files Returning 404s Don’t Cause Issues for Google

March 6, 2018 Source

Sitemap files that return 404s don’t cause any issues for Google from an SEO perspective, they will just be left as 404s.

Sitemaps Are More Critical for Larger Sites with High Churn of Content

February 23, 2018 Source

Sitemaps are more useful for larger websites that have a lot of new and changing content. It is still best practice to have sitemaps for smaller sites that largely have the same content, but they are less critical for search engines to find new pages.

Static Sitemap Filenames Are Recommended

February 1, 2018 Source

John recommends having static site map filenames that don’t change every time they are generated so they don’t waste time crawling sitemaps URLs which don’t exist any more.

Make Sure There is a Clear Connection Between Your Mobile & Desktop Sites

January 9, 2018 Source

It’s possible to include m. pages in your main sitemap file to help Google discover and crawl them for mobile-first, but if there is a clear connection between the desktop and mobile sites then this won’t be necessary.

Canonicals Are Chosen by Google Using XML Sitemap URLs

January 9, 2018 Source

XML sitemap URLs are used to help inform Google’s decision on which URL is chosen to be the canonical.

Google Uses Scheduler to Determine Recrawl Date

December 15, 2017 Source

Google uses a scheduler before crawling to work out when they need to recrawl URLs. Google will increase crawl rate if it gets signals that it needs to do so e.g. updated modification date in sitemaps and internal linking (especially from the homepage).

Crawl Frequency Attribute in XML Sitemaps Doesn’t Impact Crawl Rate

December 15, 2017 Source

Google takes no notice of the crawl frequency attribute in XML sitemaps or any priority set. Only the last modification timestamp will impact crawl rate.

New Search Console Will Show More Sitemap Data

December 12, 2017 Source

The new Search Console will show more detailed information regarding sitemaps and more detail per sitemap file.

Related Topics

Crawling Indexing Crawl Budget Crawl Errors Crawl Rate Disallow Last Modified Nofollow Noindex RSS Canonicalization Fetch and Render