Sitemaps

A sitemap is a list of all of the live URLs which exist on a site and is used to inform search engine crawlers of the most important pages and therefore which ones should be crawled and indexed. There are several things to consider when creating sitemaps, as well as understanding how search engines view them. We cover a range of these topics within our Hangout Notes, along with best practice recommendations and advice from Google.

Block Videos From Search By Adding Video URL & Thumbnail to Robots.txt or Setting Expiration Date in Sitemap

July 13, 2018 Source

You can signal to Google for a video not to be included in search by blocking the video file and thumbnail image in robots.txt or by specifying an expiration date using a video sitemap file.


A Sitemap is the Best Way for Google to Quickly Process Noindex at Scale

July 10, 2018 Source

Make sure the pages you’ve added a noindex tag to are included in a sitemap file with the last modified date to ensure Google picks these up as quickly as possible. Make sure last modified dates are realistic and aren’t the same for every page as this looks artificial to Google.


You Only Need to Include Standalone AMP Pages in Sitemaps

June 26, 2018 Source

As long as you have the link rel=amphtml you won’t need to include AMP in sitemaps unless they are standalone pages. Google can access the HTML of the main page to include in the AMP cache when changes are made to the content.


Google Will Crawl Sitemaps That Have Been Removed from GSC

May 15, 2018 Source

It’s not enough to remove an old sitemap file from GSC to prevent it from being crawled, you need to remove it from the server to prevent Google from finding and crawling it. John recommends fixing the sitemap file if possible though.


Large Hreflang Sets Should be Included in Sitemap Files

March 6, 2018 Source

If you have a large set of hreflang tags then John recommends putting these in your sitemap files as this makes them easier to maintain.


Sitemaps Submitted Through GSC Will be Remembered for Longer

March 6, 2018 Source

Google’s memory for sitemaps is longer for those submitted through Google Search Console. Sitemaps that are submitted through robots.txt or are pinged anonymously are forgotten once they are removed from the robots.txt, or if they haven’t been pinged for a while.


Sitemap Files Returning 404s Don’t Cause Issues for Google

March 6, 2018 Source

Sitemap files that return 404s don’t cause any issues for Google from an SEO perspective, they will just be left as 404s.


Sitemaps Are More Critical for Larger Sites with High Churn of Content

February 23, 2018 Source

Sitemaps are more useful for larger websites that have a lot of new and changing content. It is still best practice to have sitemaps for smaller sites that largely have the same content, but they are less critical for search engines to find new pages.


Static Sitemap Filenames Are Recommended

February 1, 2018 Source

John recommends having static site map filenames that don’t change every time they are generated so they don’t waste time crawling sitemaps URLs which don’t exist any more.


Related Topics

Crawling Indexing Crawl Budget Crawl Errors Crawl Rate Disallow Last Modified Nofollow Noindex RSS Canonicalization Fetch and Render