Sitemaps

A sitemap is a list of all of the live URLs which exist on a site and is used to inform search engine crawlers of the most important pages and therefore which ones should be crawled and indexed. There are several things to consider when creating sitemaps, as well as understanding how search engines view them. We cover a range of these topics within our Hangout Notes, along with best practice recommendations and advice from Google.

Use Sitemaps Ping, Last Modified and Separate Sitemaps to Index Updated Content

March 20, 2020 Source

To help Google index updated content more quickly, ping Googlebot when a Sitemap has been updated, use Last Modified dates in Sitemaps, and use a separate Sitemap for updated content so it can be crawled more frequently.


Specify Timezone Formats Consistently Across Site & Sitemaps

February 18, 2020 Source

Google is able to understand different timezone formats, for example, UTC vs GMT. However, it’s important to use one timezone format consistently across a site and its sitemaps to avoid confusing Google.


Include Most Recently Changed Content in Separate Sitemap

February 18, 2020 Source

Rather than submitting all of your sitemaps regularly to get Googlebot to find and crawl newly updated pages, John recommends adding recently changed pages into a separate sitemap which can be submitted more frequently, while leaving more stable, unchanged pages in existing sitemaps.


Use the Last Modified Date to Provide a Hierarchy of Changes Made to A Site

February 7, 2020 Source

John recommends using the last modified date in sitemaps in a reasonable way to provide a clear hierarchy of the changes that have been made on a site. This helps Google to understand which pages are important and ensures they focus on crawling these first.


“Discovered Not Indexed” Pages May Show in GSC When Only Linked in Sitemap

October 29, 2019 Source

Pages may show as “Discovered Not Indexed” in GSC if they have been submitted in a sitemap but aren’t linked to within the site itself.


Google Has a Separate User Agent For Crawling Sitemaps & For GSC Verification

October 1, 2019 Source

Google has a separate user agent that fetches the sitemap file, as well as one to crawl for GSC verification. John recommends making sure you are not blocking these.


Internally Link Pages Together to Increase Discoverability & Reduce Reliance on XML Sitemap

September 3, 2019 Source

Internally linking pages together helps Googlebot to discover the pages on your site more easily, and reduces the reliance on using XML sitemaps for URL discovery.


XML Sitemaps Should Include URLs on Same Path Unless Submitted Via Verified Property in GSC

August 23, 2019 Source

XML sitemaps should contain URLs on the same path. However, URLs in sitemaps submitted via GSC can be for any valid property within your GSC account.


Missing Sitemap Data in GSC API is a Known Error

August 9, 2019 Source

When switching over to the new GSC UI for sitemap reporting, which took place early April 2019, an issue occured within the API where data stopped updating. The team are looking into this and John expects they will document the error soon, with advice for those affected.


Related Topics

Crawling Indexing Crawl Budget Crawl Errors Crawl Rate Disallow Last Modified Nofollow Noindex RSS Canonicalization Fetch and Render