A sitemap is a list of all of the live URLs which exist on a site and is used to inform search engine crawlers of the most important pages and therefore which ones should be crawled and indexed. There are several things to consider when creating sitemaps, as well as understanding how search engines view them. We cover a range of these topics within our Hangout Notes, along with best practice recommendations and advice from Google.

Low Proportion of Indexed Pages Points to Technical Issue

September 22, 2017 Source

If a site has a low proportion of indexed pages, this usually points to a technical issue than a quality issue. Compare the site map index counts and index status report for differences. Try splitting up sitemap file , checking indexed pages using info: query, that rel canonicals match those in sitemap file, hreflang and internal linking. Also, uppercase, lowercase, trailing slashes all matter. Then check crawl stats to get idea of crawl rate and if it’s reasonable.

GSC Sitemaps Report Can Take Couple of Days to Update

September 8, 2017 Source

Sitemaps report in GSC can take a couple of days to update after changes have been made to the sitemap and may explain why non-existent errors are reported.

Google Validates Sitemap Files Immediately After Submission

September 5, 2017 Source

The 50k URL limit for sitemaps is based on the number of entries or elements in the sitemap file (including alternate linked URLs) and this is validated immediately after they are submitted. So if there are too many URLs in the sitemap file, you will be made aware of that straight away.

Ensure Separate Sitemap Files Don’t Contain URL Overlap

August 11, 2017 Source

Having separate dynamic and static sitemap files is fine, as long as there is no URL overlap.

Submit Sitemap With Updated Last Modification Date For Faster Crawling of Updated Pages

August 11, 2017 Source

Submit a sitemap file with an update last modification date to speed up the process of crawling and indexing of pages that have been changed.

Site: Search Operator Isn’t True Indicator of All Indexed Pages

August 2, 2017 Source

Site: search operator isn’t a true indicator of all pages that are indexed on that site. Use a sitemap file to submit the URLs you care about.

Internal & Sitemap Links May Override Canonical Tags

July 7, 2017 Source

Google uses a number of factors to determine which URLs to show. Canonicalised pages may still be chosen if you link to them internally and in Sitemaps.

Use Sitemaps With Last Modified for Expired Content

June 16, 2017 Source

Use a last modified date with a regularly updated Sitemap to help get expired pages picked up more quickly.

Split up Sitemaps up to Identify Pages Indexed by Google

June 2, 2017 Source

There is no way to get information on which specific URLs are indexed in Google. If you want to see what URLs have been indexed by Google, you can split the sitemap up into smaller parts. However, you shouldn’t focus on getting high numbers of URLs indexed, but more on the relevance of indexed pages and content.

Related Topics

Crawling Indexing Crawl Budget Crawl Errors Crawl Rate Disallow Last Modified Nofollow Noindex RSS Canonicalization Fetch and Render