How to Setup Sitemap Audits in DeepCrawl

Adam Gent
Adam Gent

On 10th April 2019 • 9 min read

A valid XML Sitemap file or Sitemap Index can be added to a crawl project. However, it is important to understand what type of data you want to include in any XML Sitemap audit.

There are three ways that XML Sitemaps can be audited in DeepCrawl:

Each of these crawl projects has advantages and disadvantages depending on what you want to achieve.

Before setting up a crawl project, it is important that any Sitemaps included in a crawl are tested to make sure they are valid.
 

Test XML Sitemap Files

An invalid Sitemap will provide inaccurate crawl data and will mean that the crawl will need to re-run.

To avoid any inaccurate crawl data our team recommends:

Once you are confident the XML Sitemaps are valid, then you can choose to include them in a crawl project which meets your audit objectives.
 

Choose Your Crawl Project When Auditing XML Sitemaps

It is essential to understand the differences between the different crawl projects to make sure data from DeepCrawl achieves your intended goal.

Here is a chart to compare the three different types of crawl project:

Crawl Project Validate XML Files Use traffic and backlink data in Sitemap analysis Use internal link data in analysis Audit URLs in XML Sitemaps? Run crawl quickly
All Data
Web and XML Sitemap
XML Sitemap

For each crawl project, we have provided further detail about the disadvantages and advantages of using them.
 

All Data Sources Crawl Project

In this crawl project type, all the data sources are selected.

This helps to understand better the URLs found within the XML Sitemap files in the context of other data sources (Google Search Console, backlinks, etc.).

Advantages:

Disadvantages:

When Should This Project Type be Used?

A crawl project with all data sources selected is best to be scheduled to be run every month or quarter. This is because a crawl with all data sources selected would take more time to crawl due to the extra data sources which DeepCrawl needs to request and process.

Data Sources SHould Use the Same Primary Domain Name

When connecting all data sources within the crawl set up, please make sure they are all using the same primary domain as the XML Sitemap and web crawl (example.com, www.example.com, etc.).

If the primary domains are different in the backlink and analytics data sources, then any XML Sitemap analysis will be invalid in DeepCrawl. This is because all impression, click, and backlink metrics associated with a URL will not match up to the URLs found in the Sitemaps.
 

Web Crawl and XML Sitemap Crawl Project

In this crawl project type, only the website and XML Sitemap data sources are selected.

The website and XML Sitemap crawl project helps to better understand the URLs found within the XML Sitemap files in the context of the on-site signals (internal links, orphaned pages, canonicalization, etc.).

Advantages:

Disadvantages:

When Should This Project Type be Used?

A crawl project with all data sources selected is best to be scheduled to be run every month or quarter. This is because a crawl with all data sources selected would take more time to crawl due to the extra data sources which DeepCrawl needs to request and process.

Website Crawl Should Use the Same Primary Domain Name

When choosing both the website crawl data and XML Sitemap data source, make sure both are using the same primary domain.

If the primary domain is different in the website data source than the any XML Sitemap analysis will be invalid in DeepCrawl. All on-page metrics associated with a URL will not match up to the URLs found in the Sitemaps.
 

XML Sitemap Crawl Project

In this crawl project type, only the XML Sitemap data source is selected.

The XML Sitemap crawl project helps to quickly understand any issues with the URLs found in XML Sitemap files (HTTP status codes, noindexed pages, broken links, etc.).

Advantages:

Disadvantages:

When Should This Project Type be Used?

The crawl project is ideal for when development or SEO teams want to monitor XML Sitemaps and want to get data back quickly.
 

Summary

There is no right or wrong way to set up a crawl project for auditing XML Sitemaps. Just remember that when setting up an XML Sitemap project using DeepCrawl to:

 

Recommended Guides

Author

Adam Gent
Adam Gent

Search Engine Optimisation (SEO) professional with over 8 years’ experience in the search marketing industry. I have worked with a range of client campaigns over the years, from small and medium-sized enterprises to FTSE 100 global high-street brands.

Get the knowledge and inspiration you need to build a profitable business - straight to your inbox.

Subscribe today