Reports, Metrics, and Datasources in DeepCrawl

Alec Bertram
Alec Bertram

On 5th July 2019 • 5 min read

At DeepCrawl we track over 250 metrics to help our users understand their website. In this guide, we will explain a bit more about how our systems work and how to store the information around your crawl data.

 
 

Explore DeepCrawl Reports

Click on the links below to understand the metrics behind specific reports. All reports have been grouped into datasources.

URLs and Pages | Links | Unique Links | Sitemaps

 

Below is more information on how DeepCrawl calculates metrics and reports.
 

What are metrics in DeepCrawl?

A metric is a piece of information about a page, link, or sitemap that we have extracted from a URL or has been calculated in our system (e.g. DeepRank).

Here are some examples of metrics we store around a URL:

There are different levels of metrics which we have to calculate within the DeepCrawl system.

For example, Meta Noindex is a low level true or false metric which lets you know whether a page has the noindex meta tag. Indexable is a high-level metric which needs to take into account several metrics to be accurate (such as noindex tags, headers, canonicalisation, etc.). All these different metrics, once calculated, let our system identify if a page is indexable or non-indexable.

For all pages fetched and processed in our system, we collect more than 300 metrics which include everything from a page's title to the number of Search Console impressions.
 

What are reports in DeepCrawl?

A report in DeepCrawl is a combination of different metrics - while a metric is an individual piece of information about a page, a report takes many metrics and their values into account.

For example, the Page Title metric is the title that we extracted from your page, but the Short Titles report is a list of URLs which have a short title and are indexable.

Examples of reports in DeepCrawl:

 

What are DeepCrawl’s datasources?

During our crawls, we collect information about URLs, links between those URLs, and sitemaps. As these three pieces of data are so different from each other, we separate them into separate main databases.

Pages and URLs

This datasource contains each URL and all metrics related to each URL. For example:

Links

This datasource contains each link and related metrics, for example:

It also contains links which have issues. For example broken links, links between protocols, and a few other cases.

We do not currently store every single link and its source that we see during a crawl as this is typically terabytes of data. If you are interested in all links between pages, look at Unique Links.

Unique Links

This datasource contains every unique link that we saw during the crawl. For example:

If your website has a navigation link to the homepage on every page of the website, then we will save that link once along with a count of the times we saw that link.

Sitemaps

This datasource includes Information about the sitemaps we processed during the crawl. For example:

 
 

Using Reports and Metrics with the API

You can query URLs in the API using reports - the two concepts for this are:

Reports contain aggregate information - total is the count of URLs which match that report query, added is the number of new URLs in that report since the last crawl, etc. This is available by calling /accounts/:account_id/projects/:project_id/crawls/:crawl_id/reports/:report_code_basic

Report rows are the raw data of each URL (and relevant metrics) within that report. These can be accessed in the API using /accounts/:account_id/projects/:project_id/crawls/:crawl_id/reports/:report_code_basic/report_rows

Author

Alec Bertram
Alec Bertram

Alec is the Head of Product at DeepCrawl. He has 10 years' of experience in SEO, and works to make sure we're building the most valuable things we can for our users.

Get the knowledge and inspiration you need to build a profitable business - straight to your inbox.

Subscribe today