The new Data Explorer enables our customers to summarise millions of rows of crawl data in seconds and streamline your workflow. This new feature will allow our customers to gain unique insights into their crawl data and quickly communicate SEO issues to other stakeholders.
Note: The feature is currently in beta and only be accessed by customers with the segmentation add-on. We will soon allow all customers access to the Data Explorer.
What is the Data Explorer?
The Data Explorer allows our customers to summarise crawl data and calculate simple aggregate functions. The feature uses a new service that aggregates data by specific dimensions.
Why use Data Explorer?
The Data Explorer allows our customers to:
- Save time: It summarises millions of rows of data seconds.
- Summarise crawl data: Data Explorer can summarise millions of rows of data using simple aggregate functions.
- Communicate to stakeholders: Key SEO issues can easily be spotted and shared with stakeholders.
How does Data Explorer work?
The Data Explorer acts as an advanced pivot table that allows you to calculate aggregate functions for numeric metrics (DeepRank, internal link count, backlink count, log hits, etc.).
At the moment, we allow our customers to group crawl data using the following dimensions:
- HTTP Status Code
- Tree View (coming soon)
The technology we use currently only calculates the following aggregate functions for relevant metrics:
Our customers can calculate aggregate functions on any numeric metric in our system. A list of metrics and reports can be found here.
Data Explorer scenarios
We have provided a few common scenarios to help show off the unique data insights available to SEOs.
Internal Link Analysis
This scenario can be used to identify any internal link opportunities across different important page types on your website.
In this scenario, a client wants to understand any internal link opportunities available to optimize pages within the /appliances/ directory.
Data Explorer is used to quickly summarise the internal linking metrics around important page types using segments. To understand the internal linking of the page types, the SEO analyst used the following metrics and aggregate functions:
- URL, Count
- DeepRank, Avg
- Links In Count, Avg
- Links Out Count, Avg
In a few seconds, the SEO analyst has identified that the average DeepRank and the Followed Links In Count for the appliances pages are lower compared to other important page types in our crawl.
Canonical link signals
This scenario can be used to identify inconsistencies across canonical link signals on your website.
In this scenario, a client has raised that canonicalized parameter URLs are being found to be indexed by Google.
To understand any canonical link signals inconsistencies across the parameter URLs, the SEO analyst added a new Parameter segment to DeepCrawl. Then they used the following metrics and aggregate functions while selecting the Segment dimension:
- URL, Count
- Canonical link In Count, Sum
- Followed Links In Count, Sum
- Sitemaps In Count, Sum
The team identified both followed links pointing to the canonicalized parameter URLs. Google has spoken in detail about the canonicalization algorithm and mentioned that PageRank (links) is a signal. The team suspects that internal linking is what could be causing Google to ignore canonical tags.
The SEO team can also easily communicate this summary with the development team and other stakeholders.
This scenario can be used to segment XML Sitemap issues for important page types across the website.
In this scenario, an SEO team wants to understand why pages within the /appliances/ directory are not being discovered and crawled quickly by Google.
To analyze the sitemaps in the crawl, the team have added the following metrics and aggregate functions while selecting the Segment dimension:
- URL, Count
- Sitemap In Count, Sum,
- Followed Links in Count, Sum
- DeepRank, Avg
The number of appliance URLs being found in sitemaps is smaller than the number of URLs found in the crawl. This indicates to the team that not all the new appliance pages have been added to sitemaps.
In seconds the team identified that the new pages need to be added to sitemaps and submitted to Google.
The SEO team can easily communicate the scale of the issue to other stakeholders.
Broken page analysis
This scenario can be used to quickly identify broken pages on the website.
In the scenario, the team wants quickly to identify broken 4xx pages with any SEO value.
To do this the team used the following metrics and aggregate functions and selected the HTTP Status code dimension:
- URL, Count
- Analytics Visits, Sum
- Backlink Count, Sum
- Backlink Domains Count, Sum
As you can see from the list screenshot above the team has quickly identified broken pages with traffic and backlinks (SEO value).
This data should still be investigated within DeepCrawl and third-party tools to identify the pages to 301 redirect. The SEO team can also quickly summarise and communicate the available opportunities to the wider team in seconds.
Web Vital Analysis
In this scenario
In this scenario, the SEO team wants to quickly understand Core Web Vital metrics across the most important pages on the website.
The SEO team updated the table to include the following metrics and aggregate function and selected the segment dimension:
- Time to First Byte (TTFB), Avg
- Largest Contentful Paint (LCP), Avg
- Cumulative Layout Shift (CLS), Avg
The team found that the TTFB average for all the segments is outside the “good” criteria that Lighthouse and Google recommends. Also, there are page types which are showing a poor average score for the CLS metric.
This data quickly shows that further investigation is needed to understand which pages are poorly performing but this table can be used to summarise and communicate these problems with internal stakeholders.
The team wishes to understand if there any crawl errors found in the crawl and quickly summarise the information to communicate to the engineering team.
Using the default Data Explorer view the team selected the HTTP Status Code dimension.
The team can quickly identify crawl errors that were found in the crawl using the SUM aggregate function and the Internal Links In Count metric.
This data can then be quickly summarised and communicated to internal stakeholders.
How can I get started?
Our team will first roll out the Data Explorer for all customers with the segmentation feature, and then give access to all our customers.
Once you have access you will be able to see Data Explorer in the navigation.
Frequently Asked Questions
Is the tree summary going to be added to the Data Explorer?
Yes, we are currently testing the tree view with our developers and plan to release this dimension soon.
Why are all the metrics not available?
At the moment the Data Explorer only groups numeric metrics (integer and float data types). The only string metric we allow to count is URL. As we gather feedback from customers we’ll identify other metrics that could be added to the Data Explorer.
Will the Data Explorer replace the Site Explorer?
Eventually we plan to replace the Site Explorer feature with the Data Explorer.
Do you plan to release other enhancements?
We’re always looking to improve features and feedback from our Beta testers has already been valuable in improving the Data Explorer.
Please provide feedback through the feedback form or your Customer Success Manager.
Is the summarised data available in the API?
Yes, all the aggregate data is available in the API. Read our API docs for more information.
Is the aggregation data available in Data Studio?
The data aggregate calculations are not available in Google Data Studio at the moment, although we will look at getting this added as soon as we can.
The Data Explorer will help our customers save time and summarize important technical SEO issues across millions of rows of data.
This is just a start in the type of data we are trying to provide for our customers and we plan to make more changes based on customer feedback over the next few months.