Integrating Log File Summary Data with DeepCrawl

Adam Gordon
Adam Gordon

On 19th March 2018 • 7 min read

Our integration allows us to automatically get log file data every time we crawl a website, meaning that log file data is always up to date, and there is no need to manually upload data.


What data does DeepCrawl get from

DeepCrawl queries your account and gets all the URLs which received traffic from search engine crawlers within the specified date range along with the count of hits from Google’s desktop crawler and the count of hits from Google’s mobile crawler. This data is used to populate the "Log File" and "Bot Hit" reports.


How do I set up and connect it to my DeepCrawl account?

To first set up the integration you will need to contact DeepCrawl support to set up your account. To do this, navigate to the Connected Apps page from the dropdown menu in the top right of the app, click on and click “Request Access to Logzio API”. A member of the DeepCrawl team will get in touch with you to discuss the setup process and will help you with the creation of a new account.

Once you’re account has been setup you will need to connect it to your DeepCrawl account. You can do this too from the Connected Apps (you can also do this from Step 2 in the Crawl Setup).

To connect your account to DeepCrawl, navigate to the Connected Apps page click on and click “Add Account”. You will now need to go to the app ( to generate a token. Once in the app, click on the small cog in the right of the screen and then click on Shared Tokens. Now, type a name for your token in the Token Name field and click Save. A new token will then appear in the table. Copy the token and then jump back to DeepCrawl. Paste the token into the Token field and enter a similar label for the token into the Label field and click “Add Account”. Your account is now integrated with DeepCrawl.


How do I add log summary data to a crawl?

First navigate to Step 2 of crawl setup open the Log Summary tab and click on the tab.
As you have hopefully now connected your account with DeepCrawl, you will instead see an ‘Add Query’ button. Click this button to open DeepCrawl’s query builder that will help you set up a query to pull in log summary data for your website from (This query will run when the crawl next runs.)

The query builder contains pre-filled values for the most common server log file setup, but if your server’s log file setup is different, you can update the values for the query here.
It’s best to speak to the development team who maintain your website to confirm the values for the query.

Below is an explanation of each of the different fields in the query builder:

Base URL:
Enter the base domain to use for relative URLs in the query. If left blank, the primary domain will be used.

The token used to connect DeepCrawl to your account.

Date Range:
The date range to use to collect logs from that timeframe.

Desktop user agent regex:
The user agent to use when retrieving desktop bot requests. Regex can be used.

Maximum number of URLs to fetch:
The maximum amount of URLs to fetch from

Mobile agent regex:
The user agent to use when retrieving mobile bot requests. Regex can be used.

URL field name:
The URL field I.e. URI.

User agent field name:
The user agent field. I.e. Agent.

Query filter (JSON): (Optional)
An optional filter that can be used to filter down URLs to only specific subdomains.

Once you have confirmed these values with the developers, you can click the Save button and then finish the rest of the project setup screen and start a crawl.


How do I view data when the crawl has finished?

Once your crawl has finished, navigate to that crawl in the DeepCrawl app. In the sidebar you will notice a category named Log Files. This category contains many different reports relating to log file analysis.

These reports will give you deeper insights into how your website is performing by utilising your log file summary data. Here are some of the examples of the insights they provide:


No Bot Hits, High Bot Hits, Medium Bot Hits & Low Bot Hits

Knowing where search engines focus their time crawling your site, allowing you to optimize your crawl budget – you can even break not hits down by device


Error Pages with Bot Hits

Identifying where crawl budget is being wasted crawling pages returning 4xx and 5xx status codes


Non-Indexable Pages with Bot Hits

Uncovering which of your non-indexable pages are being crawled by search engine bots


Pages without Bot Hits in Sitemaps

Flagging URLs which are included in your sitemap but aren’t being crawled by search engine bots

If you’ve got any questions about our exciting new integration we’d be happy to answer them. Just shoot us over a message or get in touch with your account manager.


Adam Gordon
Adam Gordon

Product Manager at DeepCrawl.

Get the knowledge and inspiration you need to build a profitable business - straight to your inbox.

Subscribe today