Our Logz.io integration allows us to automatically get log file data every time we crawl a website, meaning that log file data is always up to date, and there is no need to manually upload data.
What data does DeepCrawl get from Logz.io?
DeepCrawl queries your Logz.io account and gets all the URLs which received traffic from search engine crawlers within the specified date range along with the count of hits from Google’s desktop crawler and the count of hits from Google’s mobile crawler. This data is used to populate the "Log File" and "Bot Hit" reports.
How do I set up Logz.io and connect it to my DeepCrawl account?
To first set up the integration you will need to contact DeepCrawl support to set up your Logz.io account. To do this, navigate to the Connected Apps page from the dropdown menu in the top right of the app, click on Logz.io and click “Request Access to Logzio API”. A member of the DeepCrawl team will get in touch with you to discuss the setup process and will help you with the creation of a new Logz.io account.
Once you’re Logz.io account has been setup you will need to connect it to your DeepCrawl account. You can do this too from the Connected Apps (you can also do this from Step 2 in the Crawl Setup).
To connect your Logz.io account to DeepCrawl, navigate to the Connected Apps page click on Logz.io and click “Add Logz.io Account”. You will now need to go to the Logz.io app (https://app.logz.io/) to generate a token. Once in the Logz.io app, click on the small cog in the right of the screen and then click on Shared Tokens. Now, type a name for your token in the Token Name field and click Save. A new token will then appear in the table. Copy the token and then jump back to DeepCrawl. Paste the token into the Token field and enter a similar label for the token into the Label field and click “Add Logz.io Account”. Your Logz.io account is now integrated with DeepCrawl.
How do I add Logz.io log summary data to a crawl?
First navigate to Step 2 of crawl setup open the Log Summary tab and click on the Logz.io tab.
As you have hopefully now connected your Logz.io account with DeepCrawl, you will instead see an ‘Add Logz.io Query’ button. Click this button to open DeepCrawl’s query builder that will help you set up a query to pull in log summary data for your website from Logz.io. (This query will run when the crawl next runs.)
The query builder contains pre-filled values for the most common server log file setup, but if your server’s log file setup is different, you can update the values for the query here.
It’s best to speak to the development team who maintain your website to confirm the values for the query.
Below is an explanation of each of the different fields in the query builder:
Enter the base domain to use for relative URLs in the query. If left blank, the primary domain will be used.
The Logz.io token used to connect DeepCrawl to your Logz.io account.
The date range to use to collect logs from that timeframe.
Desktop user agent regex:
The user agent to use when retrieving desktop bot requests. Regex can be used.
Maximum number of URLs to fetch:
The maximum amount of URLs to fetch from Logz.io.
Mobile agent regex:
The user agent to use when retrieving mobile bot requests. Regex can be used.
URL field name:
The Logz.io URL field I.e. URI.
User agent field name:
The Logz.io user agent field. I.e. Agent.
Query filter (JSON): (Optional)
An optional filter that can be used to filter down URLs to only specific subdomains.
Once you have confirmed these values with the developers, you can click the Save button and then finish the rest of the project setup screen and start a crawl.
How do I view Logz.io data when the crawl has finished?
Once your crawl has finished, navigate to that crawl in the DeepCrawl app. In the sidebar you will notice a category named Log Files. This category contains many different reports relating to log file analysis.
These reports will give you deeper insights into how your website is performing by utilising your log file summary data. Here are some of the examples of the insights they provide:
No Bot Hits, High Bot Hits, Medium Bot Hits & Low Bot Hits
Knowing where search engines focus their time crawling your site, allowing you to optimize your crawl budget – you can even break not hits down by device
Error Pages with Bot Hits
Identifying where crawl budget is being wasted crawling pages returning 4xx and 5xx status codes
Non-Indexable Pages with Bot Hits
Uncovering which of your non-indexable pages are being crawled by search engine bots
Pages without Bot Hits in Sitemaps
Flagging URLs which are included in your sitemap but aren’t being crawled by search engine bots
If you’ve got any questions about our exciting new Logz.io integration we’d be happy to answer them. Just shoot us over a message or get in touch with your account manager.