DeepCrawl Integration: Google Search Console

Adam Gent
Adam Gent

On 9th July 2019 • 14 min read

The Performance report in Search Console helps businesses track the organic search traffic and appearance of a website or app. At DeepCrawl we allow our users to integrate Google Search Console analytics with crawl data to identify issues with your website or app.
 

What is Google Search Console

Google Search Console is a no-charge web service by Google for webmasters, which helps them troubleshoot problems with their website. It also allows SEOs and site owners to:

Google does have a public API for Search Console. At the moment DeepCrawl only uses organic search analytics data from the API in our integration.
 

Why integrate Google Search Console with DeepCrawl?

Integrating Google Search Console Performance data with DeepCrawl allows users to further enrich their reports. DeepCrawl users will be able to:

To get these special reports Google Search Console needs to be added as a crawl source within a users DeepCrawl project.
 

Setting up a Google Account in DeepCrawl

There are two ways a user can connect to a Google Account in DeepCrawl, through the Connected App page or the DeepCrawl property settings in a Project within the Sources step.

Connect Apps method

1. Navigate to the Connected Apps page.

2. Click on the “ADD GOOGLE ACCOUNT” button.

3. Login to your Google Account which has the Google Search Console profiles you want to include in crawls.

4. In the Connected Apps page, users can manage what Google accounts are connected in DeepCrawl.

5. Once a Google account is connected, navigate to the Sources settings in the project you want to include Google Search Console data in and select the Google Search Console source (green tick will appear when selected).

6. Select the Google Search Console view you want to use in project crawls, the selected view will then appear on the right of the list.

7. That’s it, DeepCrawl will now fetch URLs found in Google Search Console during the crawl.

DeepCrawl property settings

1. In the Sources settings in a Project scroll down to the Search Console source.

2. Select the Google Search Console source (green tick appears when selected).

3. Click on the “ADD GOOGLE ACCOUNT” button.

4. Login to your Google Account which has the Google Search Console properties you want to include in crawls.

5. Finally, select the Google Search Console property you want to use in project crawls, the selected view will then appear on the right.

6. That’s it, DeepCrawl will now fetch URLs from the chosen property in Google Search Console during the crawl.
 

Configure Google Search Console Filters

Once Google Search Console is integrated, DeepCrawl allows users to filter the requested landing page URLs from the Performance data.

To configure the Performance data filter before DeepCrawl fetches it, click on “Configure Google Search Console” under Google Search Console on the Source settings in step 2.

Clicking on the configure filter options will open up the settings.

The following filters can be used to alter the URLs which are extracted by DeepCrawl:

1. Country: Group data by a specific country.

2. Search type: Group data by type of Google search (web, image, video).

3. Date range: Select a date range (10, 15, 30, 60 and 100 days).

4. Include queries: Include URL data by keywords they appear in Google Search.

5. Exclude queries: Exclude URL data by keywords they don’t appear for in Google Search.

6. Clicks: Group data by minimum number of clicks a page requires to retrieve in Search.

For a breakdown of these filters and their dimensions please read the following section in the Google Search Console guide here.

By default the date range for Google Search Console data is 100 days.
 

Google Search Console metrics extracted

The following metrics are extracted from Google Search Console:

These metrics are all pulled for desktop and mobile devices (desktop clicks vs mobile clicks). This allows DeepCrawl to identify problems between mobile and desktop URLs in Google SERPs.
 

Issue Reports in DeepCrawl

When integrating Google Search Console with DeepCrawl, our software can identify issues with pages. The following reports provide information of the issues which are likely to have a significant negative effect on your search rankings:

 

Config and Information Reports in DeepCrawl

As well as providing issue reports, DeepCrawl also provides reports which can allow users to better understand their website:

 

Frequently Asked Questions

How can the Google Search Console data be extracted?

The Google Search Console data can be extracted by visiting the Pages in Search Console report.

All the Google Search Console analytics data crawled by DeepCrawl can be exported to a CSV with metadata and other on-page SEO signals.

To visit the report in DeepCrawl simple input "Pages in Search Console" in the top left search bar.

Pages in Search Console

How can the Google Search Console data be viewed in DeepCrawl?

The Google Search Console data can be viewed either in the Pages in Search Console report or by visiting the page report in DeepCrawl.

To visit the single page report, click on any URL in a report to be taken to its page report.

How does Google Search Console work with DeepCrawl?

The integration works as follows:

1. A client connects to their Google account in DeepCrawl.

2. A user then chooses the Google Search Console profile they want to use in a crawl.

3. DeepCrawl sends a request to the Google Search Console API.

4. Google Search Console API accepts the DeepCrawl request.

5. DeepCrawl uses the protocol and domain in Google Search Console property settings selected when fetching URL data, unlike Google Analytics which uses DeepCrawl primary domain.

6. DeepCrawl crawls URLs found in the Google Search Console data.

7. DeepCrawl saves certain metrics for each URL found in Google Search Console (see below).

8. All Google Search Console data is saved and pulled into a clients crawl and reported on.

Unless filtered DeepCrawl will pull in all available URL data from the Search Console API. It is not limited by the 1000 row limit in the Google Search Console UI.

How does DeepCrawl choose the protocol and hostname for the URLs in Google Search Console?

DeepCrawl uses the property setting domain in Google Search Console. For example, if you select https://example.com/ as a source in the Google Search Console settings in step 2, then DeepCrawl will use https://example.com/ as the primary domain when fetching URL data from the API.

Why isn’t Google Search Console data matching with my crawl data?

The most common reason Google Search Console can’t be compared with crawl data is that the domain property setting in GSC does not match up with the primary domain or secondary domain in DeepCrawl.

For example:

Our team recommends carefully checking the domains selected when running a crawl (because you’ll have to run another crawl to get the data!)

Does DeepCrawl fetch URL data for a domain property which is not the primary domain?

Yes, DeepCrawl will fetch, parse and process URLs from selected domain properties found in the Google Search Console source settings.

For example if the following settings are used:

Then DeepCrawl will fetch https://example.com/ and https://marketing.example.com/ URLs from the Google Search Console API. The https://marketing.example.com/ URLs can be seen in the Pages in Search Console report. However, unless the subdomain is found in a web crawl, the data won’t appear in certain reports and may show false positives (orphaned pages etc.).

The Google Search Console property I have selected is not showing up?

This is usually due to the Google Account which has been connected. If the Google Search Console property setting has not been set up in the connected Google account then it will not appear in the list.

Does DeepCrawl show the keywords that drove impressions/clicks in any reports?

No, our reports do not display keywords in our reports, only pages. Also, DeepCrawl groups data by the page dimension not at a query dimension when fetching data from the Search Console API.

For more information around the page dimension please read the following official Google help documentation.

The pages fetched from the API can still be filtered based on specific keywords such as brand or non-brand keywords. A user can do this in the project setup under the “Configure Google Search Console” settings.

Does DeepCrawl support the new Search Console Domain properties verification?

Google recently updated the domain property types within Google Search Console and introduced ‘domain properties’ to include all subdomains and protocols of one domain. This will allow users to verify and collect data for a whole domain in one report and uses the DNS method of verification.

Following this update, at the current time, DeepCrawl only supports URL prefix properties which have been verified using historic methods such as the HTML tag, HTML file upload, Google Analytics and Google Tag Manager verification methods.

Does DeepCrawl connect to Bing, Yandex, or Baidu Search Console/Webmaster tools?

No, we only support Google Search Console at this time.

Is it possible to add multiple Google Accounts to DeepCrawl?

Yes. DeepCrawl allows multiple Google Accounts to be added. Also, any profiles in Google Console will be consolidated into one list in the Sources settings.

For example all the Google Search Console profiles under seo@gmail.com and marketing@gmail.com would be consolidated into one list in the Analytics Source settings in a project set up.

Any questions about Google Search Console and DeepCrawl?

If you have any further questions about Google Search Console and DeepCrawl don’t hesitate to get in touch.

Author

Adam Gent
Adam Gent

Search Engine Optimisation (SEO) professional with over 8 years’ experience in the search marketing industry. I have worked with a range of client campaigns over the years, from small and medium-sized enterprises to FTSE 100 global high-street brands.

Get the knowledge and inspiration you need to build a profitable business - straight to your inbox.

Subscribe today