BigQuery Connector

Alec Bertram
Alec Bertram

On 8th December 2020 • 14 min read

Being able to measure and analyze the impact of technical SEO improvements against wider business metrics has never been so important.

SEOs today need to prove their work leads to greater revenue and business improvement in order to fully understand the value of their work and to build a stronger business case for increasing budgets, acquiring new tools, and having a bigger stake in decision-making.

DeepCrawl has now added a BigQuery connector so you can connect your crawl data to all your other data sources together to do just this.
 

Why use the BigQuery connector?

Our BigQuery connector will allow you to easily access all your crawl data and integrate it with many tools that you use in your day-to-day work.

Historically, the only way for users to get all URL data about a crawl was to generate a CSV, download it, decompress it, and parse it. As our CSVs are not made for programmatic use, any update we do to the CSV data or format tends to break user-built integrations. 

BigQuery availability gives users access to all data with a powerful SQL querying interface. If users are able to install the BigQuery ODBC driver in their environment, they will be able to query BigQuery as if it was any other SQL database. Developers can also utilize Google SDKs to quickly test/deploy apps without writing much code

Many businesses will not load their commercially sensitive data into DeepCrawl (i.e. traffic, sales, etc). If they have their business data in a BQ (or any SQL db), they should be able to merge their business data with DC data in BQ and analyze it for problems there.

When BigQuery is enabled for a project, we will automatically import all new crawls to a dataset that is shared with you.

Data format

Each DeepCrawl project is a BigQuery dataset which uses its ID as its name (such as project_123456), each crawl is a table inside this dataset – each table is named crawl_urls_[DATE]-[crawl ID] such as ‘crawl_urls_20201225_654321.

The crawl data itself contains all of the data that we know about each URL in the crawl. 
 

DeepCrawl SQL Cheat Sheet

We’ve structured our data in a way that will allow you to get all of the insights that you need. 

Rather than manually selecting a crawl, you can use the below queries to logically choose the table you want to work with (such as “latest crawl”, “all crawls from the last 30 days”). The connector comes with some default data views which internally use these queries, so in some cases, it is easier to use those.

View the SQL Cheat Sheet
 

Data Projects & Billing Projects 

Unless you request otherwise, your data will be saved into the BigQuery project “deepcrawl-bi” and shared with you.

While you will have permission to access this data, this project does not come with any querying capabilities. The querying capability will come from a separate Billing Project that will be provided to you (this is specific to your organization, so you should not share it). Therefore, when using data in the deepcrawl-bi project with any tool, you must specify the separate Billing project. 

If you require DeepCrawl data to be saved to your own BigQuery project or within your own GVPC, please discuss this with your CSM. 
 

Permissions

Data is shared with you on a project-by-project basis. This means that you may share a project with a single team member and that person will not have access to any of your other BigQuery data.
 

Compatible systems

Below are some of the popular systems that natively integrate with Google BigQuery. Other systems may be compatible with BigQuery (or compatible with our other BI integrations) – please ask your CSM if you do not see your favorite tool.
 

Business Intelligence Tools

Looker

DeepCrawl connects to Looker via our BigQuery integration – when a crawl finishes, it will be loaded into BigQuery, and data will become available in Looker. 

See: Looker docs: BigQuery Integration

Tableau

DeepCrawl connects to Tableau via our BigQuery integration – when a crawl finishes, it will be loaded into BigQuery, and data will become available in Tableau. 

See also: Tableau docs: Google BigQuery

Create a datasource

When creating a datasource, find Google BigQuery under “Connectors”. After authorizing Tableau to access your BigQuery data, you will be presented with a list of your BigQuery projects – you’ll need to select both the Billing Project and the Project that you were assigned. 

The Dataset is your DeepCrawl project, and the number corresponds to the ID of your project in our system. To find your project ID, please see our BigQuery cheat sheet.

After choosing your project, you will be presented with all of the crawls currently in BigQuery. Choosing any of the tables will give you access to the data in that crawl, or you can use one of the preconfigured views (such as latest_crawl) to always have access to the latest data.

PowerBI

DeepCrawl connects to PowerBI via our BigQuery integration – when a crawl finishes, it will be loaded into BigQuery, and data will become available to PowerBI. 

See also: Microsoft docs: Connect to a Google BigQuery database in Power BI Desktop

Find Google BigQuery in the Get Data wizard and sign in to your Google account.

Find the project that you wish to import, tick a table, and you’re done.

Google Data Studio

DeepCrawl has two ways to connect to Google Data Studio: 

DeepCrawl connects raw URL-level data to DataStudio via our BigQuery integration. When a crawl finishes, it will be loaded into BigQuery, and data will become available in DataStudio. 

To use BigQuery in Data Studio, you must select the correct billing project as below:

  1. Open the Data Studio BigQuery Connector
  2. Choose “Shared Projects”
  3. Select your billing project in the first column.
  4. Type “deepcrawl-bi” in the “Shared project name” input
  5. Your projects will appear in the Dataset area


 

Data Hubs and Data Lakes

Google BigQuery

Our integration with Google BigQuery will automatically send your DeepCrawl data to BigQuery when a crawl finishes. 

There is no additional setup to do – your CSM will share your BigQuery project with you, and you can access data in BigQuery or using a Google SDK.

Azure

DeepCrawl data can be loaded into Azure via our Google BigQuery integration. When a crawl finishes, it will immediately be loaded into BigQuery and can be accessed with Azure Data Factory.

Please see Microsoft’s official guide for more information: Azure Docs: Copy data from Google BigQuery by using Azure Data Factory

Tealium

DeepCrawl connects to Tealium via our BigQuery integration – when a crawl finishes, it will be loaded into BigQuery, and data will become available in Tealium. 

Tealium advertises that they can connect to Google BigQuery, but does not provide documentation for doing so. You will need to contact your Tealium CSM for information about importing DeepCrawl data into Tealium via BigQuery
 

Data Science

Jupyter 

DeepCrawl connects to Jupyter via our BigQuery integration – when a crawl finishes, it will be loaded into BigQuery, and data will become available in Jupyter. 

There are many guides available about connecting BigQuery to Jupyter, see:

Python Pandas

DeepCrawl connects to Pandas via our BigQuery integration – when a crawl finishes, it will be loaded into BigQuery, and data will become available in Pandas. Find out more about the DeepCrawl BigQuery integration. 

See: Downloading BigQuery data to pandas using the BigQuery Storage API

Apache Zeppelin

DeepCrawl connects to Zeppelin via our BigQuery integration – when a crawl finishes, it will be loaded into BigQuery, and data will become available in Apache Zeppelin. Find out more about the DeepCrawl BigQuery integration. 

See: BigQuery Interpreter for Apache Zeppelin

Dataiku

DeepCrawl connects to Dataiku via our BigQuery integration – when a crawl finishes, it will be loaded into BigQuery, and data will become available in Dataiku. Find out more about the DeepCrawl BigQuery integration. 

See also: Dataiku docs: Google BigQuery

You will need to first create a Service Account for your Google account:

  1. Find the Service Accounts page under “IAM and Admin” menu in Google Cloud Console
  2. Choose “Create Service Account” and give this service account a relevant name, such as “dataiku-[yourname]”
  3. When granting access to the project, it is recommended that you give the Service account only the minimum amount of permissions that it will need. For our purposes, “BigQuery User” may be appropriate.
  4. On step 3, you do not need to grant access to further users
  5. After the Service Account has been created, find Create Key in its hamburger menu, and create a JSON key. A file will now be downloaded to your computer

Dataiku natively supports BigQuery in its paid version. If it does not work, you may need to install the Google BigQuery JDBC driver as described on the Dataiku Docs

Visit the Connections screen in Dataiku and create a new Google BigQuery connection

Open the Service Account file that you downloaded with Notepad or another text editor, and use its contents to populate the below fields as follows:

You should now be able to test and create the connection. BigQuery datasets should now be able to be created.
 

How can I get started?

If you want to get started with DeepCrawl’s BigQuery connector, reach out to your Customer Success Manager today.
 

Summary

As always please share any feedback as this helps to improve DeepCrawl. If you have any questions about this new feature, please feel free to get in touch with our team.

 

Author

Alec Bertram
Alec Bertram

Alec is the Head of Product at DeepCrawl. He has 10 years' of experience in SEO, and works to make sure we're building the most valuable things we can for our users.

Choose a better way to grow

With tools that will help you realize your website’s true potential, and support to help you get there, growing your enterprise business online has never been so simple.

Book a Demo