How Secure is my data?
All data is stored using Amazon Web Services which has been architected to be one of the most secure cloud computing environments available.The crawl data is stored in a database on EC2 servers until the crawl is archived or deleted. The report data and backups are archived in S3.We use a VPN and security groups to prevent unauthorized access to the data.
Can I Run Multiple Crawls at the Same Time?
Yes, you can run up to 10 crawls simultaneously.
Will DeepCrawl Activity affect the stats in my Analytics Package?
Some older analytics packages use log file data stored on the web server. This data can be affected by any crawling activity, including Google or Bing and therefore DeepCrawl too.
Do I need to add tracking tags or authenticate a site before I can use DeepCrawl?
No, you don’t need to do anything under normal circumstances to crawl a public website. No additional tracking tags or authentication processes are normally required.
It’s important to advise your IT manager or person responsible for hosting your website to avoid your crawl being blocked.
If you want to crawl a private website on your own network e.g. a test website or staging environment, then you will need to allow access to DeepCrawl by identifying the user agent or IP address and allowing access to your network, and/or any basic authentication or DNS configurations needed.
How is DeepCrawl Different from Other Services?
DeepCrawl has been designed by experienced SEOs and used extensively in the field to solve real problems.
The level of detail available is more extensive than most other crawlers and the data is presented in a more digestible, actionable format.
Because DeepCrawl is run as a cloud-based service, the size of crawls that can be run is much larger compared to software based crawlers which run on your local computer. They’re also not impacted or affected in any way by the power of your local machine, or other processes your local machine is running.
There is a very high level of customization and control available for more experienced users, allowing crawls to be tailored to suit a specific project.
What does DeepCrawl do?
DeepCrawl is a cloud-based web crawler that you control.
You can set it to crawl your website, staging environment, external sites, analytics data, backlinks, sitemaps and URL lists, with a host of flexible crawl types.
DeepCrawl helps you analyze your website architecture and understand and monitor technical issues, to improve your SEO performance.
You can use DeepCrawl for:
- Technical Auditing
- Site Redevelopment/Migrations
- Website Change Management
- Link Auditing
- Competitor Intelligence
- Landing Page Analysis
- Website Architecture Optimization
- Website Development Testing
- Competitor Analysis
Pricing & Payments
What are Active Projects?
Active projects are those which have had a crawl run in the current billing period. If you have hit your limit, you will only be able to run crawls on the projects which are active, until the next billing period.
If you need to increase this, you can purchase more Active Projects for your account with add-ons.
The number of inactive projects you can have in your account is unlimited, which means you don’t have to worry about deleting anything.
Is there a limit on the number of websites I can crawl?
We do not limit the number of different domains you can crawl, but we do have a limit on the number of ‘Active’ projects in your account, depending on your package.
Active projects are those which have had a crawl run in the current billing period.
Can I access my reports if I cancel?
Your data will be available until the account expires. To continue using your data, please export it before your account expires.
How do I change my credit or debit card details?
If you are paying via PayPal, log in to your PayPal account, click on MyPayPal, select Wallet, and from there you can choose the card details you wish to amend.
If you are paying directly via credit or debit card, you can contact your bank and amend the card details for your direct debit or standing order from there. Alternatively, you can change your details from within the platform via your Subscription area, or email firstname.lastname@example.org and we’ll do it for you.
How do I cancel my monthly plan?
If you pay via PayPal, the simplest way to cancel your existing subscription is via these instructions in your PayPal account.
If you pay via credit or debit card, most high street banks allow you to cancel direct debits via your online banking. Alternatively, call your bank or drop us an email at email@example.com and we’ll cancel it from our side.
Your remaining credit allocation will be available until the expiry date. We must warn you that you will only have access to the DeepCrawl interface until this date. If you wish to continue to use your data moving forward, please use the export functions before your account expires.
If you have any trouble with this please contact your Account Manager or email our Customer Success team at firstname.lastname@example.org.
I’ve used my monthly allowance. How do I buy extra credits?
Log into your account, go to Subscription and click ‘Buy Credits’ under the Credits section. These add-on credits will last for 30 days from the date of purchase.
How do I reactivate an old account?
Simply log in to your account and go to your Subscription area. Then click on ‘Reactivate’, located next to your latest package icon.
How do I downgrade my monthly plan?
The simplest way to do this is to go to Subscription and click the ‘Downgrade’ button under the Credits section. If you have any trouble with this please contact your Account Manager or email our Customer Success team at email@example.com.
How do I upgrade my monthly plan?
The simplest way to do this is to go to Subscription and click the ‘Upgrade’ button under the Credits section. If you have any trouble with this please contact your Account Manager or email our Customer Success team at firstname.lastname@example.org.
Where can I find my invoices?
All payment related actions can be found under Subscription within the application. To find your invoices, click on ‘Payment Details & Invoices’ button on the subscription screen.
Can I pay via invoice?
Please contact us on email@example.com and specify the package that you are interested in purchasing.
Which currency can I pay in?
You can pay in US Dollars, Euros or British Pounds. Select your preferred currency at the top right hand side of the pricing page, then select the package you want, click buy and follow the steps. You will then be billed in the currency of your choice.
How long are Add-on credits valid for?
The Add-on credits are valid for 1 month from the date of purchase.
Is there a minimum contract term?
Our Starter and Consultant packages include no minimum contract term, so you can pay on a month-by-month basis. However, with our Corporate package there is a minimum commitment of 12 months.
Do I have to buy a monthly plan or can I Pay As You Go?
Our plans are available on a monthly basis. You can also purchase a one-off add-on for your monthly plan in-platform should you run out of credits and need more to complete a project, as well as purchase recurring project and/or URL add-ons.
What technology does DeepCrawl use?
The service is run entirely within the Amazon Web Services cloud computing platform.
Do you have whitelabel options?
Yes, the interface can be white-labeled with your own logo.
Do you have an API?
Yes, the DeepCrawl API is available for all users.
The API key and instructions are available API Access.
Usage is under fair usage policy, but if you have very specific requirements, feel free to run them by us – firstname.lastname@example.org
You can find our current API documentation here:
What are the limits for custom extractions?
Each project can have up to 30 separate custom extractions, with up to 20 matches and 64KB of data per extraction.
What is the maximum file size for URL lists, XML Sitemaps, analytics data or backlinks data uploads?
We accept file uploads of up to 100MB for your URL lists, XML Sitemaps, analytics data and backlinks data.
How many credits does the average Universal Crawl consume?
This depends entirely on the website being crawled, and whether any crawl limitations or restrictions have been applied in the Advanced Settings.
Can I pause a live crawl?
You can manually pause a crawl at any point during the ‘Crawling’ phase. This can then be resumed at a later time, but will automatically finalize after 72 hours.
A crawl will pause automatically under certain circumstances (i.e. the necessary options have been selected pre-crawl) if it reaches the set limit or runs out of credits before reaching the limit. In any case, the crawl will remain paused for 72 hours before finalizing automatically.
You can also alter the crawl speed, depth and URL limit of the crawl without needing to pause at all.
In the test site section, does the test site authentication work with IIS?
IIS can support basic authentication and normally works.
Configure Basic Authentication (IIS 7)
Other types of password solution using cookies may be implemented, and these won’t work with DeepCrawl as we do not store cookies.
Yes, DeepCrawl detects and crawls CSS and JS files to check the HTTP status, and reports on broken or disallowed files.
You can change this setting in Advanced Settings > Scope > Crawl Restrictions.
Does DeepCrawl crawl and report on PDF documents for download on my site?
PDF documents are detected if they are linked internally and reported in a list.
If you implement the ‘Check Non-HTML File types’ setting in Advanced Settings, DeepCrawl will check the HTTP status of these links.
Does DeepCrawl detect image Alt tags on my site?
DeepCrawl currently looks at the alt text for linked images, which is displayed in the internal linking data reports.
It is possible to use custom extractions in the advanced settings, to identify empty alt tags on unlinked images.
Does DeepCrawl detect H1/H2 etc tags on my site?
DeepCrawl detects and extracts H1, H2 and H3 tags by default and creates reports on multiple H1s and missing H1s.
DeepCrawl does not detect H4, H5 etc. However, it can be done using DeepCrawl custom extraction. Check out our custom extraction guide to find out how to do it.
Can I get DeepCrawl to obey or ignore my robots.txt file when it crawls my site?
DeepCrawl will obey the robots.txt live on your site, based on the user agent you have selected for the crawl.
You can also use the DeepCrawl Robots Overwrite feature to ignore your current robots.txt file during a crawl, and use the alternative version you have specified.
If DeepCrawl is specifically disallowed in a robots.txt file then we will always respect this (a stealth crawl may allow you to run a successful crawl of the site in this case).
Will DeepCrawl slow down my site when it’s crawling?
Most sites never experience a site slow down whilst using DeepCrawl.
Sometimes sites can experience a slow down if their server capacity is not able to handle user demand or there is an increase in user demand with DeepCrawl running at the same time.
If this is the case, you can control the maximum speed of the crawler to prevent any site performance slow down. You can also optimize your crawl activity further, by increasing your crawl rate during known quiet periods e.g. 1am-5am.
This can all be set with the Crawl Rate restriction settings.
Can I set my crawl to run at certain times or automatically?
With DeepCrawl, you can set your crawl to run at certain times, at certain speeds (URLs per second) and even set up schedules for your crawls e.g. weekly, daily, constant (24 hours) & more.
This can all be set under phase 3 of crawl setup > Crawl Rate Restrictions.
For example, you may want to only run your crawls within a 1am – 5am time window.
So you would restrict the crawl to 0 URLs per second from 5am until 1am, to ensure it would not run any URLs during that time and be restricted to only crawling URLs during the 1am-5am time slot, potentially to avoid your peak traffic hours.
You would select ‘Add Restriction’ and then select 5am to 1am with a crawl rate of 0.
When I use a mobile bot to crawl my website, what changes from a normal Googlebot crawl?
How can I tell if DeepCrawl is crawling my site?
DeepCrawl will always identify itself by including ‘DeepCrawl’ within the user agent string.
See above for a comprehensive list of user agent strings.
What IP address will DeepCrawl requests come from?
By default, requests from the DeepCrawl crawler come from the IP address 18.104.22.168
What user agent does DeepCrawl use to crawl?
DeepCrawl offers a wide range of user agents to use for a crawl including the most common search engines, desktop browsers and mobile devices.
You can also add your own custom user agents.
By default, we crawl as Googlebot, and can be identified by the following string:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) https://deepcrawl.com/bot
Here’s a comprehensive list covering available User Agents and their full strings:
Applebot: [“Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1) https://deepcrawl.com/bot”]
Baidu: [“Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html) https://deepcrawl.com/bot”]
Bingbot: [“Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) https://deepcrawl.com/bot”]
Bingbot Mobile: [“Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 BingPreview/1.0b https://deepcrawl.com/bot”]
Chrome: [“Mozilla/5.0 (Windows NT 6.0; WOW64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.16 Safari/534.24 https://deepcrawl.com/bot”]
Chrome Mobile: [“Mozilla/5.0 (Linux; Android 7.0; SM-G892A Build/NRD90M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/60.0.3112.107 Mobile Safari/537.36 https://deepcrawl.com/bot”]
DeepCrawl: [“deepcrawl https://deepcrawl.com/bot”]
Facebook: [“facebookexternalhit/1.1 https://deepcrawl.com/bot”]
Firefox: [“Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:22.214.171.124) Gecko/20091221 Firefox/3.5.7 https://deepcrawl.com/bot”]
Google Web Preview: [“Mozilla/5.0 (en-us) AppleWebKit/525.13 (KHTML, like Gecko; Google Web Preview) Version/3.1 Safari/525.13 https://deepcrawl.com/bot”]
Googlebot: [“Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) https://deepcrawl.com/bot”]
Googlebot (legacy): [“Mozilla/5.0 (compatible; Googlebot/2.1; https://deepcrawl.com/bot)”]
Googlebot Smartphone: [“?Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) https://deepcrawl.com/bot”]
Googlebot-Image: [“Googlebot-Image/1.0 https://deepcrawl.com/bot”]
Googlebot-Mobile Feature phone: [“SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/126.96.36.199.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html) https://deepcrawl.com/bot”]
Googlebot-News: [“Googlebot-News https://deepcrawl.com/bot”]
Googlebot-Video: [“Googlebot-Video/1.0 https://deepcrawl.com/bot”]
Internet Explorer 6: [“Mozilla/5.0 (Windows; U; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727) https://deepcrawl.com/bot”]
Internet Explorer 8: [“Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0) https://deepcrawl.com/bot”]
Iphone: [“Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420+ (KHTML, like Gecko) Version/3.0 Mobile/1A543a Safari/419.3 https://deepcrawl.com/bot”]
iPhone X: [“Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1 https://deepcrawl.com/bot”]
Yandex: [“Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots) https://deepcrawl.com/bot”]
How does DeepCrawl handle international character encoding?
DeepCrawl can correctly handle characters in any language, including the content length calculations. URLs with non-Latin characters will be displayed in an unencoded format in the interface and downloads.
Why does my 5xx report show 503 errors that aren’t visible when I check them manually?
This can sometimes be caused by the default crawl speed of 3 URLs per second being too fast for your site servers. So pages are being recorded by DeepCrawl as a 503 error, but they render OK for the average user. In your Crawl Settings, you can reduce the max crawl speed to 1 URL per second to reduce the possibility of 503 errors being reported. This means it will take longer to complete your crawl, but reduces the possibility of your 5xx reports containing the 503 again.
What is the date range of the information shown in Google Analytics?
How is the Duplicate Body Content report different to the Duplicate Pages report?
The Duplicate Body content report shows URLs where DeepCrawl has looked at the text in the body of the page only, whereas results in the Duplicate Pages report are from DeepCrawl analysing the full page including HTML tags. The duplicate body content report can sometimes pick up pages which are similar but have different templates.
Why can I see pages in the Duplicate Pages report that aren’t duplicates?
Duplication is a subjective measure. We have tuned our algorithm to pick up very similar pages as well as identical pages because most people want to see these. Sometimes it picks up false positives which can be ignored. The duplication sensitivity settings can be adjusted in the Report Settings if you want to remove some of the similar pages.
How does DeepCrawl detect duplicate pages?
An exact duplicate page is the easiest to detect, but isn’t very useful, as it misses a lot of ‘similar’ pages.
The DeepCrawl algorithm is tuned to allow a small amount of variation. The algorithm finds pages that are almost identical. We ignore very small differences, because web pages often contain small pieces of dynamic content, such as dates.
We classify duplication within our algorithm as:
- Identical Title
- Close to identical Body Content
Duplicate Body Content
- Close to identical Body Content
- Identical Title
We report the most authoritative page (based on its DeepRank score) as a Primary Duplicate and list it under the Primary Pages section. The page(s) that DeepCrawl considers to be nearly identical (based on the above criteria) and hold less authority will be listed as the duplicates.
In addition, the following page types get excluded from Duplicate reports:
Very occasionally there are false positives, but in the majority of cases the algorithm correctly identifies duplicate pages. DeepCrawl is constantly being fine tuned, so please let us know if you experience a false positive and send us an example to email@example.com.
If you’d like to find out more about how to identify and handle duplicate pages, read our blog post on how URL duplication could be harming your website and how to stop it.
What is DeepRank?
DeepRank is a measurement of internal link weight calculated in a similar way to Google’s basic PageRank algorithm. DeepCrawl stores every internal link and starts by giving each link the same value. It then iterates through all the found links a number of times, to calculate the DeepRank for each page, which is the sum of all link values pointing to the page. With each iteration the values move towards their final value.
It is a signal of authority, and can help to indicate the most important URLs in the current report, or within the entire crawl.
How are issues and changes prioritized?
Every report is assigned a weight, to represent the importance of the issue and it’s potential impact. Reports are also given a sign, either positive, negative, or neutral. The list of issues is filtered to negative reports, and ordered by the number of items in the report, multiplied by the weight. This is why the issues are rarely displayed in numerical order. The changes are ordered by the number of added or removed issues in the report, multiplied by their weight.
How do you report changes in report contents?
In addition to calculating the URLs which are relevant to a report, we also calculate the changes in URLs between crawls. If a URL appears in a report and wasn’t in that report in the previous crawl, it will be included in the ‘Added report’. If the URL was included in the previous crawl, and is present in the current crawl, but is no longer in that specific report, then it is reported in the ‘Removed report’. If the URL was in the previous crawl, but is not included in any report in the current crawl, it is included in the ‘Missing report’ (e.g. the URL may have been unlinked since we last crawled, or may now fall outside of the scope of the crawl).
Do shared report links expire?
You can choose the expiration time-frame from a dropdown of options when sharing the report. These range from 24 hours to 6 months, with our default set as 1 month. Please bear in mind that the online reports are only available in the interface for the most recent crawl.
How long are the reports available?
At the moment, we keep crawl data archived for the lifespan of the client’s account.
What happens to my data when I cancel my account?
Upon account expiry, your account will fall dormant in case you wish to reactivate it at any time. Should you wish to have all of your data permanently deleted, you will need to request this specifically. Please speak to your Account Manager or the Customer Success team at firstname.lastname@example.org.
Does DeepCrawl back up reports and crawl data?
Crawl data, including all tables used to display reports is backed up in Amazon S3 storage, which is Write Once Read Many, and is therefore highly reliable. All user and account data is backed up every hour.
Can I view reports before a crawl is finished?
No. Reports are not available until a crawl has been finalised. This is because the majority of the calculations DeepCrawl performs, such as duplication detection and internal linking analysis, require a complete set of page URLs before they can begin.