Limiting the Size and Depth of a Crawl

You can restrict the overall size and depth of a crawl before you start or during your crawl.

This is useful to prevent a lot of URL credits being used unintentionally, or to run a discovery crawl, when you first start to crawl a website and don’t yet know the optimal settings.

Read more

Four Awesome Things You Can Do With Regex

Many of DeepCrawl’s features are centred around identifying and monitoring issues with site architecture. But the tool can also be used creatively to improve user experience, gather data about the structure of your site, and even make non-technical tasks such as seeking out text on your site easier and more reliable.

Read more

Modifying URLs And Stripping Parameters

You can make modifications to the URLs, as they are being crawled, using the ‘Remove URL Parameters’ and ‘URL Rewriting’ features in Advanced Settings, in step 4 of the crawl setup.

These features are useful to undertake tasks such as removing URL components that are complicating analysis of your website or to rewrite URLs to an external website, such as lookup service e.g. retrieving information from an API for a set of your page URLs.

Read more

Restricting a Crawl to Certain Pages

You may want to check or analyze a specific section of your website, instead of crawling your full site.

This can be useful after a new website channel addition, to filter out script based URLs and subdomains, or to ensure your URL credits are used for specific sections of a website.

It’s also handy for international websites, where you may only want to analyze a specific country.

You can restrict a crawl to any set of pages, using a mixture of inclusion and exclusion rules in the Advanced Settings.

Read more

How to Fix your Failed Website Crawls

Sometimes, when running a crawl on a site (or a section of a site), you may find that it isn’t progressing past the first level of URLs. When this happens, only the base domain, or “start URLs” are actually crawled.

This problem has several possible causes, and various ways in which you can rectify the issue.

Read more

Comparing A Test Website To A Live Website

You can crawl your site staging or test environment and compare it to your live website to see how they are different.

This can help you test a version of your website or part of your website before you release it to the live environment and check new site wide additions such as canonical tags, social tags or page pagination implementation etc.

Read more

Crawling Multiple Domains Or Subdomains

It’s possible to configure exactly which domains and subdomains will be included in your crawl.

This is useful if you wish to filter in/out known subdomains or if you only want to crawl a very specific area of your website.

Read more

Using Custom Extraction

You can extract specific information and data from any web pages by running a custom extraction with DeepCrawl. This can be useful if you need to check your analytics or social tagging or for extracting backlinks and product data.

Read more