I’ve been putting off the move to DeepCrawl 2.0 from 1.9 for a while, but I’ve finally taken the plunge, and have to admit: I should have done it a long time ago. The new version is not only an upgrade, it’s miles ahead of what I was getting out of 1.9, and it’s already helped me find opportunities for site improvements which I was missing before. In case you’re not familiar with DeepCrawl, here are some of the highlights:
- DeepCrawl 2.0 is cloud-based, it crawls large sites without taking up huge amounts of memory, like a desktop application.
- It plays nicely with Google Analytics, an integration that allows for far deeper insights than a typical website crawler.
- The new DeepCrawl 2.0 has an intuitive user interface that smoothes out the learning curve (something that was missing for me in 1.9).
What’s New in DeepCrawl 2.0
It seemed like I had just started to get the grasp of DeepCrawl 1.9 when 2.0 became available, and after all of the effort I’d put into the old version, I was skeptical about the new version. The user interface in version 1.9 was tricky to navigate at times, for example remembering where certain reports were nested in the tool. I saw a difference immediately once I switched to 2.0 – the new interface is much cleaner and more intuitive.
DeepCrawl 2.0 is reminiscent of a WordPress layout, something that a WP junky like myself thoroughly appreciates. The left navigation bar makes navigating around the tool faster and simpler, making the learning curve jumping from 1.9 to 2.0 exceptionally smooth. Within no time, I was able to pull off high quality audits and take advantage of the new features 2.0 provides. Yes, in case you were thinking the new version is just a slick new bod, there are definitely some great new features under the hood as well.
Chasing Down Non-Indexable Pages
A tool that tells you half of your web pages are non-indexable has only really done half the job. DeepCrawl 2.0 runs the rest of the way home with a report that lets you unpack the reasons for the non-indexable URLs. The tool gives you enough data to determine if the pages aren’t being indexed because of canonicalization or they have a robot directive like no-index, nofollow or disallow.
DeepCrawl also gives you another weapon for your SEO arsenal: amongst increasingly content-centric search engine algorithm updates like Google Panda, the “Min Content/HTML Ratio” provides an extremely valuable set of data. It will show you which of your pages have a relatively low volume of content, so that you can beef them up with valuable content (just remember – it needs to genuinely benefit your users!). Using the DeepRank and Level metrics, to prioritize the pages identified, it’s useful to assess their level – clicks from the home page – and their PageRank – measurement of authority calculated similarly to Google’s PageRank.
Uncovering Broken Pages Driving Traffic
Repairing broken pages that are driving people to your site is always a quick win, and DeepCrawl 2.0 makes it even easier. Before the page slips from the search index radar or is deleted by the web admin and to prevent continued poor user experience from the traffic driving to the URL, DeepCrawl lets you find and fix them. It even features a “share link” function that makes it easy to share this reports (or any for that matter) with developers and other stakeholders.
Locating Orphaned Pages
Similarly to the broken pages driving traffic report, DeepCrawl leverages data from Google Analytics to bring to light pages that are generating organic traffic through Google but haven’t been internally linked. Once identified, you can score an easy win by linking to these pages internally.
Crawl Source Gap Analysis
You can perform a killer gap analysis by incorporating as many as five sources in any one crawl, making it a simple one-stop-shop for easy linking opportunities. For example, you can run a crawl of your website, sitemap(s), analytics data, backlinks and a list of URLs.
I recommend setting the crawl up to run a check on your sitemaps, analytics and backlinks to dig deep and get a thorough gap analysis on your site. When setting up a crawl which includes Google Analytics, you have two options: a quick sync with your account to get between 7 and 30 days of data or a deeper historical look with a manual upload of up to six months of data from GA, Omniture or another analytics tool.
Scraping and Custom Extractions
Most crawler tools will allow you to find custom values, but DeepCrawl does them one better by providing you with 20 kinds of custom extractions. Using these, you can find exactly what you’re looking for in data not included within the default crawl report and double DeepCrawl as a scraping tool. The data you can pinpoint includes analytics information like Google Analytics, WebTrends, Omniture, Nielson, etc. as well as tracking code, missing ALT tags, schema.org markup and other classes of rich snippets.
Massive Data Exports
Amongst all of the awesome features of DeepCrawl 2.0, at the top of my list are the huge data sets the tool is capable of exporting. Looking at all of that data can be overwhelming, but if you set up some filters you can mine some truly valuable information from it within the tool – there are more than 120 filtering options, which can be saved as tasks to make your own customized crawl reports. Or, using Microsoft Excel or, my favorite, R Studio. All exports from DeepCrawl can be downloaded as Excel files as well as PDFs, PNGs and other common file types, and for me, using R Studio makes its simple to work through hundreds of thousands, or even millions, of URLs.
Here’s how to set it up in R Studio:
- Create a function (in this case, called “clientsiteurls”) as a date frame of your DeepCrawl export.
- Create another function based on a subset that filters for rows with a certain number of reported visits in Google Analytics but low internal linking on your site.
seoopps<-subset(clientsiteurls, ga_visits >= 1000 & deeprank<=5)
- View your new “siteopps”date frame in a rows and columns format
- Write your new, filtered data frame to CSV file named “clientseowins.csv”
write.csv(siteopps, file = “clientseowins.csv”)
This doesn’t cover every feature of DeepCrawl 2.0, but it does give you a good idea of the vast and valuable insights you can learn from the tool.
What do You think of DeepCrawl 2.0?