Published on 30th November 2021 by Natalie Stubbs

How to find 404 errors on your website — and what to do with them

 

Blog banner - Finding 404 errors on your website and SEO implications for 4xx HTTP status codes


It goes without saying that linking to broken pages on your domain is not recommended. Not only are broken links annoying for users, but they can also slow down search engine crawlers and contribute to issues like higher bounce rates, or less time spent by visitors on your site.

But 404 errors are not an inherently bad thing. In fact, returning a 404 error page when a user navigates to a URL that doesn’t exist is often considered best practice. Google has stated many times that 404s alone don’t harm your overall website’s indexing or ranking. So what impact are 404s actually having, and is there any real benefit to cleaning up 404 errors on your site?

 


This post is part of Deepcrawl’s series on Website Health. We are diving deep into each of the 7 categories of the SEO Funnel to help SEOs and marketers learn more about the many elements of search engine optimization that contribute to a high-performing, healthy website. Here, we’re diving into HTTP status codes — specifically 4xx errors — and discussing how they affect your SEO.

Website health - an SEO blog series on HTTP status code availability


 
 

What are 404 errors? (404 HTTP status codes explained)

Every time a user or search engine attempts to access your URL, an HTTP request is made and your server sends out an HTTP status code that is used to indicate whether the request to access a page was successful.

 

HTTP status codes fall into one of five categories, which determines how they’re treated by Google:

 

Returning a 404 error code signals that a page has not been found. Perhaps the content on that URL has been removed, or perhaps there was never anything there to begin with. All the web browser knows is that the requested content cannot be located at that address.

 
 

How to find 404 errors on your website

Finding 404 errors using Deepcrawl 

Using Deepcrawl, finding pages that return 404 status codes is as simple as navigating to the “All Pages” report and filtering by “HTTP Status Code > Equals > 404”. You can also use the “Broken Pages” report for a full list of 4xx errors (note that this includes other 4xx responses as well as 404s, such as 403s and 401s. Again, you can use filtering here to remove those from the list).

An overview of all non-200 pages and their status codes is also located in the main dashboard. Simply click the bar next to “Broken Pages (4xx Errors)” to be directed to the relevant report.

Example of a Non-200 status code page report on Deepcrawl's Analytics Hub
Alongside reports that identify all of your site’s 404 errors, there’s the option to filter this down further by source. For instance, finding all of the 404 pages that have backlinks pointing to them is as simple as navigating to the “Broken Pages with Backlinks” report. Or, if you wanted to see all of the 404 pages being linked to internally, the “Broken Links” report is the one to use.

Our “Unique Broken Links” report can also be useful when prioritizing URLs that need urgent attention. Here you’ll find all of the broken pages that are linked to internally on your site, handily sorted by URL and anchor text. It’s an easy way to see which 404 pages are linked to most commonly, and from where.

 

Finding 404 errors with Google Search Console

If you’re not using Deepcrawl, then Google Search Console is a good starting place for finding URLs that return a 404 error code. The Coverage report in GSC contains a list of URLs that have been submitted to Google and returned a 404 status code when they were last crawled. 

 

A note on “soft 404s”

If you’re using GSC to locate 404 error pages, you might also notice a report on something called a “soft 404”. Soft 404s are pages that tell a user that a page does not exist, but still return a valid 200 response code. A soft 404 is an indication that Google has found no content of value on that page, or is otherwise struggling to make sense of why the page exists.

Soft 404s are different from standard 404s in that they’re not truly returning a broken page response code. However, the label of a soft 404 can be enough for Google to drop a page from its index

Google often sees empty pages as soft 404s. If you’re using Deepcrawl, try setting up a custom extraction for pages with a word count below a certain threshold. If you see URLs of any value appearing here or in GSC’s soft 404 report, it’s worth taking the time to review these separately and make any relevant on-page improvements.

 

Finding 404 errors with Google Analytics

Google Analytics doesn’t provide a specified report for 404 pages. However, it’s possible to find them if you know the standard page title given to 404 pages on your domain. Simply head to “Behavior > Site Content > All Pages” and set the primary dimension to “Page Title”. From there, you should be able to filter results by entering the 404 page title into the search box.

 
 

When 404s become a problem for SEO

Once you’ve identified your 404 pages, it’s time to determine whether or not they need fixing. 

As mentioned, having pages that return a 404 error isn’t necessarily cause for concern. Diagnosing whether or not a broken page needs fixing is more about understanding how and when users might encounter that page.

Returning a 404 error page when you’re certain that a URL should not exist on your site is widely accepted. Google’s own documentation confirms that having some 404 errors alone will not harm your site’s search performance. However, there are some instances where 404s may require some extra attention, including:

 

When should a page be a 404?

No two sites are exactly the same, but there are some general rules to follow when deciding which action to take around removed or relocated pages.

 

If the page should still exist…

While a natural part of the web, 404 errors can still occur where they’re not supposed to. Restore any content that’s been accidentally removed and wait for the page to be re-indexed by search engines.

 

If the page has been temporarily removed…

A 404 status code isn’t the recommended course of action for a page that’s only been removed temporarily. A 302 redirect is a better choice. Consider further steps like removing internal links while the 302 is in action, then restoring them when the content is reinstated. This gives search engines the best chance of finding those pages quickly.

 

If the page has been removed but still has value…

A page that no longer exists and has no replacement can usually be allowed to 404. Returning this status code generally results in Google slowing down its crawling of the page, until eventually it gets dropped from the index altogether (this usually takes about a month).

Even if a page is gone for good, however, there are some extra considerations to be aware of: 

 

If the page has been permanently removed and has no link value…

Pages that have permanently been removed and have no link value can be given a 410 status code. This indicates that the page has gone completely and has been intentionally removed. Google currently views 404 and 410 pages in the same way, but a 410 is a good option if you know for certain that the content will not be reinstated. 

 
 

Handling valid 404 pages

We’ve discussed all the reasons 404 errors are a natural, and often helpful, part of the web. You could therefore be forgiven for thinking no further action is necessary when a page is left to 404, but that’s not strictly the case. 

404 pages that occur naturally should return a proper 404 HTTP response code. They should also not be blocked via robots.txt, as this can make it harder for Google to understand how you want the page to be treated.

You may also need to work on refining your 404 error page, ensuring that it’s user-friendly and informative.

 

What makes a good 404 error page?

Hitting upon a 404 response code can be frustrating for users. As webmasters, it’s our job to ease that frustration and direct users to the content they’re looking for (or at least the next best thing). That responsibility falls to your 404 error page, so it’s worthwhile to spend some time getting it right.

There’s no hard and fast rule as to what constitutes an effective 404 error page. Often, it depends on the type of site and the nature of the content that’s being searched for. 

However, there are some steadfast recommendations for 404 pages that all webmasters should follow:

Exactly how you meet these recommendations is up to you. Some sites bring the wow factor with stunning visuals, while others use humor in a bid to keep users engaged. 

 

The best approach is to view the page through the eyes of a brand new user. If you were landing on the page for the first time, would you know how to get back on track? Is it clear that the page you’ve requested has not been found, but that there could be other content of interest on the same domain? If the answer is anything but a clear and resounding “yes,” it’s time to improve that 404 error page.

 

 

Author

Natalie Stubbs
Natalie Stubbs

Natalie is an Associate Technical SEO at Deepcrawl and forms part of our Professional Services team. A fan of all things content-related, she has a passion for helping clients improve their technical SEO by making complex concepts more accessible. Outside of work, you'll usually find her spending quality time with her cat.

 

Tags

Choose a better way to grow

With tools that will help you realize your website’s true potential, and support to help you get there, growing your enterprise business online has never been so simple.

Book a Demo