It goes without saying that linking to broken pages on your domain is not recommended. Not only are broken links annoying for users, but they can also slow down search engine crawlers and contribute to issues like higher bounce rates, or less time spent by visitors on your site.</SPAN></P> <P><SPAN style="font-weight: 400;">But 404 errors are not an inherently bad thing. In fact, returning a 404 error page when a user navigates to a URL that </SPAN><I><SPAN style="font-weight: 400;">doesn’t exist</SPAN></I><SPAN style="font-weight: 400;"> is often considered best practice. Google has stated many times that 404s alone </SPAN><B>don’t </B><SPAN style="font-weight: 400;">harm your overall website’s indexing or ranking. So what impact are 404s actually having, and is there any real benefit to cleaning up 404 errors on your site?</SPAN></P> <P> </P> <HR/> <P>This post is part of Deepcrawl’s series on <STRONG>Website Health</STRONG>. We are diving deep into each of the <A href="https://www.deepcrawl.com/blog/best-practice/the-seo-revenue-funnel-framework-visualizing-the-path-to-organic-search-success/" target="_blank" rel="noopener">7 categories of the SEO Funnel</A> to help SEOs and marketers learn more about the many elements of search engine optimization that contribute to a high-performing, healthy website. Here, we’re diving into <STRONG>HTTP status codes</STRONG> — specifically 4xx errors — and discussing how they affect your SEO.</P> <P><A href="https://www.deepcrawl.com/blog/best-practice/the-seo-revenue-funnel-framework-visualizing-the-path-to-organic-search-success/" target="_blank" rel="noopener"><IMG src="https://www.deepcrawl.com/wp-content/uploads/2021/11/Blog-Website-Health-Topic-Availability.png" alt="Website health - an SEO blog series on HTTP status code availability"/></A></P> <HR/> <P> <BR/> </P> <H2>404 HTTP status codes explained</H2> <P>Every time a user or search engine attempts to access your URL, an HTTP request is made and your server sends out an HTTP status code that is used to indicate whether the request to access a page was successful.</P> <P> </P> <P><SPAN style="font-weight: 400;">HTTP status codes fall into one of five categories, which determines </SPAN><A href="https://developers.google.com/search/docs/advanced/crawling/http-network-errors" target="_blank" rel="noopener"><SPAN style="font-weight: 400;">how they’re treated by Google</SPAN></A><SPAN style="font-weight: 400;">:</SPAN></P> <UL> <LI style="font-weight: 400;" aria-level="1"><SPAN style="font-weight: 400;"><STRONG>1xx</STRONG> – Informational response</SPAN></LI> <LI style="font-weight: 400;" aria-level="1"><SPAN style="font-weight: 400;"><STRONG>2xx</STRONG> – Success</SPAN></LI> <LI style="font-weight: 400;" aria-level="1"><SPAN style="font-weight: 400;"><STRONG>3xx</STRONG> – Redirection</SPAN></LI> <LI style="font-weight: 400;" aria-level="1"><SPAN style="font-weight: 400;"><STRONG>4xx</STRONG> – Client errors</SPAN></LI> <LI style="font-weight: 400;" aria-level="1"><SPAN style="font-weight: 400;"><STRONG>5xx</STRONG> – Server errors</SPAN></LI> </UL> <P> </P> <P><SPAN style="font-weight: 400;">Returning a 404 error code signals that a page has not been found. Perhaps the content on that URL has been removed, or perhaps there was never anything there to begin with. All the web browser knows is that the requested content cannot be located at that address.</SPAN></P> <P> <BR/> </P> <H2><STRONG>How to find 404 errors on your website</STRONG></H2> <H3><B><I>Finding 404 errors using Deepcrawl </I></B></H3> <P><SPAN style="font-weight: 400;">Using Deepcrawl, finding pages that return 404 status codes is as simple as navigating to the “</SPAN><I><SPAN style="font-weight: 400;">All Pages</SPAN></I><SPAN style="font-weight: 400;">” report and filtering by “HTTP Status Code > Equals > 404”. You can also use the “</SPAN><I><SPAN style="font-weight: 400;">Broken Pages”</SPAN></I><SPAN style="font-weight: 400;"> report for a full list of <A href="https://www.deepcrawl.com/knowledge/hangout-library/4xx-errors/" target="_blank" rel="noopener">4xx errors</A> (note that this includes other 4xx responses as well as 404s, such as 403s and 401s. Again, you can use filtering here to remove those from the list).</SPAN></P> <P><SPAN style="font-weight: 400;">An overview of all </SPAN><I><SPAN style="font-weight: 400;">non-200</SPAN></I><SPAN style="font-weight: 400;"> pages and their status codes is also located in the main dashboard. Simply click the bar next to </SPAN><I><SPAN style="font-weight: 400;">“Broken Pages (4xx Errors)”</SPAN></I><SPAN style="font-weight: 400;"> to be directed to the relevant report.</SPAN></P> <P><IMG src="https://www.deepcrawl.com/wp-content/uploads/2021/11/Blog-Deepcrawl-report-Non-200-page-example.png" alt="Example of a Non-200 status code page report on Deepcrawl's Analytics Hub"/><BR/> <SPAN style="font-weight: 400;">Alongside reports that identify all of your site’s 404 errors, there’s the option to filter this down further by </SPAN><I><SPAN style="font-weight: 400;">source</SPAN></I><SPAN style="font-weight: 400;">. For instance, </SPAN><B><I>finding all of the 404 pages that have backlinks pointing to them </I></B><SPAN style="font-weight: 400;">is as simple as navigating to the “</SPAN><I><SPAN style="font-weight: 400;">Broken Pages with Backlinks”</SPAN></I><SPAN style="font-weight: 400;"> report. Or, if you wanted to see all of the 404 pages being linked to </SPAN><I><SPAN style="font-weight: 400;">internally</SPAN></I><SPAN style="font-weight: 400;">, the “</SPAN><I><SPAN style="font-weight: 400;">Broken Links</SPAN></I><SPAN style="font-weight: 400;">” report is the one to use.</SPAN></P> <P><SPAN style="font-weight: 400;">Our “</SPAN><I><SPAN style="font-weight: 400;">Unique Broken Links</SPAN></I><SPAN style="font-weight: 400;">” report can also be useful when prioritizing URLs that need urgent attention. Here you’ll find all of the broken pages that are linked to internally on your site, handily sorted by URL and anchor text. It’s an easy way to see which 404 pages are linked to most commonly, and from where.</SPAN></P> <P> </P> <H3><B><I>Finding 404 errors with Google Search Console</I></B></H3> <P><SPAN style="font-weight: 400;">If you’re not using Deepcrawl, then Google Search Console is a good starting place for finding URLs that return a 404 error code. The </SPAN><I><SPAN style="font-weight: 400;">Coverage</SPAN></I><SPAN style="font-weight: 400;"> report in GSC contains a list of URLs that have been submitted to Google and returned a 404 status code when they were last crawled. </SPAN></P> <P> </P> <H3><B><I>A note on “soft 404s”</I></B></H3> <P><SPAN style="font-weight: 400;">If you’re using GSC to locate 404 error pages, you might also notice a report on something called a “</SPAN><I><SPAN style="font-weight: 400;">soft 404</SPAN></I><SPAN style="font-weight: 400;">”. Soft 404s are pages that tell a user that a page does not exist, but still return a valid 200 response code. A soft 404 is an indication that Google has found no content of value on that page, or is otherwise struggling to make sense of why the page exists.</SPAN></P> <P><SPAN style="font-weight: 400;">Soft 404s are different from standard 404s in that they’re not truly returning a broken page response code. However, </SPAN><B><I>the label of a soft 404 can be enough for Google to drop a page from its index</I></B><SPAN style="font-weight: 400;">. </SPAN></P> <P><SPAN style="font-weight: 400;">Google often sees empty pages as soft 404s. If you’re using Deepcrawl, try setting up a custom extraction for pages with a word count below a certain threshold. If you see URLs of any value appearing here or in GSC’s soft 404 report, it’s worth taking the time to review these separately and make any relevant on-page improvements.</SPAN></P> <P> </P> <H3><B><I>Finding 404 errors with Google Analytics</I></B></H3> <P><SPAN style="font-weight: 400;">Google Analytics doesn’t provide a specified report for 404 pages. However, it’s possible to find them if you know the standard page title given to 404 pages on your domain. Simply head to “</SPAN><I><SPAN style="font-weight: 400;">Behavior > Site Content > All Pages</SPAN></I><SPAN style="font-weight: 400;">” and set the primary dimension to “</SPAN><I><SPAN style="font-weight: 400;">Page Title</SPAN></I><SPAN style="font-weight: 400;">”. From there, you should be able to filter results by entering the 404 page title into the search box.</SPAN></P> <P> <BR/> </P> <H2><STRONG>When 404s become a problem</STRONG></H2> <P><SPAN style="font-weight: 400;">Once you’ve identified your 404 pages, it’s time to determine whether or not they need fixing. </SPAN></P> <P><SPAN style="font-weight: 400;">As mentioned, having pages that return a 404 error isn’t necessarily cause for concern. Diagnosing whether or not a broken page needs fixing is more about </SPAN><B>understanding </B><B><I>how</I></B><B> and </B><B><I>when</I></B><B> users might encounter that page.</B></P> <P><SPAN style="font-weight: 400;">Returning a 404 error page when you’re certain that a URL should not exist on your site is widely accepted. Google’s own documentation confirms that having some 404 errors alone </SPAN><A href="https://support.google.com/webmasters/answer/2445990?hl=en" target="_blank" rel="noopener"><SPAN style="font-weight: 400;">will not harm your site’s search performance</SPAN></A><SPAN style="font-weight: 400;">. </SPAN><B>However</B><SPAN style="font-weight: 400;">, there are some instances where 404s may require some extra attention, including:</SPAN></P> <UL> <LI style="font-weight: 400;" aria-level="1"><SPAN style="font-weight: 400;">Submitted URLs that return a 404 status code</SPAN></LI> <LI style="font-weight: 400;" aria-level="1"><SPAN style="font-weight: 400;">Content that has been moved to another location (this should result in a 3xx redirect rather than a 404) </SPAN></LI> </UL> <H3><STRONG>Should it be a 404?</STRONG></H3> <P><SPAN style="font-weight: 400;">No two sites are exactly the same, but there are some general rules to follow when deciding which action to take around removed or relocated pages.</SPAN></P> <P> </P> <P><B><I>If the page should still exist…</I></B></P> <P><SPAN style="font-weight: 400;">While a natural part of the web, 404 errors can still occur where they’re not supposed to. Restore any content that’s been accidentally removed and wait for the page to be re-indexed by search engines.</SPAN></P> <P> </P> <P><B><I>If the page has been temporarily removed…</I></B></P> <P><SPAN style="font-weight: 400;">A 404 status code isn’t the recommended course of action for a page that’s only been removed </SPAN><I><SPAN style="font-weight: 400;">temporarily</SPAN></I><SPAN style="font-weight: 400;">. A 302 redirect is a better choice. Consider further steps like removing internal links while the 302 is in action, then restoring them when the content is reinstated. This gives search engines the best chance of finding those pages quickly.</SPAN></P> <P> </P> <P><B><I>If the page has been removed but still has value…</I></B></P> <P><SPAN style="font-weight: 400;">A page that no longer exists and has no replacement can usually be allowed to 404. Returning this status code generally results in Google slowing down its crawling of the page, until eventually it gets dropped from the index altogether (this usually takes about a month).</SPAN></P> <P><SPAN style="font-weight: 400;">Even if a page is gone for good, however, there are some extra considerations to be aware of: </SPAN></P> <UL> <LI><B>Internal links</B><SPAN style="font-weight: 400;"> – Does the page in question have internal links pointing to it? If so, it’s worth removing or replacing these links to prevent users from clicking through to a broken page. </SPAN><SPAN style="font-weight: 400;">Linking to 404s internally can also lead to </SPAN><A href="https://www.deepcrawl.com/blog/best-practice/5-ecommerce-seo-mistakes-bloating-your-website/" target="_blank" rel="noopener"><SPAN style="font-weight: 400;">unwanted crawl bloat</SPAN></A><SPAN style="font-weight: 400;"> and negatively impact the time it takes search engines to discover and crawl the pages that really matter.</SPAN></LI> <LI><B>Backlinks</B><SPAN style="font-weight: 400;"><SPAN style="font-weight: 400;"> – Are there any links coming from <I>external</I> sources? If so, allowing the page to 404 could result in <B><I>wasted link equity</I></B>. Check the URL’s referring domains before letting a page 404. If there’s anything of value, you may want to consider a <A href="https://www.deepcrawl.com/knowledge/hangout-library/redirects/" target="_blank" rel="noopener">301 redirect</A> instead. The ‘Broken Pages with Backlinks’ report comes in useful here. It’s also possible to narrow down and prioritize pages to redirect based using the ‘Broken Pages with Traffic’ report, as this highlights any 404s that users are finding organically.<BR/> </SPAN></SPAN></LI> </UL> <P> </P> <P><B><I>If the page has been permanently removed and has no link value…</I></B></P> <P><SPAN style="font-weight: 400;">Pages that have permanently been removed and have no link value can be given a 410 status code. This indicates that the page has gone completely and has been intentionally removed. Google currently views 404 and 410 pages in the same way, but a 410 is a good option if you know for certain that the content will not be reinstated. </SPAN></P> <P> <BR/> </P> <H2><STRONG>Handling valid 404 pages</STRONG></H2> <P><SPAN style="font-weight: 400;">We’ve discussed all the reasons 404 errors are a natural, and often helpful, part of the web. You could therefore be forgiven for thinking no further action is necessary when a page is left to 404, but that’s not strictly the case. </SPAN></P> <P><SPAN style="font-weight: 400;">404 pages that occur naturally should return a proper 404 HTTP response code. They should also </SPAN><I><SPAN style="font-weight: 400;">not</SPAN></I><SPAN style="font-weight: 400;"> be blocked via </SPAN><A href="https://www.deepcrawl.com/knowledge/technical-seo-library/robots-txt/" target="_blank" rel="noopener"><SPAN style="font-weight: 400;">robots.txt</SPAN></A><SPAN style="font-weight: 400;">, as this can make it harder for Google to understand how you want the page to be treated.</SPAN></P> <P><SPAN style="font-weight: 400;">You may also need to work on refining your 404 error page, ensuring that it’s user-friendly and informative.</SPAN></P> <P> </P> <H3><B><I>What makes a good 404 error page?</I></B></H3> <P><SPAN style="font-weight: 400;">Hitting upon a 404 response code can be frustrating for users. As webmasters, it’s our job to ease that frustration and direct users to the content they’re looking for (or at least the next best thing). That responsibility falls to your 404 error page, so it’s worthwhile to spend some time getting it right.</SPAN></P> <P><SPAN style="font-weight: 400;">There’s no hard and fast rule as to what constitutes an effective 404 error page. Often, it depends on the type of site and the nature of the content that’s being searched for. </SPAN></P> <P><SPAN style="font-weight: 400;">However, </SPAN><B><I>there are some steadfast recommendations for 404 pages that all webmasters should follow</I></B><SPAN style="font-weight: 400;">:</SPAN></P> <UL> <LI>Clearly display the error code and explain that the page can’t be found</LI> <LI>Follow the branding on the rest of the site</LI> <LI>Include clear navigational links</LI> </UL> <P><SPAN style="font-weight: 400;">Exactly how you meet these recommendations is up to you. Some sites bring the <EM>wow</EM> factor with stunning visuals, while others use humor in a bid to keep users engaged. </SPAN></P> <P> </P> <P><SPAN style="font-weight: 400;">The best approach is to view the page through the eyes of a brand new user. If you were landing on the page for the first time, would you know how to get back on track? Is it clear that the page you’ve requested has not been found, but that there could be other content of interest on the same domain? If the answer is anything but a clear and resounding “<EM>yes</EM>,” it’s time to improve that 404 error page.