Spammy backlinks to 404 pages are ignored by default
When asked how to deal with thousands of spammy backlinks, John was keen to reassure users that low-quality sites linking to 404 pages won’t impact your site negatively. Because it’s a 404, Google essentially reads that link as not connecting to anything and ignores it (for the same reason, it’s important to review/redirect links coming from valuable sources to a 404 page). If it’s just a handful of sites providing spammy backlinks to non-404 pages, the recommendation is to set a domain-level disavow.
Use log file analysis to understand which older 404 pages may benefit from redirects
When doing a website migration, it’s important to make sure that important external links are redirected so that users don’t land on a 404 and the link is lost. This could also be reflected in the search results over time. John mentioned that a difference may not be noticed after a time period of two years, but that if there are really strong external links pointing to a 404 or broken link, it would still be worthwhile to redirect these pages. He clarified that you can further analyze this by using log files to see which older 404 pages search engines are still regularly trying to access. This could be a sign that they should be redirected to something more useful.
There is No Guarantee of Faster Results By Using 410 Status Codes
To remove a full section of a site from the index, it is best to include a 410 status code on the pages. Both 404 and 410 display different signals to Googlebot, with 410 being a clearer signal that the page has been removed. However, as Google encounters a large number of incorrect signals, Martin explained that they will use these status codes as hints, so it is not a guarantee that you will see faster results by using a 410.
Return 404 or 410 Status Codes to Prevent Googlebot Processing Files from Hacked Domains
If you have a legacy hacked domain, the best way to prevent Google from crawling old URLs is to create an overwrite file in your htaccess that returns a 404 or 410 status code when the hacked URL is accessed by Googlebot. This will stop it from processing the files and making calls to the database.
404 or 410 Status Codes Will Not Impact a Website’s Rankings
If Google identifies 404 or 410 pages on a site, it will continue to crawl these pages in case anything changes, but will begin to phase out the crawling frequency to concentrate more on the pages which return 200 status codes.
Google Checks Status Code Pages Before Attempting to Render
Google checks the status code of a page before doing anything else, such as rendering content. This helps to identify which pages can be indexed and which pages it shouldn’t render. For example, if your page returns a 404, Google won’t render anything from it.
Google Does Not Index 404 Pages
If a page returns a 404 error code, Google will not index the page’s content. However, if the page has recently become a 404 page and Google has not crawled the page to see this, the page will still appear in search results. This error could also occur if the server displays a 404 page, but the code shown to crawlers is still a 200 status.
Signals Are Kept For 4xx or 5xx Error Pages Previously Dropped from the Index When They Are Re-added
If your pages displayed a 4xx or 5xx error for a while and were dropped from the index but become available again after a month or so, for example, Google will be able to return them to the search results in the same state they were before. They won’t have to start trying to rank from nothing.