Don’t Rely on Unsupported Robots Directives in Robots.txt Being Respected By Google
Don’t rely on noindex directives in robots.txt as they are aren’t officially supported by Google. John says it’s fine to use robots directives in robots.txt, but make sure you have a backup in case they don’t work.
A Sitemap is the Best Way for Google to Quickly Process Noindex at Scale
Make sure the pages you’ve added a noindex tag to are included in a sitemap file with the last modified date to ensure Google picks these up as quickly as possible. Make sure last modified dates are realistic and aren’t the same for every page as this looks artificial to Google.
Google Eventually Treats Noindexed Pages as Soft 404s
Google treats noindexed pages as soft 404s after a period of time, as the page is ignored and essentially doesn’t exist in their eyes.
Videos Blocking Googlebot May Still be Crawled and Indexed
Blocking Googlebot from crawling a video may still result in a video snippet appearing in search if the video file is embedded from a different location, if some Google datacentres haven’t yet seen the updated version or if the video URL has parameters attached.
Noindex & 410 Pages Are Removed Faster Than 404
Noindex and 410 remove pages from Google’s index at about the same speed, and both are slightly quicker than using a 404.
Noindexing Images Will Cause Omissions From Image Search & Video Search
Noindexed images won’t appear in Google image search and if a site hosts its own videos the thumbnail image won’t be indexed, meaning that the video won’t be indexed either.
Ensure All Product Pages Can be Crawled With Considered Use of Noindex
eCommerce sites with facets should be careful which pages are noindexed because this may make it difficult for Googlebot to crawl individual product pages e.g. noindexing all category pages. Webmasters might consider noindexing specific facets or deciding that everything after a certain number of pages in a paginated set be noindexed.
Noindex Errors Are Differentiated by Source of URL in the New Search Console
In the new Search Console the noindex errors are differentiated by the source of the URL. Google will assume an error on your side if you submit a noindexed URL, as opposed to finding noindexed URLs found through crawling.