Either Disallow Pages in Robots.txt or Noindex Not Both
Noindexing a page and blocking it in robots.txt will mean the noindex will not be seen, as Googlebot won’t be able to crawl it. Instead, John recommends using one or the other.
Disallowed Pages With Backlinks Can be Indexed by Google
Pages blocked by robots.txt cannot be crawled by Googlebot. However, if they a disallowed page has links pointing to it Google can determine it is worth being indexed despite not being able to crawl the page.
Google Supports X-Robots Noindex to Block Images for Googlebot
Google respects x-robots noindex in image response headers.
Focus on Search Console Data When Reviewing Links to Disavow
If you choose to disavow links, use the data in Google Search Console as this will give you an accurate picture of what you need to focus on.
Block Videos From Search By Adding Video URL & Thumbnail to Robots.txt or Setting Expiration Date in Sitemap
You can signal to Google for a video not to be included in search by blocking the video file and thumbnail image in robots.txt or by specifying an expiration date using a video sitemap file.
Don’t Rely on Unsupported Robots Directives in Robots.txt Being Respected By Google
Don’t rely on noindex directives in robots.txt as they are aren’t officially supported by Google. John says it’s fine to use robots directives in robots.txt, but make sure you have a backup in case they don’t work.
Google Uses the Most Specific Matching Rule in Robots.txt
When different levels of detail exist in robots.txt Google will follow the most specific matching rule.
Check Robots.txt Implementation if Disallowed URLs Accessed by Googlebot
Googlebot doesn’t explicitly ignore URLs in robots.txt files. If Googlebot is crawling these pages then check the robots.txt file has been set up incorrectly server side using Google’s robots.txt tester. Also, check that there’s nothing on the server which is logging URLs as being accessed in one way but being requested in a different way; this can occur with URL rewriting.
Site Removal Request is Fastest Way to Remove Site From Search
Disallowing a whole site won’t necessarily remove it from search. If the site has links pointing to it then Google may still index pages based on information from the anchor text. The fastest way to remove a site from search is using the site removal request in Search Console.