The disallow directive is used to instruct search engines not to crawl a page on a site and is added within the robots.txt file. This will also prevent a page from appearing within search results. Within our Hangout Notes, we explain how Google deals with disallow directives, with best practice advice and examples.

Disallowed Pages With Backlinks Can be Indexed by Google

July 9, 2019 Source

Pages blocked by robots.txt cannot be crawled by Googlebot. However, if they a disallowed page has links pointing to it Google can determine it is worth being indexed despite not being able to crawl the page.

Google Supports X-Robots Noindex to Block Images for Googlebot

December 21, 2018 Source

Google respects x-robots noindex in image response headers.

Focus on Search Console Data When Reviewing Links to Disavow

August 21, 2018 Source

If you choose to disavow links, use the data in Google Search Console as this will give you an accurate picture of what you need to focus on.

Block Videos From Search By Adding Video URL & Thumbnail to Robots.txt or Setting Expiration Date in Sitemap

July 13, 2018 Source

You can signal to Google for a video not to be included in search by blocking the video file and thumbnail image in robots.txt or by specifying an expiration date using a video sitemap file.

Don’t Rely on Unsupported Robots Directives in Robots.txt Being Respected By Google

July 13, 2018 Source

Don’t rely on noindex directives in robots.txt as they are aren’t officially supported by Google. John says it’s fine to use robots directives in robots.txt, but make sure you have a backup in case they don’t work.

Check Robots.txt Implementation if Disallowed URLs Accessed by Googlebot

January 12, 2018 Source

Googlebot doesn’t explicitly ignore URLs in robots.txt files. If Googlebot is crawling these pages then check the robots.txt file has been set up incorrectly server side using Google’s robots.txt tester. Also, check that there’s nothing on the server which is logging URLs as being accessed in one way but being requested in a different way; this can occur with URL rewriting.

Google Uses the Most Specific Matching Rule in Robots.txt

January 12, 2018 Source

When different levels of detail exist in robots.txt Google will follow the most specific matching rule.

Site Removal Request is Fastest Way to Remove Site From Search

September 8, 2017 Source

Disallowing a whole site won’t necessarily remove it from search. If the site has links pointing to it then Google may still index pages based on information from the anchor text. The fastest way to remove a site from search is using the site removal request in Search Console.

Disallowed Pages May Take Time to be Dropped From Index

August 2, 2017 Source

Disallowed pages may take a while to be dropped from the index if aren’t crawled very frequently. For critical issues, you can temporarily remove URLs from search results using Search Console.

Related Topics

Crawling Indexing Crawl Budget Crawl Errors Crawl Rate Sitemaps Last Modified Nofollow Noindex RSS Canonicalization Fetch and Render