Disallow Directives in Robots.txt

The disallow directive (added within a website’s robots.txt file) is used to instruct search engines not to crawl a page on a site. This will normally also prevent a page from appearing within search results.

Within the SEO Office Hours recaps below, we share insights from Google Search Central on how they handle disallow directives, along with SEO best practice advice and examples.

Either Disallow Pages in Robots.txt or Noindex Not Both

August 23, 2019 Source

Noindexing a page and blocking it in robots.txt will mean the noindex will not be seen, as Googlebot won’t be able to crawl it. Instead, John recommends using one or the other.

Disallowed Pages With Backlinks Can be Indexed by Google

July 9, 2019 Source

Pages blocked by robots.txt cannot be crawled by Googlebot. However, if they a disallowed page has links pointing to it Google can determine it is worth being indexed despite not being able to crawl the page.

Google Supports X-Robots Noindex to Block Images for Googlebot

December 21, 2018 Source

Google respects x-robots noindex in image response headers.

Focus on Search Console Data When Reviewing Links to Disavow

August 21, 2018 Source

If you choose to disavow links, use the data in Google Search Console as this will give you an accurate picture of what you need to focus on.

Block Videos From Search By Adding Video URL & Thumbnail to Robots.txt or Setting Expiration Date in Sitemap

July 13, 2018 Source

You can signal to Google for a video not to be included in search by blocking the video file and thumbnail image in robots.txt or by specifying an expiration date using a video sitemap file.

Don’t Rely on Unsupported Robots Directives in Robots.txt Being Respected By Google

July 13, 2018 Source

Don’t rely on noindex directives in robots.txt as they are aren’t officially supported by Google. John says it’s fine to use robots directives in robots.txt, but make sure you have a backup in case they don’t work.

Google Uses the Most Specific Matching Rule in Robots.txt

January 12, 2018 Source

When different levels of detail exist in robots.txt Google will follow the most specific matching rule.

Check Robots.txt Implementation if Disallowed URLs Accessed by Googlebot

January 12, 2018 Source

Googlebot doesn’t explicitly ignore URLs in robots.txt files. If Googlebot is crawling these pages then check the robots.txt file has been set up incorrectly server side using Google’s robots.txt tester. Also, check that there’s nothing on the server which is logging URLs as being accessed in one way but being requested in a different way; this can occur with URL rewriting.

Site Removal Request is Fastest Way to Remove Site From Search

September 8, 2017 Source

Disallowing a whole site won’t necessarily remove it from search. If the site has links pointing to it then Google may still index pages based on information from the anchor text. The fastest way to remove a site from search is using the site removal request in Search Console.

Related Topics

Crawling Indexing Crawl Budget Crawl Errors Crawl Rate Sitemaps Last Modified Nofollow Noindex RSS Canonicalization Fetch and Render