The disallow directive is used to instruct search engines not to crawl a page on a site and is added within the robots.txt file. This will also prevent a page from appearing within search results. Within our Hangout Notes, we explain how Google deals with disallow directives, with best practice advice and examples.

Disallowed Pages May Take Time to be Dropped From Index

August 2, 2017 Source

Disallowed pages may take a while to be dropped from the index if aren’t crawled very frequently. For critical issues, you can temporarily remove URLs from search results using Search Console.

Redirecting Robots.txt Is OK

June 16, 2017 Source

Google will follow redirects for robots.txt files.

Disallowing Internal Search Pages won’t Impact the Sitelinks Search Box Markup

May 30, 2017 Source

Internal search pages on a site do not need to be crawlable for the Sitelinks Search Box markup to work. Google doesn’t differentiate desktop and mobile URLs, so you might want to set up a redirect to the mobile search pages for mobile devices.

Interstitials Blocked with Robots.txt Might be Seen as Cloaking

May 16, 2017 Source

You can prevent Google from seeing a JavaScript run interstitial by blocking the JavaScript with robots.txt, but Google doesn’t recommend it as it might be seen as cloaking.

Server Performance and Robots.txt Can Impact HTTPS Migrations

February 24, 2017 Source

An HTTPS migration might have problems if Google is unable to crawl the site due to server performance issues or files blocked in robots.txt.

Don’t Disallow a Migrated Domain

January 10, 2017 Source

If you disallow a migrated domain, Google can’t follow any of redirects, so backlink authority cannot be passed on.

Use Noindex or Canonical on Faceted URLs Instead of Disallow

September 23, 2016 Source

John recommends against using robots.txt disallow to prevent facet URLs from being crawled as they may still be indexed, and allow them to be crawled and use a noindex or canonical tag, unless they are causing a server performance issue.

Only Disallowed Scripts Which Affect Content Are an Issue

May 17, 2016 Source

Disallowed scripts which are flagged as errors are only an issue if they affect the displaying of content you want indexed, otherwise it’s OK to leave them disallowed.

Robots.txt Overrides Parameter Settings

May 17, 2016 Source

URL Parameter settings in Search Console are a hint for Google, and they will validate them periodically. The Robots.txt disallow overrides the parameter removal, so it’s better to use the parameter tool to consolidate duplicate pages instead of disallow.

Related Topics

Crawling Indexing Crawl Budget Crawl Errors Crawl Rate Sitemaps Last Modified Nofollow Noindex RSS Canonicalization Fetch and Render