Home / SEO Office Hours / Disallow Directives in Robots.txt / Page 2

Disallow Directives in Robots.txt

The disallow directive (added within a website’s robots.txt file) is used to instruct search engines not to crawl a page on a site. This will normally also prevent a page from appearing within search results.

Within the SEO Office Hours recaps below, we share insights from Google Search Central on how they handle disallow directives, along with SEO best practice advice and examples.

For more on disallow directives, check out our article, Noindex, Nofollow & Disallow.

Site Removal Request is Fastest Way to Remove Site From Search

Disallowing a whole site won’t necessarily remove it from search. If the site has links pointing to it then Google may still index pages based on information from the anchor text. The fastest way to remove a site from search is using the site removal request in Search Console.

8 Sep 2017

Disallowed Pages May Take Time to be Dropped From Index

Disallowed pages may take a while to be dropped from the index if aren’t crawled very frequently. For critical issues, you can temporarily remove URLs from search results using Search Console.

2 Aug 2017

Redirecting Robots.txt Is OK

Google will follow redirects for robots.txt files.

16 Jun 2017

Disallowing Internal Search Pages won’t Impact the Sitelinks Search Box Markup

Internal search pages on a site do not need to be crawlable for the Sitelinks Search Box markup to work. Google doesn’t differentiate desktop and mobile URLs, so you might want to set up a redirect to the mobile search pages for mobile devices.

30 May 2017

Interstitials Blocked with Robots.txt Might be Seen as Cloaking

You can prevent Google from seeing a JavaScript run interstitial by blocking the JavaScript with robots.txt, but Google doesn’t recommend it as it might be seen as cloaking.

16 May 2017

Server Performance and Robots.txt Can Impact HTTPS Migrations

An HTTPS migration might have problems if Google is unable to crawl the site due to server performance issues or files blocked in robots.txt.

24 Feb 2017

Don’t Disallow a Migrated Domain

If you disallow a migrated domain, Google can’t follow any of redirects, so backlink authority cannot be passed on.

10 Jan 2017

Use Noindex or Canonical on Faceted URLs Instead of Disallow

John recommends against using robots.txt disallow to prevent facet URLs from being crawled as they may still be indexed, and allow them to be crawled and use a noindex or canonical tag, unless they are causing a server performance issue.

23 Sep 2016

Robots.txt Overrides Parameter Settings

URL Parameter settings in Search Console are a hint for Google, and they will validate them periodically. The Robots.txt disallow overrides the parameter removal, so it’s better to use the parameter tool to consolidate duplicate pages instead of disallow.

17 May 2016

Only Disallowed Scripts Which Affect Content Are an Issue

Disallowed scripts which are flagged as errors are only an issue if they affect the displaying of content you want indexed, otherwise it’s OK to leave them disallowed.