Hands up who knew you can add noindex directives within your robots.txt file before John Mueller spoke about it in the Google Webmaster Hangout this week?
Yes, that’s a thing. And it’s not a new thing, either: we know Google have supported this feature for at least ten years, because Matt Cutts first mentioned it back in 2008. DeepCrawl has also supported it since 2011.
And yet, if you were unaware you could do it, you’re not alone. It’s been hiding in plain sight; bet Google wish they could keep a few other things that quiet…
So is it actually useful?
Although Google doesn’t offer any additional support for it and John was slightly vague in the video – stating ‘we used to support it’ but then ‘you shouldn’t rely on it’ – his comments and our testing suggest it’s still in operation.
If you’re already disallowing in robots.txt...
Unlike disallowed pages, noindexed pages don’t end up in the index and therefore won’t show in search results. Combine both in robots.txt to optimise your crawl efficiency: the noindex will stop the page showing in search results, and the disallow will stop it being crawled:
This is the holy grail of robots control that SEOs have been looking for.
If you’re already adding meta noindex...
Adding this directive in the robots.txt is quicker, cleaner, easier to manage than getting a meta no index added to specific pages. It also means less confusion over which directives override which (because the robots.txt will override any directives that have been added on the page).
While a meta noindex is still great for conditional noindexing, (for example, only noindexing results pages with 0 results) the robots.txt noindex is useful for noindexing based on URL patterns.
Just use a noindex directive in your robots.txt in addition to the disallow one – that’s it:
Testing and monitoring using Search Console and DeepCrawl
Test how your noindex directive is working in the Search Console testing tool, as you would with any other Robots.txt directive (in Crawl > robots.txt Tester)
DeepCrawl already supports the robots.txt noindex directive: check which pages are being noindexed in your report via Indexation > Non-Indexable Pages > Noindex Pages.
You’ll see a list of URLs that have been noindexed, plus where they have been noindexed: robots, meta or header.
If you’d rather not see noindexed pages in your report, you can exclude them from the crawl in Advanced Settings.
Want More Like This?
We hope that you’ve found this post useful in learning more about robots.txt noindex to control the crawling of your site.
Additionally, if you’re interested in keeping up with Google’s latest updates and best practice recommendations then why not loop yourself in to our emails?