URL-level Robots Directives

Rachel Costello
Rachel Costello

On 11th June 2018 • 17 min read

In the previous guide, we took you through what a robots.txt file is and how to use it. Now we’ll take a look at a couple of different ways that you can apply robots directives at a page level to instruct search engine bots with rules about how they should treat the pages on your site.

 

What are URL-level robots directives?

Robots directives are sections of code that give instructions to website crawlers about how a page’s content should be crawled or indexed. Robots meta tags allow a granular approach to controlling how specific pages are indexed and shown in search engine results pages.

 

Where do they exist?

Robots directives can either exist in the HTTP response header or HTML head. If they are included within a page, they must be located in the <head> section in order to be read and understood by search engines.

X-Robots-Tags are placed in the HTTP response header and are useful for controlling the indexing of non-html content, such as PDF files, video or word documents where it isn’t possible to add robots directives in the HTML head.

Here are a few considerations around X-Robots-Tags:

 

How do they apply?

Search engines need to crawl a page to see these directives. They can only be followed and understood if crawlers have access to the page, so adding a noindex tag then disallowing the page in the robots.txt file will mean that search engines will not see the noindex meta tag.

You can also create combined robots directives by separating them with commas. For example, you can use the following combination to instruct crawlers not to index a page, but to follow all the links on a page:

 

How do you set instructions for specific user agents?

You can address specific user agents and instruct them on how to index a particular page by including a different name=“” attribute. For example, you can tell Bing not to index a page with the meta tag:

<meta name=“bingbot” content=“noindex” />

If you want to address different user agents and provide them with different instructions, you can use multiple robots directives in separate meta tags.

 

Noindex

In this section of the guide, we’ll explain what the noindex tag is and how it should be used for SEO.
Noindex tag example

What does noindex do?

A noindex indicates to search engines to crawl a page, but not to include it in their indices. It effectively stops a page from appearing in the search engine results pages (SERPs). It is useful for keeping irrelevant, unnecessary pages from being indexed which can create index bloat.

Google will still crawl noindex pages, but less frequently than indexable pages, so it can help improve the use of your crawl budget.

Noindex gives a direct instruction around how to handle a page, as opposed to ‘Index’ which does nothing as that is the default state for search engines.

What does noindex look like?

This is how the noindex meta tag appears in the HTML head:

<head>
<meta name=”robots” content=”noindex” />
(…)
</head>

This is how the noindex X-Robots-Tag appears in the HTTP header:

HTTP/1.1 200 OK
Date: Fri, 21 July 2017 21:00:00 GMT
(…)
X-Robots-Tag: noindex

Why might you use noindex?

There are a number of different reasons why you might want to use a noindex tag for particular pages on your site, which might include:

When shouldn’t you use noindex?

Noindex can be a useful solution for helping reduce index bloat, but it isn’t the answer for everything. Here are some examples where you shouldn’t use noindex:

 

Nofollow

Now let’s look at the nofollow tag, what it is used for and how to use it properly.

Nofollow example

How does nofollow work?

Nofollow tells a crawler not to follow any links on a page or on a specific link (depending on the implementation), meaning that no link equity will be passed on to the target URL. To restrict crawling and improve crawler efficiency, you can use nofollow to prevent URLs that you don’t want equity to be passed to from being crawled.

Using ‘follow’ won’t do anything as this is the default attribute for search engines so it is not required. Nofollow is a command as it goes against the default and gives an instruction.

It’s important to note that nofollowed link targets will still be crawled if they are linked to elsewhere without the nofollow tag, so you need to make sure it is used consistently. Also, if you have two links on a page to the same target, but one is nofollowed, then Google will just crawl the followable link.

Google respects the nofollow directive and won’t pass PageRank across those links. Bing also responds to nofollow, however, for Google and Bing, a page that is nofollowed may still appear in their indices if it is linked to without nofollow either internally or externally. Nofollow also works for Baidu, and link weight won’t be counted with this directive in place. Yandex doesn’t support nofollow, and recommends using noindex instead.

How do you implement nofollow?

If you want to implement nofollow at a page level, this would prevent all links on that page from being crawled, including <a>  links, canonicals and rel alternates. In this instance you would include this code snippet in the <head> section:

<head>
<meta name=”robots” content=”nofollow”>
(…)
</head>

When implementing nofollow at an individual link level, you would include this code snippet where you would place the link in the HTML:

<a href=”example.html” rel=’nofollow”>here</a>

The strictest rule wins with nofollow and the most restrictive directive is applied, so this means that the page level meta nofollow will override a link level follow, for example.

Why might you use nofollow?

There are various reasons why you might use nofollow for particular pages or links on your site. Here are some examples where you might use nofollow:

 

Other Robots Directives

There are a number of other directives you can use to communicate with search engines around how to handle the pages on your site. Here are some examples and how each of them works.

None

<meta name=”robots” content=”none”>

This directive is equivalent to “noindex, nofollow.” It basically instructs search engines that the page should be ignored.

Noarchive

<meta name=”robots” content=”noarchive”>

This directive instructs search engines not to show a cached link in search results. It can also be used to prevent competitors scraping your content, which is especially useful on ecommerce sites where prices update frequently.

It’s important to note that this directive prevents users from being able to view a cached version of the page when your site is down or inaccessible, however.

Nosnippet

<meta name=”robots” content=”nosnippet”>

This directive instructs search engines not to show a snippet for this page in the search results. Nosnippet is useful if you want to have more control over what to display to users about the content of the site which could lead to increased CTR depending on the search query.

Notranslate

<meta name=”robots” content=”notranslate”>

This directive instructs search engines not to offer a translation of the page in the results pages. This can be useful as you may not want the content on your page to be able to be translated, because sometimes the automatic translation is not accurate.

You can also specify sections of your page not to be translated using a class:

<div class=”notranslate”>
<…>
</div>

Noimageindex

<meta name=”robots” content=”noimageindex”>

This directive instructs search engines not to index images on the page. It is useful if you don’t want others republishing your images as their own, especially if they’re copyrighted. However, if images are linked to elsewhere on a site, search engines will still index them so the x-robots tag may be a better option.

Unavailable_after

<meta name=”robots” content=”unavailable_after: Monday, 11-June-18 12:00:00 UTC”>

This directive instructs search engines not to show the page in the results pages after a specified date or time. This is useful for pages that are relevant for a specific timeframe, such as event registration pages or promotional landing pages (e.g. Black Friday.) The content will be deindexed after the event has finished and you no longer require the page to appear.

For pages with a limited useful time window, using “unavailable_after” is more convenient than putting a noindex on the page at a later date.

Noodp

<meta name=”robots” content=”noodp”>

This directive instructs search engines not to use metadata from the Open Directory project for titles or snippets shown for this page. This has recently depreciated, however, and DMOZ ceases to exist.

Author

Rachel Costello
Rachel Costello

Rachel Costello is a Technical SEO & Content Manager at DeepCrawl. You'll most often find her writing and speaking about all things SEO.

Get the knowledge and inspiration you need to build a profitable business - straight to your inbox.

Subscribe today