Here is a comprehensive list of ways to get URLs crawled by Google.
Following on from our beginner’s guide to implementing noindex, disallow and nofollow directives, we’re now taking a look at some more advanced methods for controlling Disallow directives in robots.txt.
In this guide for intermediate and advanced SEOs, we’ll cover PageRank, JS/CSS files, indexation, parameters, pattern matching and how search engines will handle conflicting Allow/Disallow rules.
Following on from our beginner’s guide to implementing noindex, disallow and nofollow directives, we’re delving a bit further into the murky waters of noindexing. We’ll cover using noindex for thin pages, duplicate content and managing Sitelinks; error recovery; combining with disallow; PageRank and Sitemaps.
The three words above might sound like SEO gobbledegook, but they’re words worth knowing, since understanding how to use them means you can order Googlebot around. Which is fun.
So let’s start with the basics: there are three ways to control which parts of your site search engines will crawl:
- Noindex: tells search engines not to include your page(s) in search results.
- Disallow: tells them not to crawl your page(s).
- Nofollow: tells them not to follow the links on your page.
Hands up who knew you can add noindex directives within your robots.txt file before John Mueller spoke about it in the Google Webmaster Hangout this week?
Auditing URLS being blocked by robots.txt is a tricky business, and potentially massively risky to your rankings. That’s why we turned to SEO expert Glenn Gabe for an in depth guide to the subject.
Writing and making changes to a robots.txt file can make even the most hardened SEOs a little bit nervous. Just one erroneous character could have a major impact on performance, or even wipe out your entire site.