Indexing

In order for web pages to be included within search results, they must be in Google’s index. Search engine indexing is a complex topic and dependent on a number of different factors. Our Hangout Notes on indexing cover a range of best practice advice to ensure your website’s important pages are indexed by search engines.

Sites Not Indexed in Search May Be Due to Spam or Technical Issues

March 17, 2020 Source

There is a big difference between a site which completely disappears from Google search, and one which is demoted. A site which is removed from the index is usually due to a very significant web spam issue, or a technical issue. If a site is demoted, and not ranking as well as before, it may be due to the quality of the content, or setup of the site. Spammy backlinks are not likely to be a cause.


Some Machine-translated Content Can be High Enough Quality to be Indexed

February 18, 2020 Source

Machine-translated content is getting more sophisticated and producing better results, so if these pages are translated to a high enough quality then they are fine to be indexed. However, the translation results should be checked by humans to ensure accuracy and quality, which can be difficult to scale across a large number of translated pages.


If You Have A Manual Action in Place GSC Will Still Show the Page as Indexed

February 7, 2020 Source

If you have a manual action or URL removal in place, the inspect URL tool in Search Console will still show a page as indexed but it won’t display in search results. This is because the manual action and URL removal are filters which happen on top of the search results, so the page can still be indexed but not shown.


Anything Contained on Non-canonical Pages Will Not Be Used for Indexing Purposes

February 7, 2020 Source

When Google pick a canonical for a page, they will understand there is a set of pages, but only focus on the content and links of the canonical page. Anything that is only contained on the non-canonical versions will not be used for indexing purposes. If you have content on those pages that you would like to be indexed, John recommends ensuring they are different.


If a Robots.txt File Returns a Server Error for a Brief Period of Time Google Will Not Crawl Anything From the Site

January 31, 2020 Source

If a robots.txt file returns a server error for a brief period of time Google will not crawl anything from the website until they are able to access it and crawl normally again. During the period of time where they are blocked from reaching the file they would assume all URLs are blocked and would therefore flag this in GSC. You can use the robots.txt request in your server logs to identify where this has occurred by reviewing the response size and code that was returned during each request.


Google is Able to Display Structured Data Results as Soon as the Page Has Been Re-crawled

January 10, 2020 Source

After configuring pages to send structured data to Google, it will be able to display the structured data results the next time it crawls and indexes that page.


Technical Issues Can Cause Content to be Indexed on Scraper Sites Before Original Site

January 7, 2020 Source

If content on scraper sites is appearing in the index from those sites before the original site, this could be due to technical issues on the original site. For example, Googlebot might not be able to find main hub pages or category pages or may be getting stuck in crawl traps by following excess parameter URLs.


Google Doesn’t Show Preference to Multi Page Websites Over Single Page Websites in Rankings

November 12, 2019 Source

Google doesn’t have a preference for ranking websites with lots of pages over single page websites, the latter can rank well.


Make Category Pages Indexable & Internal Search Pages Non-indexable

November 12, 2019 Source

To get around URL duplication and index bloat issues, focus on providing high quality category pages and making sure that these are indexable, and noindex internal search pages as the different search combinations often create low-quality pages.


Related Topics

Crawling Crawl Budget Crawl Errors Crawl Rate Disallow Sitemaps Last Modified Nofollow Noindex RSS Canonicalization Fetch and Render