Home / SEO Office Hours / Indexing / Page 6

Indexing

In order for web pages to be included within search results, they must be in Google’s index. Search engine indexing is a complex topic and is dependent on a number of different factors. Our SEO Office Hours Notes on indexing cover a range of best practices and compile indexability advice Google has released in their Office Hours sessions to help ensure your website’s important pages are indexed by search engines.

Pages with Internally Duplicated Content Are Indexed Separately but Folded Together in Search

Google will index pages with duplicate blocks of text separately but will work out which of those pages is most relevant to show for each query and will show just one of them in the search results.

5 Apr 2019

Google Can Index Pages Blocked by Robots.txt

Google can index pages blocked in robots.txt if they have internal links pointing to them. In a scenario like this, Google will likely use a title from some the internal links pointing to the page, but the page will rarely be shown in search because Google has very little information about it.

22 Mar 2019

Replace Unnecessary URL Parameters with Fragment Identifiers

John recommends replacing unnecessary URL parameters with fragment identifiers because anything after the # is usually dropped for indexing, whereas parameter URLs can be indexed separately.

19 Mar 2019

Mobile-friendly Test Errors Not Necessarily Representative of Google’s Indexing Process

The Mobile-friendly test might not pull all page resources because it is a live tool and needs to be time-efficient. However, this isn’t representative of Google’s indexing process which uses the cached version for some of a page’s resources.

8 Mar 2019

Google Folds Together Same Language Country Versions & Swaps Out URLs Dependent on Searcher Location

Google identifies different country versions of the same language as duplicates and folds them together for indexing. Google can then swap out the URLs depending on where the search is performed.

19 Feb 2019

Index User-Generated Content if it Provides Value

It is fine to index user-generated content, such as comments, but it is up to webmasters if they think that it is valuable content that should be visible to search engines. In the case of comments, it might make sense to block them from being crawled and indexed.

11 Dec 2018

Crawling But Not indexing Pages is Normal for Pages with Content Already on Other Indexed Pages

It’s normal for Google to crawl URLs, but not index them if they are not considered useful for search, such as index and archive pages which have content already indexed on other pages. This has been the case for a long time, but these pages have become more visible recently due to the ‘Crawled – currently not indexed’ report in Search Console.

30 Nov 2018

It’s Normal for Google to Index XML Sitemap Files

If you see an XML sitemap file showing in the search results when you search for a specific URL on your website, this is normal and won’t cause any issues. If you don’t want XML sitemaps to be indexed, then add an x-robots tag in the HTTP header.

27 Nov 2018

Internally Link to Seasonal Content so Google Will Index It

Publish seasonal content far enough in advance for Google to index it for the required period, and also internally link to this content so Google knows these pages are important and relevant to users which will improve indexing.

16 Nov 2018

Discovered – Currently not indexed’ GSC Report Pages Have No Value for Crawling & Indexing

Google knows about pages in the ‘Discovered – currently not indexed’ report in Google Search Console but hasn’t prioritised them for crawling and indexing. This is usually due to internal linking and content duplication issues.