What is Search Engine Indexing & How Does it Work?

What happens once a search engine has finished crawling a page? Let’s take a look at the indexing process that search engines use to store information about web pages, enabling them to quickly return relevant, high-quality results.

What’s the need for indexing by search engines?

Remember the days before the internet when you’d have to consult an encyclopedia to learn about the world and dig through the Yellow Pages to find a plumber? Even in the early days of the web, before search engines, we had to search through directories to retrieve information. What a time-consuming process. How did we ever have the patience?

Search engines have revolutionized information retrieval to the extent that users expect near-instantaneous responses to their search queries.

What is search engine indexing?

Indexing is the process by which search engines organize information before a search to enable super-fast responses to queries.

Searching through individual pages for keywords and topics would be a very slow process for search engines to identify relevant information. Instead, search engines (including Google) use an inverted index, also known as a reverse index.

Indexability - Website Health and SEO Topics — View more search engine indexability resources in Lumar’s Website Intelligence Academy

What is an inverted index?

An inverted index is a system wherein a database of text elements is compiled along with pointers to the documents which contain those elements. Then, search engines use a process called tokenization to reduce words to their core meaning, thus reducing the amount of resources needed to store and retrieve data. This is a much faster approach than listing all known documents against all relevant keywords and characters.

An example of inverted indexing

Below is a very basic example that illustrates the concept of inverted indexing. In the example, you can see that each keyword (or token) is associated with a row of documents in which that element was identified.

Keyword	Document Path 1	Document Path 2	Document Path 3
SEO	example.com/seo-tips	moz.com	…
HTTPS	deepcrawl.co.uk/https-speed	example.com/https-future	…

This example uses URLs but these might be document IDs instead depending on how the search engine is structured.

The cached version of a page

In addition to indexing pages, search engines may also store a highly compressed text-only version of a document including all HTML and metadata.

The cached document is the latest snapshot of the page that the search engine has seen.

The cached version of a page can be accessed (in Google) by clicking the little green arrow next to each search result’s URL and selecting the cached option. Alternatively, you can use the ‘cache:’ Google search operator to view the cached version of the page.

Bing offers the same facility to view the cached version of a page via a green down arrow next to each search result but doesn’t currently support the ‘cache:’ search operator.

What is PageRank?

“PageRank” is a Google algorithm named after the co-founder of Google, Larry Page (yes, really!) It is a value for each page calculated by counting the number of links pointing at a page in order to determine the page’s value relative to every other page on the internet. The value passed by each individual link is based on the number and value of links that point to the page with the link.

PageRank is just one of the many signals used within the large Google ranking algorithm.
An approximation of the PageRank values were initially provided by Google but they are no longer publicly visible.

While PageRank is a Google term, all commercial search engines calculate and use an equivalent link equity metric. Some SEO tools try to give an estimation of PageRank using their own logic and calculations. For example, Page Authority in Moz tools, TrustFlow in Majestic, or URL Rating in Ahrefs. Lumar has a metric called DeepRank to measure the value of pages based on the internal links within a website.

How PageRank flows through pages

Pages pass PageRank, or link equity, through to other pages via links. When a page links to content elsewhere it is seen as a vote of confidence and trust, in that the content being linked to is being recommended as relevant and useful for users. The count of these links — and the measure of how authoritative the linking website is — determines the relative PageRank of the linked-to page.

PageRank is equally divided across all discovered links on the page. For example, if your page has five links, each link would pass 20% of the page’s PageRank through each link to the target pages. Links that use the rel=”nofollow” attribute do not pass PageRank.

The importance of backlinks

Backlinks are a cornerstone of how search engines understand the importance of a page. There have been many studies and tests performed to identify the correlation between backlinks and rankings.

Research into backlinks by Moz shows that results for the top 50 Google search queries (~15,000 search results), 99.2% of these had at least 1 external backlink. On top of this, SEOs consistently rate backlinks as one of the most important ranking factors in surveys.

Next Chapter: Search Engine Differences

The Full Guide to How Search Engines Work:

How Do Search Engines Work?

How Search Engines Crawl Websites

How Does Search Engine Indexing Work?

What are the Differences Between Search Engines?

What is Crawl Budget?

what is robots.txt used for? An SEO guide to robots txt

What is Robots.txt? How is Robots.txt Used by Search Engines?

tech seo tips for url-level robots.txt directives

A Guide to Robots.txt Directives

Additional learning resources:

Indexability Best Practices (Lumar Website Intelligence Academy)

Learn More About Search Engine Indexing & SEO

free seo ebook - ultimate guide to enterprise seo

eBook: The Ultimate Guide to Enterprise SEO

How should you approach SEO for enterprise websites? What unique considerations and challenges exist for search engine optimization on an enterprise scale vs. implementing SEO in smaller organizations?

free digital marketing ebook - demand gen, digital ops, and your website

eBook: Digital Ops & Demand Generation

Learn how to make your website a performance multiplier for demand generation in this guide for digital marketers and SEOs.

Explore Lumar’s Enterprise SEO Tools

See why technical SEO and website teams around the globe love Lumar’s ultra-fast website crawler.

Start building better online experiences today

Lumar is the intelligence & automation platform behind revenue-driving websites

Get started with Lumar

Search Engine Indexing