How to make AJAX applications crawlable

Adam Gent
Adam Gent

On 3rd October 2019 • 9 min read

If you want to crawl a website with an AJAX application, you will need to use the AJAX crawling feature to allow DeepCrawl to access the links and content on the site.

Note: Google stopped using the AJAX crawling scheme at the end of quarter 2 of 2018. If you are relying on the AJAX crawling scheme for Google to crawl and index dynamic content, then this is no longer supported. For more information on getting JavaScript websites crawled and indexed please read new Google Developers documentation.

In this guide we will be focusing on:

  1. What is AJAX?
  2. How does the AJAX crawling scheme work?
  3. How do I configure DeepCrawl to crawl AJAX websites?

 

What is AJAX?

The term AJAX stands for Asynchronous JavaScript and XML. It is a technique used by developers to create interactive websites using: XML, HTML, CSS, and JavaScript.

AJAX allows developers to update content, when an event is triggered, without having to make the user reload a page. Although an AJAX website can create an excellent experience for users, it can cause server issues for search engines.

The main problems with AJAX websites are that:

Both problems stop search engines from crawling and indexing dynamic content on a website. To manage these problems, Google came up with the AJAX crawling scheme which allows search engines to crawl and index these types of websites.
 

How does the AJAX crawling scheme work?

A website that implements the AJAX crawling scheme provides search engine crawlers with a HTML snapshot of a dynamic page. The search engine is served an “ugly URL”, while the user is served with the dynamic “clean URL” of a web page.

For example:

For an overview on how to create HTML snapshots and the AJAX crawling scheme read the following official guide by Google.
 

How to configure AJAX crawling in DeepCrawl

When configuring DeepCrawl it is important to make sure that:

  1. The AJAX website supports the AJAX crawling scheme.
  2. The advanced settings in the DeepCrawl project are updated.

 

Supporting the AJAX crawling scheme

Our team recommends following the AJAX crawling scheme instructions to implement it on an AJAX website. It is important to note that there are two ways to indicate the scheme: An AJAX website which has hashbang (#!) in the URL, and AJAX websites which do not include a hashbang (#!) in the URL.

Our team has provided further details below for each setup and how it impacts DeepCrawl.

AJAX websites using hashbang URLs (#!)

For DeepCrawl to crawl an AJAX website which has a hashbang in the URL it needs the following requirements:

  1. The AJAX crawling scheme is indicated on clean URLs using hasbang (#!).
  2. The site’s server should be setup to handle requests for ugly URLs.
  3. The ugly URL should contain the HTML snapshot of the page.

If these requirements are not met, then DeepCrawl will be unable to crawl an AJAX website.

AJAX websites without hashbang URLs (#!)

For DeepCrawl to crawl an AJAX website without a hashbang in the URL it needs the following requirements:

  1. AJAX crawling scheme is indicated on clean URLs using meta fragment tag.
  2. The _escape_fragment_ parameter is appended to the end of clean URLs.
  3. The ugly URL should contain the HTML snapshot of the page.

The meta fragment tag and _escape_fragment_ parameter only need to be included on pages that are using AJAX. They do not need to be added to every page of a website, unless all pages use AJAX to load content.
 

Updating DeepCrawl settings to crawl AJAX websites

Once the website is updated to properly indicate the AJAX crawling scheme to search engines, you will need to update the advanced settings in DeepCrawl.

1. Firstly, uncheck the ‘Strip # Fragments from all URLs’ box. This will force DeepCrawl to crawl the hashed URLs.

Strip fragment URL setting

2. Then create the following three rules in the URL rewriting settings.

URL rewriting rules

 

Match From Match To Case Options
#! ?_escaped_fragment_= No Change
^(?!.*?_escaped_fragment_=)(.*?.*) $1&_escaped_fragment_= No Change
^(?!.*?_escaped_fragment_=)(.*) $1?_escaped_fragment_= No Change

 

3. Hit “Save” at the bottom of the advanced settings in the project to save the rewrite settings.

save URL rewrite

4. Run a test crawl to see if the new project settings are working.
 

How does the URL rewrite rule work?

The first rewrite rule will replace #! with _escaped_fragment_= in all URLs, for example:

Pretty URL: www.example.com/document#!resource_1
Rewritten URL: www.example.com/document?_escaped_fragment_=resource_1

This will allow our crawler to access the HTML snapshot of the page.

The second rewrite rule will append the escaped fragment onto the end of a URL that contains parameters, for example:

Pretty URL: www.example.com/document/?resource_1
Rewritten URL: www.example.com/document/?resource_1&_escaped_fragment_=

The third rewrite rule will append the escaped fragment onto the end of a URL that does not contain parameters, for example:

Pretty URL: www.example.com/document/resource_1
Rewritten URL: www.example.com/document/resource_1?_escaped_fragment_=

These rules will also allow for links on the website, which already contain the ‘?_escaped_fragment_’, in which case the solution should not be appended.
 

Frequently Asked Questions

Why does Google no longer support the AJAX crawling scheme?

Google stopped officially crawling #! URLs in the summer of 2018. They have stopped supporting this scheme as Googlebot can now render AJAX websites using the web rendering service (WRS).

Do any other search engines support the AJAX crawling scheme?

At the time of writing this guide, Bing and Yandex still support the AJAX crawling scheme. Neither of these search engines have announced plans to stop supporting the AJAX crawling scheme.

How long will DeepCrawl support the AJAX crawling scheme?

The DeepCrawl team does not have any plans to deprecate this feature. For any updates please follow our blog.

What should I do if DeepCrawl is not crawling our AJAX website?

If DeepCrawl will not crawl your website, even with AJAX crawling enabled, then we recommend checking the following:

We'd also recommend reading our debugging blocked crawls and crawling issue guides.
 

AJAX crawling feedback

If you have any further questions about AJAX crawling then please get in touch.

Author

Adam Gent
Adam Gent

Search Engine Optimisation (SEO) professional with over 8 years’ experience in the search marketing industry. I have worked with a range of client campaigns over the years, from small and medium-sized enterprises to FTSE 100 global high-street brands.

Get the knowledge and inspiration you need to build a profitable business - straight to your inbox.

Subscribe today