If you need to crawl a site with AJAX dynamically-generated content, you may find that the escaped fragment solution is used to make the content accessible to spiders.
There are two scenarios where this is implemented: pages with and without hashbang in the URLs.
Pages with hashbang in the URLs (hash fragment with exclamation mark)
Firstly, the URLs that you want crawled might include the hashbang in the URL, for example:
These are known as ‘pretty URLs’.
In this case, anything before the ‘#!’ symbol is a document that is returned by the server, while anything after is used so the browser can identify what content from within that document needs to be built, or displayed, to the user.
In this case, the #! has to be replaced by _escaped_fragment_=, transforming the pretty URL into what is known as an ‘ugly URL’, in order for the crawler to access these pages.
Pages without a hashbang in the URLs
Alternatively, the URLs could contain #!. In this scenario, the _escaped_fragment_= needs to be appended to the end of the URL in order to make the ugly URL, for example:
These ugly URLs are only used for crawling, as search engines will return the pretty URL to the user once the document has been crawled and indexed.
Requirements for the escaped fragment solution
There are two requirements for the escaped fragment solution to work. Firstly, the site needs to ‘opt in’ to the AJAX crawling scheme , so that the crawler requests the ugly URLs. This is done by adding a trigger to the head of an HTML page:
If your page does not include the hashbang (#!), but does include the directive in the head, then the escaped fragment solution will be appended to the end of a URL, as above.
Crawling Escaped Fragment with DeepCrawl
As a crawler, DeepCrawl is also unable to see anything that isn’t part of the HTTP requests, and will not be able to identify these as separate URLs as standard.
However, it’s possible to crawl a site by making a few adjustments in the Advanced Settings of your DeepCrawl project.
1 ) Firstly, uncheck the ‘Strip # Fragments from all URLs’ box. This will force DeepCrawl to crawl the hashed URLs.
2 ) Then, under the URL rewriting section, add the following:
If you already have a DeepCrawl account, you can see this working here using Fetch as DeepCrawl.
The first rewrite rule will replace #! bit with the _escaped_fragment_= one in all URLs, for example:
Meaning that these URLs can be crawled.
The second rewrite rule will append the escaped fragment onto the end of a URL that contains parameters, so that:
is rewritten to:
The third rewrite rule will append the escaped fragment onto the end of a URL that does not contain parameters, so that:
is rewritten to:
These rules will also allow for links on the website which already contain the ‘?_escaped_fragment_’, in which case the solution should not be appended.
A word of caution…
Google is going to drop support for AJAX-crawling at some point in the future. John Mueller has stated: “I suspect at some point we'll deprecate our recommendation to use the AJAX-crawling proposal (escaped-fragment/hash-bang-URLs), though we'll probably support crawling & indexing of that content for a longer time.” Read more details at SERoundtable’s post.