Page Rendering Service (PRS)
For all the latest features that Google Chrome supports we recommend referring to chromestatus.com or use the compare function on caniuse.com.
At Google I/O 2019, Martin Splitt a Webmaster Trends Analyst at Google announced that the web rendering service (WRS) component of Googlebot uses the latest stable version of Chrome to render web pages.
This announcement now means that both DeepCrawl’s page rendering service (PRS) and web rendering service (WRS) component of Googlebot are using the latest stable version of Chrome to render the web.
As our PRS has been designed to fetch and render pages similar to the behaviour of Googlebot and the WRS, and both are using the same stable release of Chrome, there should be few discrepancies in the types of web platform features and capabilities supported compared to what Googlebot can render.
For a full list of features that the latest stable version of Google Chrome supports we recommend referring to chromestatus.com or use the compare function on caniuse.com.
For information around WRS and Googlebot our team recommends the following resources:
- Google I/O 2019: Google Search and Java Script Sites
- Googlebot: SEO Mythbusting
If you wish to better understand if Bingbot can render pages, then we recommend using the Bing mobile-friendly test tool – as it uses the same customisable rendering engine as Bingbot.
How does PRS work in DeepCrawl?
DeepCrawl is a cloud-based website crawler that follows links on a website or web app and takes snapshots of page-level technical SEO data.
The page rendering service works in DeepCrawl as follows:
- Start URL(s) and URL data sources are inputted into the project settings.
- The web crawler begins with the start URL(s) based on the project settings.
- The start URL is fetched using the PRS.
- The PRS fetches a page and will wait a maximum of 10 seconds for the server to respond and page to load.
- The PRS will then wait a maximum of 5 seconds for any custom injection scripts to run.
- Once the page responds and loads, and the custom scripts are run the crawler grabs both the raw HTML and rendered HTML of a page.
- The rendered HTML is parsed, and the SEO metrics are stored by the crawler.
- Any links discovered in the rendered HTML of the page are added to the crawl scheduler.
- DeepCrawl continues to crawl URLs which are found and added to the crawl schedule.
- The crawl scheduler waits until all web documents on the same level (click depth) have been found before the crawler can begin crawling the next level (even if lower level pages are in the URL crawl queue).
- All SEO metrics fetched by the crawler is passed to our transformer which processes the SEO data and calculates metrics (e.g. DeepRank).
- Once the transformer has finished analysing the data, it passes it to the reporting API, and the technical reports in the DeepCrawl app are populated.
Make sure PRS can crawl your website
Before crawling your website with the PRS, our team recommends reviewing the specifications below.
PRS and anchor links
Examples of links that DeepCrawl will follow:
- <a href=”https://break-hearts-not-links.com”>
- <a href=”/get/to/the/crawler.html”>
Examples of links PRS will not follow (by default):
- <a routerLink=”I/am/your/crawler.html”>
- <span href=”https://example.com”>
- <a onclick=”goto(‘https://example.com’)”>
This is in line with current SEO best practice and what Google recommends in its Search Console help documentation.
PRS and dynamic content
It is essential to understand that rendered HTML elements which require user interaction will not be picked up by the PRS. So any critical navigational elements or content which do not appear in the DOM until a user clicks or gives consent will not be captured by DeepCrawl.
Examples of dynamic elements the PRS will not pick up:
- Onclick Events
- onmouseover and onmouseout Events
- Deferred loading of page elements (lazy loading)
This default behaviour is in line with how Google currently handles events after a page has loaded.
- A Search-Marketer’s Guide to Google IO 2018
PRS is stateless when crawling pages
When the PRS renders a page, it is stateless by default, meaning that:
- Local storage data is cleared when each page is rendered.
- HTTP cookies are not accepted when the page is rendered.
This also means that by default any content which requires users to download cookies will not be rendered by the PRS.
This in line with Google’s own web rendering service specifications.
PRS declines permission requests
Any content which requires users to consent is declined by the page rendering service by default, for example:
- Camera API
- Geolocation API
- Notifications API
This in line with how Google’s web rendering service handles permission requests.
PRS static geo IP address
The PRS is unable to run a rendered crawl with a specified geo static IP. All requests from the rendered crawler will come from the address `18.104.22.168′, which is based in the United States.
If you need to whitelist us to allow crawling, you should add this IP address to your whitelist.
PRS and custom DNS
Custom DNS settings do not currently work with rendering.
PRS not able to detect state changes
The PRS is unable to detect state changes by default.
If your website uses state changes, the PRS can detect them by turning them into a proper location change by adding the following script in the “Custom Script” field.
PRS disables certain interfaces and capabilities
The PRS disables the following interfaces and capabilities in Google Chrome:
- IndexedDB and WebSQL interfaces
- Service Workers
- WebGL interface is disabled
This is in line with Google’s web rendering service specifications when handling certain interfaces and capabilities.
PRS block analytics and ad scripts
The PRS by default blocks common analytics and advertisement scripts. This is because the PRS uses an off-the-shelf version of Chrome, would execute many analytics, advertisement, and other tracking scripts during a crawl. To stop analytics data from being inflated while crawling with PRS we block these scripts by default.
Analytics Scripts Blocked
A list of analytics tracking codes DeepCrawl blocks by default:
Advertisements Scripts Blocked
A list of advertisement tracking codes DeepCrawl blocks by default:
Block custom analytics or advertisement scripts
PRS custom script injection
The PRS allows custom scripts to be injected into a page while it is being rendered. This unique feature allows for additional analysis and web page manipulation.
The page rendering service allows custom scripts to be added by:
To pull data injected onto the page using custom injection, output needs to be added to the page and then extracted using the Custom Extraction feature.
This page rendering functionality allows users to:
- Manipulate elements to the Document Object Model (DOM) of a page
- Analyse and extract Chrome page load timings for each page
- Create virtual crawls and change behaviour of DeepCrawl
Learn more about using DeepCrawl custom script injection to collect Chrome page speed metrics.
Frequently Asked Questions
If your website is still not able to be crawled then we’d recommend getting in touch with our support team.
How fast should I crawl my website with the PRS?
This will depend on your web server and technical stack on your website.
Our team always recommends running a sample crawl first to test the speed settings, to make sure that your site’s server can handle the PRS.
If you are unsure of what speed to set the crawler, please contact our Customer Success team using the help portal in the DeepCrawl app.
How do I crawl using the AJAX crawling scheme?
At the time of writing, DeepCrawl still supports the AJAX crawling scheme. For more information, please read our 60-second DeepCrawl AJAX crawling guide on how to set this up.
Please be aware, even though DeepCrawl supports the AJAX crawling scheme, Google officially announced they depreciated support for this crawling scheme.
What is the maximum timeout for the PRS?
The maximum render timeout period is 15 seconds per page. This is broken down into two steps:
- PRS has a maximum timeout period of 10 seconds for a page to respond and for content to load, then
- It will wait for a maximum of 5 seconds for any custom scripts added to the settings to load.
If the page does not complete rendering within the 15 second timeout period, then the PRS will take whatever content has been loaded on the page at that point for processing.
The page rendering service will always evaluate custom scripts as long as the server responds within 10 seconds (i.e. time to first byte).
This means that if your page takes 20 seconds to render, we will use whatever content is rendered at the 15 second point, but anything after that will be ignored.
If the server takes 14 seconds to respond to our initial request, then we will only allow it to render for one second before taking a snapshot of the page for processing.
Yes. If website’s web performance is slow then the PRS will crawl at a slower rate.
To identify if pages are slow we recommend using the Google Lighthouse or PageSpeed Insights tools to identify slow pages on your website.
What is the “render_timed_out” error?
If you receive a “render_timed_out” error, this means that when we tried to render the page, the server did not respond at all within the max timeout of 15 seconds.
This error is only used when we had no HTTP response headers and no body HTML at all.
If you are seeing this error message consistently throughout a crawl, it is likely that your server stopped responding to our crawler during the crawl – it may have been overwhelmed by requests (in which case, reducing the crawl rate can help).
Page Rendering Service Feedback
Our team sees the page rendering service as being a flagship feature in DeepCrawl which will give us the ability to add new features like Chrome page load timings. If you have any requests or ideas about what we should be doing, then please get in touch.