Deepcrawl is now Lumar. Read more.
DeepcrawlはLumarになりました。 詳細はこちら

Identifying, Monitoring and Extracting Schema Tags

SEO and Digital Marketing Best Practices
 

What is Schema Microdata?

Microdata is a form of structured markup which allows you to add metadata onto existing page content in order to help search engines understand the purpose of the content and provide richer search results for users.

You can find out more about schema metadata, including further benefits and the different types of markup in our Technical SEO Library.

Schema supported by Google

There are several forms of schema markup that are supported by Google, these include;

Company information: Provides a company overview, logo, contact details and social media profiles.

Company Information Markup

Events: This can be used by performers, venues or ticket selling sites to display dates, times, locations and other information about the event. This information can then be shown in Google’s event search features.

Event Markup

Rich snippets: Shows product information, recipes, reviews, event details (in regular search results), video thumbnails and news article information.

Review Rich Snippet

Breadcrumbs & site search boxes: These can enable the user to go directly to the specific information they want.

Site Search Box

A full list of items which can be added through schema markup can be found here.

Schema supported by Bing and Yandex

Bing currently supports markup for breadcrumbs, business and organisation information, events, people, products and offers, recipes and reviews. Read Bing’s Markup Overview for more information.

Yandex supports markup for address and organisation information, videos, questions and answers, software and movies. Products and prices, recipes and creative content including blog posts and audio objects are also supported by Yandex. View their schema.org guide for more details.

 

How to implement structured markup

Using in-page markup, structured data can be added to the HTML code of the page the information is contained on. The code is typically implemented into the head of the page, but adding it to the body is also valid.

Further information on implementation and guidelines for the use of structured markup can be found in Google’s Structured Data guide.

 

Different markup formats

Schema markup can be implemented using three different methods; Microdata, RFDa and JSON.

Microdata

Microdata is an open-community specification used to include additional tag attributes to existing HTML content.

An example of Microdata Website structured markup looks like this:

Microdata Structured Markup

RDFa

RDFa (Resource Description Framework in Attributes) is a HTML5 extension which uses the in-line markup syntax and supports linked data to describe user-visible content to search engines.

An example of RDFa Postal Address structured markup looks like this:

RDFa Structured Markup

JSON-LD

JSON-LD (JavaScript Object Notation for Linked Data) is Google’s recommended format for implementing structured markup to a page. This is due to the fact that most new types of structured data come out in this format first. JSON-LD is also provided as a blockcode, so it is easily separated from the page’s content.

An example of JSON-LD Organisation structured markup looks like this:

JSON Structured Markup
 

Auditing and monitoring schema tags using DeepCrawl

Using DeepCrawl’s Custom Extraction Tool you can check for, and extract, several types of schema tags including microdata, RDFa and JSON. This powerful feature cuts hours out of your site audits by showing you which pages have schema tags, plus which tags have been added/removed since your last crawl.

DeepCrawl Custom Extraction

Extracting schema tags through Custom Extraction

The Custom Extraction settings are found in the fourth step of the crawl setup under Advanced Settings.

DeepCrawl Advanced Settings

To check for and extract schema tags, enter the relevant regex rule below into the Extraction Regex box in Advanced Settings > Custom Extraction > New Extraction Rule.

 

Pre-written regexes for schema custom extraction

To help you extract your site’s schema tags, we’ve put together some different pre-written regexes below, depending on the method you are using.

Microdata only:

microdata:

(itemtype=["']http\:\/\/schema.org)

microdata-itemtype:

itemtype\s?=['"\s]?http\:\/\/schema.org\/([^\"\s\']*)

RDFA only:

rdfa:

(vocab=['"]http:\/\/schema.org\/['"])

rdfa-typeof:

typeof=['"]([^"']*)"

JSON only:

json-ld:

(<script type=['"]application\/ld\+json['"]>)

json-ld-type:

['"]@type['"]\s?:\s?['"]([^"']*)

Here is an example of how the JSON-LD type custom extraction would appear in the app:

DeepCrawl JSON Schema Tags Custom Extraction

For more information on using regex rules, our regex guide provides a great introduction to help you get started.

Identifying mixed schema formats

Google advises against using both microdata and JSON-LD types on a single page as it can cause confusion and can be complicated to implement and maintain. The following regex will enable you to identify any pages which contain both formats.

Mixed_Schema_Formats:

itemtype="http:\/\/schema.org\/.*?conte(x)t"\s?:\s?"http:\/\/schema.org"|conte(x)t"\s?:\s?"
http:\/\/schema.org".*?itemtype="http:\/\/schema.org\/
 

Custom Extraction Reports

Once your crawl has finalised, you will be able to find your custom extraction reports in the navigation on the left-side of the app.

DeepCrawl Custom Extraction Reports

From here you will be able to see the type of schema which has been identified on each page, along with the key metrics for each in one central report.

DeepCrawl Schema Custom Extraction Reports
 

Testing and validating schema tags

While extracting your schema tags will tell you where they are within your site, they are worthless if they are not valid. The following testing tools can help to identify any issues which may be occurring with the setup and implementation of structured data.

 

The importance of monitoring structured markup

Ensuring you are correctly implementing structured data is important to help you stand out in saturated search engine results pages. This is particularly useful if you are not currently ranking as highly as your competitors, as it enables you to achieve SERP real estate. In some cases, rich results are also likely to drive more meaningful clicks to your website. Therefore, monitoring and analysing the implementation of these tags is essential to ensure your content is able to appear in search results that are relevant to what your customers are looking for, helping to direct relevant visitors to your site.

 

Sign up for a DeepCrawl account to identify schema tags on your site

Sign up for a DeepCrawl account to try out our custom extraction feature for yourself and gain an understanding of the schema tags currently on your site.

 

Learn more about Custom Extraction

Custom Extraction is a powerful tool that can be used to check a number of different elements, including extracting analytics, social tags, backlinks, product data, and much more. It can also be used for performance measurement by collecting chrome page speed metrics, which you can find more about in our guide here.

Avatar image for Ruth Everett
Ruth Everett

Technical SEO

Ruth Everett is a data & insights manager at Code First Girls, and a former technical SEO analyst at Lumar. You'll most often find her helping clients improve their technical SEO, writing about all things SEO, and watching videos of dogs.

Newsletter

Get the best digital marketing & SEO insights, straight to your inbox