Advanced Structured Data Principles and Practical Implementation
SEO Track >> Building a Better Web: Structured Data Applications
SMXL Milan 2018 featured 60 speakers from the U.S., Europe and The Middle East. I take great pride in being Chairman of this event and wish to thank Business International for their continued trust in me and my capabilities to organize and mange such an important event.
This year I took the stand with a presentation on a panel dealing with Structured Data we put together with David Amerland.
David gave an overview on the impacts and needs to provide “intelligent” data to search engines. My presentation went into the details and provided insights on:
- The Benefits of Structured Data
- A closer look at the underlying principles
- Practical implementation aspects
The Benefits of Structured Data
Implementation of structured data isn’t easy, nor is it straight forward: it will come at a cost.
Your first effort is going to be proving to management why investing in structured data is important.
This chart is taken from a website I recently optimized. It’s a small business operating in a very specific niche. It lacked SEO (Before SEO).
A first level of optimization pushed the number (and the quality) of search queries up a notch (After SEO) . Once structured data and semantic optimization were implemented and understood by Google, the number of search queries significantly increased and improved in quality (After Structured Data Implementation & Entity Identification).
A consequence of better targeting is that you will attract a smaller number of more specific queries which in turn will generate cost effective leads or sales. In this example which was taken from another website, we have seen Year on Year:
- a 10% fall in traffic
- a 20% lift in goal completions
- a 31% goal conversion rate
Theoretical Aspects of Structured Data
An information web is an organic entity that grows from the interests and energy of the communities that support it
In other words and from a practical standpoint: The web has been the playground of millions of people, professionals, small businesses and corporations – all at work over the past 20 years, producing billions of web pages, mostly lacking structure to help search engines understand what they about.
Anyone can say Anything about Any Topic
This is an enormous problem for the search engines for a very simple reason: they often lack the “intelligence” to “understand” the context. This can lead to weak (or wrong) inferencing because of a weak (or inaccurate) understanding of on-page concepts.
The Resource Description Framework (RDF) defines information about resources. Resources can be anything: documents, people, physical objects or abstract concepts
In RDF information is presented by a Node-Arc model
In RDF the description of a resource is represented by a series of triples. The components of each triple are:
Subject – Predicate – Object
A triple emulates the structure of a simple phrase such as “Sante lives in L’Aquila…”
- The Subject of the triple is the URI, identifying the resource being described.
- The Object may be a value (string, number, data, …) or the URI of another resource which is somehow related to the subject.
- The Predicate explains the relationship between subject and object and is a URI chosen among those available in the various Vocabularies …
This may sound extremely theoretical and way too difficult to understand but it’s not – let’s see this in action and understand what I’ve just stated:
Let’s take a simple phrase, analyze it, and break it down into triples using RDF:
Mario Rossi works for Azienda Srl. Their corporate VATID is ITXXXXXXXXXXX, they can be reached via email at email@example.com or you can call at +39. XXX XX XX XXX
The underlying concepts:
- Mario Rossi is a Person
- The entity Mario Rossi has name “Mario Rossi”
- He worksFor Azienda Srl
- Azienda Srl has VatID ITXXXXXXXXXXX
- Azienda Srl has email firstname.lastname@example.org
- Azienda Srl telephone number is +39. XXX XX XX XXX
- Azienda Srl is an Organization
- Azienda Srl has legalName Azienda Srl
Each one of these statements can be organized in a triple:
<https://www.mariorossi.it/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> .
<https://www.mariorossi.it/> <http://schema.org/name> “Mario Rossi” .
<https://www.mariorossi.it/> <http://schema.org/worksFor> <https://www.aziendasrl.it/> .
<https://www.aziendasrl.it/> <http://schema.org/vatID> “ITXXXXXXXXXXX” .
<https://www.aziendasrl.it/> <http://schema.org/email> “email@example.com” .
<https://www.aziendasrl.it/> <http://schema.org/telephone> “+39. XXX XX XX XXX” .
<https://www.aziendasrl.it/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Organization> .
<https://www.aziendasrl.it/> <http://schema.org/legalName> “Azienda Srl” .
The current trend is to serialize this information in a JSON-LD file as follows:
Examples of Structured Data Implementation
Schema Markup for Documents using JSON-LD
While Google features a limited number of content types providing rich results in the SERPs, they should not be the main focus of your attention – in fact your content might not just fit what Google is featuring, or you may fall short of some requirements to be featured in a position ZERO carousel: for example, you might have an interesting article to write and think you may benefit from the Article rich snippet markup. The truth is your chances of making it in the Top stories carousel are very slim unless your site is a recognized authority in that specific vertical.
Should this stop you from implementing Structured data? No it shouldn’t!
Here’s an example.
The world is full of documents and forms – take this one as an example to analyze:
In many cases, a document such as this, cannot be described by a simple ALT tag. The use of Structured Data and Schema.org offer the opportunity to provide a series of important details which help the search engines understand the topic and provide context which helps identify (with a higher degree of confidence) the right answer to the queries being researched by users in need of specific information. Which inforamtion can we extract and make explicit about this document?
- We can categorize this document as of Type DigitalDocument
- It has been issued to a generic Mr. Mario Rossi, owner of website mariorossi.it, at section https://www.mariorossi.it/documents/
- The name of this DigitalDocument is Approved Waivers
- This DigitalDocument is about URI https://www.uscis.gov/i-601
- https://www.uscis.gov/i-601 is of Type Thing
- https://www.uscis.gov/i-601 has name “I-601, Application for Waiver of Grounds of Inadmissibility“
- https://www.mariorossi.it/documents/ has part https://www.mariorossi.it/documents/waiver.png
- https://www.mariorossi.it/documents/waiver.png is of Type ImageObject
- https://www.mariorossi.it/documents/waiver.png has name “Form I-797, Notice of Action“
- https://www.mariorossi.it/documents/waiver.png has description “application for travel document: approval notice“
- https://www.mariorossi.it/documents/waiver.png is about https://www.wikidata.org/wiki/Q5422397
- https://www.wikidata.org/wiki/Q5422397 has name “Extreme Hardship“
We could go on and add more information to better describe the document – there are many more Types we can use. This depends on your specific needs and overall strategy towards building a knowledge graph for your website.
What would this look like if we were to put it in a JSON-LD file? Like this:
We can take this one step further to identify the most relevant entities on page and within our website and bring to the next level our efforts of disambiguation, in order to reduce ambiguity and uncertainty –
Would you like to know more on this subject?
Download my presentation now, or leave a comment – I’ll answer.