Structured data and Gutenberg

The new “Gutenberg” editor is supposed to be the future of WordPress. The project aims to completely revamp the editing UI and leaves behind many old paradigms. But many of those are well established, loved and needed for future developments. So I’m asking myself: What about structured data?

What is structured data?

Just a few days ago I had the pleasure to attend and speak at a convention titled “How technology changes journalism” at the Academy of Political Education in Tutzing, Bavaria. That event gave me access to people like the Werner Wittmann, the head of digitalization of “Kicker“, one of the biggest German football magazines, or Johannes Sommer, CEO of Retresco, one of the market leaders of content automation.

Everyone came from a vastly different field, but everyone agreed upon the importance of structured data. But the definition of structured data changes a bit depending of the context. An SEO Consultant has a bit of a different idea about structured data than a sports journalist. But the underlying concept is always the same.

We are essentially talking about a collection of machine readable data that describes a piece of content or that is the content itself. Typical examples are the tags and categories in WordPress, which should describe the content of the blog post. But it doesn’t end there in real world applications.

Examples and usecases

Rainy WeatherThe daily weather forecasts are a good example for the usage of structured data. Huge networks of mostly automated systems are collecting data (temperature, humidity, wind speed, etc.) and transform this data into content of every form. Websites, Google, TV, Radio, Newspaper. Companies like retresco take the data and forms thousands of articles and texts. Human journalists would be incapable to produce an output this huge, so automation based on data frees them from that burden.

Further examples are easy to find. Nearly every aspect of sports reporting is based on data collected from the games. Google uses structured data to enhance its search results… and now we’ve hit the nerve! Having structured data available gives us a measurable advantage in SEO!

What Do we need that for?

Generally spoken, whenever we want a (semi-)automated system to understand our content we need structured data. As mentioned Google is the closest use case to WordPress to this. Google is using the concept of the “Semantic Web” to create machine readable entities from websites that have html code with structured data.

But it doesn’t end there. Almost every AI or Voice Assistant feature currently relies on data. If you want to offer an Alexa Skill, a Google Home integration or a Siri App, you will need structured data.

But also if you want to display regular content, having your data structured is a clear advantage. You are able to rearrange and customize your content based on device, usage and user. A personal use-case in the WordPress World is our Beergarden Finder.

The Beergarden Finder

Beergarden Finder MunichThis project was a proof of concept to use a WordPress System as a headless CMS to use it as a backend for an app. We ended up using WordPress itself as the marketing page and collected the data of the beergardens as structured data in an own Custom Post Type.

For every beergarden we collected address, location, open hours, beer brand and much more to be able to serve the data based on the current need of the user. This would not have been possible with a blob of text.

So it is vitally important to collect all this data in your backend by hand. There will be a time in the future, where NLP (Natural Language Processing) is more reliable and accessible, but those systems are not yet available for the broad masses and are not usable for every use case.

The other end of the spectrum

The opposite of having structured data is using a page builder. The content is arranged in a blob of text, html, media and custom code, that is very hard for an external system to understand. Websites built with page builders are prone to making it very hard for SEO, the story is very similar with structured data.

The current approach on many pages is to define a second, hidden set of content, that serves as the structured data representation of the content. I think it is clear that this approach doesn’t scale very well and quickly becomes a burden on the editors.

Why are we talking about this?

My main argument up to this point is, that having structured data available for your content is a way to future proof your content. The technological surge of the past decade has shown that we have basically no idea how our content is consumed in the next decade.

The WordPress Team and especially the REST API team have made a tremendous effort to prepare us for that uncertainty. Want to attach a Chatbot to your content? The REST API is your friend. Alexa Skills, Mobile Apps, Watch-Faces for smartwatches? So much more beyond the usual website is possible thanks to awesome WordPress contributors.

But the Gutenberg team? The metabox discussion has painfully shown the stance of the core Gutenberg team on this issue. Having the ability to attach a variety of custom content boxes to posts, custom post types and different areas of WordPress seems like an afterthought, sacrificed in the race to become yet another page builder like Squarespace or Wix.

With TinyMCE our content is saved as semi-valid HTML in the database. That is a type of content that is very readable for many applications, so outputting that to systems like RSS Feeds, Instant Articles, Google AMP, external Apps attached via the REST API, etc is not very hard.

The Gutenberg team chose to join content and presentation to a custom set of code-comments, that gets saved in the database. And while the TinyMCE solution is not perfect from a structured data standpoint, the Gutenberg way makes this content almost unusable without a huge effort of custom interpretation, which results in development, time and cost.

Why we are worried about this!

While having structured data available is currently more or less neglectable for small portfolio websites, nearly every one of our enterprise customer is relying on having that data available.

The use-cases range from displaying all stores on a map to pushing and clustering content from WordPress to a complicated set of display advertising, apps and digital signage. Long story short: Having structured data is essential for the enterprise market.

The current approach to this makes us worry about the extensibility and usability of custom metadata and therefore the choice of WordPress in that market region, where the contenders are Contentful, and huge enterprise CMS like the Adobe Marketing Cloud.

Our plea

Dear Guttenberg team, please think long and hard on how to attach and edit custom metadata on a piece of content. Saving everything as a blob of custom code is no solution.

We do not want WordPress to become yet another of those awful page builders. We want WordPress to become the industry leader to create and curate content for the modern web of tomorrow! 

3 Replies to “Structured data and Gutenberg”

  1. I understand your concern that storing structured data has been slow moving in Gutenberg. I hope you’ll be pleased to see progress on this, Gutenberg 1.2 was released today, which includes support for storing data in post meta, instead of post_content.

    I’d love for you to check this out, and see how it works for your use case. If there are changes we could make to improve it for you, please let us know in the Gutenberg issue tracker! ?

    1. Hey Gary,

      hell yeah I will! I really appreciate your work and thank you for pushing WordPress forward!