A Short Case Study Involving Table Schema Frictionless Specs at the European Union
Do you remember Costas Simatos (opens new window)? He introduced the Frictionless Data community to the Interoperability Test Bed (opens new window) (ITB), an online platform that can be used to test systems against technical specifications — curious minds will find a recording of his presentation on the subject available on YouTube (opens new window). Amongst the tools it offers, there is a CSV validator (opens new window) which relies on the Table Schema specifications (opens new window). Those specifications filled a gap that the RFC 4180 (opens new window) didn’t address by having a structured way of defining the content of individual fields in terms of data types, formats and constraints, which is a clear benefit of the Frictionless specifications as reported back in 2020 when a beta version of the CSV validator was launched (opens new window).
Frictionless specifications are flexible while allowing users to define unambiguously the expected content of a given field, therefore they were officially adopted to realise the validator for the Kohesio pilot phase of 2014-2020 (opens new window), Kohesio (opens new window) being the “Project Information Portal for Cohesion Policy”. The Table Schema specifications made it easy and convenient for the Interoperability Test Bed to establish constraints and describe the data to be validated in a concise way based on an initial set of CSV syntax rules (opens new window), converting written and mostly non-technical definitions to their Frictionless equivalent. Using simple JSON objects, Frictionless specifications allowed the ITB to enforce data validation in multiple ways as can be observed from the schema used for the CSV validator (opens new window). The following list of items calls attention to the core aspects of the Table Schema standard that were taken advantage of:
- Dates can be defined with string formatting (e.g.
%d/%m/%Y
stands forday/month/year
); - Constraints can indicate whether a column can contain empty values or not;
- Constraints can also specify a valid range of values (e.g.
"minimum": 0.0
and"maximum": 100.0
); - Constraints can specify an enumeration of valid values to choose from (e.g.
"enum" : ["2014-2020", "2021-2027"]
). - Constraints can be specified in custom ways, such as with regular expressions (opens new window) for powerful string matching capabilities;
- Data types can be enforced for any column;
- Columns can be forced to adapt a specific name and a description can be provided for each one of them.
Because these specifications can be expressed as portable text files, they became part of a multitude of tools to provide greater convenience to users and the validation process has been documented extensively (opens new window). JSON code snippets from the documentation highlight the fact that this format conveys all the necessary information in a readable manner and lets users extend the original specifications as needed. In this particular instance, the CSV validator can be used as a Docker image (opens new window), as part of a command-line application (opens new window), inside a web application (opens new window) and even as a SOAP API (opens new window).
Frictionless specifications were the missing piece of the puzzle that enabled the ITB to rely on a well-documented set of standards for their data validation needs. But there is more on the table (no pun intended): whether you need to manage files, tables or entire datasets, there are Frictionless standards to cover you. As the growing list of adopters and collaborations demonstrates, there are many use cases to make a data project shine with Frictionless.
Are you working on a great project that should become the next glowing star in the world of Frictionless Data? Feel free to reach out to spread the good news!