Frictionless Data is about removing the friction in working with data. We are doing this by developing a set of tools, standards, and best practices for publishing data. The heart of Frictionless Data is the Data Package standard, a containerization format for any kind of data based on existing practices for publishing open-source software.
Informed by our work building and deploying CKAN and learning about various data publication workflows, we have learned that there is too much friction in working with data. The frictions we seek to remove—in getting, sharing, and validating data—stop people from truly benefiting from the wealth of data being opened up every day. This kills the cycle of find/improve/share that makes for a dynamic and productive data ecosystem.
The idea behind Frictionless Data is to decouple data publishers from their consumers by creating a common interchange format: publishers can publish their data as a Data Package, and consumers can load that Data Package into their favorite tool. Without Data Packages, every tool for working with data has to support an ever-increasing number of formats for import and export.
We have been working on these and similar issues for nearly a decade, and we think that the time is right for frictionless data. Help us get there.
We see our approach as analogous to standardization efforts in the transport of physical goods. Historically, loading goods on a cargo ship was slow, manual, and expensive. The solution to these issues came in the form of containerization, the development of several ISO standards specifying the dimensions of containers used in global shipping. Containerization dramatically reduced the cost and time required for transporting goods by enabling the automation of several elements of the transport pipeline.
We currently consider transporting data between and among tools to be comparable to shipping physical goods in the pre-containerization era. Currently, before you can properly begin an analysis of your data or build a data-intensive app, you have to extract, clean, and prepare your data: procedures that are often slow, manual, and expensive. Radical improvements in data logistics—through specialisation and standardisation—can get us to world where we spend less time sorting through and cleaning data and more time creating useful insight.