Data Packages
Data Package is a simple container format used to describe and package a collection of data. The format provides a simple contract for data interoperability that supports frictionless delivery, installation and management of data.
Data Packages can be used to package any kind of data. At the same time, for specific common data types such as tabular data it has support for providing important additional descriptive metadata -- for example, describing the columns and data types in a CSV.
The following core principles inform our approach:
- Simplicity
- Extensibility and customisation by design
- Metadata that is human-editable and machine-usable
- Reuse of existing standard formats for data
- Language, technology and infrastructure agnostic
The Data Package Suite of Specifications
Over time the single Data Package spec has evolved into a suite of specs -- partly through componentization whereby the original spec is in several components and partly through extension.
The main specifications are:
- Data Package specification, a simple format for packaging data for sharing between tools and people
- Tabular Data Package, a format for packaging tabular
data that builds on Data Package and which uses:
- Table Schema, a specification for defining a schema for tabular data
- CSV Dialect Description Format (CSV-DDF), a specification for defining a dialect for CSV data.
How do these specifications relate?
A Data Package can "contain" any type of file. A Tabular Data Package is a type of Data Package specialized for tabular data and which "contains" one or more CSV files. In a Tabular Data Package, each CSV must have a schema defined using Table Schema and, optionally, a dialect defined using CSV-DDF. An application or library that consumes Tabular Data Packages therefore must be able to understand not only the full Data Package specification, but also Table Schema and CSV-DDF.
For more information on each specification, see below: