Tools, Integrations, Libraries, and Platforms
Tools and Integrations
Data Package Viewer (service)
View Data Package metadata in human-readable form.
Good Tables (service)
Continuous data validation, as a service: http://goodtables.io/
Good Tables (service)
A web service to validate and process tabular data: http://goodtables.okfnlabs.org/
Data Quality Dashboard
Data Quality Dashboards display statistics on a collection of published data.
Data Package Pipelines
Framework for processing data packages in pipelines of modular components.
Data Package Manager (dpm)
Data Package Manager.
DataPackagist is a webservice for creating Data Packages: http://datapackagist.openknowledge.io/
A desktop CSV editor for data publishers: http://comma-chameleon.io/
A set of functions written in M for working with Tabular Data Packages in Power BI Desktop and Power Query (also known as ‘Get & Transform’) in Excel.
Python language parser for a tabular format for structured metadata: http://metatab.org/
CSV Lint is a webservice for validating tabular data: http://csvlint.io/
An easy interface for documenting data packages.
A simple JavaFX application to load, save and edit a CSV file and provide a JSON configuration for columns to check the values in the columns: http://frosch95.github.io/SmartCSV.fx/
Create simple APIs from CSV files.
The Data Retriever uses the Data Package format internally. It is a package manager for data. It downloads, cleans, and stores publicly available data, so that analysts spend less time cleaning and managing data, and more time analyzing it: http://www.data-retriever.org/
BIML Enabled Tabular Data Package Importer
BIML (Business Intelligence Markup Language) project that uses datapackage.json to generate SSIS packages that can load the contents of a Tabular Data Package into a SQL Server database.
The following libraries at least a subset of the Frictionless Data stack: http://specs.frictionlessdata.io/implementation/.
A Python library for working with Data Packages.
A Python library for working with Table Schema.
Table Schema to SQL module for jsontableschema-py.
Table Schema to BigQuery module for jsontableschema-py.
Table Schema to Pandas module for jsontableschema-py.
Validate and process tabular data in Python.
Consistent interface for stream reading and writing tabular data (csv/xls/json/etc).
Data Package reader for Pandas.
Create an ERD for a database given as Table Schema.
Create Table Schema from a live PostgreSQL database.
CSVDDF support for Python.
R Data Package Library.
R Data Package Manager.
R Open Data Protocols Library.
Ruby library and tools for working with Data Packages.
A Ruby library for working with Table Schema.
A ruby gem to support validating CSV files to check their syntax and contents.
Work with tabular data packages (lets you download, load or query datasets using SQL via ActiveRecord - thus, works with any SQL database; defaults to an in-memory SQLite database).
A validator and storage library for working with Table Schema.
Provides struct specifications for Data Package as well as a command line tool to create Data Packages.
A function to read data from a Tabular Data Package is available for download from MATLAB Central’s File Exchange: http://www.mathworks.com/matlabcentral/fileexchange/47506-read-tabular-data-package
Data Packages are currently being published by the following repositories:
The Fiscal Data Package is the native format for datasets published on OpenSpending.
Data.world provides all datasets as Data Packages.
Open Power System Data
Open Power System Data develops a free-of-charge platform for open data dedicated to electricity system researchers.
datahub.io (and other CKAN instances)
All datasets on datahub.io can be exported as
Data Packages. Other CKAN instances can install the
extension to gain this feature.
Central de Dados
Central de Dados is a repository of Open Data in Portugal.
Octopub provides a platform to publish CSV data on an automatically created webpage.
HarvestChoice publishes its bulk agricultural data as zipped Data Packages.
Dataship is a way to share data and analysis, from simple charts to complex machine learning, with anyone in the world easily and for free.
Tesera publishes a variety of Data Package-aware tools.
Case Study: http://frictionlessdata.io/case-studies/tesera/
Python Test Suite
Perl module (preliminary)
Import for Google Spreadsheets
Import Tabular Data Packages into Google Spreadsheets:
The default representation of a Data Package is JSON. However, for convenience, users may wish to represent Data Package metadata in other formats.
All the tools, libraries, and platforms above assume the JSON representation.
JSON media types registered with IANA.
CSVY uses a YAML version of the Table Schema convention.
Metatab uses a tabular representation for metadata (which can be translated to Data Package metadata).
- Tools and Integrations