Developer Guide

This guide introduces you to the Frictionless Data tool stack and how you can contribute to it.

Asking questions and getting help

If you have a question or want help the best way to get assistance is to join our public chat channel and ask there -- prompt responses are guaranteed:

https://gitter.im/frictionlessdata/chat

Example and Test Data Packages

We have prepared a variety of example and test data packages for use in development:

Key Concepts and Pre-requisites

This entity diagram gives an overview of how the main different objects fit together. The top row is a generic Data Package and the row below shows the case of Tabular Data Package.

This guide will focus on Tabular Data Packages as that is the most commonly used form of Data Packages and is suited to most tools.

This guide will assume you already have some high-level familiarity with the Data Package family of specifications. Please a take a few minutes to take a look at the overview if you are not yet familiar with those specs.

Implementing a Data Package Tool Stack

Here's a diagram that illustrates some of the core components of a full Data Package implementation.

The italicised items are there to indicate that this functionality is less important and is often not included in implementations.

General Introduction

As a Developer the primary actions you want to support are:

  • Importing data (and metadata) in a Data Package into your system
  • Exporting data (and metadata) from your system to a Data Package

Addition actions include:

  • Creating a Data Package from scratch
  • Validating the data in a Data Package (is the data how it says it should be)
  • Validating the metadata in a Data Package
  • Visualizing the Data Package
  • Publishing the Data Package to an online repository

Programming Language

This is example pseudo-code for a Data Package library in a programming language like Python or Javascript.

Importing a Data Package

# location of Data Package e.g. URL or path on disk
var location = /my/data/package/

# this "imports" the Data Package providing a native DataPackage object to work with
# Note: you usually will not load the data itself
var myDataPackage = new DataPackage(location)
var myDataResource = myDataPackage.getResource(indexOrNameOfResource)

# this would return an iterator over row objects if the data was in rows
# optional support for casting data to the right type based on Table Schema
var dataStream = myDataResource.stream(cast=True)

# instead of an iterator you may want simply to convert to native structured data type
# for example, in R where you have a dataframe you would do something like
var dataframe = myDataResource.asDataFrame()

Accessing metadata

# Here we  access to Data Package metadata
# the exact accessor structure is up to you - here it an attribute called
# metadata that acts like a dictionary
print myDataPackage.descriptor['title']

Exporting a Data Package

A simple functional style approach that gets across the idea:

# e.g. a location on disk
var exportLocation = /path/to/where/data/package/should/go
export_data_package(nativeDataObject, exportLocation)

A more object-oriented model fitting with our previous examples would be:

var myDataPackage = export_data_package(nativeDataObject)
myDataPackage.save(exportLocation)

# if the native data is more like a table a data then you might have
var myDataPackage = new DataPackage()
myDataPackage.addResourceFromNativeDataObject(nativeDataObject)

# once exported to
myDataPackage.saveToDisk(path)

# You can also provide access to the Data Package datapackage.json
# That way clients of your library can decide how they save this themselves
var readyForJSONSaving  = myDataPackage.dataPackageJSON()
saveJson(readyForJSONSaving, '/path/to/save/datapackage.json')

Creating a Data Package from scratch

var myMetadata = {
  title: 'My Amazing Data'
}
var myDataPackage = new DataPackage(myMetadata)

Data Validation

This is Tabular Data specific.

var resource = myDataPackage.getResource()
# check the data conforms to the Table Schema
resource.validate()

# more explicit version might look like
var schema = resource.schemaAsJSON()
var tsValidator = new TSValidator(schema)
# validate a data stream
schema.validate(resource.stream())

Validating Metadata

Specific Software and Platforms

For a particular tool or platform usually all you need is simple import or export:

# import into SQL (implemented in some language)
import_datapackage_into_sql(pathToDataPackage, sqlDatabaseInfo)

# import into Google BigQuery
import_datapackage_into_bigquery(pathToDataPackage, bigQueryInfo)

Examples

Python

The main Python library for working with Data Packages is datapackage:

See http://github.com/frictionlessdata/datapackage-py

Additional functionality such as TS and TS integration:

tabulator is a utility library that provides a consistent interface for reading tabular data:

https://github.com/frictionlessdata/tabulator-py

Here's an overview of the Python libraries available and how they fit together:

Python libraries

Javascript

Following "Node" style we have partitioned the Javascript library into pieces, see this list of libraries:

SQL Integration

Coming soon: walk-through of Python SQL integration.

Related blog post: http://okfnlabs.org/blog/2017/10/05/frictionless-data-specs-v1-updates.html

bookdocsexternal fforumgithubgitterheartpackageplayrocket tools