Frictionless Data Frictionless Data
Introduction
Projects
Universe
Adoption
People
Fellows (opens new window)
  • Architecture
  • Roadmap
  • Process
  • Get Help
  • Contribute
  • Code of Conduct
  • Events Calendar
  • Forum (opens new window)
  • Chat (Slack) (opens new window)
  • Chat (Matrix) (opens new window)
Blog
Introduction
Projects
Universe
Adoption
People
Fellows (opens new window)
  • Architecture
  • Roadmap
  • Process
  • Get Help
  • Contribute
  • Code of Conduct
  • Events Calendar
  • Forum (opens new window)
  • Chat (Slack) (opens new window)
  • Chat (Matrix) (opens new window)
Blog
  • Using Data Packages in Clojure

    • Setup
      • The Data
        • Loading the Data Package
          • Casting Types with core.spec

          Using Data Packages in Clojure

          May 7, 2018 by Matt Thompson
          Price icons created by Pixel perfect - Flaticon Clojure

          Matt Thompson was one of 2017’s Frictionless Data Tool Fund (opens new window) grantees tasked with extending implementation of core Frictionless Data data package (opens new window) and table schema (opens new window) libraries in Clojure programming language. You can read more about this in his grantee profile. In this post, Thompson will show you how to set up and use the Clojure (opens new window) libraries for working with Tabular Data Packages (opens new window).

          This tutorial uses a worked example of downloading a data package from a remote location on the web, and using the Frictionless Data tools to read its contents and metadata into Clojure data structures.

          # Setup

          First, we need to set up the project structure using the Leiningen (opens new window) tool. If you don’t have Leiningen set up on your system, follow the link to download and install it. Once it is set up, run the following command from the command line to create the folders and files for a basic Clojure project:

          
          lein new periodic-table
          
          

          This will create the periodic-table folder. Inside the periodic-table/src/periodic-table folder should be a file named core.clj. This is the file you need to edit during this tutorial.

          # The Data

          For this tutorial, we will use a pre-created data package, the Periodic Table Data Package hosted by the Frictionless Data project. A Data Package (opens new window) is a simple container format used to describe and package a collection of data. It consists of two parts:

          • Metadata that describes the structure and contents of the package
          • Resources such as data files that form the contents of the package

          Our Clojure code will download the data package and process it using the metadata information contained in the
          package. The data package can be found here on GitHub (opens new window).

          The data package contains data about elements in the periodic table, including each element’s name, atomic number, symbol and atomic weight. The table below shows a sample taken from the first three rows of the CSV file:

          atomic number symbol name atomic mass metal or nonmetal?
          1 H Hydrogen 1.00794 nonmetal
          2 He Helium 4.002602 noble gas
          3 Li Lithium 6.941 alkali metal

          # Loading the Data Package

          The first step is to load the data package into a Clojure data structure (a map). The initial step is to require the data package library in our code (which we will give the alias dp). Then we can use the load function to load our data package into our project. Enter the following code into the core.clj file:

          (ns periodic-table.core
            (:require [frictionlessdata.datapackage :as dp]
                      [frictionlessdata.tableschema :as ts]
                      [clojure.spec.alpha :as s]))
          
          (def pkg
            (dp/load "https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json"))
          

          This pulls the data in from the remote GitHub location and converts the metadata into a Clojure map. We can access this metadata by using the descriptor function along with keys such as :name and :title to get the relevant information:

          (println (str "Package name:" (dp/descriptor pkg :name)))
          (println (str "Package title:" (dp/descriptor pkg :title)))
          

          The package descriptor contains metadata that describes the contents of the data package. What about accessing the data itself? We can get to it using the get-resources function:

          (def table (dp/get-resources pkg :data))
          
          (doseq [row table]
            (println row))
          

          The above code locates the data in the data package, then goes through it line by line and prints the contents.

          # Casting Types with core.spec

          We can use Clojure’s spec (opens new window) library to define a schema for our data, which can then be used to cast the types of the data in the CSV file.

          Below is a spec description of a periodic element type, consisting of an atomic number, atomic symbol, the element’s name, its mass, and whether or not the element is a metal or non-metal:

          (s/def ::number int?)
          (s/def ::symbol string?)
          (s/def ::name string?)
          (s/def ::mass float?)
          (s/def ::metal string?)
          
          (s/def ::element (s/keys :req [::number ::symbol ::name ::mass ::metal]))
          

          The above spec can be used to cast values in our tabular data so that they match the specified schema. The example below shows our tabular data values being cast to fit the spec description. Then the -main function loops through the elements, printing only those with an atomic mass of over 10.

          (ns periodic-table.core
            (:require [frictionlessdata.datapackage :as dp]
                      [frictionlessdata.tableschema :as ts]
                      [clojure.spec.alpha :as s]))
          
          (s/def ::number int?)
          (s/def ::symbol string?)
          (s/def ::name string?)
          (s/def ::mass float?)
          (s/def ::metal string?)
          
          (s/def ::element (s/keys :req [::number ::symbol ::name ::mass ::metal]))
          
          (def pkg
            (dp/load "https://raw.githubusercontent.com/frictionlessdata/example-data-packages/62d47b454d95a95b6029214b9533de79401e953a/periodic-table/datapackage.json"))
          
          (def resources (dp/get-resources pkg :data))
          
          (def elements (dp/cast resources element))
          
          (defn -main []
            (doseq [e elements]
              (if (< (:mass e) 10)
                (println e))))
          

          When run, the program produces the following output:

          $ lein run
          {::number 1 ::symbol "H" ::name "Hydrogen" ::mass 1.00794 ::metal "nonmetal"}
          {::number 2 ::symbol "He" ::name "Helium" ::mass 4.002602 ::metal "noble gas"}
          {::number 3 ::symbol "Li" ::name "Lithium" ::mass 6.941 ::metal "alkali gas"}
          {::number 4 ::symbol "Be" ::name "Beryllium" ::mass 9.012182 ::metal "alkaline earth metal"}
          

          This concludes our simple tutorial for using the Clojure libraries for Frictionless Data.

          We welcome your feedback and questions via our Frictionless Data Gitter chat (opens new window) or via GitHub issues (opens new window) on the datapackage-clj (opens new window) repository.

          Blog Index