GoodTables is a managed service to validate tabular data. It can check the structure of your data (e.g. all rows have the same number of columns), and its contents (e.g. all dates are valid). Internally, it uses the Data Quality Spec for common tabular data errors. GoodTables also supports data described by Data Package and Table Schema.
Let’s visit the GoodTables website and login with GitHub to start the process of validating our data.
Add a data source in the dashboard using GitHub (Amazon S3 is also supported, but we’re only covering GitHub here):
We need to create a GitHub repository to store our
helloworld.csv file. Make sure you use the valid CSV from our example above.
Because we have valid and well-structured data in our
helloworld.csv, the results will come back as valid, as seen in the image below
Now, let’s change to invalid tabular data and see what the checks return:
Of course, this build will fail because some structural errors were detected by GoodTables (“Blank Header”, “Missing value”, and “Extra Value”).
Additionally, here’s a video walkthrough of the content outlined above