# About
# Why we’re here
Data integration is essential –- but currently painful and confusing. We want to bring simplicity and gracefulness to the messy world of data.
We want building data pipelines is as easy as playing with LEGO. We want a frictionless data ecosystem built on the unix philosophy where data flows with simplicity and grace across diverse tools and teams, enabled by the specs and patterns we have shared. We want everyone working with data to have access to these core patterns and tools.
We imagine a data ecosystem that is effortless, pure and clear. Where data flows everywhere, effortlessly and easily – like water.
# What we do
We offer specifications and tools integrated into an overarching framework that bring simplicity & ease to transform the data experience. We help you generate clean data efficiently and reliably gain ongoing insights. We help you standardize, package, validate, share, and publish your data.
# Who we are here for
We are here for the data managers and the data architects, the data wranglers and the data geeks. The people who work with data day to day – and those who design solutions for them. People who have moved beyond the quick hack in a notebook. People looking for the power of the command line but without the bulk and inelegance of big data tooling. They are data engineers and data scientists, researchers and technologists.
# What’s different about us?
We work with, add to and enhance your existing tooling (progressive enhancement). We’re lightweight and coherent. We have a big vision and lightweight tools. We cover the whole data journey to and from the end user – our patterns and tools can apply to Excel as well as Hadoop.
# Design Principles
# Focused
Each tool or pattern is focused on one part of the data chain, one specific feature – packaging – and a few specific types of data (e.g. tabular).
# Web Oriented
Build for the web using formats that are web “native” such as JSON, work naturally with HTTP such as plain text CSVs (which stream).
# Distributed
We design for a distributed ecosystem with no centralized, single point of failure or dependence.
# Open
Anyone should be able to freely and openly use and reuse what we build. Our community is open to everyone.
# Existing Tooling
Integrate as easily as possible with existing tools both by building integrations and designing for direct use – for example we like CSV because everyone has a tool that can access CSV.
# Simple and Lightweight
Add the minimum, do the least required, keep it simple. For example, use the most basic formats, require only the most essential metadata, data should have nothing extraneous.