Is reproducing someone else’s research data a Frictionless experience?
The “reproducibility crisis” is a hot topic in scientific research these days. Can you reproduce published data from another laboratory? Can you follow the published scientific methods and get the same result? Unfortunately, the answer to these questions is often no.
One of the goals of Frictionless Data is to help researchers make their work more reproducible. To achieve this, we focus on making data more understandable (make sure to document your metadata!), of higher quality (via validation checks), and easier to reuse (by standardization and packaging).
As a test of these reproducibility measures, we tasked the Frictionless Fellows with reproducing each others’ data packages! This was a great learning experience for the Fellows and revealed some important lessons about how to make their data more (re)usable. Click on the blog links below to read more about their experiences!
# Reproduciendo un viaje a Mo’rea by Sele Yang (opens new window) (Cohort 1)
“Mi viaje a través de los datos de Lily, me llevó a Mo’rea, Polinesia Francesa, desde donde ella, a través de diferentes herramientas, recopiló un total de 175 entrevistas entre residentes y también investigadores/as de la región…Para reproducir los datos de Lily, utilicé inicialmente el DataPackage Creator tool para cargar su información en bruto y así empezar a revisar las especificaciones de su data type creados de manera automática por la herramienta.”
# Packaging Ouso’s Data by Lily Zhao (opens new window) (Cohort 1)
“This week I had the opportunity to work with my colleague’s data. He created a Datapackage which I replicated. In doing so, I learned a lot about the Datapackage web interface….Using these data Ouso and his co-authors evaluate the ability of high-resolution melting analysis to identify illegally targeted wildlife species.”
# Data Barter: Real-life data interactions by Ouso Daniel (opens new window) (Cohort 1)
“Exchanging data packages and working backwards from them is an important test in the illustration of the overall goal of the Frictionless Data initiative. Remember, FD seeks to facilitate and promote open and reproducible research, consequently promoting collaboration. By trying to reproduce Monica’s work I was able to capture an error, which I highlighted for her attention, thus improved the work. Exactly how science is supposed to work!”
# On README files, sharing data and interoperability by Anne Lee Steele (opens new window) (Cohort 2)
“One of the goals of the Frictionless Data Fellowship has been to help us make our research more interoperable, which is another way of saying: something that other researchers can use, even if they have entirely different systems or tools with which they approach the same topic….What if researchers of all types wrote prototypical “data packages” about their research, that gave greater context to their work, or explained its wider relevance? In my fields, many researchers tend to find this in ‘the art of the footnote’, but this type of informal knowledge or context is not operationalized in any real way.”
# Using Frictionless tools to help you understand open data by Dani Alcalá-López (opens new window) (Cohort 2)
“A few weeks ago, the fellows did an interesting exercise: We would try to replicate each others DataPackages in pairs. We had spent some time before creating and validating DataPacakges with our own data. Now it was the time to see how would it be to work with someone else’s. This experience was intended to be a way for us to check how it was to be at the other side.”
# Validating someone else’s data! By Katerina Drakoulaki (opens new window) (Cohort 2)
“The first thing I did was to go through the README file on my fellow’s repository. Since the repository was in a completely different field, I really had to read through everything very carefully, and think about the terms they used….Validating the data (to the extent that it was possible after all) was easy using the goodtables tools.”
# Reproducing Jacqueline’s Datapackage and Revalidating her Data! By Sam Wilairat (opens new window) (Cohort 2)
“Using Jacqueline’s GitHub repository, Frictionless Data Package Creator, and Goodtables, I feel that I can confidently reuse her dataset for my own research purposes. While there was one piece of metadata missing from her dataset, her publicly published datapackage .JSON file on her repository helped me to quickly figure out how to interpret the unlabeled column. I also feel confident that the data is valid because after doing a visual scan of the dataset, I used the Goodtables tool to double check that the data was valid!”
# Reproducing a data package by Jacqueline Maasch (opens new window) (Cohort 2)
“Is it easy to reproduce someone else’s data package? Sometimes, but not always. Tools that automate data management can standardize the process, making reproducibility simpler to achieve. However, accurately anticipating a tool’s expected behavior is essential, especially when mixing technologies.”
# Validating data from Daniel Alcalá-López by Evelyn Night (opens new window) (Cohort 2)
“In a fast paced research world where there’s an approximate increase of 8-9% in scientific publications every year, an overload of information is usually fed to the outside world. Unfortunately for us, most of this information is often wasted due to the reproducibility crisis marred by data or code that’s often locked away. We explored the question, ‘how reproducible is your data?’ by exchanging personal data and validating them according to the instructions that are outlined in the fellows’ recent goodtables blogs.”