A tool for collaborating on datapackages

Lexman · October 7, 2015, 3:03pm

Hello,

I’d like to present to you a tool useful for preparing datapackages : tuttle, a make for data.

When we write scripts to create data, we don’t make it right on the first time. How many times did you have to comment the beginning of a script, so that executions jumps directly to a bug fix ?
With tuttle, you won’t have to. First, it computes only what is necessary : for example if a file has already been downloaded, it won’t do it again. But also, when you change a line of code, tuttle knows exactly what data must be removed and what part of the code must be run instead.

This brings fluidity and repeatability when you work on your own. But this is also very useful for team work : when you merge code, retrieve scripts modified by someone else or use a continuous integration system, you don’t need to wonder which data is not valid any more : tuttle does it for you.

Moreover you can follow the progression of computing with a report like this : http://stuff.lexman.org/s-and-p-500/scripts/.tuttle/report.html

You can use any language or arbitrary tool (like an xls to csv converter, or a git command), but tuttle already has built-in support for shell, batch, python and sql for sqlite.

If you’re interested by this fluent way to work on data, tuttle’s tutorial explains in detail how to use it : https://github.com/lexman/tuttle/blob/master/doc/tutorial_musketeers/tutorial.md .

You can also have a look at how I translated one of the core packages (s-and-p-500) with a tuttlefile : s-and-p-500/tuttlefile at tuttle · lexman/s-and-p-500 · GitHub . It runs every hour on my server, so whenever the xls file changes, tuttle handles the whole datapackage update and even pushing data back to github.

Tuttle is a tool for collaborating on data as we collaborate on code, thus it might be of interest for the Open Knowledge Foundation.

Hope you will enjoy it,

Alexandre

herrmann · October 8, 2015, 5:59pm

Hi, Alexandre.

This tool looks awesome! I look forward to experimenting with it in the near future.

An interesting feature to add to this tool could be automatically generating machine readable provenance documentation using the W3C PROV model and one of its standard notations.

Cheers,
Augusto

Lexman · October 9, 2015, 12:38pm

Hello herrmann,

I had a look at the PROV model, and it seem apropriate to export a workflow to one of this formats.
I’ve added a feature request on github to track it : Export workflow to W3C PROV model · Issue #26 · lexman/tuttle · GitHub . Feel free to add comments or precision or further details there.

By the way, if you experiment with tuttle, please send me feedbacks

Alex

Topic		Replies	Views
Tools for datapackages : make vs tuttle Open Knowledge Labs	3	1087	March 29, 2016
Tool for collaborating on small open data - looking for feedback Open Knowledge Labs open-data	20	2250	March 25, 2017
Working with Data Package Creator Frictionless Data	2	901	October 18, 2024
Data packages with R Frictionless Data opendata	6	2326	July 14, 2016
Improving Data Package Manager documentation Core Datasets	2	1432	August 12, 2016

A tool for collaborating on datapackages

Related topics