Data packages with R

Dear all,

@samgta and I are in the process of creating a co-op called, dedicated to services around open data (training, helping NGOs to use open data for their lobbying, etc.).

One of the things that we would be interested in is helping as many people as possible using open data datasets. That would imply documenting and curating datasets. But of course this would be even better if not only us could do that, but if others could do it on their own too. So we would be interested in creating a set of tools that we as well as anyone could use to document, curate and share datasets.

Being an R adept, I’d be interested in using R to develop these tools (including GUIs, with Shiny for example). Compatibility with OKF’s Data Packages spec (or, at least to begin with, the Tabular Data Package spec) would also be an imortant feature. We would build on the existing ROpenSci package.

Now the R Consortium has a program to financially (or otherwise) support “Technical Initiatives, Community Events and Training to Support R User Community”. We are considering submitting an application. So we were wondering if anyone here might be interested in working with us on that, or even if the OKFN would like to take part into the submission.

Anyway, we’re still at the beginning of your thinking on this so we’re very much open to remarks, criticisms, suggestions and so on.



Hi @joelgombin and @samgta,

First of all, congratulations on the initiative. These kind of tools are very much needed indeed, at least in my perspective. Specially because of this [quote=“joelgombin, post:1, topic:2756”]
if others could do it on their own too.

It seems the current development is focused in Python, which is kind of understandable since it is used widely, but R should get its attention, specially when the curators need some kind of attention to work with data and, in that sense, R is more friendly (you can view data, view your changes, etc, while in Python you have to trust your instinct and skills :laughing:).

That said, even though I am not a technical user, if there’s anything I can do to help, please let me know. I still wanted to congratulate you guys on the initiative! Good luck!!

1 Like

Thanks @gsilvapt for your kind words! Much appreciated :wink:

To be clear, we are interested in creating two different kinds of tools:

  • tools for users with some technical background etc., which could benefit from dedicated R packages etc. Here the idea would be to ease and promote good practices and standards - the idea that data should be packaged is making its way in the R community these days, but the practices are not really standardised yet.
  • perhaps more importantly, tools for non-technical users, who couldn’t create a data package on their own. Here R would be the underlying engine but the UI has to be a GUI, and if possible a friendly one. So I think we agree the key here is to get as many people as possible to participate and to lower the barriers to participation!
1 Like

Hi, thanks for raising this issue! The ROpenSci package is set for some major development work in the near future. Have you tried it out yet? Do you have real data and, specifically, real issues you want addressed in working with that data?. At any rate, I’m PM’ing you :slight_smile: .

Hi @joelgombin,
I do CKAN training and support and use R from time to time. Its been really cool to see some of the R integrations. You probably saw the presentation from Florian Mayer about creating a whole scientific workflow with CKAN and R: Pyramids, Pipelines and a Can-of-Sweave - CKAN Asia-Pacific Meetup

Anyway, would be happy to help out with the application if you’re still looking for volunteers. I’m going to try and be at the Labs hangout this evening in case you’re joining: Labs Hangout May 2016

1 Like

Thanks @mattfullerton for your answer. I’ll try to join the hangout tonight, I might not be able to do so though (it’s about my son’s bathing time ;-))
In any case as I was saying to @danfowler by PM we could set up a hangout call once our own ideas anf projects get clearer - at that stage we haven’t decided in which direction to go exactly, waiting both to get reactions from you guys and also to be able to anticipate our workload over the next few months.
Thanks for the link to the presentation!

Hi all, I posted a very brief introduction to using Data Packages with R here: Using Data Packages with R - Open Knowledge Labs

Let me know what you think!