Watercooler - Core Datasets


Place for general discussion.

Core Data Curators - Introductions

Hurrah! We have our first new packaged core dataset thanks to sxren https://github.com/datasets/registry/issues/56

Core Data Curators - Introductions

US CPI has been a bit neglected and needs a maintainer: https://github.com/datasets/registry/issues/64


Anyone in or near London? The ODI are having a meetup https://www.eventbrite.co.uk/e/open-data-london-meet-up-tickets-15350802664 I don’t know if I can make it but I’m going to try.


I’ve tentatively signed up. It means an overnight stay in London! If I can make it, I’ll come say hello.




Hi, I likely can’t make the ODI’s open data london event but I will be running Open Data Maker London on Thursday 5th February (next Thursday):

This is also probably a better venue for working on the core datasets work as more making than talking oriented (plus I’ve made “core datasets” a theme).

@ekoner @EvilPhil hope you can make it!


I hope to make it, depending on my workshop timings. It would be great to speak to someone about the process. It’ll help tie everything together so I can get started contributing.




@ekoner great - the event runs 18:30 to 21:00 and its no problem if you do not turn up right at the start.

@EvilPhil how about you?


I don’t know. I will be in London but for training but I’ll see if I can make it. It would be good to meet everybody in person.


A question, if you allow - and i am not even sure this is the right forum.

When I work with datasets I try to note a few things

  • where did I get it (=source, url)
  • when did I get it and give an estimate when I need to refresh
  • note key steps and issues i encounter when cleaning up the data
  • gaps if any (= often)

I have not seen these in the introduction, but my experience would indicate this information as very helpful. What do you think of adding this? ( as comment in the package.json?)


@andreas this is the exactly the right place to be asking. I also note you are also free to start a new “topic” in the forum (just put it in the core datasets category) - which will mean your question will get a dedicated thread.

First, these are great questions and we should adding answers to these to the primary Data Packaging docs e.g. http://data.okfn.org/doc/publish (and subpages) as we go (you can submit improvements to those pages btw!)


Looking for somewhere to make your next contribution? Here’s some places to look:


Hi all, our amazing current managing editor @sxren has less time currently due to other commitments so we are looking for someone to step up and help coordinate activity on the core datasets queue: https://github.com/datasets/registry/issues

You’ll get one-on-one tutoring and support from me and there’s plenty of people helping out :slight_smile:


Perhaps of topic a bit, but as I’ve finally started working on the irish House Prices index I’m wondering if anybody knows where I can get a free or very cheap linux VM in the cloud? I’m hoping to run a monthly cron job and push updates to the price index to github.


@EvilPhil this is really interesting as this is a general need on this project - i wonder if we should start a dedicated thread on this (focused on what we would want in terms of scraping).

Let’s see if we can find a cheap VM which we could use for this sort of scraping: obviously there’s stuff like DigitalOcean, Linode, Dreamhost etc - but it would be nice to get something pro-bono maybe.

I should also flag Morph in case it fits your needs already: http://okfnlabs.org/blog/2014/03/22/morph.html


Hello @EvilPhil,

you might be interested in these small cheap hosting :