Watercooler - Core Datasets

rufuspollock · January 5, 2015, 10:11pm

Place for general discussion.

rufuspollock · January 11, 2015, 1:58pm

Hurrah! We have our first new packaged core dataset thanks to sxren CO2 PPM data · Issue #56 · datasets/awesome-data · GitHub

rufuspollock · January 16, 2015, 1:56pm

US CPI has been a bit neglected and needs a maintainer: US CPI data · Issue #64 · datasets/awesome-data · GitHub

EvilPhil · January 24, 2015, 9:48pm

Anyone in or near London? The ODI are having a meetup Open Data London Meet Up Tickets, Thu 12 Feb 2015 at 18:15 | Eventbrite I don’t know if I can make it but I’m going to try.

ekoner · January 25, 2015, 11:05am

I’ve tentatively signed up. It means an overnight stay in London! If I can make it, I’ll come say hello.

Cheers

Edafe

rufuspollock · January 27, 2015, 12:25pm

Hi, I likely can’t make the ODI’s open data london event but I will be running Open Data Maker London on Thursday 5th February (next Thursday):

http://attending.io/events/open-data-maker-london-feb-2015

This is also probably a better venue for working on the core datasets work as more making than talking oriented (plus I’ve made “core datasets” a theme).

@ekoner @EvilPhil hope you can make it!

ekoner · January 28, 2015, 6:31am

I hope to make it, depending on my workshop timings. It would be great to speak to someone about the process. It’ll help tie everything together so I can get started contributing.

Cheers

Edafe

rufuspollock · January 28, 2015, 7:51am

@ekoner great - the event runs 18:30 to 21:00 and its no problem if you do not turn up right at the start.

@EvilPhil how about you?

EvilPhil · January 31, 2015, 9:26pm

I don’t know. I will be in London but for training but I’ll see if I can make it. It would be good to meet everybody in person.

andreas · February 5, 2015, 11:01pm

A question, if you allow - and i am not even sure this is the right forum.

When I work with datasets I try to note a few things

where did I get it (=source, url)
when did I get it and give an estimate when I need to refresh
note key steps and issues i encounter when cleaning up the data
gaps if any (= often)

I have not seen these in the introduction, but my experience would indicate this information as very helpful. What do you think of adding this? ( as comment in the package.json?)

rufuspollock · February 6, 2015, 9:35am

@andreas this is the exactly the right place to be asking. I also note you are also free to start a new “topic” in the forum (just put it in the core datasets category) - which will mean your question will get a dedicated thread.

First, these are great questions and we should adding answers to these to the primary Data Packaging docs e.g. http://data.okfn.org/doc/publish (and subpages) as we go (you can submit improvements to those pages btw!)

andreas:

where did I get it (=source, url)

ANS: this is supported by the sources field in datapackage.json - see http://dataprotocols.org/data-packages/#recommended-fields

when did I get it and give an estimate when I need to refresh

ANS: this we do not do since we assume this is part of management. That said, we are thinking about adding info for time series about periodicity. If that is what you are after can I suggest you open a new topic on this so we can discuss specifically

note key steps and issues i encounter when cleaning up the data- gaps if any (= often)

ANS: This should be part of the ‘Preparation’ section of the README.md - see the Optional Extras section at bottom of http://data.okfn.org/doc/publish-tabular (note we should probably move these instructions somewhere better and more obvious!)

rufuspollock · February 22, 2015, 10:07pm

Looking for somewhere to make your next contribution? Here’s some places to look:

rufuspollock · April 9, 2015, 8:49pm

Hi all, our amazing current managing editor @sxren has less time currently due to other commitments so we are looking for someone to step up and help coordinate activity on the core datasets queue: Issues · datasets/awesome-data · GitHub

You’ll get one-on-one tutoring and support from me and there’s plenty of people helping out

EvilPhil · May 7, 2015, 8:33pm

Perhaps of topic a bit, but as I’ve finally started working on the irish House Prices index I’m wondering if anybody knows where I can get a free or very cheap linux VM in the cloud? I’m hoping to run a monthly cron job and push updates to the price index to github.

rufuspollock · May 8, 2015, 7:55am

@EvilPhil this is really interesting as this is a general need on this project - i wonder if we should start a dedicated thread on this (focused on what we would want in terms of scraping).

Let’s see if we can find a cheap VM which we could use for this sort of scraping: obviously there’s stuff like DigitalOcean, Linode, Dreamhost etc - but it would be nice to get something pro-bono maybe.

I should also flag Morph in case it fits your needs already: Morph, a scraper platform for hackers and would be hackers - Open Knowledge Labs

Lexman · October 7, 2015, 3:42pm

Hello @EvilPhil,

you might be interested in these small cheap hosting :

https://www.kimsufi.com/fr/index.xml dedicated server at 6 €/month
High-performance Dedicated server Dedibox | Scaleway dedicated server at 6 €/month
https://www.scaleway.com/pricing/ cloud server at 0,0072 € /hr, ie 3.5 € / month. It’s ARM, not a normal intel x86 computeur, but it should be Ok if you are running node or python scripts…