Hi all!
I’m a newcomer looking for some orientation in this crazy and exciting world of open data. As a coder just dipping my toes into the civic hacking world, I see so much unrealized potential here – it seems that a bit of elbow grease investment around the UX, for the benefit of non-technical folks, could go such a long way in making open data more accessible; at least that’s what I experienced first-hand at a civic tech hackathon I attended yesterday.
As an outsider, I’m interested in your comments (especially where I’m wrong about things).
It appears to me that the discourse around open data is dominated by publishers and academics, with some overlap. The former have a tendency to dump unstructured data on their sites (sometimes CKAN, sometimes custom); the latter have a tendency to philosophize about metadata schemas and ontologies and curation strategies etc. Meanwhile, third-party users quietly muddle through with CKAN APIs or scraping scripts – not a problem for those who know exactly what dataset they need.
However, a huge potential open data audience of researchers and government policy analysts is left out of the discussion – they don’t have the technical background, they don’t want to sift through dozens of city/state/federal/other sites to find data, and even when they do, it’s challenging for them to add the structural info that their visualization tools need to properly graph the numbers. And those are the people we want most to make good use of the data, no?
Think: government workers with a legal or economics background, needing to quickly understand dozens of datasets from different places as they are trying to make evidence-based policy recommendations. Why, in 2018, can’t they open up their BI software, go “File → Import Data → Search on Datahub”, and have 99% of all of the world’s open datasets at their fingertips? Just like I, as a coder, have been able to type npm install [any-javascript-lib-in-the-world]
, without having to think about the where and how, since 2010? Quantity is key here, not quality, as long as basic structural info has been extracted.
Socrata has made some inroads here. opendatanetwork.com is the only open-data platform I know that approaches being actually useful for casual consumers looking for specific information. But it’s closed-source. It’s US only. It doesn’t allow others to upload anything. No datapackage.jsons, only raw download or API. And it only has structural info because the Socrata platform seems to force publishers to include it.
Do you think it is at all feasible to turn e.g. datahub.io into the open equivalent of opendatanetwork.com, with the help of volunteer civic hackers? I.e. a public registry of datapackage.json files that are semi-automatically generated and which just link to the actual data, wherever it is hosted? So that one day I will be able to go to my terminal and type data pull zimbabwe-harare-highschool-gradrates-2012
in my terminal, or go to Tableau and go Import Data → Search Datahub → NOAA 11404 Hydrographic Survey? Again, it doesn’t have to be “core data” quality; it just has to work.
I look forward to your input!
Sebastian