DataPortals.org: An experiment in community data curation
This post gives an update on DataPortals.org, sets out some next steps and solicits support to make this happen.
Interested in helping update DataPortals.org or curating contributions? That’s awesome - please just leave a comment below and we’ll get back to you
Rufus @rufuspollock & Jonathan @jwyg
Situation
http://dataportals.org/ is an online listing (Open) Data Portals created in 2011 by Open Knowledge International at the initiative of Jonathan Gray and supported by LOD2. There is a clear demand for such a site and for rich metadata on portals. For example, recently on a visit to China I met a university group collecting a database of portals in China. There is no similar community maintained database with the equivalent level of quality content.
A Nov 2015 refactor moved the database to CSV (from gdocs) which powers a node webapp running on heroku. To add or update information is cumbersome requiring users to open a github issue Submit - Data Portals which is then reviewed and hand-merged into the CSV (this change was made to deal with an increasing issue of spam).
As of March 2018 the process lacks as a curator as the current curator @todrobbins has stepped down after two years of sterling service (huge thank-you Tod!) and there are a number of unprocessed submissions: Issues · okfn/dataportals.org · GitHub
Complication
This is a valuable resource but it is not up to date and lacks a current maintainer and it is hard for people to add or modify the database. The current process for adding a portal involves several steps including significant manual intervention from expert curator to convert github issue for a new portal into a new line in the CSV (portal data is unstructured data in github issue) and there is no obvious path for updating info: Contribute - Data Portals (you can do this via PR to CSV if you know how to do that).
We are also not yet collecting richer info on portal metadata and data availability and quality Collect info on portal metadata and data availability and quality · Issue #117 · okfn/dataportals.org · GitHub
Smaller, less significant improvements we would like to make:
- Data Package: Make this into a Data Package · Issue #64 · okfn/dataportals.org · GitHub
- Add search: Search (and browse) page · Issue #16 · okfn/dataportals.org · GitHub
Question
How can we curate updates to DataPortals.org in a way that is easy (for users to submit, easy for curators to review/add), high quality (no spam) and sustainable (allows for handover between curators, for volunteering)?
Proposed Solution
(Re)Introduce structured submission via e.g.google forms and automate process of turning that into a pull request that can then be reviewed and merge and recruit some new site curators to oversee the contribution flow going forward.
More technical details in progress in this issue:
In addition, depending on time:
- Review all outstanding submissions - maybe automate converting these to CSV: Issues · okfn/dataportals.org · GitHub
- Finish data packaging: Make this into a Data Package · Issue #64 · okfn/dataportals.org · GitHub
- Enhance the set of information we create on portals
Help Wanted
We need help to make this happen! What is needed:
- Someone to help go through outstanding submissions and add them to the CSV - Issues · okfn/dataportals.org · GitHub 1-2h one-off job
- Coding help to improve the workflow and the site not sure how long
- Help writing up docs and promoting the project 2h one off and/or ongoing assistance (not much!)
- Help with ongoing curation just keep an eye on github notifications. Probably around 30m a month!
If you are interested post in the comments. We’ll kick off with a short call for interested people.
The Future
DataPortals.org is the best source for up to date information on data portals around the world.
It would be great to:
- Make its portal metadata richer with screenshots, data on the contents of portals, how often the portal updates etc
- An improved user interface including search
- Summary stats: how many portals are there, where? How much open data is in them? What types, and (tough) what quality …