DataPortals.org - April 2018 an Update and Invitation to Contribute


#1

DataPortals.org: An experiment in community data curation

This post gives an update on DataPortals.org, sets out some next steps and solicits support to make this happen.

:pick: :helicopter: Interested in helping update DataPortals.org or curating contributions? That’s awesome :fireworks: - please just leave a comment below and we’ll get back to you :smile::email:

Rufus @rufuspollock & Jonathan @jwyg

Situation

http://dataportals.org/ is an online listing (Open) Data Portals created in 2011 by Open Knowledge International at the initiative of Jonathan Gray and supported by LOD2. There is a clear demand for such a site and for rich metadata on portals. For example, recently on a visit to China I met a university group collecting a database of portals in China. There is no similar community maintained database with the equivalent level of quality content.

A Nov 2015 refactor moved the database to CSV (from gdocs) which powers a node webapp running on heroku. To add or update information is cumbersome requiring users to open a github issue http://dataportals.org/add which is then reviewed and hand-merged into the CSV (this change was made to deal with an increasing issue of spam).

As of March 2018 the process lacks as a curator as the current curator @todrobbins has stepped down after two years of sterling service (huge thank-you Tod!) and there are a number of unprocessed submissions: https://github.com/okfn/dataportals.org/issues?q=is%3Aissue+is%3Aopen+label%3Asubmission

Complication

This is a valuable resource but it is not up to date and lacks a current maintainer and it is hard for people to add or modify the database. The current process for adding a portal involves several steps including significant manual intervention from expert curator to convert github issue for a new portal into a new line in the CSV (portal data is unstructured data in github issue) and there is no obvious path for updating info: http://dataportals.org/contribute (you can do this via PR to CSV if you know how to do that).

We are also not yet collecting richer info on portal metadata and data availability and quality https://github.com/okfn/dataportals.org/issues/117

Smaller, less significant improvements we would like to make:

Question

How can we curate updates to DataPortals.org in a way that is easy (for users to submit, easy for curators to review/add), high quality (no spam) and sustainable (allows for handover between curators, for volunteering)?

Proposed Solution

(Re)Introduce structured submission via e.g.google forms and automate process of turning that into a pull request that can then be reviewed and merge and recruit some new site curators to oversee the contribution flow going forward.

More technical details in progress in this issue:

In addition, depending on time:

Help Wanted

We need help to make this happen! What is needed:

  • Someone to help go through outstanding submissions and add them to the CSV - https://github.com/okfn/dataportals.org/issues?q=is%3Aissue+is%3Aopen+label%3Asubmission :clock1: 1-2h one-off job
  • Coding help to improve the workflow and the site :clock1: not sure how long
  • Help writing up docs and promoting the project :clock1: 2h one off and/or ongoing assistance (not much!)
  • Help with ongoing curation :clock1: just keep an eye on github notifications. Probably around 30m a month!

If you are interested post in the comments. We’ll kick off with a short call for interested people.

The Future

DataPortals.org is the best source for up to date information on data portals around the world.

It would be great to:

  • Make its portal metadata richer with screenshots, data on the contents of portals, how often the portal updates etc
  • An improved user interface including search
  • Summary stats: how many portals are there, where? How much open data is in them? What types, and (tough) what quality …

#2

Some thoughts on this…

At Link Digital we have experimented with an approach that provides those who submit a portal with a few bits of value. The idea is that this will help create a more organicly grown register of portals driven by those who want to know certain things about specific portals.

The alpha release for this is found at datashades.info. It is only designed to work for CKAN portals at this stage and certainly will have bugs :slight_smile:

If you enter a portal URL such as https://data.nsw.gov.au then you’ll eventually receive a permalink wiith some information about the portal (such as https://datashades.info/d82d4bb338e8591d8de5d79b4f3698a0).

At the moment we have a few tabs shown. One is to provide value for portal owners to view how their portal might stack up against all the other similarly indexed data portals. The next is to provide some value to ckan developers so they can see what extensions are used, especially compared to other portals. The third tab is a work in progress but would provide information about how the portal may have changed over time - more datasets, more users, etc. This tab is intended to provide value to end users.

Any three types of users would be motivated via the value being provided to enter/register a new portal.

A similar approach may help grow and maintain dataportals.org, or Link Digital could simply push any newly entered portal locations into dataportals.org via a simple integration.


#3

Hi @rufuspollock, Hope you are having a good time.

Thank you so much for starting this thread about DataPortals.org, I have been following the project since @todrobbins started curating it and would definitely like to provide helping hands if needed.

The proposed solution looks good and I believe we can make the workflow much better once we start working on it an enhancement. I can work with the team of Open Knowledge Nepal to solve the existing issue starting from the May first week. Our experience of building Open Data Nepal (http://opendatanepal.com) will also help us.

Looking forward!


#4

Thank you @rufuspollock and best wishes! I will contribute where I can, but I appreciate being recognized. It means a lot to me. It’s been a great pleasure to work on this project.


#5

Also, I just merged this helpful update:

Thank you, Meiran Zhiyenbayev!


#6

Can I suggest you open a separate thread for these so we can discuss them separate from the question of maintaining the current data portals database.


#7

@nikeshbalami this is great and let’s sync then in first week of May. If you or team could help be part-time curators from then that would be great

@todrobbins just want to say a big thank-you again and wherever you do have time we appreciate any mentorship you can provide :slight_smile:

@Starl3n it would be great if you could push this info in. Ultimately, i would like to get one well-supported point of truth including harvesting of additional info …


#8

Thanks @rufuspollock, I will start working on it from May first week.


#9

It’s a shame this didn’t come up in a few months; https://www.datatig.com/ already has half an answer and today I’ve starting work on the other half!

https://www.datatig.com/gh/okfn/dataportals.org/b/master/ already provides an interface for browsing the data, searching and exporting it to use in other places.

I’m now working on an edit feature - people would be able to log in with their GitHub accounts on https://www.datatig.com/ , fill in a simple web form and hit submit. We will then make a pull request against the original repository. This protects from spam - because a admin must accept or refuse the P.R… It also cuts down on the amount of admin work to do - the data will already be structured in the correct way so if it’s good, they just have to press Merge!

Anyway, I will keep an eye on this thread. Curious to see what people work out!


#10

@jarofgreen what we really want is the editor part and for it to be linked to dataportals.org repo with its CSV and datapackage.json. Can you provide that do you think?


#11

Great - could you drop in okfn/chat at some point so we can set up a time to sync.


#12

for it to be linked to dataportals.org repo

Yes, that’s what I was trying to say (badly). When someone edits on DataTig a GitHub Pull Request will be made against the original GitHub repo and the data files in there.


#13

Sure @rufuspollock, will start the work ASAP.