What datasets should we include in the 2015 Global Open Data Index? Our public consultation is now open!

Mor · June 18, 2015, 1:37pm

Hello all,

I am happy to announce that today we are starting a new phase in the Global Open Data Index - public consultation! In the next 10 days you will have the opportunity to advise us which datasets we should add to this year’s Global OD Index. You opinion and collaborative effort can help to determine which topics and themes we want to promote beyond the baseline we have now.

There are two ways to contribute to the consultation:

Answer the Wiki survey - The platform is simple, just choose between two datasets the one that you see as being a higher priority to include in the Global Open Data Index. Can’t find a dataset that you think is important? Add your own idea to the pool. You do not have a vote limit, so vote as much as you want and shape the index

Focused consultation with civil society organisations - We will send a survey to various civil society organisations to get more details about data and information that concerns them. We invite everybody who want to have more input about the datasets to fill in the survey as well.

The consultation will be open for the next 10 days and will be closed at June 28.
For more information about the process, please read the blog post about it.

Please post all of you questions, comments and suggestions for improvements on this topic.

Thank you for your help,
Mor

rufuspollock · June 18, 2015, 8:20pm

First: this is fantastic and great to have this consultation and this process.

Second, I assume you checked it, but wanted to link the existing list of “additional” datasets which we collected in this spreadsheet: (linked from http://census.okfn.org/global/)

Third, looking at All Our Ideas I couldn’t see “Government procurement contracts” (though we do have “call for tenders” and “Future tenders”). As this has regularly been a no. 1 suggested item for inclusion i was surprised to note see it in there (unless I’ve missed it!)

Fourth, is it useful (and possible in all our ideas) to have a short description for the dataset? I think that would really help as otherwise can be a bit unclear in some cases.

Mor · June 18, 2015, 8:44pm

Hi Rufus,

Yes, I did look at this spreadsheet, it was very helpful and I used for to the original ideas in ‘Allourideas’

Secondly, I did put Government procurement contracts in my list but forgot to add it to the survey. I will add it now. Thanks!

Lastly, All our ideas has a limit of 140 characters, which is a bit frustrating when it comes to describe datasets. However, it got a got feature - “I can’t decide”. This feature give us feedback why users can’t decide between two options and to learn from it. Not ideal, but is useful.

Thanks for the feedback and the reference to the table!

ovoicu · June 19, 2015, 7:46pm

Nice idea, congrats!

If I may make some suggestions, I think that some kind of filter of proposals will be useful:

eliminating the proposals that tackle privacy issues. „Addresses” or „prescriptions” are such examples. The second is very sensitive.
better explaining some proposals. Maybe the person who proposed „prescriptions” had some other type of health statistics in mind?
eliminating the proposals that are part of larger datasets. For example, „government spending on X” is a part of „government spending”.
thinking twice about the proposals that are very broad and difficult to handle, as it is the one with „utilities and services”. Remember that the Index rely on volunteers.

I think that it will be easier to get a relevant result with fewer proposals. I guess most people will get bored before getting to select from all pairs.

Mor · June 22, 2015, 11:01am

HI Ovidiu,

Great comments!
Some of my answers -

All of our ideas give us a really vast idea of what datasets are in demand, the disadvantage is that is can’t be very specific. In the case of privacy, there are prescription datasets that can be anonymised. Here is what code4SA did with such dataset- http://mpr.code4sa.org . I do agree that privacy issue should be taken into account in our final decision.
You are completely right about board proposal. There are other factors that we should take into account such as who is the responsible body for the data. In the case of utilities and services, it usually gathered and published by the local level and not the national one.

I will look today which proposal we should deactivate, however, this program know and suppose to deal with new ideas, so I believe it will help us in the long run.

Any other feedback is always welcome.

PJPauwels · June 22, 2015, 12:54pm

I love the fact that we can use Allourideas.org, which is such a simple tool, to find input on such a complex platform as the Global index. I think it is a great way to have a broad ideation phase with a first insight into the preferences of the community.

The issue I have however, is that the Global Index is based on national datasets (gathered on federal level) and already the majority of datasets in the WikiSurvey are either regional, provincial or locally gathered in our country. This is also not really discussed in the introductory blogpost on the public consultation.

Why it matters:
Eg. Pollutant Emissions, a dataset in the current Global index, is gathered on a regional level in Belgium, so as long as all the regions have not opened up those specific datasets, we cannot claim to be open on that part.

This is something which shows the limitations of the Global data index. Is it intended in this public consultation to find datasets that are only applicable on federal authorities or can we add another ‘data availability’ parameter to compensate for adding non-federal datasets?

An example of the latter suggest would something like this: The data exists, in digital form an publicly available online, machine readable, available in bulk, openly licensed and up-todate, but only available in Flanders and Brussels, but not Wallonia. This should mean that the data is 66% open for that subject, rather than a 100%, providing another axis on top of the platform. It does get more complicated if multiple regions score differently on the othere parameters.

The suggestion above is just a proverbial brainfart, because it was such a great stumbling block for us in the last few years. So I’m wondering what the plan is this year to handle this.

okfnvince · June 29, 2015, 10:57am

Some very strange things are happening on All Our Ideas.

The OKFN blog stated “This public consultation will be open for the next 10 days and will be closed at June 28th” yet two new entries have been added on the 29th (“Every hospital ward with size, location, phone number, and current visting hours” and “List of registered intellectual property works”) and one of the top 3 entries has simply disappeared: no more “Real Time Traffic data” anywhere on the list!!!

Does anyone here know why those changes were made?

Mor · June 29, 2015, 11:34am

HI!

We announced yesterday on Twitter that we are extending the vote in 24 hours, therefore, we added some new one to the list (and took real time traffic data by mistake).

Keep in mind that although the fact that a dataset got a high score does not mean that it is going to be included in the Index. We give this consultation a lot of weight in our final decision, but we are also taking this survey in account of the whole consultation. You are all invited to fill it in as well. In addition, there are other factors to choose a dataset, such as the factor of who is the responsible authority to collect the data - the federal or the local level.

This brings me to @PJPauwels comment, datasets definitions will be our next steps in this consultation. First we wanted to learn what are the topic that are most relevant. Hopefully, we can start discuss the definitions themselves later this week. Just to remember though - we never going to have one size fits all here. This is the challenge, and I hope that together we can make it easier and more reliable.

emmaAkin · July 17, 2015, 10:58am

For me I think if the codebase is available, volunteers from that country could refine the data representation for that region per county and this should be allow to influence the regions overall representation on a particular dataset.

earl_butterworth · July 23, 2015, 12:47am

I’m sorry that I’ve missed the cut-off, but I would like to make suggestions for next year.

Some governments are looking more to NGOs to provide frontline, non-critical services; e.g. social housing intake and accommodation management, disability services, or early intervention to support families before a child is at risk; e.g. parenting or anger management classes. This is especially the case in the human or social services sector.

Given the significant investment in these funded services, the index should have clarity of datasets for what, where and how these funded services can be accessed, either directly from the government, or via funded services. This should be a key value add for open data: who is do what where, allowing citizens to find what is available near them.

There is also a reduction in the emphasis of measuring and reporting inputs for human / social services (funding, full time equivalent headcounts) to measuring outcomes; e.g. how many children are in out of home care greater than 12 months. In Australia these figures are reported to a central, federal government body, the Australian Institute of Health and Welfare (AIHW) by all the regions / states that either directly operate or license service delivery. The reporting is sourced from the regions, but published at the national level, and allows comparison as all the measures are based upon agreed national standards.

What changes should be considered to open data index to support these types of datasets? What approach should be taken to regional government service information which is reported at a federal government level?