Introducing The New Proposed Global Open Data Index Survey

dannylammerhirt · June 15, 2016, 12:45pm

Dear all,

as we have mentioned earlier, Open Knowledge International revised the GODI survey for possible improvements. Today we released an official blog post introducing our revised survey questions and the issues we try to tackle with them (you will notice that most of these issues have been addressed in this forum).

In this thread we would like to discuss the proposed changes with you and collect your thoughts and ideas about it. What do you find positive? Where do you see issues (especially flagging to the local index groups)? Where could we improve?

Besides a blog post we created a table in order to make it easier for you to compare old and new questions.

We are very much looking forward to hear your thoughts!

Mor · June 16, 2016, 1:26am

I am tagging some people here who commented about this before, in case they missed this! We would love to hear your thoughts!
@dirdigeng @RouxRC @herrmann @cmfg @timdavies @Stephen @anab @carlos_iglesias_moro

timdavies · June 16, 2016, 10:22am

I’ve had a quick read through - and this looks like really great progress.

I’ve added a few questions / comments into the doc - but overall this looks like a great step forward in tightening up definitions and methods.

cmfg · June 17, 2016, 12:55pm

Looks good!
A few comment from a government perspective on the three parts of the blogpost:

On the 1.: interesting measure but I agree that it should not be scored because of lack of comparability
On the 2.: would be really interested in “clearly defined datasets” but I guess it will be unveiled in a following blogpost. That is always great to kniow the list and definitions as soon as possible so open data taskforce like mine can ask for anticipated released. I am not sure though of what is the implication of the 2.: would only one dataset be considered? I think it is important to still have a comment section so several datasets can be listed, for instance if we want to find out about the datasets on a specific subject in a specific country.
On the 3.: Much needed clarification of the questions, good job you did on writing them in plain langage!

PJPauwels · June 20, 2016, 1:11pm

Just to understand the context of the survey questions:
Will the type of datasets be the same as last year?
Will transport (timetables) and health performance again be included in the census but not in the index?

I left a few comments in the table, but overall great job. I would however leave the findability subquestion out of it. There are so many parameters that determine how easy (or hard) it is to find a certain dataset.

dannylammerhirt · June 23, 2016, 10:51am

I see that there is a quite important discussion about whether we should analyse a complete dataset, whether we over-score a partial dataset that is completely open, etc. This is a BIG POINT and we should discuss here - because our new survey especially addresses this. Let me explain here:

with the new index we want to encourage governments to publish all data in one dataset - i.e. in one file containing all fields/characteristics we want to see in there. This is our reference point - this is why we have dataset definitions and we clearly only want to evaluate the datasets that meet all our requirements - hence using Q5 (which we consider to integrate into Q3).
however, there are a lot of cases where these data are not provided in one dataset. To answer to @carlos_iglesias_moro comment (how Q1 and Q5 relate to each other). We decided that it would be a radical step to only measure a dataset that contains all our requirements - e.g. a spreadsheet containing water pollutants of all watersources in one file, etc.) - to be rigorous we would have to ask “Are all data included in a file” and if that’s not the case we would stop the survey - because actually we only want to analyse the openness of datasets that meet all our requirements

We decided against this step and also accept the evaluation of partial datasets. And this opens two issues discussed by @RouxRC - 1) do we “over-score” datasets if they are only partial (openness vs. “completeness”) and 2) shall we analyse multiple datasets or focus on one partial dataset?

To point 1 - we definitely only want to evaluate our reference dataset (meeting all of our criteria). If there is no such dataset, we still want to see if there are other datasets we could evaluate - to understand, how open these datasets are, to acknowledge first steps taken by government in the right direction and to sensitize our submitters for the fact that they are only looking at a partial dataset and still give them the chance to evaluate it.

But also we want to encourage governments to publish a complete dataset - and therefore we want to explicitely flag in the overall score something like this “THE SCORE ONLY APPLIES TO A PART OF THE DATA - THE DATASET CANNOT BE REGARDED FULLY OPEN”. Alternatively we can have a lower score in total - e.g. subtracting 50% score for partial datasets, or sth. similar. The point is that we exactly DO NOT want to communicate that the dataset is fully open if it does not even meet our criteria - but we do not want to cut off datasets that are partial either. The critical point here is how we can incentivize governments to publish complete datasets. Key is to have a clever way of flagging this - and a disclaimer might not be enough if we display a 100% score - so considering negative scores might be an option here, that we will consider for our weighting.

To point 2 - In past editions we allowed to analyse several datasets, but I think it is methodologically a problem to evaluate multiple datasets because we compare apples and oranges:

In Romania we found several datasets for national statistics - one was for free but not machine-readable, another one was available in bulk but had to be paid - in the end the dataset got a 100% score - because we added up partial scores in one overall score. http://index.okfn.org/place/romania/statistics/

National maps of the UK are not complete, but still we evaluated it as bulk - it got a 100% score even without containing Northern Ireland. http://index.okfn.org/place/united-kingdom/map/

In Cameroon we found company registers for several types of enterprises - the dataset got a score of 0% because every question was answered with “Unsure”. http://index.okfn.org/place/cameroon/companies/

So we had several cases where partial datasets were treated differently - all leading to different scores. But the case of Romania shows that it does not make sense to add up scores for different datasets because it makes our evaluation arbitrary again - what if a dataset contains some characteristics and is for free - while the complete dataset has to be paid. We cannot simply add up their scores because in the end the message is “A specific dataset is open to a certain extent”.

I agree, @RouxRC , that it makes sense to document alternative datasets. This is also why we use Q2.2: We want to see where datasets can be found. We could repurpose question 2.2. and use it to list alternative datasets - a comment section could be used so submitters can describe alternative datasets (re: @cmfg) and tell us our rationale why they only looked at one specific dataset (which should most likely be the fact that this dataset was thee one most compliant with Q3.

dannylammerhirt · June 23, 2016, 11:46am

One thing I forgot to mention (@RouxRC): I think it is an interesting idea to have several iterations of one survey for different datasets and then to calculate an overall score for the sum of the datasets. I’m more worried that these iterations would increase the workload for our reviewers tremendously and will suffer from the problem that an average score - as you propose - relies on an exhaustive list of datasets for each data category.

And this might become a new problem for us: when can we say that our average score represents the landscape of open gov datasets correctly. The problem I see is : where do we draw a line which datasets to look at in the first place - all datasets that contain at least one characteristic we are asking for? Or only datasets that meet all our dataset requirements?

Don’t get me wrong, I really like your idea and see your point. But maybe you could specify a bit or tell me if you see a problem with the points I mentioned above

herrmann · June 23, 2016, 7:17pm

I think this is a good, thought out revision to the GODI Survey. I still have some concerns, though.

The methodology keeps changing every year, making it not comparable across years and unreliable as a KPI for open government data. In fact, I foresee governments will increasingly use the Barometer instead the the GODI for metrics in the coming years because of this.
The new text for Q1 makes it completely redundant in light of Q2. The question to be considered is this: is there any value in ascertaining whether or not government has the data (but does not publish it) at all? Arguably, just publishing existing data is a lot easier and cheaper than doing that but also having to start collecting and managing it in the first place. With this in mind, people will likely have an easier time advocating for governments to open up existing data than advocating for them to start a whole new line of business to collect and manage the data, notwithstanding how much justified the latter request is.
There’s still not a clear enough procedure for evaluating data that is split across multiple datasets. It seems that the document proposes that, in these cases, contributors should choose one of the datasets and ignore the rest. The chosen partial dataset would then be evaluated, but it’s not clear yet how the partial score would be calculated.

I like @RouxRC’s idea of averaging the scores of datasets. I think it could sort of wrap up two problems (data discoverability and data split into multiple datasets) in a single solution. We won’t need an exhaustive list of datasets before calculating an average score - just the ones contributors could find. If they couldn’t find a dataset, it means it is not accessible enough, so it would be reasonable to not consider it.

Mor · June 24, 2016, 12:08pm

Hi @herrmann -

About comparability - technically, the index was never comparable. I think that as a fast growing field we can’t use the same methodology if it’s wrong or not answer our need. The index should reflect in my opinion, a mirror for how a government is doing in a certain time. Also, if we will always use the same methodology, government will always try and play to the same rules and we won’t challenge them.

Regarding Barometer / index preference - We are fully cooperating with the Web Foundation and we working toward join forces on the Open Data assessment. I think that this should not be one of our concerns now and in the future.

herrmann · June 29, 2016, 7:16pm

Hi, @Mor.

If comparability is not supposed to be an intended goal for the index, the UI should just refrain from showing changes in rank in comparison to previous years (as it has always done).

I don’t think keeping the methodology a little more stable would lead governments to “cheat” or to become complacent, as you seem to have suggested. On the contrary, comparability is a virtue of any good metric. I feel that without comparability, governments will tend to use the GODI as a metric less as time goes on, as you can only manage something if you have a proper metric (notwithstanding the existence of other possible metrics, I mentioned the Barometer just as an example).

I’m not arguing to keep the methodology stagnant, mind you. Just to be mindful of how much and how fast you change. And when you do, do it slowly and parsimoniously.

noamoss · September 3, 2016, 10:07pm

I would consider marking stable and comparable criteria and features, to clarify users what can (and therefore, what can not) be used for chronological comparison.

Changing and evolving measurements can become a part of the annual message of GODI, giving voice for the civil society expectations in the field.

PJPauwels · September 13, 2016, 8:57am

Hi @Mor and @dannylammerhirt, do you have any news on the roadmap and / or timeline of the Global Open Index?

Normally local groups get poked somewhere in the end of September / beginning of October.

Would be cool if we have some breathing room (Cf. room to ask around for contributors) so that we can crowdsource the entries amongst data experts and data owners. Last year was such a short deadline that we mainly had to do it ourselves.

oscarmontiel · September 19, 2016, 2:19pm

Hi @PJPauwels, we are planning on launching the new survey by mid October. It will remain open for submissions until the end of November and then we’ll start with the review process, this means there is still around one month to contact submitters.
We’ll contact everyone with further news later in the week and whenever we have a timelines with more specific dates.

PJPauwels · September 20, 2016, 7:43am

Alright! Looking forward to it.

adolflow · October 15, 2016, 9:42pm

Hi @tlacoyodefrijol

What about this survey? I would like Open Knowledge Spain to participate

Best

oscarmontiel · October 17, 2016, 5:31pm

Hi @adolflow,

We had to reschedule the launch of the survey. We updated on the blog here. The new date is November 1st. We’ll contact everyone before we launch, so you can be ready tu submit during those weeks.

alyssaluo · October 18, 2016, 2:10am

Hi Oscar @tlacoyodefrijol

Tried to email you but could get through. Thought to follow up with you again on the GODI. Please note that the Singapore Government is keen to participate in the 2016 ranking and we will give our full support in the assessment. Noted that you will be approaching a third party civil organisation to get their inputs in Nov, and subsequently come to us (government) for inputs.

For Singapore Govt, Aloysius and I (Alyssa) will be the contact points. Our emails are Alyssa_Luo@mof.gov.sg and Aloysius_Seah@mof.gov.sg.

Thanks!

oscarmontiel · October 20, 2016, 9:52pm

Hi Alyssa,

Which email did you use? We’ll let you know when we start receiving government inputs.

Best,
Oscar

alyssaluo · October 21, 2016, 1:05am

Message Classification: Restricted
Hi Oscar,

I used the email on your website: oscar@okfn.org mailto:oscar@okfn.org. Thanks, look forward to hearing from you!

WARNING: This message may contain confidential information. If this email has been sent to you by mistake, please notify the sender and delete it immediately. Unauthorised communication and disclosure of any information in the email is an offence under the Official Secrets Act (Cap 213)
P Please consider the environment before printing this e-mail

MOF Email IDhttps://mof.km.gov.sg/?xmail-id=8d9b1207-7ef4-4494-ab24-0de18699dc5d
[3S-id=8d9b1207-7ef4-4494-ab24-0de18699dc5d:5265d989]

alyssaluo · November 1, 2016, 1:38am

@tlacoyodefrijol @Mor The Singapore Govt has finally rolled out our Open Data Licence! Check it out at Open Government Products – Medium

p.s. thanks Mor for liaising with us previously

Feel free to clarify if you have any questions on the licence!

Topic		Replies	Views
GODI 2016: update on the new survey & introducing the new scoring Open Data Index	0	1448	July 19, 2016
The Global Open Data Index is now OPEN for submissions Global Open Data Index 2015	9	2302	September 21, 2015
Submissions 2016 closed, thank you all! Global Open Data Index 2016 opendataindex	12	1539	February 7, 2017
The 2016 Global Open Data Index is live! Global Open Data Index 2016 opendataindex , godi	4	1223	May 4, 2017
The global open data index methodology change and it's impact on the local indexes Local Open Data Index	15	2251	June 29, 2016

Introducing The New Proposed Global Open Data Index Survey

Related topics