United Kingdom / What does 0% open really mean?


#1

Does anyone have any views on what 0% open means in reality?

The United Kingdom (Great Britain and Northern Ireland) have been given 0% for the entire indicators of Elections; Locations and Water quality.

Is this really saying that there is no data available at all for these headings and that they are 100% closed? Are things as black and white as that?


Guidance on how to read the GODI results
Why does the index look at places and not countries?
#2

Hi Nick,

IMO the review process has been lacking this year; I was the submitter for those three entries and they were rejected without any discussion (or even notification).

I have protested the decision for Location, but haven’t had a response yet:

Owen


#3

Thank you. This does seem a bit unfortunate.
Nick


#4

Same thing with Bulgaria. I was a submitter on many of the entries and added corrections on all others in the forum. While I did get some followups, Bulgaria got 0% on several topics while getting more than 80% previous years. My requests for comment on the review process were also left unanswered.


#5

Thank you this is very useful to know.

Of course a key thing that Open Knowledge are linked with is transparency and if this is not clearly evident in the decision making then it could undermine the whole process and diminish the credibility that any government gives the Index.

As with so many things the logic should be that with the evidence available others should be able to replicate the process and generate the same results. If this is not the case that would be worrying.

It would be particularly helpful, for example in the methodology page, if against each indicator an live example from a country which demonstrates the ideal result and how it was calculated - that is worked examples. And show how marks are deducted for issues with levels of openness.

At the moment colleagues in the major organisations who generate the data are confused as there seem to be moving goalposts which makes it harder to know where to target future improvements.


#6

Replication is always hard when you don’t speak the language. I did my best, but on my own I couldn’t find any of the datasets listed on the French results page of this index.

The results are a shame really. The submittion process was a mess and the reviews didn’t apparently compensate.

Last year Bulgaria got 90% on water quality, 55% on government spending and 60% on national maps. Now it got 0% on all of these despite me alerting to the issues in submittions. You know what changed in the past year? Bulgaria introduced a government opendata portal based on CKAN with all those datasets and hundreds more in way better format. That’s the change.

How did this happen?


#7

Hi @yurukov - I am sorry you feel disappointed about the index. We tried, according to the feedback that @nickmhalliday and @owenboswarva gave us to explain what 0% means. You can see more about it ion the index itself -
https://index.okfn.org/interpretation/

We have this public dialogue phase to see where the assessment is going wrong and what can help data publishers and users. So please let us know where there are issues. We also made sure yesterday that we would not miss a post and made the review of this rigorous, so I hope that will help to make GODI a better tool since we see it more than just ranking, but a place to surface issues, so keep them coming!


#8

Hi @yurukov, @nickmhalliday, @owenboswarva

I would like to explain what our idea behind this was.

The Global Open Data Index is designed to assess specific data categories. In this regard we share similarities and differences with other open data assessments, like the Open Data Barometer or Open Data Inventory (ODIN).

We are similar in that our results are a proxy to understand the wider landscape of open data publication. National open data assessments by their very nature can only present snapshots of open data. By using key datasets, we seek to capture information that is identified as particularly relevant for the public.

We differ from ODIN, the Barometer and others who assess a representative sample of datasets in different sector. GODI puts emphasis on the usefulness of data. We are assessing clearly defined datasets, with specific levels of granularity, and defined data elements. As I think both approaches have their justification and the devil is in the detail. Let me try to explain each approach.

Measuring a representative sample of open data
A representative sample is intended to assess a broader data category. Here data categories tend to be described rather broadly as a general orientation. Reviewers need to choose the most representative dataset. This leaves space to acknowledge the publication of different datasets.

This line of thought seems to resonate with many comments in this forum (see here, here, here, or this very same thread).

GODI’s approach - assessing usable datasets
GODI assess the availability of data for which we identified use cases. The rationale is that only the availability of specific data elements enables certain use cases (roughly speaking). Also with this approach we can measure some aspects of data quality such as completeness (through the combination of data elements and their granularity).

We do this for a number of reasons:

  • We want to create a comparable ranking based on consistent criteria.
  • We reduce reviewer bias and personal judgement.
  • Possibility to use GODI as a testbed for research around data quality, data „relevance“ and data usability. These factors are currently lacking in open data assessments.

Each point brings questions with it:

  • What are our methods to identify data relevance? What are methodological biases, challenges, and what can be learnt for the future?
  • What is the best level of detail for each data category?
  • Is the current approach to only accept “entire” datasets reasonable?

Let’s discuss this!

In order to have a clearer discussion about this, I propose to discuss GODI’s approach in two different threads. This makes it easier for us to come to a conclusion, since we are basically talking about two different things:

  1. Our key datasets, what they contain, and our methods to define them
  2. Our process of scoring a key dataset („All in or nothing“ vs. weighted assessment)

Please do comment on these threads! It is very important for us to discuss both topics. @nickmhalliday I will get back to your concerns about worked examples in these threads. We hope that thereby our research process becomes as transparent as possible.

I will add links to these topics once I have created them


#9

I am copying here my reply from the other thread, about the company register in Romania. It is the same text but I have seen this more general discussion too late.

Well, in theory it sounds very good. But in practice, you have a red flag saying that data „It’s not publicly available” which is the same for a dataset that does not exist, and one that exists, is in open format and under open license, and contains all the indicators, except one (which, by the way, is not the most relevant). And for both situation you put a big 0% label.

To understand the difference between the „0% for a dataset that does not exist” and „0% for a dataset in open format, under open license, missing a slightly less relevant indicator”, one should read the methodology and a lot of explanations on some obscure forum.

Maybe if you try to look from the user perspective for a moment, you will understand why this approach, as valid as it may be from the academic point of view, is almost completely irrelevant for my advocacy work.

And one more thing: I can accept and be fine if I am not in your target group. I can use other products for my work. But because I contributed to the GODI in the past, and I used it with some success, I felt the need to express my lack of satisfaction with the current evolution.


What are GODI's key datasets (and how we define them)?
#10

Hi Danny, I’m interested in joining those threads, but I don’t see them. So I’m going to give my input here and I’ll copy it elsewhere once the topics are created.

I think the specific requirements of each dataset are hard to define in a way that encompasses all countries, and I agree that we have to be flexible by positively scoring datasets even if they don’t meet all the criteria.
But what happens if they have a dataset on the topic that does not meet ANY criteria defined. An example of this would be an election dataset that is an aggregate of the votes for each party, published in an open format with an open license in an open data portal. In this case I would like to add this to the index, but should the score be 0?

In my opinion there should be a minimal not-zero score for datasets that exist but don’t meet any criteria of contents.


#11

Hi @martinsz, thanks a lot for your feedback, this is great! I am currently writing up our explanations for the two threads and will post them throughout the day.


#12

As a short update, here are the two topics to discuss how our 0% results. Please address your feedback to these two topics, as I think that it is best to split the discussion.


#13

It is good to speak about this here. Of course open data advocates are one of our core groups and we do not want to be an ivory tower exercise for the sake of creating “just another index”.

Can you explain how your advocacy work usually looks like, and how the index design could support your work? We take this feedback very seriously so please do share your experiences and concerns with us.