Question about the methodology

Good Evening,
I’m working on my Master Thesis and I’ve mentioned the Methodology that Open Knowldge used to came up with the ranking. I had a seminar today and i’ve been asked: how the open knowledge put the score for each question? based on what? (for example: data available? score 5).
I explained that weights were given to questions which the foundation assessed to be critical in opening data and they were not convince! they asked me to invistagte as there must be a scientific approach behind the scoring.

any idea?


All the information about the methodology is here:

In short:
Each dataset in each place is evaluated using nine questions that examine the openness of the datasets based to the open definition.

High weights were given to questions we assessed to be critical in opening data
30 points were given to the open license question, a topic which is still problematic in open data implementation and re-use.
Open license is the key aspect of actually being open data. Many people release some data but license it in a way that creates a barrier to use, and especially restricts reuse which is critical.
15 points were given to the machine readable question, since without data being machine readable, it is hard to reuse and reveal the potential of the data.
Again, a major barrier and another area where governments frequently fall down. Reuse is significantly impacted without machine readability and a lack of machine readability can rob data that is openly licensed of any actual practical “openness.”
15 points were given for data being free of charge since because, again, it presents a major obstacle in terms of being practically open.

What do they mean by a more scientific approach?

Hello Mor, thank you for your quick response.
They want to know how each score was assigned. For example why 5 for data availability not 6 or 7? Is there any statistic used here to distribute the score?

We based it on past years results where we saw what were the problematic factors.
We didn’t use any statistics for the score. Scores were given as such so we would not need to standardize them later on. The users and the crowd of the Index is not the academia, and therefore we simplify the index so it can be use by everyone.

Having said that, I went through the methodology with Oxford Internet Institute, and they didn’t mentioned this was not a scientific approach, so I still don’t understand the feedback from your program.

I am attaching the Open Data Barometer, that uses almost the same scoring as we do. I didn’t see any type of a statistical logic behind it either.

@rufuspollock @timdavies @carlos_iglesias_moro - thoughts?

Is it worth mentioning here the approaches taken to validate the score?

I.e. In both the Index and the Barometer each score is reviewed by at least one independent reviewer.

I believe in the 2015 Index, you had both country and domain reviewers right? So that the original crowdsourced score was cross-checked by someone looking at the country, and cross-checked by someone comparing it to other datasets of the same kind.

This is one approach to improve the validity of a score, and address the potential for coder bias.

In the two editions of the Barometer I worked on, we did check for outliers by running some summary statistics, and then double-checking scores and their justifications.

The alternative more statistical approach might be to collect multiple rankings from independent scorers, and to average these out. This is the approach taken, for example, in World Economic Forum surveys. However, it is (a) very costly; and (b) doesn’t guarantee that all the coders had equal interpretation of the question and is open to a range of other possible biases.

Hope these observations are useful.