Hi @yurukov, @nickmhalliday, @owenboswarva
I would like to explain what our idea behind this was.
The Global Open Data Index is designed to assess specific data categories. In this regard we share similarities and differences with other open data assessments, like the Open Data Barometer or Open Data Inventory (ODIN).
We are similar in that our results are a proxy to understand the wider landscape of open data publication. National open data assessments by their very nature can only present snapshots of open data. By using key datasets, we seek to capture information that is identified as particularly relevant for the public.
We differ from ODIN, the Barometer and others who assess a representative sample of datasets in different sector. GODI puts emphasis on the usefulness of data. We are assessing clearly defined datasets, with specific levels of granularity, and defined data elements. As I think both approaches have their justification and the devil is in the detail. Let me try to explain each approach.
Measuring a representative sample of open data
A representative sample is intended to assess a broader data category. Here data categories tend to be described rather broadly as a general orientation. Reviewers need to choose the most representative dataset. This leaves space to acknowledge the publication of different datasets.
This line of thought seems to resonate with many comments in this forum (see here, here, here, or this very same thread).
GODI’s approach - assessing usable datasets
GODI assess the availability of data for which we identified use cases. The rationale is that only the availability of specific data elements enables certain use cases (roughly speaking). Also with this approach we can measure some aspects of data quality such as completeness (through the combination of data elements and their granularity).
We do this for a number of reasons:
- We want to create a comparable ranking based on consistent criteria.
- We reduce reviewer bias and personal judgement.
- Possibility to use GODI as a testbed for research around data quality, data „relevance“ and data usability. These factors are currently lacking in open data assessments.
Each point brings questions with it:
- What are our methods to identify data relevance? What are methodological biases, challenges, and what can be learnt for the future?
- What is the best level of detail for each data category?
- Is the current approach to only accept “entire” datasets reasonable?
Let’s discuss this!
In order to have a clearer discussion about this, I propose to discuss GODI’s approach in two different threads. This makes it easier for us to come to a conclusion, since we are basically talking about two different things:
- Our key datasets, what they contain, and our methods to define them
- Our process of scoring a key dataset („All in or nothing“ vs. weighted assessment)
Please do comment on these threads! It is very important for us to discuss both topics. @nickmhalliday I will get back to your concerns about worked examples in these threads. We hope that thereby our research process becomes as transparent as possible.
I will add links to these topics once I have created them