What are GODI's key datasets (and how we define them)?

This is a follow-up thread on a discussion around this year’s results of GODI

As it has been discussed in this topic, people are asking what our results, particularly 0% scores, mean. In this topic we’d like to explain our key datasets, and how we define them. In another thread, we will present you how GODI is scoring key datasets and why.
Both are important to understand our final score.

We also invite all of you to comment on our data definitions (a list of our key datasets is available in this document).

This document explains what key datasets are, how we define them, as well as current challenges. We hope that this can spark debate and thinking how GODI can become more relevant for you.

Please share your opinions with us and others. What do you think about our dataset characteristics? Do they resonate with your priorities? If not, why? All perspectives are welcome - be it government, open data advocate, or data users themselves.

I’m flagging this discussion to some of you who had questions or concerns about this @ovoicu, @owenboswarva, @Lin_Zhaowei, @gustavo_uy, @gonzaloiglesias, @Enrique_Zapata, @arianit, @nickmhalliday, @martinsz, @yurukov

See Our process of finding and evaluating key datasets - #3 by ppkrauss

PS: duplicating topic?

Hi @ppkrauss,

This thread shall be used to talk about the data GODI assesses. So it’s all about what data do we measure and why is this data important to assess.

The other thread you mention shall be used to discuss how we assess this data. Either through an all-in or nothing approach as currently done? Or a different approach?

Both threads discuss the two most important aspects of our evaluation. They are important to improve what our final results communicate.

1 Like

To gather all input around our dataset definitions I’ll copy some other threads in here

For example similar discussions around our election results here, here and here.

@nickmhalliday, @dread, please do feel free to comment on our list of key datasets about our level of granularity for elections data

In the meantime, we will get back to the National Democratic Institute and discuss our results with them as well.

Given the importance of research to national and global agendas (particularly UN 2030), and a growing number of national policies I think it would be helpful to investigate the feasibility of adding publicly funded research as a dataset. At the moment, open (government) data and open (research) data are both continuing along somewhat parallel tracks but a more joined up approach even just in terms of looking at research in this way could also support a number of related efforts around data use, reproducibility. By looking at the openness of what is funded with the principle of “open as possible, closed as necessary” would respect the intellectual freedom of researchers and also commercial interests and privacy issues inherent in research. My perspective is as someone who has been involved in both communities as a civil society advocate.

1 Like


I think that list lacks of “levels of maturity” for each datasets. For example Procurement dataset requirements include:
_Tender phase _
* Tenders per government office
* Tender name
* Tender description
* Tender status

Award phase
* Awards per government office
* Award title
* Award description
* Value of the award
* Supplier’s name

But it’s very simplified. Other procurement phases are missing and actual data includes much more data. Procurement data could include:

  • date of bidding
  • supplier name of each participant
  • supplier unique id
  • supplier address
  • tender items (each item/service/work inside this tender)
  • procurement classification codes
    and e.t.c

I think that “Levels of maturity” of data openness could include different requirements on each level. I’am sure that other datasets are quite similar.

Also I would like to mention that most important dataset “Laws and bills” is missing. We need it.

Hi @ibegtin

Thanks so much for your feedback. So just to be sure I understand correctly - you propose that the Index adds more data elements, so that we separate between a basic dataset (minimum information required) and an advanced/mature dataset (more data elements provided)?

It is an interesting idea. The only caveat is a possibly high error rate during our assessment. As we see often government publishes data in many places. Depending on the data type and provider, it is tough to find all specific data elements online.

So in fact if we don’t find all data, it can also be an indication for poor data findability (something we wanted to measure by counting URLs or unique domain names).

So measuring data maturity might be fairly hard, and we would possibly mix this with intervening variables like findability. Nonetheless, a very interesting point to think about.

To your other point, we capture draft legislation (= bills) and national law already. Or are you referring to something different?


Hi @fiona,

I think more clarity around the openness of publicly funded research sounds like a great idea. I’m wondering in terms of key datasets - how would you define a research key dataset? Would you propose to focus on a specific research area? Would you measure open access to publications, or to raw research data?

Looking forward to hear your ideas :slight_smile:

Hi Danny,

Very often there is a focus on STEM subjects, and not all research areas, so it would be important to try and be inclusive.
Open access to publications would be easier to capture in some countries at present, but the underlying data could very well be the future so that might be an area to focus on now while some governments especially in Europe are trying to figure out their policies in this area.
Clearly not every country yet has the capacity to start reporting on this, but many countries are copying others in terms of research exercises and measurement, open access policies are growing daily and so on, so now is a good time to begin to define what this could look like.