What are GODI's key datasets (and how we define them)?

dannylammerhirt · May 12, 2017, 12:02pm

This is a follow-up thread on a discussion around this year’s results of GODI

As it has been discussed in this topic, people are asking what our results, particularly 0% scores, mean. In this topic we’d like to explain our key datasets, and how we define them. In another thread, we will present you how GODI is scoring key datasets and why.
Both are important to understand our final score.

We also invite all of you to comment on our data definitions (a list of our key datasets is available in this document).

This document explains what key datasets are, how we define them, as well as current challenges. We hope that this can spark debate and thinking how GODI can become more relevant for you.

Please share your opinions with us and others. What do you think about our dataset characteristics? Do they resonate with your priorities? If not, why? All perspectives are welcome - be it government, open data advocate, or data users themselves.

I’m flagging this discussion to some of you who had questions or concerns about this @ovoicu, @owenboswarva, @Lin_Zhaowei, @gustavo_uy, @gonzaloiglesias, @Enrique_Zapata, @arianit, @nickmhalliday, @martinsz, @yurukov

ppkrauss · May 13, 2017, 2:33pm

See Our process of finding and evaluating key datasets - #3 by ppkrauss

PS: duplicating topic?

dannylammerhirt · May 15, 2017, 8:51am

Hi @ppkrauss,

This thread shall be used to talk about the data GODI assesses. So it’s all about what data do we measure and why is this data important to assess.

The other thread you mention shall be used to discuss how we assess this data. Either through an all-in or nothing approach as currently done? Or a different approach?

Both threads discuss the two most important aspects of our evaluation. They are important to improve what our final results communicate.

dannylammerhirt · May 15, 2017, 11:54am

To gather all input around our dataset definitions I’ll copy some other threads in here

For example similar discussions around our election results here, here and here.

@nickmhalliday, @dread, please do feel free to comment on our list of key datasets about our level of granularity for elections data

In the meantime, we will get back to the National Democratic Institute and discuss our results with them as well.

fiona · May 17, 2017, 9:52am

Given the importance of research to national and global agendas (particularly UN 2030), and a growing number of national policies I think it would be helpful to investigate the feasibility of adding publicly funded research as a dataset. At the moment, open (government) data and open (research) data are both continuing along somewhat parallel tracks but a more joined up approach even just in terms of looking at research in this way could also support a number of related efforts around data use, reproducibility. By looking at the openness of what is funded with the principle of “open as possible, closed as necessary” would respect the intellectual freedom of researchers and also commercial interests and privacy issues inherent in research. My perspective is as someone who has been involved in both communities as a civil society advocate.

ibegtin · May 22, 2017, 6:06am

Hi!

I think that list lacks of “levels of maturity” for each datasets. For example Procurement dataset requirements include:
_Tender phase _
* Tenders per government office
* Tender name
* Tender description
* Tender status

Award phase
* Awards per government office
* Award title
* Award description
* Value of the award
* Supplier’s name

But it’s very simplified. Other procurement phases are missing and actual data includes much more data. Procurement data could include:

date of bidding
supplier name of each participant
supplier unique id
supplier address
tender items (each item/service/work inside this tender)
procurement classification codes
and e.t.c

I think that “Levels of maturity” of data openness could include different requirements on each level. I’am sure that other datasets are quite similar.

Also I would like to mention that most important dataset “Laws and bills” is missing. We need it.

dannylammerhirt · May 22, 2017, 6:37am

Hi @ibegtin

Thanks so much for your feedback. So just to be sure I understand correctly - you propose that the Index adds more data elements, so that we separate between a basic dataset (minimum information required) and an advanced/mature dataset (more data elements provided)?

It is an interesting idea. The only caveat is a possibly high error rate during our assessment. As we see often government publishes data in many places. Depending on the data type and provider, it is tough to find all specific data elements online.

So in fact if we don’t find all data, it can also be an indication for poor data findability (something we wanted to measure by counting URLs or unique domain names).

So measuring data maturity might be fairly hard, and we would possibly mix this with intervening variables like findability. Nonetheless, a very interesting point to think about.

To your other point, we capture draft legislation (= bills) and national law already. Or are you referring to something different?

Best
Danny

dannylammerhirt · May 22, 2017, 6:41am

Hi @fiona,

I think more clarity around the openness of publicly funded research sounds like a great idea. I’m wondering in terms of key datasets - how would you define a research key dataset? Would you propose to focus on a specific research area? Would you measure open access to publications, or to raw research data?

Looking forward to hear your ideas

fiona · May 24, 2017, 3:59pm

Hi Danny,

Very often there is a focus on STEM subjects, and not all research areas, so it would be important to try and be inclusive.
Open access to publications would be easier to capture in some countries at present, but the underlying data could very well be the future so that might be an area to focus on now while some governments especially in Europe are trying to figure out their policies in this area.
Clearly not every country yet has the capacity to start reporting on this, but many countries are copying others in terms of research exercises and measurement, open access policies are growing daily and so on, so now is a good time to begin to define what this could look like.

Topic		Replies	Views
Our process of finding and evaluating key datasets Global Open Data Index 2016	16	6345	June 12, 2017
Introducing The New Proposed Global Open Data Index Survey Open Data Index	29	4720	November 1, 2017
The 2016 Global Open Data Index is live! Global Open Data Index 2016 opendataindex , godi	4	1223	May 4, 2017
Dataset dialogue ends but conversation goes on Global Open Data Index 2016 opendataindex	1	1228	June 5, 2017
Datasets definitions for the Global Open Data Index - Methodology consultation #1 Global Open Data Index 2015	4	1885	July 15, 2015

What are GODI's key datasets (and how we define them)?

Related topics