Evaluating questions for GODI - Methodology consultation #2


Hello all,

Sorry for the delay, but here is the second methodological consultation for the index. In this consultation, we will look at the following question - Should we add more questions to the Global Open Data Census?

Based on this past discussions, we were thinking of adding two more questions:

  1. Can users provide feedback on the dataset? This question might seem easy - whether there is a feedback form or not. However, how do we define feedback? Is an email at the bottom of a portal is enough?

  2. Is the data historic? - Is the data relevant not only for this year but also for the past and allow advanced analysis? What are we doing in the case that the datasets are divided by years or in case of real-time data? Maybe this should be part of the dataset description and not a question?

To answer these questions above, I would like to give us all some background about GODI - The focus of GODI and the crowdsourcing method.

The Open Data Research Framework define four competencies in which we can measure open data -

  1. Context/environment - The background context in which open data is being provided - by governance level or sector.
  2. Data - The supply, and qualities of open datasets, including the legal, technical, practical and social openness of data, as well as issues of data relevance and quality.
  3. Use - The direct use (or reuse) of open datasets.
  4. Impact ­- Addressing the benefits gained from using specific open datasets, or the returns from open data initiatives in general.

GODI only address the Data element of the measurement. Currently, we would like to keep it as is and not explore different competencies since it allows us to focus on one point of measurement and to improve it.

In addition, GODI is a crowdsourced effort. This means that the knowledge of the contributors about Open Data may vary from college students to professors, from first-time volunteers to experienced activists. We see it as a good thing, and a tool that allows a learning and an interaction experience with open data.

This also means that we need to acknowledge our limits - we need to be straightforward and simple in our survey so it will be easy to use. We also can’t ask questions that will take too much time to work on since volunteers time is a limited resource.

I hope that these are enough to move us forward.
Please comment here to add your thoughts and ideas.
This consultation will be open until Friday, 17.7.

pinned #2


They’re both tricky…

I don’t honestly think you can really measure the possibility to provide feedback. You can give extra points for having a “feedback button” or email address but you’ll end up rewarding people that might not even check that email and punishing others that might be much more receptive even if they haven’t considered being explicit about feedback. That’s not a great incentive… Also, forget data portals, does a general contact from the publisher at the page footer count as feedback? I smell trouble here :smile:

On the historic bit, It could be an awesome addition to the description but seems very problematic too. What about the changes in format or quality over time? Here in Uruguay he have open data on national budgets going back all the way to the sixties, but that data’s quality sucks. Also, how far into the past are we measuring? As I said in the previous consultation, this affects specially developing countries with poor data management. Even when a country has currently decent data infrastructure, that doesn’t mean that stuff as recent as from this decade is still in paper or deprecated systems.

In general, I feel GODI should be simple and as quantitative as possible. Feedback definitely is not quantifiable, a form is not enough, and it would be naive to think we can honestly assess if every publisher engages in open discussion. On the other hand, historic data is more on the quantitative side, but its tied to definitions that are very arbitrary (how many years, quality, bulk or individual datasets, available through API).


As we consider adding more datasets and more questions we drift closer to the ODI’s open data certificate initiative. As submissions are crowd sourced and not automated via the collection of metadata, I think the open data census questions should be kept simple (i.e. As is)

I personally think that GODI is a great marketing tool but see it as a stepping stone to wider measurement like open data certificates and the open data monitor.


Maybe I’m going against the flow here but I have to say it. From my experience with the local census in Romania, the list of questions is rather long, and sometimes difficult to apply. I know that we are discussing GODI mainly but we should keep in mind the vision of expanding the census to cover a variety of governments.

Let’s take these three questions:
(1) data exists
(2) data is in digital format
(5) data is online

It makes sense, in theory. But in practice, we could cover only the third one. If the volunteer found the data online, it was obviously that it exists and it is in digital format. But if it is not online, what to do? Call or make FOIA requests to all municipalities to find out if they collect the data and store it in digital format? Not working with volunteers. And, frankly speaking, if the data is collected, in our days, it is in digital format (or I am being too naive about the usage of technology). In the end, we compromised to use the third question as proxy for the first two: the data exist if it is online.

I know that there is a logical shortcut in here, but maybe we should think about this idea for a moment from the advocacy perspective. We want our governments to make the data available online, in particular for these datasets. It makes no sens to store any of these datasets on paper. And it is less relevant form the openness perspective if it stored somewhere on the servers of the administration. I would start directly with question (3): is it online?


(Esp @Stephen). The point of the Index in to provide a way to assess progress on open government data worldwide by focusing on comparable, standardized datasets and assessing them on their openness as defined by the Open Definition.

In that sense I would agree that additional questions may be a distraction. My personal vote would be to leave out e.g. “feedback on the dataset” and “historicity” of the data and encourage any comments on this in the comments / description section (this has largely been our previous approach). My main objection isn’t so much broadening as simply that this is very hard to systematically evaluate and its actual relation to openness (and usefulness) is limited. For example, the government may not have a systematic feedback mechanism but might be really responsible. Conversely, they may have a formal feedback mechanism but be very poor about actually doing anything.

Aside: The Open Definition is the core standard here. I also note that Open Definition has had - and continues to have - informal “badging” (which are very similar to lightweight certificates) for more than decade.


I completly agree with @danielcarranza’s points.

In addition, for the historical bit, I can’t see how it can be described objectively to fit all the countries.


I would rather agree with the present arrangement unless we would want to remove some very essential feature of openness. If the data is not digital it cannot be easily used in diverse ways, and to me if it is not machine readable is like “obfuscating”. The existence of data is a super set of the data is online.


I would also add that it would be necessary for other countries and region to catch up on the understanding and use of the Index measure paradigm before we think of refining datasets assessment less when they come they would be overwhelmed by the complexity of measurement.


I agree with @danielcarranza. On “historical”, it is difficult to define in general terms what extent of historical data would be sufficient, and in many use cases it is up to date data that is by far the most important. Where there are exceptions (for instance for weather observations) the minimum extent of historical data should be in the dataset definition. On “feedback” I agree that there is a big risk of perverse incentives or “tick box compliance” without real benefit. In addition, let’s keep it simple!


For the users, I think the best way to provide feedback about the data is through applications that consume or use those datasets. The common user needs relevant information from the dataset, and most of the dataset sites provide only the raw data, not tools or applications that help them find the information from the datasets.

closed #12