Reflections on this year's index


#1

Pfiuuu. Finished… That was a long run: cumulated over all our participants, filling the whole french census probably took us collegially more or less than 24 hours of research, discussion, existing submissions analysis and finally submissions and comments in the latest minutes before the deadline.

I really believe the interface and nature of the questions make it too time-consuming to fil the census now. I believe the OpenData community supposed to take the time to fill this is mostly volunteer-based and focusing this much time on the census sounds too much to me. At least a quite longer delay to fill it would be greatly appreciated in the future.

Aparts from this, a few critics:

  • It really is not that complex to implement an internal login at least by default for those who wouldn’t want to register via Google or Facebook on one hand, and Disqus on the other. Even just a simple OpenID as existed in the first versions of CKAN would be enough. It used to be ok, I really do not understand how such a regression can happen in an open community project.

  • Having only one submission possible can not function anymore since the review is supposed to happen afterwards thematically. It made sense when reviewers had to validate a submission on the fly so others could then update or fix it, having the reviewer to validate it again then. But now it makes the first submitter the only one able to actually fill the form. Others can only struggle to explain in comments possible corrections.

  • The requirements often include multiple different kind of data for a same set. This makes filling the form quite hard in many cases as some of the data might be fully open while some other may not. Such common cases make it really hard to actually reflect the reality of the data availability.

  • The requirements often include update-regularity constraints. This comes a bit problematic when open data actually exists but only yearly or so: a strict reading would say the data should then be set as not even existing, whereas a lot of criterions can be met… Hopefully everyone will have deported this on the up-to-date criterion…

  • Last but not least, the previous census used to let one check some things such as free or machine readable even when the data was not available for free. Now, some No choices block other fields. This might be unfortunate as it gives an undersetimated wrong idea of the real situation in some cases.

PS: I also noticed a bug when submitting a revision from the previous year: when one of the “blocking-other-fields” field is changed to No, aftervalidation, the form will return an error saying the now greyed fields are set to null. One has to reset the first field to Yes, set the blocked ones to something (that will not be kept…) and then reset it to No for the form to actually validate.


Multiple ways to get the same dataset
#2

Thanks Benjanmin, good reflections, I agree all, and reinforce the first about “plural login”. Other problems… In my perception, are social:

  1. no participation on answering: the only participant is the “the first to arrive”, as in a game… This feature is not democratic or participative… scares people.

  2. no information about who are the experts at “Awaiting review”: the experts must be indicated before to the submission.

  3. no democratic/transparent process to elect the experts of item 2.


PS: more one problem with login, I not remember if was Facebook, Google or other, but the system also not remember, perhaps I have 2 or 3 independent users there :wink:


#3

+1 to everything Benjamin said.

I would add that, for next year, there should be a better way to evaluate the dataset requirements (qualifying criteria). Quite often a dataset meets all but one of the requirements, but it is still useful to people - definitely more useful than not having the data. So a score of zero, same as if no data existed, seems awfully inaccurate if we intend to measure the usefulness of open data to citizens.

I don’t know the solution for this problem. Perhaps count points for each requirement, or having the percentage of requirements be multiplied by the points obtained from the questions. Ideas are welcome.

Finally, in the quite frequent case where data in the same set is split among two or more datasets, often from different publishers, each of which will have different answers for the questions, as mentioned by @b_ooghe. In that case, we arbitrarily chose one to go as the “main” dataset and all the others were mentioned in the comments section.

As for the answers to the questions, we didn’t have time to discuss this case and be consistent in the responses every time. In some cases we chose “no” unless the answer was yes to each and every one of the datasets (essentially treating all these datasets on the same set as if it was just one combined dataset) (SOLUTION 1). In others, like example 2, we answered the questions considering just the dataset chosen as the “main” dataset and ignored the rest (SOLUTION 2). Other contributors, as I found out later, chose “unsure” as the answer to questions in cases like this (SOLUTION 3).

Example 1: Weather forecast for Brazil has 3 datasets: [A] and [B] from Inmet, [C] from Inpe. A and C are forecasts, B is past historical measurements. Only B is machine-readable, but we chose “no” as the answer for the corresponding question because we treated the three datasets as one.

Example 2: Company register for Brazil has 3 datasets: [D] from Receita Federal, [E] from Ministério do Planejamento and [F] from JUCESP. Only E is openly licensed and machine readable, the others are not. Only E and F can be searched for company names and provide company addresses. Only D and F are regularly updated. Only D is complete (E is only for registered suppliers or government contractors, F is only for companies in the state of São Paulo).

What I propose is that we should decide on the appropriate solution (SOLUTION 1, 2 or 3) for these situations. Then make it normalized across all countries during the review process. And perhaps, for next year, take into account these “multiple datasets in a set” situations for the census model and forms themselves.

Finally, another improvement for next time should be a wider job of spreading the word about the start of the census. We only learned about it after the original period had expired and then extended. Perhaps a more direct outreach could prove more fruitful: contacting past contributors and government officials directly. The Open Data Barometer, for comparison, started this year asking directly for government contributions, which I believe will lead to digging up findings that independent researches working alone wouldn’t find before.


#4

Guys, great comments. Just ome answers:

  1. there is a possibility to comment on each submission. This is done because multiple submissions are confusing for the reviewers (feedback from last year). Sure, it’s through disqus, but I think commenting is a good thing. Maybe we can improve the user experience of commenting rather than adding more submissions?

  2. I do not follow - why do you need to know submitters before the submissions? Like in any research, their names will be published in the final result. You can find the submission process in the methodology section.

  3. We have tried to make this as participative as possible, and it takes time to learn. Why the community needs to decide who are the reviewers? We also wrote about the process here.

Guys, feedback is great, but I need more information in order to know how to improve it, specific examples would be good.


#5

@herrmann Criteria is known. This is experimental here, so we are trying our best. It is mentioned in the methodology section as well. We hope that next year we can add a check box to criteria so both reviewers and submitter will indicate better about this.

I do not understand your examples from different sources - we are looking for official federal source. I also do not Brazil like you - what is inmet? what is inpe? I do know we need to allow submissions for more than one agency, this would also be a good improvements for the local indexes. Currently, each reviewer will choose how to decide what is the criteria and how to measure it. They will write their justifications during the process, so be a bit patient with it please.

Lastly, the Index is not government oriented, it is civil society one. I also know that the Barometer contact government for the review process of the Barometer, not contributing. we hope that we can get government reviews as well after this stage is done. I did contact all chapters and ambassadors of Open Knowledge, including the Brazilian one right at the beginning of the Index…

Again, this is not me defending the Index, but I need to know more concrete information in order to keep this record valid and good for the next Index improvement… I would also like to know what we need to keep…


#6

Yes, @Mor, this is mentioned in the [2015 GODI methodology] 1. And from what I understand of it, if a dataset misses just one of the qualifying criteria, it automatically loses all points in the category. Does anyone honestly think that a country that, for instance, publishes national maps every two years but otherwise meet all the other criteria for national maps should be measured no better than a country that doesn’t even have basic national cartography (let alone publish it)? I argue that the current qualifying criteria all-or-nothing rule makes the GODI a very blunt instrument of measurement of how a country is handling open data publication.

As for the check boxes next to criteria on the next year, I think that is indeed a very good idea for improving the accuracy and making sure both contributors and reviewers pay attention to and consider them. +1

[Inmet] 2 is the National Metereology Institute, linked to the [Ministry of Agriculture] 3, and [Inpe] 4 is the National Institute for Space Research, linked to the [Ministry of Science and Technology] 5. Both are official federal agencies that offer weather forecast. But neither of them seem to be concerned with open data at this time.

I’m not sure what you mean when you say the review process of the Barometer, but I can say from first hand experience that we were asked to fill in a blank questionnaire from scratch.

As for the outreach, as I said before, my suggestion is that you should be contacting past contributors directly to inform of the census. In my case, I still wear both hats, having been involved with Open Knowledge for 6 years and with government for 5 years. I have been contributing to the Index since its very first edition (then called the “Open Data Census”). Yet, if I hadn’t had stumbled upon the information here on Discuss, albeit late, I would not have known about it and would not have contributed this time. Either contacting past contributors to the Index or contacting government open data initiatives would have solved this issue. If asking for government contributions is out of question (as evidenced in the methodology, and that is an entirely different discussion), please do try to reach past contributors as the census opens. I hope that qualifies as concrete information for the next Index.

What you should keep is the basic structure of the Index and its crowdsourced nature - which is unique and very different to, say, the Barometer. The new dataset categories added this year were also very relevant additions.


#7

in 2016 I think it would be better to state the full deadline (date/time/timezone) in the global index header when the period opens not just the deadline extended to date (20th sep but unfortunately not then updated to 28th sep and that will have deterred some input)

i was surprised at apparently lower engagement in the process in 2015 or, particularly the places with Open Knowledge groups, the relatively late submissions

i think Open Knowledge should seriously consider nominal funding (or securing more funding) contributions for any key jurisdictions omitted in 2015 – they will always be able to source open government hacktivists that do not have enough cash but some spare hours – i noted that luxembourg

i helped out extensively in 2013 (15+ British influenced offshore jurisdictions) and just some interesting jurisdictions in 2014 such as Saudi Arabia as I could not allocate the same level of unpaid hours

with the reviewer process removed in 2015, to be honest, I would have expected and certainly would still prefer my name to be removed from Jersey and any other places from 2013/2014
mor – could you progress that for me at least please, and perhaps ping any others to confirm preference??

well done though as i believe a no cost benchmark is a very effective tool to lobby any governments that drag their feet on open government or think they are above average when in fact they are not as it helps prompt and frame a debate – even if that focuses a bit too much on relative position in the first year or so of government upping their game and shifting to more engagement with external interested parties – sorry Jersey, Guernsey but i don’t make those rules!! ;O)


#8

I notice that in some submissions people have just clicked “Unsure” to “Does the data exist” and left no comments.

This prevents others from making a better submission. I think adding comments should be mandatory if you are going to answer “No” or “Unsure” to “Does the data exist?”.

People should explain the effort they have taken to try find the data, the searches they performed and the websites they rejected and why.


#9

Just filled in all for Canada and Ireland.

There may be some issues with some of the new indicators:

a) maps, some of the maps may be produced by different agencies, but only one UrL applies, scale may also confuse people
b) water quality, not sure we can ask for weekly data updates, I do not know of any government that does that, and in the context of Canada, impossible
c) hospital admin data and infectious diseases are often available at different organizations

  • it is possible to have bulk download but only accessible behind a paywall
  • it is possible to have data on the web but not open
  • we may need to have the option to add more than one url (i.e. health, maps)

Those are my quick reflections. I will dive into mapping soon.


#10

Is there someone running the Cameroon census?
Let me know please! I kindly want to take the responsibility on behalf of Cameroon.


#11

Thanks all for your answers (I’m @b_ooghe as well as @RouxRC, not sure how nor why this other account got created when posting here…).

@Mor:
Commenting is definitely good and important. But it is not enough: being able to propose counter submissions would make the whole collaborative process way more contributive.

Regarding @Herrmann’s example I believe it is quite simple: imagine for instance for National Statistics that recquire 3 different datasets that one of the three does not exist whereas the two others are fully open. The current form makes it impossible to reflect such situation without any guidelines that would hardly be followed by everyone.

This could be solved in various ways, but it requires a bit more than a single checkbox, for instance:

  • by being able to fill the form multiple times for multiple data entities in a same category, same as when recording multiple data resources to a ckan data package;
  • by having an adjustable percentage gauge to indicate how much of a requirement is actually considered in the evaluation, for instance in France for transport timetable we could this way indicate that data for some trains is open while data for some others is not.

#12

Thanks Open Knowledge team for your work.
In September 2015 the Ministry of Finance of Ukraine launched the transparency project named “E-Data” which is primarily focused on publishing transactions of the National Treasury. And today Ukrainian government published on FB “Ukraine has risen in an annual ranking of countries Global Open Data Index up to 58th place”. The information was spread in minutes.


#13

Thanks to all Global Open Data Index 2015 team (!). About GODI 2016: we can improve it?

We notice that a recurrent problem is about licence and licenses inferences:

  • When there are only implied licenses, how to explain with a “licence name”? How to check a reliable interpretation of that context’s implied license.

  • When use a “exotic license”, how to check the basic clauses of the license?

So, some suggestions:

1) use a “family of the license” in GODI statistics. See concept at okfn/licenses/issues/54

2) Help to construct a dataset of all (open or not) licenses at Core Datasets Project to manage licenses attributes, standard names, etc.


#14

Great points! Another question to consider -

What to do when data is known as creative commons by law but don’t have a license or worse - has copyright notice in the footer of the website?


#15

Hello Natasha,

Ukraine never risen in the Index for the sole reason that Ukraine was not part of last year Index. Did anyone responded to this announcement?


#16

Hi Mor, thanks. Yes, this case is also frequent (!)… The proposal,
https://github.com/ppKrauss/licenses
is also about implied licenses, as you describe, with no explicit citation of licence in the document…
And the worse, yes perhaps is not so rare :smile:
In order to overcome contradictions, we need some RFC-like process to endorse “implied licenses reports” …
The brasilian report is a good example,

https://github.com/ppKrauss/licenses/blob/master/reports/implied-lex-BR-v1.md

There are also other drafts to show that is not difficult,

https://github.com/ppKrauss/licenses/blob/master/reports/implied-lex-IL-v1.md
https://github.com/ppKrauss/licenses/blob/master/reports/implied-lex-IS-v1.md
https://github.com/ppKrauss/licenses/blob/master/reports/implied-lex-US-v1.md

and, of course, the worst case is when we must use the “universal default”,

https://github.com/ppKrauss/licenses/blob/master/reports/implied-berne-v1971.md

:wink:


#17

Hello, Mor!
Don’t worry about this situation. You have responded to the request of Denis Gurskiy. So official information has been corrected. After your explanations, it became clear that one website is just a platform for getting submission, but the other website is the Index site that we need. We don’t compare two years.
My colleagues and I tried to contribute to the development of open data in Ukraine. I am a member of the working team "E-data"project. I noticed that the rating of Ukraine in 2015 was calculated without some factors that could improve the rating. For example, the fact that the data on the website spending.gov.ua are machine readable: file type is CSV. And about Government Budget: data is in the public domain, you can download them, they are free.
What can we do to the index 2016 will calculated as correctly as possible?
Thank you!