"Not available free online" grading is not implemented correctly?



If a dataset is free but requires registration, it’s graded as “not available free online”.

This criteria seems to penalise countries that try to provide more real-time data. It seems somewhat counter to the open data movement, even.

In the case of Singapore, we provide realtime updates for our Pollutant Emissions and Weather Forecast datasets. As with most API data providers, we require users to register for practical reasons – we cannot provide unlimited bandwidth for such high-frequency resources.

Perhaps it would be more relevant to consider whether registration process is easy or not, and whether access is granted immediately or with some delay.

@Lin_Zhaowei,

this is a fundamental point and caused headaches for last year’s index (see this thread for further info).

All of your mentioned points are correct. The “API issue” is a matter of balancing the functionalities of real-time access and user-friendliness with 1) open data’s technical requirement of access to ‘complete’ data (bulk data) and 2) open data’s normative requirement of allowing users to choose whether they want to register or not.

Let me answer to your point that GODI penalizes API access as “not available free online”. This is only partly true. If you go to the submission page and click on the question mark-symbol next to question B2 you can see following text field popping up:

Answer “Yes”, if the data are made available by the government on a public website. Answer “No” if the data are NOT available online or are available online only after registering, requesting the data from a civil servant via email, completing a contact form or another similar administrative process.

APIs normally do not qualify for the question because they make registration mandatory. If a country only provides data through an API it loses 10 points for question B2. Yet, we do not say that this data is not available for free which would mean that you have to pay for access (asked in B4) and which would block the open license question (B7). Instead, the GODI 2016 edition allows to continue the survey - an improvement that brings more clarity compared to former editions.

We assume that API access is definitely a great feature (see my comments below), and we regard it as a valuable additional form of accessing data. But there are other reasons why we didn’t include API access into the scoring:

The Global Open Data Index is a cross-country assessment. We want to make the results of the assessment comparable across countries. Therefore we base our requirements of necessary weather and air quality data on governmental documentation standards as well as use cases that are not time-critical so that they would require a real-time data provision. For instance some pollutants are provided by governments only in 8-hour rhythms. We kept our definition of timely data provision broad so it can apply to all pollutants across the board. We definitely lose some detail here, but circumvent to become overly confusing by providing different timeliness criteria for different data elements.

However, we allow submitters to document API access in question B2 because this is valuable contextual information. For the future we will consider possibilities to score different forms of data access adjusted to whether the data are time-critical or not. However, this would come at the expense of making our assessment more complex too.

@dannylammerhirt,

Many thanks for the detailed response and the link to the interesting discussion.

I fully agree with you that the GODI assessment must be made comparable and requirements should be broad and not overly complex. I’m also pleased to hear that this year’s survey has been streamlined to allow for greater clarity for datasets accessible via API.

I think the 2015 discussion, which focused on the bulk/historical vs realtime dichotomy, overlooked one additional possibility – that an API can provide both realtime and historical data. For Data.gov.sg, we’ve made it a point to store all historical data (from the point where we started publishing the APIs) so that users can query and download the data in bulk via API.

What I think needs to be clarified is, whether the need for registration in itself is seen as counter to open data. I’ve relooked the OKFN’s Open Data Definition (http://opendefinition.org/od/2.1/en/) and there is no explicit clause on this. While registration is admittedly an extra step, in our case access is immediately granted by default.

I feel that our expected score in this metric makes it seem that we proactively restrict usage when that is not the case.

Lin_Zhaowei

On a side note, we are starting to work on making CSV dumps available for realtime datasets as well. This is to cater to less technical users who may want to analyse the data in bulk. But due to the size of certain realtime datasets, it’s going to take us some time to figure out the best implementation.