Odd technical/legal categorizations of index criteria


#1

Over at this post @mlinksva raises a question about the Open Data Index Methodology. I’ve copied the issue from GitHub so more people can see it (as suggested by @Mor)

http://index.okfn.org/methodology/ says

Each dataset in each place is evaluated using nine questions that examine the technical and the legal openness of the dataset. In order to balance between the two aspects, each question is weighted differently and worth a different score. Together, the six technical questions are worth 50 points, the three legal questions are also worth 50 points.

The following questions examine technical openness:

Does the data exist?
Is the data in digital form?
Is the data available online?
Is the data machine-readable?
Is it available in bulk?
Is the data provided on a timely and up to date basis?

The following questions examine the legal status of openness:

Is the data publicly available?
Is the data available for free?
Is the data openly licensed?

Not clear to me how public availability and price are legal statuses.

Also not clear how some of the technical criteria are… technical.

Except for possibly the licensing and machine-readability criteria, the technical/legal distinction seems artificial and arbitrary.

I’d consider eliminating the distinction. If the 3 currently categorized as ‘legal’ criteria are worth half the score, it should because they are worth that, not because of an arbitrary and odd assignment into legal, and same for the ones currently assigned to technical.

What do you think?


Penalty for no bulk data not appropriate for realtime or big data
#2

I think that it does make sense if we interpret “legal” broadly as meaning the conditions of access:

  • Not publicly available is de facto exclusion
  • Free available is a condition of the license (i.e. the charging of a fee) - so “legal”
  • Openly licensed - this fits with the open definition openly licensed (i.e. licenses permits freedom to use, reuse and redistribute etc)

#3

I could argue most of the ‘technical’ points are conditions of access as well. I don’t get your second point at all – that’s not the case for any Open license. I don’t really care, just seems rather odd to me still.


#4

I tend to agree with @mlinksva. Perhaps it would be more meaningful to explain why each question received its individual weight and not get hung up on grouping questions. I think the explanations could have an educational value.


#5

@Stephen - can you elaborate what you mean when you say explanations? To be honest, every wighting system is arbitrary, and it depends at the end of the day on what we want to put our emphasis on. I think we give more points to machine readable and license since they are crucial points which most countries failed at (specifically license, most of the datasets are red because of this).

If you have any concrete ideas for this sections, I would be happy to incorporate them in the upcoming Index and make this section clearer.


#6

By “explanations” I meant explain why “openly licensed” gets a weight of 30 and bulk gets 10.

I don’t think @mlinksva or I were suggesting to change the weights of each question - just that the grouping didn’t make sense. Questions should be weighted, as you say, where you want to place emphasis - not by starting with a 50-50 split between “technical” and “legal” groupings.

So, as a suggestion:

  • drop the legal-technical grouping in the methodology
  • for each question, assign a weight and explain why it is more/less important, e.g.
    • Open licence means you can reuse it - that’s really important hence our weight is 30.
    • If the data is not available in bulk it’s not the end of the world - maybe there’s an API or you can download multiple files - hence our weight is 10.