Data Availability and Machine-readability


#1

In terms of the digital form criteria, even if the data is not publicly available, we can say “yes” when it’s in a digital form inside the government (e.g. in the relational database system). Does this also apply to other criterias such as machine-readability?

In some cases, the data is publicly available, but not open-licensed, and it’s available only in HTML (i.e. found on the Web only), but not in a machine-readable format, even though we know the data is in a digital form and machine-readable inside the government. In these cases, can we still say “yes” to the machine-readability criteria?

In the case of Government Spending of Korea, where the dataset is in a digital form but only high-level parts are publicly available, I said “yes” to the machine-readable criteria since the available parts are machine-readable and it seems that there is a feature for government officials to export data in Excel.

Before I go further, I need clarification on this. Any thoughts, @Mor? :wink:


Assessing open data or official data?
#2

As always, great questions @jgkim!

So, I would say that the point of analysis is the data that is out there and published, not the one that the government works with. The reason is that we want to see the state of the open data. So if the data that the government published a document in PDF, although we can clearly see it’s an excel, I would still mark it as non-machine readable. Users of the data would not be able to use it if it’s not in machine readable form, and I don’t want to reward government for not making the data fully open to the public, only to themselves. In you case, if the data is only published in HTML, it is not machine readable.

BTW - be very careful with Spending dataset, we are looking for a very detailed dataset, not only high level.


#3

I tried my best with the spending dataset. As far as I can tell, the spending dataset of the Republic of Korea does exist, but only high-level data is publicly available. So, I answered like below:

  • The data exists.
  • It’s in digital form.
  • It’s NOT publicly available.
  • It’s NOT available for free.
  • It’s NOT online.
  • It’s machine-readable. (Because the available high-level data is machine-readable)
  • It’s NOT available in bulk.
  • It’s NOT openly licensed.
  • It’s provided on a timely and up to date basis. (Because the available high-level data is up to date)

Am I missing something here?

Thanks for the answer as always, @Mor. :smile: