Publishing quality Data

What makes open data not quality-ready?
Cowardice to open interesting data
Obscure formats
Trying to opening “perfect” open data leads to not open data at all

-Access to API is in some cases (VBB Brandenburg) very difficult)
-Standardization can be too confusing ('confusion of standards")
-Big XML are sometimes a barrier to data quality
-When documentation is missing it is not good quality

  • Data production bias - we need to be aware of coverage and make sure it is looking like the entire population. You need to be aware of the sample - importance of the methodology of the sample as well.
    -Timely of the data and the updates.
  • There is a lot of data productions that is in a non-machine processable format - need to structure the unstrcuture
  • no capacity to high quality data, speacialized capacity in the topic that you cover.

Examples for bad data -

Ecuador public data education - Change of IT systems creates friction in the data, no quality control and it is not open to the public.

German Member of Parliament Income Data (published on website)
-only categories (10 categories ending with 100.000+)
-no given format (not machine processing possible)
-not updated

EcoCounter API (enumerates bikers and )
-Error margin is about 20%
-4 different PHP files with different parameters → very confusing
-outcomes: GO-Json in bad structure
-no access allowed

Knesset Open Data
-No unique identifier of law bill (changing to different comitees)
-No documentation about the API
-Tech-team does not understand the basic concepts
-Block scraping from website

What problem do we need to tackle first?

  • Documentation - We need not to create a perfect documentation, but release it fast and often over time, it is always evolving, How can we create better documentation processes? How do you deal with noise?

  • Standards - How can we become more aware to standards and to make data more structure?

  • How can we praise good examples so we can learn from (without over glorify them)

  • How can we give feedback about bad data and not only complain but to give feedback.

  • How can be more consistent?

(and thank you Markus for helping taking the notes).

2 Likes

Thanks for sharing the notes. If would be great if people could share examples of good open data documentation and good open standards. Perhaps by reviewing these we could make a template of what must or should be included and how.

I think using OpenAPi (swagger.io) is one potential exemplar for documenting an API and letting you try it out before deciding to use the data.

That is a nice post related to sharing valuable info regarding open data documentation that helps to learn more about its standards.