Publishing quality Data

iodc
unconference

#1

What makes open data not quality-ready?
Cowardice to open interesting data
Obscure formats
Trying to opening “perfect” open data leads to not open data at all

-Access to API is in some cases (VBB Brandenburg) very difficult)
-Standardization can be too confusing ('confusion of standards")
-Big XML are sometimes a barrier to data quality
-When documentation is missing it is not good quality

  • Data production bias - we need to be aware of coverage and make sure it is looking like the entire population. You need to be aware of the sample - importance of the methodology of the sample as well.
    -Timely of the data and the updates.
  • There is a lot of data productions that is in a non-machine processable format - need to structure the unstrcuture
  • no capacity to high quality data, speacialized capacity in the topic that you cover.

Examples for bad data -

Ecuador public data education - Change of IT systems creates friction in the data, no quality control and it is not open to the public.

German Member of Parliament Income Data (published on website)
-only categories (10 categories ending with 100.000+)
-no given format (not machine processing possible)
-not updated

EcoCounter API (enumerates bikers and )
-Error margin is about 20%
-4 different PHP files with different parameters -> very confusing
-outcomes: GO-Json in bad structure
-no access allowed

Knesset Open Data
-No unique identifier of law bill (changing to different comitees)
-No documentation about the API
-Tech-team does not understand the basic concepts
-Block scraping from website

What problem do we need to tackle first?

  • Documentation - We need not to create a perfect documentation, but release it fast and often over time, it is always evolving, How can we create better documentation processes? How do you deal with noise?

  • Standards - How can we become more aware to standards and to make data more structure?

  • How can we praise good examples so we can learn from (without over glorify them)

  • How can we give feedback about bad data and not only complain but to give feedback.

  • How can be more consistent?

(and thank you Markus for helping taking the notes).


#2

Thanks for sharing the notes. If would be great if people could share examples of good open data documentation and good open standards. Perhaps by reviewing these we could make a template of what must or should be included and how.

I think using OpenAPi (swagger.io) is one potential exemplar for documenting an API and letting you try it out before deciding to use the data.