Help text and tool tips - 6. Is the data machine readable?

Stephen · August 4, 2015, 11:15am

As mentioned over here, I’m proposing shorter text for the nine census questions. Here’s the 6th suggestion.

Question 6: Is the data machine readable?

Files are digital, yes, but not all can be processed or parsed easily by a computer. In order to answer this question, you would need to look at the datasets file type. As a rule of thumb the following file types are machine readable:

XLS

CSV

JSON

XML

If the files are in the following formats, the are NOT machine readable:

HTML

PDF

DOC

JIF

JPEG

PPT

If you have a different file type and you don’t know if it’s machine readable or not, send an email to the Open Data Census list.

Proposed:

Data is machine readable if it is structured and can be automatically read and processed by a computer. Common machine readable file formats include CSV, XLS, JSON, XML, RDF, SHP.

The following file formats are NOT machine readable, HTML, PDF, DOC, GIF, JPEG, PPT.

I’ve dropped the email statement as we now have Support in the top menu that links to this forum.

dirdigeng · August 4, 2015, 12:58pm

There has recently been a big debate about this on the Open Definition mailing list. The key requirement is that the content is easily processible and modifiable by a computer. That means that a JPG of a document is not “machine readable” but a JPG of a picture of a person is “machine readable”. A more difficult case is a JPG of a map tile - although the definition of “national map” would currently allow raster maps as well as vector or CAD maps.

Mor · August 12, 2015, 1:57pm

This is a really good point. I think that for the current dataset, jpg is not applicable as Machine readable. Let’s play it by ear, we can always change the help text when needed.

Mor · August 12, 2015, 1:59pm

Im leaving the support to the forum still.
And keeping it in bullet points, This is one of the most problematic question in the forum and need to be as clear as possible. I will think what to do with the tool tip.

cosnate · August 13, 2015, 8:39pm

A JPEG probably becomes machine readable when the JPEG has meta data and when a tool becomes freely and widely available that scans the jpeg and assigns meta data to it from a larger database? Like Facial recognition software?

cosnate · August 13, 2015, 8:43pm

The shortened version is better IMO.

I still have a lot of navigation problems with the support forums. The menus and logins don’t stay consistent across the hyperlinked structure. There is probably no cheap and easy way to consolidate that. Disqus is good in that it tracks my questions for me and provides link backs. Versus the scenario where I would leave a question and never be able to find my way back to the answer. I think some type of common bread crumb menu might improve navigation. I have a github and a facebook login working on the site but still don’t have the permissions set I need. I’ve been in 3 different support areas I think.

Topic		Replies	Views
Let's create a knowledge base for machine-readable formats + open file formats Open Data Index	1	1017	July 18, 2016
GODI survey 2016 - new question inserted to measure machine-readability / processability Open Data Index	1	1047	September 3, 2016
How to score PDFs? Global Open Data Index 2016	3	1195	March 10, 2017
Entry for draftlegislation / fr Global Open Data Index 2016	6	2214	December 15, 2017
Machine Readable HTML Open Data Index	4	2187	November 2, 2015

Help text and tool tips - 6. Is the data machine readable?

Related topics