Help text and tool tips - 6. Is the data machine readable?

As mentioned over here, I’m proposing shorter text for the nine census questions. Here’s the 6th suggestion.

Question 6: Is the data machine readable?

Files are digital, yes, but not all can be processed or parsed easily by a computer. In order to answer this question, you would need to look at the datasets file type. As a rule of thumb the following file types are machine readable:

  • XLS
  • CSV
  • JSON
  • XML

If the files are in the following formats, the are NOT machine readable:

  • HTML
  • PDF
  • DOC
  • JIF
  • JPEG
  • PPT

If you have a different file type and you don’t know if it’s machine readable or not, send an email to the Open Data Census list.


Data is machine readable if it is structured and can be automatically read and processed by a computer. Common machine readable file formats include CSV, XLS, JSON, XML, RDF, SHP.

The following file formats are NOT machine readable, HTML, PDF, DOC, GIF, JPEG, PPT.

I’ve dropped the email statement as we now have Support in the top menu that links to this forum.

There has recently been a big debate about this on the Open Definition mailing list. The key requirement is that the content is easily processible and modifiable by a computer. That means that a JPG of a document is not “machine readable” but a JPG of a picture of a person is “machine readable”. A more difficult case is a JPG of a map tile - although the definition of “national map” would currently allow raster maps as well as vector or CAD maps.

This is a really good point. I think that for the current dataset, jpg is not applicable as Machine readable. Let’s play it by ear, we can always change the help text when needed.

And keeping it in bullet points, This is one of the most problematic question in the forum and need to be as clear as possible. I will think what to do with the tool tip.

A JPEG probably becomes machine readable when the JPEG has meta data and when a tool becomes freely and widely available that scans the jpeg and assigns meta data to it from a larger database? Like Facial recognition software?

The shortened version is better IMO.

