Machine Readable HTML

jalbertbowden · October 19, 2015, 10:22pm

I noticed that HTML is still not considered machine readable…
If its properly formatted, its machine readable, and I think not letting it be an option is not the right approach, although I’m not 100% on where to go from there.

Perhaps making the submitter ensure the markup is valid and microformats/schema validate as well?

Essentially, I think in a few cases there will be machine readable data that will be marked as not because the data is in HTML.

Also think this plays into the worldwide phenomenon of not caring about markup, which hurts no one more than the end user(s), and is a plague on the web.

Mor · October 20, 2015, 12:07pm

Hello,

Great question!

The reasons for that is simple:
1, most of our submitters do not have the technical knowledge to differ between what if properly formatted and what is not. I think that government will also find it difficult to understand.
2. It still means that the data need to be scraped, instead of handed as is to the user. Scraping data takes a lot of time and effort that could be saved if the data was already ready in an easy to process format.

I invite to dive into some of the index results, especially in developing countries, and look at the HTML there, I promise you that you would be shocked from some of the website there.

I hope this help!

Jasmine_Lai · November 2, 2015, 6:55am

Hi,

Is quering records from the data query tool considered machine-readable? Our customers offer a comprehensive database via data query tool to allow public retrieve data according to the parameters entered. I felt this is an excellent method of sharing data and it’s real-time data extraction. It is human-friendly but not neccessarily machine friendly though. We do have to define machine-readable to be adhering to specific file formats, license-free etc.

Thanks.

Mor · November 2, 2015, 2:50pm

Hi @Jasmine_Lai - This year, we decided across the board that search interfaces are not bulk and not machine readable. The reason for this is that it does not meet the Opendefinition and it also doesn’t allow users to have full access that will allow them to analyse the data in a sufficient way.
I hope that’s answer the question.

jalbertbowden · November 2, 2015, 5:14pm

i agree with 1 and 2, although once you have a process for scraping, particularly with microformats, it should be relatively painless.
no need to dive into results, i’m well aware of the horrible state of markup being produced on the web.

i appreciate your response, but still have to say that labeling all HTML as not machine readable is incorrect, and only furthers the illusion in most circles that markup doesn’t matter.

thanks for the response