W3C CSV for the Web - how does it relate to Data Packages?

herrmann · July 4, 2016, 12:24pm

To clarify what I meant about offline use, W3C’s tabular data model has a couple of ways to do offline processing of CSV metadata:

“Overriding metadata”, by which a user can specify, e.g. in a command line interface, a metadata file to use to validate the CSV
“Embedded metadata”, by which the metadata is inserted as comment headers in the beginning of the CSV file

Neither of those are practical for offline use.

The first offers no formal association between the data and metadata. Supposedly, practitioners might use the same file name and a different extension to link the CSV and the schema files. But that is not part of the standard and you can’t guarantee that data sources will offer data and metadata files named in this way. So offilne data processors are left with guesswork trying to locate the schema.

The second offers a strong link as both data and metadata are provided in the same file. However, most CSV data processing tools probably can’t handle well comments in CSV files and will fail loading a file using this standard, thus lowering compatibility.

Lastly, about the competition of standards, there is also a precedent to having two different W3C standards to do essentially the same thing: see Microdata vs. RDFa. So I see no problem if eventually the W3C accepts both the Tabular Data Model and Tabular Data Packages simultaneously, especially if Tabular Data Packages do gain a lot of traction in practical usage.

It might also be possible to provide metadata using both of the standards simultaneously for the same CSV data, akin to marking up hypertext with both Microdata and RDFa at the same time, but I haven’t really looked into this possibility in detail to see if it is feasible.