Rufus,
Thanks for the history – I wasn’t aware of HXL, and I figured that all of this had been thought about before, but I didn’t know about it. thanks.
HXL is a somewhat different thing. Metatab isn’t row oriented in the same way. Metatab is not really a CSV data file, it’s a grid format for structured data, sometimes stored in CSV. Since it’s not strictly row-oriented, the format can store general structured data, including data package metadata that has multiple resources.
The Metatab and Datapackage.json formats are homologous; one can be converted to the other. For instance, here is a Metatab input that creates the example datapackage.json for the GDP package, from the data package documentation:
“a data package has many resources.” The number of resources hasn’t been a problem so far; I’m also working on a Metatab version of the metadata for the US census, which has about 100 resources, 1000 tables and 9000 fields.
"If you have this as well as JSON you have another format to support " Not entirely; the two formats are homologous and the conversion between datapackage.csv and datapackage.json is completely programatic. If you serialize a JSON file to metatab, and then convert back to JSON, you get the input file ( with canonicalization), for any possible JSON file. So, Metatab can use the same tool chain as datapackage.json, by converting to JSON first. If the data package tool chain included the Metatab python parser, it could work with a datapackage.csv file in exactly the same was as a datapackage.json file.
" or multiple metadata sheets." For our use case, in which the metadata is stored in Excel, this is actually an advantage. Our users will be submitting Excel files into a workflow, and the Excel file will have the data and metadata in separate tabs. The largest section of the metadata, the schema, is also in a separate tab. This separation make it easier to use, but, of course, it’s more tailored to Excel, not to CSVs in a zip file.
“Representing this in tabular structure leads either to complex structure in your single metadata sheet” I don’t think the Metatab structure is more complex than JSON, and is actually much easier to read than JSON, especially for non tech user. ( Particularly for spreadsheets, where we can add color and styling to separate sections. )
“Humans: pretty readable and editable by experts (coders)” That is the crux of our problem; our metadata creators aren’t coders, so if we don’t give them a more familiar way to create metadata, we don’t get any metadata.
Also, Metatab makes it possible to entirely manage a CKAN data package from a Google Spreadsheet.
Since we’re committed to the Metatab format for a pilot project in California, it would be really valuable to be able to learn from your experience and thoughts from the prior work you’d done on a tabular metadata format. Could you refer me to prior work or documents?
Perhaps, as a next step, you could propose a test case to validate the format against? I’d be happy to put together a demo.