Columns with multiple attributes in tabular data


#1

I am not sure as to the most elegant way to express variables with multiple attributes in the tabular table schema. For example, variable A could have a mean value and a standard distribution, as could column B. One implementation I have seen in the Open Power System Data is to use labels like “A_mean” and “A_standard_deviation”, but this is quite ugly compared to the elegancy of the rest of the standard. I had originally hoped to be able to require that each data table would only cover one variable, so this wouldn’t be a problem, but it looks like our working group wants to have both uncertainty and variability specified separately (as probability distributions) for each variable. There is no guarantee that A and B would have the same column labels, BTW, so including the variable name as a new column probably wouldn’t work.

It is noteworthy that OPSD also provide data in a non-data package compliant multiindex format.

For context: I am part of a working group trying to come up with a better standard for one part of sustainability assessment; motivation blog post, draft standard.

Sorry if this is a duplicate question, I did try to search the standard and forums.


#2

Hi @cmutel I scribbled some notes about this as I thought about a way to describe data quality as a set of metrics and measures. There is also issue #364 in the Specs repo that’s worth a look.

I hope that helps.