Next MVP: CSV to Data Package to Aggregate to Visualization

OK, in the ongoing trialing and thinking through of this workflow after the first experiment, I’ve come up with the following. Excuse the brain dump :smile:.

The visualization code needs one, and likely only one, currency for visualization purposes and it needs a single column of amounts (the ‘measure’). These are stored together in an entry in the “measures” object of the current OpenSpending Data Package draft, so that’s good and probably means the visualization code needs to read the datapackage.json. But a single datapackage can have multiple measures, so how should it choose? Just the first?

The visualization code also needs to know where the Aggregate CSV/JSON lives. The obvious place to pull this from would be the aggregates.json file we’re working off currently, but again, how to specify which one?

Along the same lines: the resources object specifies possibly multiple CSVs: how should the aggregation code determine which resource to aggregate on?

Lastly (and I’m mainly thinking out loud here) the SQL statements in the aggregates.json should reflect the mapped the dimensions and measures specified in the datapackage.json (currently, they refer directly to the original CSV columns). So the aggregation code (somewhere between CSV read and jts-sql load), should store the logical columns as fields in the SQL. That seems right, I think. But then the user must use the logical model to generate her aggregation specification. Is that good?

1 Like