Next MVP: CSV to Data Package to Aggregate to Visualization

danfowler · September 17, 2015, 1:12pm

[
  {
    "columns": ["func1"],
    "year": "2006",
    "aggregate": "approved",
    "where": "transfer = 'Excluding transfers' AND Econ_func = 'Function'",
    "file": "by-func1-2006",
    "format": "csv"
  }
]

In the recent tech hangout, there was much concern over building a simple aggregation spec that used raw SQL to define aggregates. I was also concerned with some of the redundancy inherent in this approach (e.g. “SELECT a,b,c … GROUP BY a,b,c”). In trying to build an script that imports fully normalized datapackages and runs aggregates on the data, I, of course, ran into the problem of needing to join lots of tables.

It turns out that when using SQLAlchemy, it is easier to use a more structured query specification to define these aggregates while also eliminating the redundant GROUP BYs. The above is what I came up with while working on a particular dataset that happened to have a year column and needed special conditions (e.g. “transfer = ‘Excluding transfers’ AND Econ_func = ‘Function’”) to generate a coherent aggregate.

What do you think? Should the “aggregate” column be renamed? How should I handle specific, uncommon conditions without creating a brand new Query Language?

Topic		Replies	Views
OpenSpending Next visualization experiment OpenSpending	6	2661	September 23, 2015
Data modelling/structure decisions OpenSpending	2	1804	May 29, 2017
OpenSpending For Dummies OpenSpending	9	1746	February 13, 2016
OpenSpending Data Package Structure OpenSpending	9	2264	May 26, 2015
Fiscal Data Package versus XBRL? OpenSpending	14	2413	May 17, 2021

Next MVP: CSV to Data Package to Aggregate to Visualization

Related topics