2015 Near Term Technical Roadmap for OpenSpending

pwalsh · April 28, 2015, 1:45pm

Based on discussions we’ve had, the general scope of the work, I think we should proceed by first defining the scope of a spike solution, and iterate out from that.

A good spike solution should enable use to get our data structure solid (OSEP-04) around a few ideal use cases, and demonstrate some very basic (but essential) value to the end user.

It might also be possible to have a small team working on a spike solution, while others concentrate on more general architectural and developer-user experience issues.

A proposed spike solution

Here is a really basic (yet essential) flow that a spike solution could aim to support:

User has a single CSV file of spend data
User interacts with Web UI to model this CSV (create an Open Spending Data Package)
User uploads the (valid) Open Spending Data Package
User can navigate to the data package directory (so, each data package would have an index.html added to it, which provides a formatted view of metadata/data, and a link to the raw data sources as a minimal API)
When User’s Data Package is uploaded, an aggregation task runs on the package (need to define the most basic aggregation task on spend data)
Once aggregation is completed on a Data Package, a link is provided to the aggregate sources (also as CSV), and a simple visualisation over that. (these links could be provided via the index.html of the data package)

This solution could be completed without any user auth/z service, but it wouldn’t be publicly usable (even for real user testing). So, we’d need to consider an auth/z microservice, which could be developed in parallel, or, just simply use oauth via Google or similar for now, just for the spike solution.

Also, some type of task queue would be needed (at least, a way to trigger the aggregation service when a new data package is uploaded). Even if this was mocked for a spike solution, this is another area that could be developed in parallel (and indeed, it is critical for the micro service approach generally).

Components

So, the components of this solution would be:

UI to model and load data (note that a CLI POC to load data has been developed, and would form the basis of a UI)
S3 (or similar) backend to store data packages
Micro service to aggregate data packages when they hit S3
Port part of openspendingjs (treemap?) to work with the new data package aggregates

And in parallel, to either be directly integrated into the spike solution, or after the solution:

Auth/z service
pubsub / task queue service that would eventually bind all Open Spending micro services together

Goals

Work out fine details of OpenSpending Data Package
Get an idea of how/what type of APIs Open Spending will be able to offer over raw CSV files (and therefore start to spec out use cases for OLAP, arbitrary queries, etc.)
Have a basis on which to plan migration of existing data to the new system

Topic		Replies	Views
When is next OpenSpending Community Hangout and Tech Update? OpenSpending	6	1957	March 22, 2016
Open Spending Data Structure: Ideas and Suggestions OpenSpending	26	4302	October 1, 2015
OpenSpending Next: Status Update Posts	2	2632	April 8, 2016
Open Spending Next : Update and Teaser OpenSpending	0	1536	April 27, 2016
OpenSpending For Dummies OpenSpending	9	1751	February 13, 2016

2015 Near Term Technical Roadmap for OpenSpending

A proposed spike solution

Components

Goals

Related topics