2015 Near Term Technical Roadmap for OpenSpending

rufuspollock · April 18, 2015, 8:48pm

This thread is about establishing a clear near-term (6-12m) roadmap for OpenSpending technical work. In particular, what (sub)-components (see diagram) should we focus on in what order?

Context

OSEP 1 - from which that diagram comes - sets out a high-level architecture but it does not set out a specific roadmap - i.e. select which components get worked on in what order.

We therefore need to work on a roadmap, especially near-term roadmap (next 6-12m) as a matter of priority.

This topic is a place for discussion of that near-term roadmap. We intend to boot an OSEP document to help frame the discussion here and to record the agreed roadmap arising from this discussion.

Lastly we observe that OSEP 1 also does not provide sufficient detail on specific components of the roadmap to allow for that prioritization to happen effectively. We therefore anticipate that some work here will be detailing sufficient detail on given component to allow for proper priorization of it and/or its subcomponents. As an example, the in progress OSEP 5 starts to give details of the Import/ETL workflow.

pwalsh · April 28, 2015, 1:45pm

Based on discussions we’ve had, the general scope of the work, I think we should proceed by first defining the scope of a spike solution, and iterate out from that.

A good spike solution should enable use to get our data structure solid (OSEP-04) around a few ideal use cases, and demonstrate some very basic (but essential) value to the end user.

It might also be possible to have a small team working on a spike solution, while others concentrate on more general architectural and developer-user experience issues.

A proposed spike solution

Here is a really basic (yet essential) flow that a spike solution could aim to support:

User has a single CSV file of spend data
User interacts with Web UI to model this CSV (create an Open Spending Data Package)
User uploads the (valid) Open Spending Data Package
User can navigate to the data package directory (so, each data package would have an index.html added to it, which provides a formatted view of metadata/data, and a link to the raw data sources as a minimal API)
When User’s Data Package is uploaded, an aggregation task runs on the package (need to define the most basic aggregation task on spend data)
Once aggregation is completed on a Data Package, a link is provided to the aggregate sources (also as CSV), and a simple visualisation over that. (these links could be provided via the index.html of the data package)

This solution could be completed without any user auth/z service, but it wouldn’t be publicly usable (even for real user testing). So, we’d need to consider an auth/z microservice, which could be developed in parallel, or, just simply use oauth via Google or similar for now, just for the spike solution.

Also, some type of task queue would be needed (at least, a way to trigger the aggregation service when a new data package is uploaded). Even if this was mocked for a spike solution, this is another area that could be developed in parallel (and indeed, it is critical for the micro service approach generally).

Components

So, the components of this solution would be:

UI to model and load data (note that a CLI POC to load data has been developed, and would form the basis of a UI)
S3 (or similar) backend to store data packages
Micro service to aggregate data packages when they hit S3
Port part of openspendingjs (treemap?) to work with the new data package aggregates

And in parallel, to either be directly integrated into the spike solution, or after the solution:

Auth/z service
pubsub / task queue service that would eventually bind all Open Spending micro services together

Goals

Work out fine details of OpenSpending Data Package
Get an idea of how/what type of APIs Open Spending will be able to offer over raw CSV files (and therefore start to spec out use cases for OLAP, arbitrary queries, etc.)
Have a basis on which to plan migration of existing data to the new system

rufuspollock · April 30, 2015, 12:16pm

Comments:

Add a short summary at the top of what this spike solution does for the user e.g. “User can start from a file (CSV) and have it visualized (and browsable) in a few quick and simple steps”
Suggest dropping step 4. I think a way to “browse the catalog” can come later and is probably part of a dedicated flask app. In the mean time we have tools like http://data.okfn.org/tools/view as stop-gaps.
Step 6: index.html from step 4 is gone now (see prev point). Suggest that we probably want to have our minimal flask app (or even just html / js app) here so we can have theming and the like.
Overall: I think it would be worth mocking the front page for this app right off and thinking of the 1-2-3 parts of this. I think this app (whether flask or o/w) would be where we’d actually implement landing page and step 6 and possibly at least frontend of everything in between.

More generally, here are some thoughts from our discussion on possible MVPs / Spike solutions

Start from a file (CSV) and see it visualized (and browsable)
Import UI + DataStore (basic) + Aggregation (basic) + Browse and search + Visualization (e.g. treemap)
Not needed
- identity (?)
- No authorization for datastore (?)
MVP 2
Start from a file (CSV) and see it visualized
Import UI + DataStore (basic) + Aggregation (Basic) + Visualization
Not needed
- Identity
- Authorization
- Any API (for viz) beyond CORS + S3
- Search and browse
MVP 3
Take existing data from current system and see it visualized
MVP2 minus import UI (as we do not need to import - we script conversion)

pwalsh · May 21, 2015, 9:27am

Hi all,

I’ve booted a simple Jekyll site for all the roadmap and specification work on the new OpenSpending (which I’ve dubbed OpenSpending Next in absence of anything better to call it):

http://labs.openspending.org/next/

Important links in the context of this thread:

Roadmap: this takes the points raised in this thread and formalises them into an actionable roadmap
Components and Teams: Work will focus on specific components. This section makes those explicit, provides some ballpark estimates, and provides a way to register interest in joining a team

The site is hosted on GitHub Pages:

https://github.com/openspending/next

We welcome pull requests and issues to make suggestions on the existing roadmap, and to help out with the specifications - particular user stories and use cases at this stage. Every page has a link for editing, to make this easier.

rufuspollock · July 7, 2015, 4:12pm

Shall we close and archive this topic now in favour of http://community.openspending.org/next/roadmap/

Topic		Replies	Views
Open Spending Data Structure: Ideas and Suggestions OpenSpending	26	4284	October 1, 2015
OpenSpending Next visualization experiment OpenSpending	6	2661	September 23, 2015
When is next OpenSpending Community Hangout and Tech Update? OpenSpending	6	1952	March 22, 2016
Open Spending Next : Update and Teaser OpenSpending	0	1532	April 27, 2016
OpenSpending Tech Lead Group OpenSpending	2	1298	August 20, 2015

2015 Near Term Technical Roadmap for OpenSpending

Context

A proposed spike solution

Components

Goals

Related topics