2015 Near Term Technical Roadmap for OpenSpending


#1

This thread is about establishing a clear near-term (6-12m) roadmap for OpenSpending technical work. In particular, what (sub)-components (see diagram) should we focus on in what order?

Context

OSEP 1 - from which that diagram comes - sets out a high-level architecture but it does not set out a specific roadmap - i.e. select which components get worked on in what order.

We therefore need to work on a roadmap, especially near-term roadmap (next 6-12m) as a matter of priority.

This topic is a place for discussion of that near-term roadmap. We intend to boot an OSEP document to help frame the discussion here and to record the agreed roadmap arising from this discussion.

Lastly we observe that OSEP 1 also does not provide sufficient detail on specific components of the roadmap to allow for that prioritization to happen effectively. We therefore anticipate that some work here will be detailing sufficient detail on given component to allow for proper priorization of it and/or its subcomponents. As an example, the in progress OSEP 5 starts to give details of the Import/ETL workflow.


OpenSpending Next visualization experiment
#2

#3

Based on discussions we’ve had, the general scope of the work, I think we should proceed by first defining the scope of a spike solution, and iterate out from that.

A good spike solution should enable use to get our data structure solid (OSEP-04) around a few ideal use cases, and demonstrate some very basic (but essential) value to the end user.

It might also be possible to have a small team working on a spike solution, while others concentrate on more general architectural and developer-user experience issues.

A proposed spike solution

Here is a really basic (yet essential) flow that a spike solution could aim to support:

  1. User has a single CSV file of spend data
  2. User interacts with Web UI to model this CSV (create an Open Spending Data Package)
  3. User uploads the (valid) Open Spending Data Package
  4. User can navigate to the data package directory (so, each data package would have an index.html added to it, which provides a formatted view of metadata/data, and a link to the raw data sources as a minimal API)
  5. When User’s Data Package is uploaded, an aggregation task runs on the package (need to define the most basic aggregation task on spend data)
  6. Once aggregation is completed on a Data Package, a link is provided to the aggregate sources (also as CSV), and a simple visualisation over that. (these links could be provided via the index.html of the data package)

This solution could be completed without any user auth/z service, but it wouldn’t be publicly usable (even for real user testing). So, we’d need to consider an auth/z microservice, which could be developed in parallel, or, just simply use oauth via Google or similar for now, just for the spike solution.

Also, some type of task queue would be needed (at least, a way to trigger the aggregation service when a new data package is uploaded). Even if this was mocked for a spike solution, this is another area that could be developed in parallel (and indeed, it is critical for the micro service approach generally).

Components

So, the components of this solution would be:

  • UI to model and load data (note that a CLI POC to load data has been developed, and would form the basis of a UI)
  • S3 (or similar) backend to store data packages
  • Micro service to aggregate data packages when they hit S3
  • Port part of openspendingjs (treemap?) to work with the new data package aggregates

And in parallel, to either be directly integrated into the spike solution, or after the solution:

  • Auth/z service
  • pubsub / task queue service that would eventually bind all Open Spending micro services together

Goals

  • Work out fine details of OpenSpending Data Package
  • Get an idea of how/what type of APIs Open Spending will be able to offer over raw CSV files (and therefore start to spec out use cases for OLAP, arbitrary queries, etc.)
  • Have a basis on which to plan migration of existing data to the new system

Open Spending Data Structure: Ideas and Suggestions
OpenSpending Tech Lead Group
#4

Comments:

  • Add a short summary at the top of what this spike solution does for the user e.g. “User can start from a file (CSV) and have it visualized (and browsable) in a few quick and simple steps”
  • Suggest dropping step 4. I think a way to “browse the catalog” can come later and is probably part of a dedicated flask app. In the mean time we have tools like http://data.okfn.org/tools/view as stop-gaps.
  • Step 6: index.html from step 4 is gone now (see prev point). Suggest that we probably want to have our minimal flask app (or even just html / js app) here so we can have theming and the like.
  • Overall: I think it would be worth mocking the front page for this app right off and thinking of the 1-2-3 parts of this. I think this app (whether flask or o/w) would be where we’d actually implement landing page and step 6 and possibly at least frontend of everything in between.

More generally, here are some thoughts from our discussion on possible MVPs / Spike solutions

  • Start from a file (CSV) and see it visualized (and browsable)
  • Import UI + DataStore (basic) + Aggregation (basic) + Browse and search + Visualization (e.g. treemap)
  • Not needed
    • identity (?)
    • No authorization for datastore (?)
  • MVP 2
  • Start from a file (CSV) and see it visualized
  • Import UI + DataStore (basic) + Aggregation (Basic) + Visualization
  • Not needed
    • Identity
    • Authorization
    • Any API (for viz) beyond CORS + S3
    • Search and browse
  • MVP 3
  • Take existing data from current system and see it visualized
  • MVP2 minus import UI (as we do not need to import - we script conversion)

#5

Hi all,

I’ve booted a simple Jekyll site for all the roadmap and specification work on the new OpenSpending (which I’ve dubbed OpenSpending Next in absence of anything better to call it):

http://labs.openspending.org/next/

Important links in the context of this thread:

  • Roadmap: this takes the points raised in this thread and formalises them into an actionable roadmap
  • Components and Teams: Work will focus on specific components. This section makes those explicit, provides some ballpark estimates, and provides a way to register interest in joining a team

The site is hosted on GitHub Pages:

We welcome pull requests and issues to make suggestions on the existing roadmap, and to help out with the specifications - particular user stories and use cases at this stage. Every page has a link for editing, to make this easier.


Fetching data for various counties separately
#6

Shall we close and archive this topic now in favour of http://community.openspending.org/next/roadmap/


#7

#8