How to handle big data (OpenSpending in Ukraine)

vanuan · December 3, 2016, 5:51pm

In Ukraine we have terabytes of spending data available.
It’s near real-time government spending data on per bank transaction level.
How to handle such a massive array of information?
How much would it cost?
How to make it useful?

vanuan · December 4, 2016, 7:44am

As a first try, I’ve updated an extract of data for 5 days, to this next.openspending platform.

Here’s my experience with OpenSpending:

There are several fields for which I had to choose “Unknown”:
a) transaction description. Unfortunately, this field is unstructured and contains raw description with specifying classification codes, contract identifiers, budgetary codes, court orders, etc in a single raw string. Can OpenSpending in any way help with parsing those raw description fields and extracting valuable information?
b) recipient bank (name and code). I couldn’t find whether I can specify which bank the money goes
c) payer bank (name and code). Similarly, apparently there’s no field dedicated to payer’s bank account
When I visualize recipient, I see its identifier, but not its display name.
When I edit the dataset, it looks like that date format is not saved, it’s always “%Y-%m-%d”.
I couldn’t find a way to enrich my dataset with some external resources. I found that there is some “external hooks” feature. What is it for? There’s also “Data Mine” feature. Is it for enrichment?

Overall, uploading experience is great. But I don’t know how well does it scale.

adam · December 4, 2016, 9:11am

Hey @vanuan!

First things first - I’m opening proper Github issues for all the problems you mentioned. Adding more data types is super easy, and we’ll investigate regarding the rest.

Generally speaking we’ve always wanted to work with spending data, but we didn’t have a proper use case.
If you’re up to it, I’d love to have a short call with you to show you around and answer all your questions - especially around our framework for automatically uploading data to openspending (see the datapackage-pipeline framework and where we use it). With that you can do data enrichment, custom processing and much more.

Hoping to hear from you!

pudo · December 4, 2016, 9:48am

I’m wondering if it may be possible to attach Babbage to Amazon RedShift. It wouldn’t be fast (i.e. multi-second responses) but it would certainly scale very well.

Where does the Ukrainian treasury publish this data, can you share a link?

vanuan · December 4, 2016, 8:38pm

@pudo It’s http://spending.gov.ua/

See also Entry for Government Spending / Ukraine

Topic		Replies	Views
OpenSpending.org moving to read-only mode OpenSpending	8	3626	January 24, 2017
OpenSpending For Dummies OpenSpending	9	1683	February 13, 2016
Data modelling/structure decisions OpenSpending	2	1744	May 29, 2017
Open Spending Data Structure: Ideas and Suggestions OpenSpending	26	4116	October 1, 2015
OpenSpending Next visualization experiment OpenSpending	6	2585	September 23, 2015

How to handle big data (OpenSpending in Ukraine)

Related Topics