At Open State Foundation we’re reviewing different standards to see if they can be used to represent (aggregated) financial data of governments (budget and spending to be exact. We were mostly wondering if you have looked at XBRL specifically when developing this format and why CSV was chosen as a file format for the actual data? Any why was CSV better than XBRL?
We also do have some questions on how to model this kind of data properly in FD, since governments will be reporting the data at several aggregation levels (A top-down kind of pyramid model, so to speak ) and how to correctly represent that? Ie. does it mean that every aggregation level will need it’s own file? And how can we specify the ordering at the aggregated levels?
Hi @breyten, thanks for your question! I’m sorry for the late reply - I missed your post earlier and now moved it to the OpenSpending category which is more suitable for this discussion. I’m pinging @adam here who’s best to address your question as spec lead, and @brook may add some thoughts as tech lead on OpenSpending.
I talked briefly with your colleague Tom Kunzler about this yesterday and my understanding is that you look to represent nested data and were expecting something a bit more flexible than CSV so you’re able to do that, as well as include the kind of formulas and references that XBRL supports. Could you maybe share a link to an example file that explains what you look to model and elaborate a bit on the use case in which this modeling would take place? Many thanks!
First a clarification - the Fiscal Data Package is a standard for packaging fiscal data with its metadata in a ‘data package’. The datapackage uses a json file to describe its contents - which is usually CSV files when working with tabular data, but could be any other file as well. Similarly, XBRL uses XML as its underlying data format.
So, to rephrase your question - why is a CSV file better than an XML file? Because everyone has software that can open and analyze CSV files, while almost no one has the necessary skills or tools to work with an XBRL flavored XML file.
This example touches the fundamental difference between FDP and XBRL - FDP was designed as an open standard, while XBRL wasn’t. Each part of the FDP standard is meant to be as portable and compatible as possible with as many tools and systems. It is much less strict than XBRL, and will adapt and embrace various fiscal systems and methods. It has extensive tooling, and as mentioned above, one doesn’t even need to have it to make use of the data.
XBRL, on the other hand, is very complex. It’s quickstart documentation in the official website advises not to attempt and work with it on your own, but to use one of the available commercial software suites instead. The ‘hello world’ package is a zip file containing 40 different, interconnected, XML files.
Now, don’t get me wrong - being strict and complex does have its merits; for example, in case you’re a regulator that needs to inspect large financial institutions. However, when we talk about making data accessible and useful to the public, openness, usability and simplicity trump strictness.
As to how to model the data -
In my experience (working with government budgets and spending data from all around the world), the best way to publish data is in its most denormalized form. Publishing structured data will always make it more cumbersome to process and analyze, and is often better represented as a combination of inter-connected flat tables. These flat tables support hierarchies (even multiples of them, e.g. administrative, functional & economic classifications in budget files etc.).
If you’d like, we can schedule a call where we will go over your data, a few examples from around the world and see what model fits best your needs.
this question is for all:
i keep hearing about XBRL and its elements used to tag data.
how is this different from HTML?
i’ve asked this question multiple times now from people in the XBRL space and cannot get a straight answer.
@jalbertbowden fundamentally, HTML and XML are very similar - in fact, both are extensions of a more primitive standard, SGML.
There are major differences, though - HTML is normally used to describe web pages and how to render them visually. XML is primarily used to describe and transport structured data.
HTML has a set of predefined elements, defined in the HTML standard, while XML is flexible and supports any elements that the specific application needs. Specifically, XBRL is a set of schemas and rules applied on XML files to support a standard method for conveying structured fiscal data.
As for example files, this is a bit difficult (mostly because XBRL in itself is quite difficult to work with), and we don’t have the files (yet?) to show you the exact difficulty we’re trying to look at. The hierarchy of the data should be on average about 4-5 levels deep. Up until now we got Excel files which are denormailzed in such a way the the levels are individual columns. This sort results in a file of a few thousand lines, which is ok.
However in this process we’re not able to determine the exact order of the columns (Because of limitations in the software that generates these files), so we’re looking for alternative ways to represent this data – with FDP this would be possible by assuming the list in the metadata file is ordered once we let each level have it’s own file, so to speak.
So to elaborate a bit more I have two more questions:
Do you have any experience in processing the CSV files on a larger scale, Ie. with different encodings and separators and such?
Is there any practice of combining all the files? Ie. should I make a zip or a tar ball for easier submission of these packages? is there any common practice?
Thanks for the response, but I asked how XBRL differed from HTML.
HTML is not normally used to describe how to render web pages visually, CSS is; HTML is the foundation, CSS is the style.
HTML is primarily used poorly across the web, however, even then it is used to describe and transport structured data. In ideal implementations, this is amplified by utilizing proper markup, as well as microformats, microdata, and schema, all of which are rules/schemas applied to HTML documents that support standard methodologies for conveying structural fiscal data.
I get that you want to mark up some data so that it is tagged specifically; what I do not get is how that is different from proper HTML implementation. Is there missing functionality HTML doesn’t have? I’m leaning on this being the reality, however, I have yet to see or hear an example clearly showing this.
And without said example, I find it very confusing that there’s a push for a standard that does what HTML does, will missing out on all of the other benefits using HTML provides to users.
Anyways, I’m just looking for some details/examples here because I’m still in the dark on XBRL’s necessity.
because XBRL in itself is quite difficult to work with
Agreed
The hierarchy of the data should be on average about 4-5 levels deep.
In FDP we define a concept of “Column Types”, which are semantic concepts attached to columns. These “Column Types” can be defined to have an inherent order - e.g. to represent budget hierarchy.
This sort results in a file of a few thousand lines, which is ok.
Yes, this is very reasonable (although we have worked with >1M rows in a single file as well).
so we’re looking for alternative ways to represent this data – with FDP this would be possible by assuming the list in the metadata file is ordered once we let each level have it’s own file, so to speak.
See above - there’s no need to implicitly assume hierarchy from field order, there’s an explicit way to declare this.
Do you have any experience in processing the CSV files on a larger scale, Ie. with different encodings and separators and such?
Yes, encodings and CSV dialects are handled perfectly by the FDP and its accompanying toolset.
Is there any practice of combining all the files?
You can have one resource per file in the data package, and then optionally zipping the result (for easier transport). Optionally, I recommend using the dataflows library for creating datapackages with some data processing - e.g. taking multiple files and concatenating them into one, employing custom validation rules etc.
Thanks, Adam. Yes I have an example. Unfortunately, this discussion board does not allow me to input links. Anyone who wants to see an example can email me at marc[at]publicsectorcredit[dot]org