Using data from within Excel/LibreOffice

opendata

#1

Hi all,
I understand that this Frictionless Data is about helping people consume data easily, but I don’t see that you are looking to integrate with spreadsheets, so feel free to suggest another forum for this post.

I’m trying to understand how can one consume public Open Data from inside a spreadsheet application, without navigating to a portal and downloading a CSV using the browser. Because I imagine that there might be some users that would use more Open Data if it’s right there inside the tool they already use.

So my first approach was to check if the spreasheet apps had any way of consuming online information, which they did, but I was not able to use them.

To be more precise, here’s how Excel declares to have support for OData stanard:

It seems like the OData protocol is a standard set of APIs so that any app can consume data from any portal. There’s even a CKAN extension for it: http://extensions.ckan.org/extension/odata/
And there’s a recent fork that improves it.

I searched this forum for mentions of OData but there’s none, so I’m guessing there’s something wrong with it, isn’t there?

I don’t have access to the Excel version that has OData support. I tested in Excel 2003 and it seemed to have a feature where you would use an embedded browser to find tables in HTML and import them to the workbook. But it had so many javascript errors that I was not able to use it with the CKAN-based argentinian data portal (www.datos.gob.ar)

About LibreOffice, they also claim to have support for something called WebQuery, but I can’t follow this wiki’s instruction in my LibreOffice 5 installation, maybe it’s changed.

https://help.libreoffice.org/Calc/Inserting_External_Data_in_Table_WebQuery

So, tu summarize: I think consuming data and browsing data catalogs from inside spreadsheet applications could be a great way to improve data usability, but I can’t find any solution that currently works.
If you know about any, please let me know.


#2

CC @rufuspollock


Hi @martinsz.

Your entire suggestion is actually amazing and an idea to follow in the future, for sure. I am not sure how this can be implemented, where or how, as the current state allows the users to download data from many open sources (in-house, we have Core Datasets and DataHub and the first may allow some interaction with GitHub’s API, even though there are some cool options within DataHub) and use that data directly in your programming environment. Pretty cool for those who need to program some data analysis, research or similar. Well, worst case scenario, you can download the CSV and open in a spreadsheet and then do the same work.

However, if OData only works with proprietary software (namely Excel), then I think it will not be a priority to get that working for now.

And please bear with us in our flaws, as the project is relatively new and we have been working hard on other key ends and tools. Feel free to contribute to the project if you will!

Thanks!


#3

Hi Gustavo, thanks for the answer.
So I’m confirming my impression that there’s no easy way to consume open data from inside spreasheet apps, which seems like a gap the process. Now I will have to confirm if there’s an actual use case for that.

I picture this user story:
The accountant is using a spreadshet application to (for example) evaluate investments in different countries, and she needs to know the inflation rates for each country. She would go to Insert > Open Data… there a window appears with a search box, she writes “Nepal inflation”, this would trigger a search of all datasets those that match. Then she would select the one that she needs, from the Nepali Central Bank and select the last row, the column that contains the number, and click the “insert” button.
This would create a formula in that row that has a function that includes parameters such as the search string, the portal url, the dataset url and the row/colum she selected, so the data can be refreshed if the dataset is updated.

I see two issues with this:

  1. Data index
    Core datasets and datahub seem to be very useful resources and they both have data on inflation. But isn’t http://dataportals.org/ more useful as you get lots of datasets directly from the portals?

If the user searches for specific data that is mentioned inside the dataset and not in the metadata, then we would need to index the contents, which adds complexity.

1.1) Looking at the Core Datasets inflation dataset, it does include data on nepal, but it’s metadata does not name all the countries it has inflation data for, so it would not be sufficient for the kind of search index needed for my use case.

1.2) In DataHub I searched for inflation and I got a disparity of sources
https://datahub.io/dataset?q=inflation&sort=score+desc%2C+metadata_modified+desc

Which means that the user would have to do some heavy parsing before picking a dataset. Searching for “nepal inflation” is empty, but “new zealand inflation” is 276 good results.

1.3) And for the DataPortals.org, it doesn’t index all the portals, but maybe that project has the capacity to provide a more extensive set of data sources to the search index.

Don’t you think?

  1. Once we have an advanced search API that includes all datasets, the interface becomes the issue. We could have a web-based interface like all other projects, but I’m interested in evaluating a spreadsheet-plugin based interface, since I think our users are already there and might enjoy in-place assistance for data gathering. Designing such interface in a way that is easy to use for the target users is a nice challenge.

In summary: Developing a plugin like this would be of medium complexity and the actual demand for it needs to be established (it could very well be just a fantasy in my mind) before doing anything. But if this conversation continues, I might try to make a survey or something.


#4

Not sure if it is like that but keep in mind the focus of our tools is something else since there was none existing before. Nevertheless, if you need to perform extensive analysis, you will most certainly use other tools other than spreadsheets - Examples are R and Python.

Do you mean in the datapackage.json file? well, tha’s normal. The metadata is machine readable, even if that means we cannot read them as is.

DataHub hosts many other open source projects to other ends. I guess you would need to source for the data set you would like to use. However, in the case of this indicator, to use open data, you should look at Core Datasets. If you cannot find that data set, then open a new issue suggesting this indicator as a core dataset. Moreover, you can contribute with some sources for us to use.

I am not sure how this project is buld, but I feel this was done with small inputs from the users. You can add all the portals you feel are missing here - we actually appreciate you do!

For now, the focus of OKI in this regard is to create the necessary tools to make data open. A few years ago, there was not a thing called open standards, data packages, tools, GitHub repositories with the load of information we have today. Also keep in mind this is a non-profit organisation and, thus, we usually take baby steps because we are dependent on contributors and volunteers.

I am not trying to discourage you of this idea nor to sound rude. The truth is you are correct, that is a missing feature, but keep in mind the focus of the project. That said, there is no need to run surveys. If you feel there is something that you’d like to propose, head over here and follow the guide on “how to contribute”. If you have some programming skills, I am certain you can push your project upwards and get it published in the right place.


#5

Hey, short answer now. From that link I found this conversation that is actually very similar to what I’m proposing, isn’t it?

Thanks Gustavo.


#6

Indeed it looks. You can maybe contribute there by proposing your own features :slight_smile: