Hey! I am sorry for late response. I missed the email at first and then holidays. @mor just pinged me this the other day. Happy to explain why our licenses are the way they are… Short answer is we wouldn’t be able to survive if we did this, as the vast bulk of our money comes not from grants but from selling the underlying data to proprietary data users (business information providers, credit reference agencies, banks, corporate investigators). This income supports not just making the whole database free to view on the web, but also the to public benefit API users (NGOs, journalists, academics), and for bulk data extracts we regularly do for NGOs, journalists and academics. It also supports the other public benefit work we do, including campaigning for company information to be open, providing expertise and analysis for the rest of civil society, responding to consultations etc. OpenOwnership, the global beneficial ownership register being created by civil society, wouldn’t exist if it hadn’t been for the many thousands of hours we put into the project before the pilot started. If we made the whole dataset available as an unrestricted download, we would disappear, and none of this would happen, and we think that would be bad for the world. There are other issues (e.g. legal ones – in order to access some of the data we need to enter into API agreements with company registers, which places limitations on what we can do, also potential data protection issues), and the size of the database (many gigs) makes it practically difficult, but the sustainability is the key one.
The OpenCorporates database is licensed under the Open Database License. A plain language summary of the ODbL is available on the Open Data Commons website.
We source the information in our databases from government and other sources through a variety of means including: directly from government websites and APIs, from publicly available datasets, or through Freedom of Information requests. We spend a lot of time, effort, and even money in getting this data and turning it into a workable and highly usable resource.
We do not claim any rights over the information we receive from our government sources, and attribute them whenever possible. This is known as the “Contents” in the ODbL license.
If the database is licensed under the ODbL then surely is openly licensed?
To actually be open it would also need to make its data available in bulk - as per the http://opendefinition.org. Does Open Corporates do this? In general, providing database dumps, even for a large database like this is not that tough (at least if done, say, every month).
Finally, the site used to included the “Open Data” button and link to the http://opendefinition.org/ and at multiple times Chris has told me that Open Corporates was and would remain open data – which included bulk access and an open license …
Hi Rufus
[Apologies for brevity. Currently travelling with limited online access.]
The licence on OpenCorporates hasn’t changed, nor have our policies. What Hera wrote above is explaining why we don’t make the data available as an unrestricted download, which has been the case since the beginning (and you and I have had several discussions about this over the past 6 years, and my understanding was that you understood our approach). Our belief in open data hasn’t wavered, and have spent a lot of money working to persuade governments make company registers available as open data, and also to give academics, NGOs and journalists and others data dumps and and API access.
But the work that we do costs money, and has to be sustainable, particularly in the current environment, and that’s why (as before) we can’t make the whole of OpenCorporates available as a bulk unrestricted download.
Happy to discuss further when I’m back from travels.
Best
Chris
@countculture Hi Chris, my understanding when this was first raised with you several years ago was that there were some temporary technical obstacles in the way of you making bulk access available - not that it was a permanent policy decision. I think i have also made clear that limiting bulk access would make the data non-open.
The Open Definition has always been unambiguous that bulk access is required from its inception in 2005. I note that bulk access does not require free access to an API (providing an API is a service on top of the data).
I very much understand the need to be sustainable and I personally believe we need much more systematic funding support for open data in the current environment.
At the same time, we need to clear and unambiguous about what open data means. It would be like saying software was free and open but you can’t use it non-commercially or that you are allowed access to each individual file but not allowed access to use the whole set together. Moving away from the the clear lines set down in the open definition risks seriously undermining the integrity and credibility of the open knowledge and open data in the broader community, for example people trust that open data means they won’t get locked and will always be able to get full access to the data and if they suddenly discover that is not so that may really undermine their trust.