Non-open "open data" on beta.usaspending.gov


#1

If you click through the tiny link in the footer “NOTE: You must click here for very important D&B information.” you find on this page https://beta.usaspending.gov/#/db_info the following statement:

D&B hereby grants you, the user, a license for a limited, non-exclusive use of D&B data within the limitations set forth herein. By using this website you agree that you shall not use D&B Open Data without giving written attribution to the source of such data (i.e., D&B) and shall not access, use or disseminate D&B Open Data in bulk, (i.e., in amounts sufficient for use as an original source or as a substitute for the product and/or service being licensed hereunder). [emphasis added]

Note that the definition of open data explicitly includes bulk access. I am concerned about this as I think this is one the major areas in which a clear definition of open data is being undermined (cf this thread OKI's session @OGP civil society morning? What is the future of Open Gov?)

Data which is not available in bulk is not open data …


#2

I agree completely. A restriction on redistributing the data in bulk is non-open (and thus bad). What do we do about this? Find the right contacts to ask to remove this restriction? If that doesn’t work, write a blog post or otherwise reach out to relevant communities / journalists etc. to push back?


#3

Wow. That is a bit sneaky and definitely doesn’t qualify as open data.

It seems like Dun & Bradstreet’s stipulations are overriding GSA’s/18F’s typical open data policies/operating procedures with this project. I’ll reach out to the developers and see about starting a conversation about this.


#4

Great finding, and of course you are right: a limitation on re-distribution can’t but compromise the openness of the licence.

However, we can’t be that surprised by a bad open licence either.

In the UK, we use everyday government datasets licensed under the Open Government Licence (OGL) that is “open until proved closed”, where it says “This licence does not cover: (…) third party rights the Information Provider is not authorised to license (…)”.

The biggest example of this comes from when I was working with Open Addresses, in 2015. You probably know the story already: using FOI, it came out that Land Registry’s famous “Price Paid Data” could not have been released in the open in the first place, as its addresses are “normalised” by checking against Ordnance Survey’s AddressBase / Royal Mail’s PAF: commercial data products (read more here).

Only in June 2017 Land Registry made the matter explicit, and limited use for a) personal and/or non-commercial, or b) to display for the purpose of providing residential property price information services. Still OGL though!

I would suggest anybody to consider the licence a starting point more than the last word. If an open dataset is key to something you are doing, you should not be content of finding a licence that looks compatible with your needs, but push your investigation further: probe the publisher, engage them etc.

What do you reckon?


#6

Be careful. That is a perfectly reasonable disclaimer and it the legal folks covering themselves in case they accidentally release some info they can’t (e.g. personal info). We reviewed the OGL and it is compliant. I get there are unfortunate examples like the one you described: but once discovered this data became non-open (they did not continue saying it was open data).

The above example re D&B data is very different. To be clear as well I think the US gov work here is great - i just think we have a clear case here of a subtle but very important undermining of what open data means.

BTW: can you open a separate thread re the address registry data - i think that is a big and interesting issue and worth digging into further (but it would take this thread off-topic to discuss it here).