License compatibility as imperative?

wolftune · July 8, 2021, 11:02pm

Continuing the discussion from License Approval Request: Ritchey Permissive License v12:

(meta: please note that the “reply as a new topic” function exists on this forum)

Robbie, this is a compelling position. It sounds like you are suggesting that the practical interoperability of resources via license compatibility should be considered part of the OD. You’re saying that license incompatibility effectively introduces non-openness, right?

How does this connect with the idea of format and so on? So, one could argue that some data or software code in an obscure format or language is impractical in its openness. The fact that two codebases can’t be mixed because they use different programming languages… but that’s just an extreme for sake of comparison.

I think incompatibility of licenses should be considered costly enough that the justification for a new incompatible license should be that much higher of a bar in order to accept it. I can see that policy making sense.

mlinksva · July 8, 2021, 11:56pm

For what it’s worth, this has all been considered in the past.

OD 2.1 says an open license should be compatible with other open licenses, and I recall that was discussed intensively in development of 2.1. Some wanted to exclude any shoulds, some wanted to call out the importance of compatibility somehow, and that’s where we landed.
The submission checklist asks about compatibility (q6).
The recommend licenses list requires compatibility with relevant open licenses.

Possibly the language at each of the above could be improved, but I think it’s basically working. Incompatible licenses do effectively have a higher bar: incompatibility is a smell that invites additional scrutiny.

robbiemorrison · July 9, 2021, 5:43am

@mlinksva Pleased to hear that license compatibility is a consideration for OKF relative to the prevailing context. @wolftune also to note that legal and technical compatibilities are completely orthogonal in my understanding.

I have been working on data‑capable license compatibilities using directed graphs (see here) and am continuing that work. Each arrow will shortly be numbered and accompanied by the best legal citations I can find. Only as a very last resort will I read and interpret the license texts myself. My current focus is on national licenses, including those used in the United Kingdom (OGL‑UK‑3.0), Germany (dl‑de/by‑2.0), and France (Licence Ouverte). More in a few weeks time.

Now, are the OKF reports generated during license approval public and open? For instance, the German dl‑de/by‑2.0 was recently approved and I would like to include the OKF analysis in my assessment. For the record, my current point of reference for that license is this document:

Bimesdörfe, Kathrin (editor) (February 2019). Datenlizenzen für Open Government Data: Rechtliches Kurzgutachten: Handreichung zu den Nutzungsrechteregelungen gebräuchlicher Open Data Lizenzen und Empfehlungen für ihren Einsatz [Data licenses for Open Government Data: Legal brief: Guidance on the usage rights of common open data licenses and recommendations for their use] (in German). Düsseldorf, Germany: Ministerium für Wirtschaft, Innovation, Digitalisierung und Energie des Landes Nordrhein-Westfalen.

Finally, I recently asked the European Commission to undertake that exact same analysis is a submission on the proposed Data Act. That kind of analysis, with the weight of the Commission behind it, is desperately needed.

mlinksva · July 9, 2021, 7:04pm

Aside: I don’t know, possibly the wrong question. The Open Definition Advisory Council was started long ago as an autonomously governed body largely, as I recall/understood, because OKF at the time thought it was important to define Open and approve licenses with respect to non-software (which of course OSI covers, and side note, in the last year they have evaluated at least one open hardware license; there is also an Open Source Hardware Definition, but no license evaluation body or list of OSHW licenses) and OKF was an interested party (having some relationship to the Open Data Commons licenses which I’ve forgotten, and they may have too). It would be good to surface documentation for all this. The ODAC has been relatively inactive the last few years. I think it could use a bit of a refresh (the ODAC, not necessarily the OD) probably with the involvement of OKF.

Yes, in the form of discussions on this forum, or previously the mailing list. Glad you take the long view – the German 2.0 open data licenses (BY and 0) were discussed and approved July/August/September 2014. I made an issue about documenting this for each license. Pull requests welcome, or drop references here.

robbiemorrison · July 9, 2021, 7:48pm

@mlinksva Many thanks for the update. I feel quite relieved that the material is available. My views are somewhat shaped by trying to obtain the legal analysis underpinning the recently released Linux Foundation CDLA‑Permissive‑2.0 license. All I have to work off is their media release:

Linux Foundation (22 June 2021). Enabling easier collaboration on open data for AI and ML with CDLA-Permissive-2.0 — Press release. Linux Foundation.

Many of the statements in that media release are either really vague or open to challenge. Proper legal analysis does exist, for instance IBM gave private presentations on the new license. But my efforts to obtain that information have so far not led anywhere. There is not much interest in servicing those at the bottom of the food chain, it would seem. But I will keep trying.

For the record, the CDLA‑Permissive‑2.0 is not a license agreement under US law, it is a public license. And I have my doubts if the license will enable easier collaboration — instead it may well suck data of the CC‑BY‑4.0 realm and prevent modifications from being shipped back. That is why I am so keen to see the legal analysis the Linux Foundation possesses in order to understand what issues are being addressed that the CC0‑1.0 and CC‑BY‑4.0 waivers/licenses cannot serve.

On reflection, it don’t think CDLA‑Permissive‑2.0 will get much traction in Europe in any case. But the Linux Foundation push it at every opportunity. If anyone from the Linux Foundation reads this posting, could they get in contact please. That would be very much appreciated. Or alternatively and better still, just post a link the legal analysis.

mlinksva · July 9, 2021, 8:27pm

We should discuss CDLA-Permissive-2.0 when it is submitted, which I understand it will be.

FWIW I’m not concerned about this. It was looked at by CC (thus the quote in the press release), and by me (I will of course recuse myself from voting for approval, as I did with O-UDA-1.0, which is one of CDLA-Permissive-2.0’s two parents).

robbiemorrison · July 9, 2021, 8:47pm

Thanks. I saw the Creative Commons quote in the Linux Foundation press release. It was open ended and certainly fell short of an endorsement.

But the real issue I believe is legal siloing. This is not software where any number of programs can be written in any number of languages within a spectrum of licenses ranging from MIT to AGPL.

Data is different. Often raw data can only be observed or measured once. And if it goes above CC‑BY‑4.0, it is, in my field, effectively lost. My domain is energy system modeling, but other domains will be similar. More generally science, in Europe at least, is settling on CC‑BY‑4.0. So my sucked out comment stands. And open science will loose when projects license or relicense data in CDLA‑Permissive‑2.0.

For completeness, CC‑BY‑4.0 in effect creates it own legal silo too. But it just might be the one that everyone can agree upon. If so, that would be a tremendous step forward.

There are other licenses that define CC‑BY‑4.0 as inbound compatible. The United Kingdom OGL‑UK‑3.0 for instance. Soon we will have quite a number of silos with that design requirement. If the OKF is relaxed about that, so be it. But I am certainly not.

mlinksva · July 10, 2021, 7:32pm

I found Open Definition Advisory Council launched – Open Knowledge Foundation blog but I’m sure some of my impressions above were based on private conversations that I only vaguely recall.

I think no restrictions (e.g., CC0-1.0, but it doesn’t really matter which instrument) are a better goal to aim for – from same era as the inception of the OD and ODAC, see Science Commons » Protocol for Implementing Open Access Data – but I agree CC-BY-4.0 or at least compatible terms are a step forward.

SimonPoole · July 13, 2021, 6:56pm

While I suspect that most readers here will sympathize with the sentiment (less licenses would be more), I for one would find requiring CC BY 4.0 compatibility a mistake, Maintenance and future of the Open Data Commons licences - #31 by SimonPoole to avoid repeating myself.

It -is- a licence that is perfectly fine for data that you want to stop disappearing in to proprietary silos and remaining open, something I can completely sympathize with. But many data sources simply want some liability protection and attribution, are completely OK with proprietary use of their content. and then CC BY 4.0 is the wrong choice. The quagmire right now is that there is no popular licence that actually fulfills this role (the ODC-By would be the obvious choice, but given the unfixed problems with the text it is essentially dead).

Note: the OGL-UK-3.0 clearly is a permissive licence and has nearly no restrictions on reuse, it isn’t a surprise that so licensed data can be included in CC BY 4.0, though it is nice to explicitly state it.

robbiemorrison · July 19, 2021, 12:25pm

My thanks to the two respondents. I am a little slow in replying, having been on holiday. On the CC0‑1.0 versus CC‑BY‑4.0 question, I am relatively ambivalent. But in many cases, CC‑BY‑4.0 material will be mixed in and that license will then necessarily prevail. Interesting to note (as mentioned earlier) that a literature review suggests that those involved in scientific research favor CC0‑1.0 while those dealing with public interest information provision favor CC‑BY‑4.0 due to its improved ability to retain provenance — more here.

An earlier remark by @SimonPoole that the CC‑BY‑4.0 model distinguishes between existing and adapted material may work for prose but it does not work for data in general. The changes involved are normally too fine grain to track in this manner. Moreover @SimonPoole writes:

The above statement is simply incorrect. The OGL‑UK‑3.0 states that CC‑BY‑4.0 is inbound compatible, hence the reverse cannot be true unless both licenses are legally equivalent — which they are not. Just one seemingly trivial condition will be sufficient to show why: the OGL‑UK‑3.0 has a choice of law provision and the CC‑BY‑4.0 does not. Indeed that one seemingly minor governing law clause is enough to prevent reverse compatibility. Arguments about how lax or otherwise different licenses might be are simply inadequate in this context — indeed the small details are material.

So that means that the OGL‑UK‑3.0 is a terminus license — meaning that once data is so licensed to cannot be returned to CC‑BY‑4.0 status. Moreover no other license (that I am aware of) will take OGL‑UK‑3.0 material so that therefore ends the path.

Continuing to promote licenses like the OGL‑UK‑3.0 or CDLA‑Permissive‑2.0 will necessarily fragment the open data space due to legal incompatibilities. This has relatively little to do with the operational merits of individual licenses and everything to do with how they play together. Moreover I cannot understand how legal siloing could be considered remotely desirable in aggregate. Sure there will be some edge cases where additional constraints are necessary and bespoke terms of use are indicated. But my focus is for general purpose public interest information and the CC‑BY‑4.0 is perfectly adequate. That license has also been endorsed for data by a number of bodies, including the European Commission and the German energy network regulator.

I would like to put together a position statement explaining these issues for the Open Knowledge Foundation to consider as a matter of license approval policy. But I am only willing to draft such a statement if it would be considered with an open mind. Please let me know, OKF.

robbiemorrison · July 19, 2021, 12:32pm

I believe that policy was authored in 2007, about five years before the data‑capable CC‑BY‑4.0 license was developed and released. Its relevance to this discussion is near zero, I am afraid to say.

SimonPoole · July 19, 2021, 4:31pm

The text of the OGL v3 states the following:

These terms are compatible with the Creative Commons Attribution License 4.0 and the Open Data Commons Attribution License, both of which license copyright and database rights. This means that when the Information is adapted and licensed under either of those licences, you automatically satisfy the conditions of the OGL when you comply with the other licence. The OGLv3.0 is Open Definition compliant.

So unless you are reading a different OGL v3 than I am, they are simply saying that OGL v3 licensed material can be distributed on CC BY 4.0 terms (which is exactly what I pointed out). Not the other way around, as you seem to be implying.This would “technically” be impossible as the OGL has no mechanism to maintain the copyleft aspects of all CC BY licenses

The choice of law point is interesting, but I don’t believe that it in practice creates an incompatibility.

robbiemorrison · July 19, 2021, 5:13pm

Hi @SimonPoole. That is really interesting. And we will be reading the same license text — I am using the one that the SPDX project links to. I interpret that statement you quote above in exactly the opposite way. That CC‑BY‑4.0 material can (at least by assertion if not by clause‑by‑clause analysis) be optionally mixed with and relicensed under OGL‑UK‑3.0. In the same way that MIT code can be optionally imported and relicensed under GPL‑3.0‑or‑later.

Furthermore, I am not sure why you regard CC‑BY‑4.0 as being copyleft. The ODbL‑1.0 is an example of a copyleft data‑capable license, for instance. And the choice of law clause in the OGL‑UK‑3.0 naturally adds restrictions that cannot be arbitrarily removed by any given reuser at will. At least not legitimately.

It is worth noting that a license can potential claim to have inbound or outbound compatibilities that may not stand legal scrutiny. In that case, a court would need to decide which attribute to favor — the claim of directional compatibility or the conflicting terms. That is not something that I can do of course.

In my current exercise of determining directed compatibilities, I try to cite legal analysis wherever possible and will only use my own interpretations as an absolutely last resort and then mark those conclusions as strictly provisional.

So I stand by my earlier assertion that the OGL‑UK‑3.0 is a terminal license. With the proviso that I have some legal analysis on file that I have yet to work through.

SimonPoole · July 19, 2021, 6:04pm

Furthermore, I am not sure why you regard CC‑BY‑4.0 as being copyleft.

From the text

Section 2 – Scope.
…
5. Downstream recipients.
…
2. No downstream restrictions. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material.

But not only that, it was clearly the intent of the drafters of the licence too.

Wrt your previous response note that section 4 of CC BY 4.0 clarifies how “Adapted Material” works for data.

robbiemorrison · July 19, 2021, 6:09pm

My initial response is, as always, that classifying and contesting the attributes of individual licenses largely misses the point. Indeed I wrote this to the European Commission in a recent submission a month back (¶ 51):

The debate on the best choice of data license or set of licenses rarely looks at the question of legal interoperability — meaning can material under one licence be mixed with material under another license and republished. Rather, the merits of individual classes of license are debated and then the merits of individual licenses. This same discussion takes place within our [energy systems analysis] community too. But this approach tackles the problem from the wrong end.

That said, the terms‑of‑use of each particular license are highly material — in relation to what users may do and how that material might be legitimately combined and distributed with material under other forms of public license.

To return to the question posed by @SimonPoole of whether the CC‑BY‑4.0 license can or should be considered copyleft. On first glance, the CC‑BY‑4.0 and CC‑BY‑SA‑4.0 licenses are typically regarded as “permissive” and “copyleft” respectively. While noting that that terminology was developed to classify software licensing — and that Creative Commons instead use the loosely related terms “attribution” and “share‑alike”. And in the software domain, copyleft licenses were designed to keep the covered code forever within the software commons — while permissive licenses intentionally allowed the covered code and any local improvements to migrate into proprietary software without the need to additionally reveal and return those improvements. If you wish, that latter action being a form of permitted enclosure, at least obliquely so.

So the broader question is essentially this: does the CC‑BY‑4.0 license force retention in the information commons? And conversely, does this particular license prevent downstream use in proprietary products. And the answers are yes and no, respectively. Some may consider those responses perverse perhaps? But one should also note that software can potentially exist in source and compiled forms and there is clearly no equivalent to binary‑only distribution for content — “binary” used here in the sense of executable files.

Stepping back, I agree with @SimonPoole that §2.a.5.B imposes copyleft‑like obligations in a similar fashion to the way that the GNU public license (GPL) family prohibits additional restrictions. I am going to ask about this clause elsewhere and will report back if I discover anything useful. Also to note that I read this earlier posting carefully and found it particularly instructive.

Regarding specific downstream use‑cases, deeming the CC‑BY‑4.0 to be inbound compatible with United Kingdom government OGL‑UK‑3.0, as that license does, would seem in contradiction of §2.a.5.B. We talked early about choice of law provisions in this regard. And indeed I observed that deemed interoperability could well be in conflict with the respective terms‑of‑use. That said, importing to OGL‑UK‑3.0 is also a use‑case of limited interest to me — but it will be for those working with United Kingdom public sector information. Similar claims of inbound‑compatibility are implied by the Linux Foundation in relation to their recent CDLA‑Permissive‑2.0 license. One certainly wonders how those compatibility assertions could have survived analysis by crown law offices and corporate legal departments.

Also needing examination are the notions of “licensed” and “adapted” material, as defined in CC‑BY‑4.0 under §1.f and §1.a respectively. I guess these notions derive from the free software presumption that every contributor retains their own copyright and the so‑called inbound=outbound precept such that the same license implicitly applies. Chestek (2017) examines these doctrines and argues that they should be abandoned in favor of a joint authorship doctrine. Nor has the inbound=outbound concept ever been tested in court, so its legal status remains uncertain in respect of computer code at least.

Regarding data specifically, the CC‑BY‑4.0 under §4 covers only 96/9/EC databases. Much of the data‑related material passed around in my community are just simple‑minded datasets — ranging from ASCII lists to HDF5 high‑performance storage formats — without the necessary seek functionality to attract database protection. Moreover, my community is starting to toy with semantic web technologies and the interaction between 96/9/EC and those technologies will doubtless be a nightmare.

As an aside, the 96/9/EC database directive is currently under review by the European Commission. My pick that is that database protection within the European Union will be removed in due course, possibly as soon as next year.

To close, my interest in data licensing is to go no more stringent that CC‑BY‑4.0. So the prohibition on significant downstream restrictions in §2.a.5.B is of no direct interest to me or most of my community.

That said, the free software world has been down the track of license proliferation. It would be very sad to see the OKF do likewise — by effectively promoting new licenses that create Open Definition-compliant legal silos because no one sought to fully analyze the wider context. Is that really the world that open data advocates would like to advance?

Finally, my earlier offer to the OKF to draft a position paper still stands. I remain concerned about no‑retreat or terminus licenses being applied to data and, in particular, the approval of new licenses that fall into this camp. Could the OKF respond either way? The ball is in your court!

References

Chestek, Pamela S (2017). “A theory of joint authorship for free and open source software projects”. Colorado Technology Law Journal. 16: 285–326. Open access.

herrmann · July 20, 2021, 2:12pm

Hi @robbiemorrison . Thanks for the in depth explanation of the issue. I completely agree that the open data community should avoid license proliferation and I support your idea to draft a position paper about it. I could even review it if I may.

About the review of the 96/9/EC database directive by the European Commission you mentioned, could you please show me any references about that? The Brazilian Lei de Direito Autoral has a similar provision in Article 7, XIII has a similar provision and a repeal of the database directive by the EU would go a long way to push advocacy in Brazil to do something similar and create a simpler environment for data publishers to do the necessary prior legal clearance and to create an ecosystem that facilitates data reuse.

robbiemorrison · July 20, 2021, 2:32pm

Hi @herrmann, the review of the 96/9/EC database directive is part of consultation on a proposed Data Act (this would still be a “Bill” under United Kingdom idiom):

European Parliament (June 2021). “2 a Europe fit for the digital age: Data Act”. Legislative Train.

The main background document is this:

European Commission (28 May 2021). Inception impact assessment: Data Act (including the review of the Directive 96/9/EC on the legal protection of databases) — Ares(2021)3527151. Brussels, Belgium: European Commission. Lead DG: CNECT/G1. Landing page for download given. Download name: 090166e5ddb6bc31.pdf.

The next round of public consultation closes on Friday 3 September 2021, see here. An earlier submission I coordinated is now up on Zenodo:

Morrison, Robbie (25 June 2021). Submission on a proposed Data Act for the European Union from the perspective of energy system analysis — Release 07. doi:10.5281/zenodo.5032198. Berlin, Germany. Creative Commons CC‑BY‑4.0 license.

For some insight into the definitional differences between datasets and legally‑protected databases in relation to German law, please see:

Morrison, Robbie (4 January 2021). Urheberrechtsgesetz (UrhG): software and data related definitions depicted using UML class modeling — Release 10. doi:10.5281/zenodo.5115643.

robbiemorrison · July 20, 2021, 3:59pm

For reference, here is an extract from my inquiry on a legal forum covering mostly open source software:

Following discussions on the Open Knowledge Foundation (OKF) discussion forum (see here), an interesting legal conundrum has arisen for open data licensing.

At least one open‑data‑capable license claims material licensed CC‑BY‑4.0 is inbound‑compatible. Indeed this consideration seems to be a relatively common design criteria. That and other licenses include:

United Kingdom government OGL‑UK‑3.0
Linux Foundation CDLA‑Permissive‑2.0 (albeit based on hints in the LF press release because not yet listed on the SPDX site and nor has the underpinning legal analysis been made public)

I’ll use the OGL‑UK‑3.0 as an example as the details are fully accessible and a single illustration is sufficient in any case. This discussion is about importing datasets under CC‑BY‑4.0 licensing into other licensing regimes — but excludes objects under or potentially under 96/9/EC database protection for the sake of simplicity.

The OGL‑UK‑3.0 claims that CC‑BY‑4.0 material is inbound-compatible (not possible to cite clauses because nothing is numbered in the legal text) (emphasis added):

These terms are compatible with the Creative Commons Attribution License 4.0 and the Open Data Commons Attribution License, both of which license copyright and database rights. This means that when the Information is adapted and licensed under either of those licences, you automatically satisfy the conditions of the OGL when you comply with the other licence.

In addition, the OGL‑UK‑3.0 provides for a choice of law (emphasis added) — which the CC‑BY‑4.0 does not:

This licence is governed by the laws of the jurisdiction in which the Information Provider has its principal place of business, unless otherwise specified by the Information Provider.

Turning to CC‑BY‑4.0, section §2.a.5.B states that additional restrictions are prohibited (emphasis added):

No downstream restrictions. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material.

My argument is that constraining the choice of law is a non‑trivial restriction. So although the OGL‑UK‑3.0 claims inbound‑compatibility from CC‑BY‑4.0, there is at least one term‑of‑use that would preclude this action. And there may also be other provisions in conflict too — I did not examine the two licenses side‑by‑side in detail.

So if my analysis stands, the OGL‑UK‑3.0 license will naturally create its own isolated silo. And if my analysis is wrong, the OGL‑UK‑3.0 creates a terminus license in the sense that material transferred from CC‑BY‑4.0 cannot be returned to CC‑BY‑4.0 for more general usage.

Addendum for clarity

In general:

for collections of data under or potentially under copyright and related rights to be inbound‑compatible, the license on the inbound material must be no more onerous than the license on the receiving material in every respect
if the license on the inbound material additionally prohibits further restrictions, then the only way that point 1 may be satisfied is if the inbound and receiving licenses are legally identical in every regard

Moreover, a 96/9/EC database is automatically a collection of data so that point 1 is exhaustive.

In some senses, point 2 could be read as a potentially onerous provision under the rationale of point 1 — but because it is entirely non‑specific, it is necessary to separate it out and accord it its own statement of logic.

Based on the above reasoning therefore, the United Kingdom government might well be advised to favor the CC‑BY‑4.0 license over the OGL‑UK‑3.0 license for the public interest information it releases and for the scientific research outputs it funds.

SimonPoole · July 22, 2021, 8:45am

I wouldn’t hold my breath, my impression from a meeting with the team working on the data act a fortnight ago is that that is not on the table (tweaking potentially maybe).

systemed · July 22, 2021, 8:50am

I interpret that statement you quote above in exactly the opposite way. That CC‑BY‑4.0 material can (at least by assertion if not by clause‑by‑clause analysis) be optionally mixed with and relicensed under OGL‑UK‑3.0.

No, @simonpoole has the correct interpretation here.

OGL is essentially a publishing licence. It is designed for the release of Government documents and data. Incorporating other data into OGL-licensed works is not a key consideration of its design. Maximising reuse of OGL-licensed works is.

Fairly obviously, no licence can unilaterally waive the provisions of another licence, and I cannot believe that the authors of OGL would assert that it can.