PSI Directive Review: Your opinion on the proposed changes

Hi all,

As some of you may know the European Commission has issued their proposal of a reviewed PSI Directive. See the proposal here: Capture Calendar | European Union, archived by Publications Office of the European Union | Archive-It Wayback Machine I’ll read the proposal more in detail in the coming days but was wondering whether other people have opinions already.

At first glance it seems to me some changes are quite positive, e.g.

  • Possibly less room for public sector bodies to charge above fees above marginal costs
  • Clearly addressing public utilities and public transport sector
  • Including publicly funded research

For example, the EC only wants to address public utilities providers, and it seems the new Directive would not apply to procured services? A policy similar to France’s loi pour un republique numerique seems like a better approach in my opinion.

What do others think?

2 Likes

Hi @dannylammerhirt and thanks for pointing to this,

I agree with you, the proposal is going to the right direction with some very positive moves in my opinion but will not bring all member states to the level of the best players such as France.

I think that the strategy of the EU is to provide with something that everyone agrees on rather than providing with something only a few will really implement. That could explain why they are very shy with including private entities.

That being said:

Good points to me are:

  • Free of charge becoming the norm
  • Inclusion of public utilities - though this may vary a lot between member states
  • Inclusion of research data
  • Fighting exclusive arrangements between public and private stakeholders
  • Inclusion of dynamic data - though this is a very broad definition
  • Establishing a list of high value datasets at EU level - probably the most important element to me

Bad points are:

  • Nothing new on cultural sector, sill left behind with many exceptions - you can blame France for that, they are the one who lobby strongly
  • Nothing new on licenses, data interoperability and quality
  • Private entities remain excluded, even when they provide public services or when their data are of high public value

Best
Pierre

1 Like

Hello all. My primary interest is open energy sector data for building open energy system models. I recently coordinated a community submission on the current (2013) re-use of public sector information (PSI) directive and also provided evidence in person in Brussels. Open energy system modelers are highly reliant on what the European Commission terms privately held data of public interest. Many modelers are also active in scientific research. Specific thoughts follow, based on COM/2018/234 unless otherwise indicated.

Sui generis database rights

We suggested European sui generis database rights be removed for PSI providers because it makes no sense to restrict downloading under this currently ill-resolved economic right. That provision seems to be clarified under revised article 1 (6) (see page 10).

Privately held data

As indicated, energy system modelers are highly reliant on privately held data of public interest. It seems little progress was made on this theme, despite having being flagged as an issue for consultation by the Commission. Our specific request to require datasets published under the electricity market transparency directive 543/2013 be open licensed was not forthcoming.

My lay reading is that energy sector regulators, such as the German BNetzA, would classify as “public undertakings” — a term which covers but is not limited to public sector bodies or bodies governed by public law — and not private entities (definition on page 9 and referencing 2014/25/EU). But that utilities within the water, energy, transport, and postal services sectors remain outside the proposed revisions. In which case, the best option for energy modelers is to continue to argue for open licensed market transparency regulations as an issue of market integrity and not public interest PSI re-use.

Research data

The desire to provide open access to research data, while laudable in concept, is weak in practice. Well accepted definitions of “open access”, by Peter Saber for example, can mean merely public provision at no cost. This was exactly how Cambridge University described the recent release of Stephen Hawking’s PhD thesis in digital format under full copyright.

It is imperative instead to push for open data licenses such as Creative Commons CC-BY-4.0. That then allows science to be conducted the way it should be and avoids the need for researchers to otherwise operate in a legal gray zone. Such licensing is the only way that the FAIR reusability principle (Wilkinson 2016), mentioned in the proposed revisions, can be operationalized (page 7).

Open licensing

To continue, the proposed revisions are wholly inadequate in terms of open licensing. Draft recital (42) is aspirational in this regard, but far from sufficient (page 12). Arguments in our submission regarding the need for universal open licensing (or public domain dedications) on PSI did not prevail. In addition, our request for analysis on whether the machine processing of legally obtained copyright protected datasets is lawful was not traversed (this is very likely a breach of copyright).

In the absence of suitable open licensing, much of the PSI becoming available, in part through Commission policy, is simply not usable in downstream ecosystems because third parties who modify, remix, and then redistribute this information will potentially be in breach of copyright. Odd that the Commission chose to almost totally ignore this key aspect.

The consultation synopsis report (Unknown 2018) also fails to mention open licensing, despite at least one submitter making this a key theme in their written feedback. The open licensing of content and data is clearly a blind spot for the Commission.

But of course, once PSI is open licensed, its short run marginal cost of provision by intermediaries within the aforementioned ecosystem sinks to near zero. In short, the Commission has failed to grasp what open data means in terms of primary provider cost recovery. Collecting the “marginal cost for reproduction, provision and dissemination of information” by a PSI provider can only be maintained where there is either no licensing or only proprietary licensing on offer. This same principle is well understood in the free software world.

Copyrightability

That said, much of the PSI under discussion — at least in the energy domain – is simply not eligible for copyright. Moreover any proprietary license applied to material that does not attract copyright is quite possibly unenforceable. Again we asked the Commission to examine the question of copyrightability in relation to PSI but they did not.

(The misconception that one can claim copyright on measured datasets is, for instance, widespread. I recently attended a presentation by the Linux Foundation that suggested that ocean temperature measurements could be subject to copyright and therefore benefit from one of their new CDLA open data licenses.)

Closure

I think it a strategic mistake to treat the aspirational gains in these proposed revisions — such as the recommendations that APIs be offered or that cost recovery be avoided — as a substitute for clear policy on genuine open data.

And by that I mean the use of established open licenses. Indeed mention of an “open license” occurs only three times and is otherwise never discussed or defined. Contrast this with the excellent European Commission (24 July 2014) guidelines which cover open licensing in detail.

The question of copyrightability remains the elephant in the room. This is an absolutely central issue and its omission in the proposed revisions is odd to say the least.

What is nonetheless understandable although unfortunate is that the concept of privately held data of public interest fell by the wayside. In particular, a body that “operates in normal market conditions, aims to make a profit, and bears the losses resulting from the exercise of its activity” (directive 2014/24/EU recital 10) is excluded from the proposed revisions, even where strong public interests apply. In which case, other less attractive avenues, including crowdsourcing, will have to suffice. We will indeed get the information we require one way or another and can then circulate it at no cost to users because the long run marginal cost of provision can more easily be funded in other ways.

In short, these proposed revisions represent a lost opportunity to make public sector information genuinely open and reusable. HTH, Robbie.

Primary references

European Commission (2018). Proposal for a revision of the Public Sector Information (PSI) Directive. Digital Single Market. Brussels, Belgium. Web page.

European Commission (25 April 2018). Proposal for a Directive of the European Parliament and of the Council on the re-use of public sector information (recast) — COM (2018) 234 final. Brussels, Belgium: Council of the European Union.

European Commission (24 July 2014). “Commission notice: guidelines on recommended standard licences, datasets and charging for the reuse of documents”. Official Journal of the European Union. C 240: 1–10.

Wilkinson, Mark D et al (15 March 2016). “The FAIR Guiding Principles for scientific data management and stewardship — Comment”. Scientific Data. 3: 160018. doi:10.1038/sdata.2016.18.

Secondary references

Unknown (25 April 2018). Consultation on PSI directive review (also known as the synopsis report on the PSI directive). Brussels, Belgium: European Commission.

European Commission (25 April 2018). Commission staff working document — Evaluation — Accompanying the document: Proposal for a Directive of the European Parliament and of the Council on the re-use of public sector information — SWD (2018) 145 final. Brussels, Belgium: European Commission.

European Commission (25 April 2018). Commission staff working document — Impact assessment — Accompanying the document: Proposal for a Directive of the European Parliament and of the Council on the re-use of public sector information — SWD (2018) 127 final. Brussels, Belgium: European Commission.

European Commission Regulatory Scrutiny Board (14 February 2018). Impact Assessment / Re-use of Public Sector Information — Opinion — Ares(2018). Brussels, Belgium: European Commission.

European Commission (28 March 2014). “Directive 2014/24/EU of the European Parliament and of the Council of 26 February 2014 on public procurement and repealing directive 2004/18/EC (text with EEA relevance)”. Official Journal of the European Union. L 94: 65–242.

European Commission (27 June 2013). “Directive 2013/37/EU of the European Parliament and of the Council of 26 June 2013 amending Directive 2003/98/EC on the re-use of public sector information (text with EEA relevance)”. Official Journal of the European Union. L 175: 1—8.

European Commission (15 June 2013). “Commission Regulation (EU) No 543/2013 of 14 June 2013 on submission and publication of data in electricity markets and amending Annex~I to Regulation (EC) No 714/2009 of the European Parliament and of the Council”. Official Journal of the European Union. L 163: 1–12.

2 Likes

Thank you @pzwsk and @robbiemorrison.

I agree with your points and was reading a bit more closely through the proposal. I spotted some other issues which seem odd.

High quality datasets:

  • The EC may “define other applicable modalities”, such as “any conditions for re-use”. There is a risk that a list of EU-wide high value datasets also includes use restrictions non-compliant with the Open Definition. This is odd because it possibly opens doors to prevent high-value datasets to be combined with other data (which is one of the definitions the EC uses to define high value in the first place)

  • High-quality datasets are established via “delegated acts” which are prepared by expert groups. I think there should explaining who these expert groups should include (e.g. civil society), and how civil society can engage with proposals for such acts. Even though it is usually part of the EU lawmaking process that the public may comment on proposals, I think it is good to clarify how the public may influence what counts as high-value datasets.

  • Licensing: Entirely agreed with both of you. I think it is necessary that the PSI Directive requires governments to codify CC0, CC BY 4.0 or CC BY-SA 4.0 in their open data policies, in cases where open licensing schemes have not established widely used bespoke licences (e.g. in France, Germany, Italy). In these cases, instead of requiring re-licensing, the PSI Directive could require legal compatibility tests of government bespoke licences with standard open licences, as well as common bespoke licences.

  • Public utilities: I tried to wrap my head around the definition of “bodies governed by public law”. The PSI proposal is fairly straightforward (body needs to receive much public funding, government needs to represent majority of board, or regulate the body). There seems to be quite some inconsistency across EU countries what activities count as general interest or are considered to have a commercial character (given different liberalisation approaches, etc. but I’m not an expert). I’m grappling as well to understand how much data can actually be provided this way.

Best
Danny

2 Likes

Hi,

I’ve just read the consolidated version you guys posted on Twitter ( https://twitter.com/vavoida/status/1070660003260968960 ). Looks like there are interesting good points, especially regarding private entities. On the other hand, and contrary to the initial version as pointed by @dannylammerhirt in his message, a first list high-quality datasets is established in an annex, and the Commission could then expand this list via “delegated acts” (which is a great point!).

The last consolidated version includes 6 categories of high-value datasets:

  1. Geospatial Data
  2. Earth observation and environment
  3. Meteorological data
  4. Statistics
  5. Companies
  6. Transport data

I don’t understand why “National Law” isn’t present anymore in this list. Indeed, this category was proposed by:

It looks like it’s the only high-value dataset which wasn’t kept among those proposed by the different committees :thinking:

I think it’s both weird and disappointing as national law is among the most fundamental datasets for citizens (with two categories in the Open Data Index: “National Laws” and “Draft Legislation”). All the more since recital 6 says: “The public sector in the Member States collects, produces, reproduces and disseminates a wide range of information in many areas of activity, such as social, political, economic, legal, geographical, environmental, weather, seismicity, tourist, business, patent and education. Documents produced by public sector bodies of executive, legislative or judicial nature constitute a vast, diverse and valuable pool of resources that can benefit society.”.

Do you have any idea why this category of datasets was excluded? Can OKFN voice its opinion regarding this “issue”? (I have no idea what the deadlines are, pretty hard to follow draft EU legislation, maybe because those datasets are missing :stuck_out_tongue: )

Best,

Antoine

2 Likes

Thanks Antoine, all,

as discussed with some Open Knowledge EU groups, we suggest to launch a Open Data Census of high value datasets and @oscarmontiel agreed to set it up

This may help to:

  • Improve discussion around what is high-value dataset for the EU and suggest other datasets/refine current definitions;
  • Evaluate state of open data for HV datasets currently suggested and identify data gaps / main barriers to re-use;
  • Inform the current debate at EU Parliament level including NGOs, MPs and open data advocates;

Below is the list of current high value datasets as described here http://www.europarl.europa.eu/sides/getDoc.do?type=REPORT&mode=XML&reference=A8-2018-0438&language=EN:

  • Postcodes
  • Cadastral map
  • Topographic map
  • Marine map
  • Administrative boundaries
  • Weather monitoring data
  • Land (soil) quality data
  • Water quality data
  • Seismicity
  • Energy consumption
  • Energy performance of building and emission levels
  • Weather forecasts
  • Rain
  • Wind
  • Atmospheric pressure
  • Statistics up to local leve (GDP, age, unemployement, income, education)
  • Company and business registers including ownership and management, registration identifiers
  • Public transport timetables of all modes of transport
  • Information on public works
  • State of transport network including traffic information

Hi @a455bcd9, hi @pzwsk ,

Thank you so much for picking up this topic again. Firstly I’m all in favour for the proposal to publish legal texts. To be honest, I struggled quite a bit when I first started to follow the PSI Directive recast process. The EC definitely can do a much better job to publish information on its legislative processes. Currently the best form of access to this information seems to be via personal contacts.

Pierre, I love the idea to organise an Open Data Survey around high value datasets to organise a public debate how we can account for the ‘value’ of data. I’d be more than happy to participate and assist in this process, assist facilitating the discussion around dataset definitions / valuation of data, and spread the word.

If you want to move forward with this idea, maybe we can all set up a call early in the new year?

Danny

1 Like

It seems like the link posted above wasn’t working. I got a working website here. Is that the same link you sent @pzwsk? In the ANNEX IIa, there are only 6 datasets listed. The instance can still be found on psi.survey.okfn.org but it would be great to add the descriptions as the Parliament wrote them for each data set.

The instance is now live at http://psi.survey.okfn.org/

Hi @oscarmontiel

yes it is the same link, thanks

the list is very broadly defined, you have 6 categories and I am counting around 20 high-value datasets but with only a name.

See related articles pertaining to definition of high-value datasets that will be discussed by Parliament (amendments):

(58) In order to set in place conditions supporting the re-use of documents which is associated with important civic or socio-economic benefits having a particular high value for economy and society, a list of categories of high value datasets is included in Annex IIa.
The power to adopt acts in accordance with Article 290 of the Treaty on the Functioning of the European Union should be delegated to the Commission in respect of additions to the list of categories of datasets set out in Annex IIa, and the addition of specific
high-value datasets among the documents to which this Directive applies, along with the modalities of their publication and re-use. It is of particular importance that the Commission carry out appropriate consultations during its preparatory work, including at expert level, and that those consultations be conducted in accordance with the principles laid down in the Interinstitutional Agreement of 13 April 2016 on Better Law-Making1a . In particular, to ensure equal participation in the preparation of delegated acts, the European Parliament and the Council receive all documents at the same time as Member States’ experts, and their experts systematically have access to meetings of Commission expert groups dealing with the preparation of delegated acts.

(59) An EU-wide list of datasets with a particular potential to generate civic or
socio-economic benefits together with harmonised re-use conditions constitutes an important enabler of cross-border data applications and services. Annex IIa provides a list of categories of high value datasets which could be amended by a delegated act. The additional categories for the list should take into account sectoral legislation that already regulates the publication of datasets, as well as the categories indicated in the Technical Annex of the G8 Open Data Charter and in the Commission’s Notice 2014 /C 240/01. In the process leading to the identification of additional categories or datasets for the list, the Commission should carry out an impact assessment and appropriate public consultations, including at expert level. For the purposes of the impact assessment, the Commission should carry out public consultations with all interested parties, including public sector bodies, public undertakings, data re-users, research organisations, civil society groups and representative organisations. All interested parties should be given the possibility to submit suggestions to the Commission for additional categories of high value datasets or concrete datasets.

(60) In view of ensuring their maximum impact and to facilitate re-use, the high-value datasets should be made available for re-use with minimal legal restrictions and at no cost. High value datasets should be published via a single point of access to promote findability and facilitate access. They should also be published via Application Programming Interfaces, whenever the dataset in question contains dynamic data.

(60 a) The High Value Datasets identified within the categories listed in Annex IIa have the potential to generate civic or socio-economic benefits, and advance fundamental societal and democratic tasks. In order to further the goals of transparency, accountability, compliance, efficiency and fair competition, it is necessary to include datasets from among categories such as business registers, budget and government spending, procurement, and statistics. To encourage innovative services and products, to stimulate sustainable growth, and to contribute to high consumer protection standards, including by taking into account factors that have no immediate economic value, such as education, environment, or healthcare, it is necessary to include datasets from among the categories of national law, earth observation and environmental data, as well as geospacial data.

How to describe a high-value dataset?

As we saw with G8 Charter and other key datasets list, category and name are not enough to define high-value datasets and ensure interoperability and equal potential of re-use between Member States.

Below is a list of information that could be part of the definition of high-value datasets.

  • Description: a paragraph describing the high-value datasets - what is expected - in a clear language and including minimum criteria.
  • Attributes: a list of key data attributes (information) the dataset is expected to contain. For instance, postcode dataset is expected to contain GPS position.
  • Resolution: Minimum resolution or granularity
  • Formats: list of digital formats typically used to provide and read such dataset
  • Rationale: A paragraph describing why this dataset is of high-value and including references to studies, consultation, legislation, and so on.

Hi @dannylammerhirt , all,

The survey is a great idea! Would be great to set up a call early in the new year.

@oscarmontiel, is there a way to add datasets to the list? Could we also list different “high value datasets” from previous reports / charters / etc. for instance?

Best,

@a455bcd9 It is possible to add new datasets, but I would recommend that we keep it simple (we already have 20 datasets listed). I think for now it is important to focus on how we want to follow the datasets we have currently and who would be in charge of doing that.

It seems that the possible compromise would be to have categories listed in the annex but no specific datasets. National laws still not present, what other categories should be added like Election results or Financial information?

Hi, I have been trying to add a dataset in the survey but I can not unlock B4 question.

Can you have a look @oscarmontiel ?

Thanks!

Hi Pierre,

It seems like the characteristics of the datasets aren’t defined. Since B3 is a required question, B4 can’t be loaded.

I see, but at this stage there is no clear characteristics.

We may add our own though based on standard defs.

How can I add them myself?

Best

I think right now we have two options. The first one is doing an initial review with the characteristics not being compulsory, or adding characteristics based on standard definitions, as you said. If you go to the admin page (just add /admin at the end of the survey url) you can see the backend spreadsheet and modify them yourself.

@dannylammerhirt, do you know if the French bill has since been passed into law? In the link you mentioned, which is from 2016, it only mentions that the bill would be appreciated by the French Senate in April 2016. I searched for loi des données d'intérêt général but found only the link you had already posted and similar ones, but not a reference to an actual enacted law.

Edit: to answer my self, yes, as I have found out, the bill did indeed pass into law. It was enacted in 2016, as LOI n° 2016-1321 du 7 octobre 2016 pour une République numérique.

1 Like

Sorry to barge in here like this, but I was reading the proposal blog post and would be interested if someone knows what is the situation with this data?

It appears at least the data is aggregated under EBRA and then one calls to one of the designated registry keepers, which look like being what they were initially in countries. The access has a cost and the terms are such that the results cannot be cached, which amongst other things hinders creating a good user experience. :slight_smile:

I know I can go to Finnish registry and query/dumb the data, but does that work in Europe in general? Is this the way to go? I mean go one-by-one to the different registries to dump out the data.

It appears I agree on PSI goals and would like to see especially busines related to be had more openly.

Adding a bridge to the topic on a proposed Data Act — this being the next piece of draft legislation in the evolution of data law within the European Union.