Data Package Versions

Stephen · February 19, 2018, 5:21am

I’m really pleased with the Data Package Version pattern but I think a couple more scenarios need to be added to the pattern and I’d like your thoughts on how the version number should be incremented.

Scenarios

You have published a tabular data package grants.csv v1.0.0. It has a foreign key relationship with another tabular data package codes.csv v2.0.0. The code.csv data has changed and some codes have been combined and other split.

Is this:

a breaking change causing an increment in the MAJOR version number
a backwards-compatible change that only needs a change in the MINOR version number?

If grants.csv data is updated to use the new codes and the foreign key reference is updated to use the new codes.csv is this a MAJOR change because the table schema has been changed?

Look forward to hearing your thoughts

herrmann · February 20, 2018, 7:03pm

Nice collection of proposed patterns!

I believe that combining or splitting lines should also be considered breaking change, causing an increment in the MAJOR version number, because that would make it incompatible with other tables that use it in a foreign key relationship.

About the second question, if grants.csv is updated just to make its foreign keys compatible with the new version of the table which it references, its dependencies declaration should also be updated to state that it depends on the new version of codes.csv, as indicated in the dependencies pattern.

As for whether this should increment grants.csv’s MAJOR, MINOR or PATCH version, I’m not sure. If an application uses this table individually, then it should not break as it is only corrects some of its values for compatibility with a new version of one of its dependencies. On the other hand, if an application makes use of the data on grants.csv and also all of its dependencies, then a MAJOR change to any of its dependencies would also break the application.

Stephen · February 20, 2018, 8:32pm

Thanks @herrmann based on your points:

I think codes.csv is a MAJOR change
I’m leaning towards grants.csv being a MAJOR change

Interested in hearing thoughts from others…

rufuspollock · February 21, 2018, 2:34am

I’d say both were MAJOR changes.

Stephen · February 21, 2018, 2:56am

I’ll draft a PR for the Data Package Version pattern

Stephen · February 21, 2018, 8:06pm

Proposed change to pattern. Feedback welcome

Data Package Version

The Data Package version format follows the Semantic Versioning specification format: MAJOR.MINOR.PATCH

The version numbers, and the way they change, convey meaning about how the data package has been modified from one version to the next.

Specification

Given a Data Package version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible changes, e.g.

Change the table schema
Change the name of fields, a data resource or a data package
Change the data package id
Add, remove or re-order fields
Change a foreignKey relationship to refer to a different resource

MINOR version when you add data in a backwards-compatible manner, e.g.

Add new data to an existing data resource
Add a new data resource

PATCH version when you make backwards-compatible fixes, e.g.

Corrections to existing data
Changes to metadata

Scenarios

You are developing your data though public consultation. Start your initial data release at 0.1.0
You release your data for the first time. Use version 1.0.0
You append last months data to an existing release. Increment the MINOR version number
You append a column to the data. Increment the MAJOR version number
You relocate the data to a new URL or path. No change in the version number
You change a title, description, or other descriptive metadata. Increment the PATCH version
You fix a data entry error by modifying a value. Increment the PATCH version
You split a row of data in a foreign key reference table. Increment the MAJOR version number
You update the data and schema to refer to a new version of a foreign key reference table. Increment the MAJOR version number

herrmann · February 22, 2018, 12:07pm

That’s a nice way to word it, @Stephen. It’s exactly as we had been discussing.

However, I still have doubts about this approach that updating the data to make it compatible with a dependency should necessarily increment the MAJOR version number. I think it kind of contradicts this part of the pattern:

PATCH version when you make backwards-compatible fixes, e.g.

Corrections to existing data

Changes to metadata

When you combine this pattern with the dependencies pattern, which explicitly models which version of the data it depends on for foreign keys, it seems that an increment in the MAJOR version number of a dependency is already explicit enough. The application can then decide whether or not it will need to use all of its dependencies.

In case it does, and if there is an increment in the MAJOR version number of any of its dependencies, it’s already clear enough that there is a change in the set of ‘the data plus all of its dependencies’ that would break the application.

On the other hand, if the application does not use all of its dependencies (e.g. if it does not require to use the fields that have foreign keys), the change would not break the application. The application can figure this out by looking at the dependencies and deciding whether or not it does need to dip into them. However, if the situation discussed here causes an increment to the MAJOR version number, the application cannot make use of the data because the versioning system is indicating a breaking change, even though the data would still be usable by the unmodified application.

So, maybe this situation should be labeled as a PATCH change in grants.csv. The specs can then let the application figure out itself whether or not it does need and make use of its dependencies, in which case a MAJOR version number to any of those would be considered a breaking change.

Stephen · February 23, 2018, 1:04am

I can see your point as if the codes.csv is in the same datapackage.json then the foreignKeys reference in the schema won’t have changed and hence not require a MAJOR version change based on

MAJOR version when you make incompatible changes, e.g.

Change the table schema

As I’m implementing the Foreign Keys to Data Packages pattern, I was thinking about that and the foreignKeys reference would change. This would then invoke a MAJOR version change.

Looks like further refinement is needed. Wording changes to the above welcome.

We probably need to cater for:

version number changes being consistent regardless of where the foreignKey referenced data is stored
not presuming patterns are implemented (I’m not aware of implementations of Data Dependencies or the Foreign Keys to Data Packages patterns
Zen like simplicity

Perhaps these pattern statements need clarifying:

Corrections to existing data (to differentiate between fixing errors and re-coding values)

Change the table schema

Stephen · February 25, 2018, 4:06am

I’ve tried to be more explicit in the pattern below. What do you think? (I’m expecting some debate on my constraints statements.)

Love to hear from the original pattern contributors @henrykironde @ethanwhite @zhangcandrew @pwalsh @rufuspollock

@herrmann the change below makes your suggested grants.csv PATCH change, a MINOR change
@rufuspollock the change below makes your suggested grants.csv MAJOR change, a MINOR change

(I removed the forum solution indicator until we’re agreed.)

Data Package Version

The Data Package version format follows the Semantic Versioning specification format: MAJOR.MINOR.PATCH

The version numbers, and the way they change, convey meaning about how the data package has been modified from one version to the next.

Given a Data Package version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible changes, e.g.

Change the data package, resource or field name or identifier
Add, remove or re-order fields
Change a field type or format
Change a field constraint to be more restrictive
Combine, split, delete or change the meaning of data that is referenced by another data resource

MINOR version when you add data or change metadata in a backwards-compatible manner, e.g.

Add a new data resource to a data package
Add new data to an existing data resource
Change a field constraint to be less restrictive
Update a reference to another data resource
Change data to reflect changes in referenced data

PATCH version when you make backwards-compatible fixes, e.g.

Correct errors in existing data
Change descriptive metadata properties

Scenarios

You are developing your data though public consultation. Start your initial data release at 0.1.0
You release your data for the first time. Use version 1.0.0
You append last months data to an existing release. Increment the MINOR version number
You append a column to the data. Increment the MAJOR version number
You relocate the data to a new URL or path. No change in the version number
You change a title, description, or other descriptive metadata. Increment the PATCH version
You fix a data entry error by modifying a value. Increment the PATCH version
You split a row of data in a foreign key reference table. Increment the MAJOR version number
You update the data and schema to refer to a new version of a foreign key reference table. Increment the MINOR version number

herrmann · February 26, 2018, 2:07pm

It looks good, @Stephen!

I think the main issue we should be concerned is whether changes are backwards compatible or break compatibility, and this proposal seems sensible to me. The constraint statements you suggest fit well in that line of thought - a change to a field constraint to make it more restrictive does break compatibility, but the other way around does not.

ethanwhite · February 26, 2018, 5:21pm

This looks good to me. Nice work @Stephen!

Stephen · February 26, 2018, 9:27pm

PR submitted…

Topic		Replies	Views
Foreign keys across data packages Frictionless Data	1	1495	April 7, 2018
Emerging patterns / workflows for Data Packages (2014) Frictionless Data	0	882	August 9, 2016
Signed Data Packages Frictionless Data	2	1083	September 27, 2016
Geo Data Package Frictionless Data	42	5331	March 1, 2018
Data Package names, folder names, and data package versions Frictionless Data	0	1331	March 18, 2018

Data Package Versions

Data Package Version

Specification

Scenarios

Data Package Version

Scenarios

Related topics