I’m really pleased with the Data Package Version pattern but I think a couple more scenarios need to be added to the pattern and I’d like your thoughts on how the version number should be incremented.
Scenarios
You have published a tabular data package grants.csv v1.0.0. It has a foreign key relationship with another tabular data package codes.csv v2.0.0. The code.csv data has changed and some codes have been combined and other split.
Is this:
a breaking change causing an increment in the MAJOR version number
a backwards-compatible change that only needs a change in the MINOR version number?
If grants.csv data is updated to use the new codes and the foreign key reference is updated to use the new codes.csv is this a MAJOR change because the table schema has been changed?
I believe that combining or splitting lines should also be considered breaking change, causing an increment in the MAJOR version number, because that would make it incompatible with other tables that use it in a foreign key relationship.
About the second question, if grants.csv is updated just to make its foreign keys compatible with the new version of the table which it references, its dependencies declaration should also be updated to state that it depends on the new version of codes.csv, as indicated in the dependencies pattern.
As for whether this should increment grants.csv’s MAJOR, MINOR or PATCH version, I’m not sure. If an application uses this table individually, then it should not break as it is only corrects some of its values for compatibility with a new version of one of its dependencies. On the other hand, if an application makes use of the data on grants.csv and also all of its dependencies, then a MAJOR change to any of its dependencies would also break the application.
That’s a nice way to word it, @Stephen. It’s exactly as we had been discussing.
However, I still have doubts about this approach that updating the data to make it compatible with a dependency should necessarily increment the MAJOR version number. I think it kind of contradicts this part of the pattern:
PATCH version when you make backwards-compatible fixes, e.g.
Corrections to existing data
Changes to metadata
When you combine this pattern with the dependencies pattern, which explicitly models which version of the data it depends on for foreign keys, it seems that an increment in the MAJOR version number of a dependency is already explicit enough. The application can then decide whether or not it will need to use all of its dependencies.
In case it does, and if there is an increment in the MAJOR version number of any of its dependencies, it’s already clear enough that there is a change in the set of ‘the data plus all of its dependencies’ that would break the application.
On the other hand, if the application does not use all of its dependencies (e.g. if it does not require to use the fields that have foreign keys), the change would not break the application. The application can figure this out by looking at the dependencies and deciding whether or not it does need to dip into them. However, if the situation discussed here causes an increment to the MAJOR version number, the application cannot make use of the data because the versioning system is indicating a breaking change, even though the data would still be usable by the unmodified application.
So, maybe this situation should be labeled as a PATCH change in grants.csv. The specs can then let the application figure out itself whether or not it does need and make use of its dependencies, in which case a MAJOR version number to any of those would be considered a breaking change.
I can see your point as if the codes.csv is in the same datapackage.json then the foreignKeys reference in the schema won’t have changed and hence not require a MAJOR version change based on
MAJOR version when you make incompatible changes, e.g.
Change the table schema
As I’m implementing the Foreign Keys to Data Packages pattern, I was thinking about that and the foreignKeys reference would change. This would then invoke a MAJOR version change.
Looks like further refinement is needed. Wording changes to the above welcome.
We probably need to cater for:
version number changes being consistent regardless of where the foreignKey referenced data is stored
@herrmann the change below makes your suggested grants.csv PATCH change, a MINOR change @rufuspollock the change below makes your suggested grants.csv MAJOR change, a MINOR change
(I removed the forum solution indicator until we’re agreed.)
Data Package Version
The Data Package version format follows the Semantic Versioning specification format: MAJOR.MINOR.PATCH
The version numbers, and the way they change, convey meaning about how the data package has been modified from one version to the next.
Given a Data Package version number MAJOR.MINOR.PATCH, increment the:
MAJOR version when you make incompatible changes, e.g.
Change the data package, resource or field name or identifier
Add, remove or re-order fields
Change a field type or format
Change a field constraint to be more restrictive
Combine, split, delete or change the meaning of data that is referenced by another data resource
MINOR version when you add data or change metadata in a backwards-compatible manner, e.g.
Add a new data resource to a data package
Add new data to an existing data resource
Change a field constraint to be less restrictive
Update a reference to another data resource
Change data to reflect changes in referenced data
PATCH version when you make backwards-compatible fixes, e.g.
Correct errors in existing data
Change descriptive metadata properties
Scenarios
You are developing your data though public consultation. Start your initial data release at 0.1.0
You release your data for the first time. Use version 1.0.0
You append last months data to an existing release. Increment the MINOR version number
You append a column to the data. Increment the MAJOR version number
You relocate the data to a new URL or path. No change in the version number
You change a title, description, or other descriptive metadata. Increment the PATCH version
You fix a data entry error by modifying a value. Increment the PATCH version
You split a row of data in a foreign key reference table. Increment the MAJOR version number
You update the data and schema to refer to a new version of a foreign key reference table. Increment the MINOR version number
I think the main issue we should be concerned is whether changes are backwards compatible or break compatibility, and this proposal seems sensible to me. The constraint statements you suggest fit well in that line of thought - a change to a field constraint to make it more restrictive does break compatibility, but the other way around does not.