I’m wondering if anyone has done any thinking about how it might be possible to provide signed+versioned data packages?
The use-case I have in mind is verifiability, if I am a provider of data and want to re-assure users of a data package that the one they download has not been modified. Modifying it is fine of course, but I feel there are use-cases where having confidence that someone hasn’t modified the data in a detrimental way is valuable.
My questions are really:
- How might this work?
- How would this work with different/updated versions of a data package?
- Can anyone else see any value in this?
Thanks!
Ross.
4 Likes
Hi @rossjones, so the BagIt spec ( draft-kunze-bagit-08 ) is another specification for packaging a bundle of files. It has the idea of creating a manifest-md5.txt
(md5
can be replaced with other algorithms) with a list of checksums per file in a “bag” as well as a tagmanifest-md5.txt
file for the metadata. It be worth incorporating that thinking here, as it seems rather translatable.
edit: of course, this would mean that the checksum for the datapackage.json
metadata itself would have to be located external to it. Something like datapackage-manifest.json
.
@rossjones good question. There was this existing discussion here: tabular-data-package should specify how to sign data · Issue #213 · frictionlessdata/specs · GitHub
Basic answer would be something like:
- Hash the datapackage.json
- Make sure datapackage.json includes hashes of resources
- Versions: put version number in the datapackage.json (if you want something more elaborate and git we could look into that).
- If you want: sign the data package with your public key