Signed Data Packages

I’m wondering if anyone has done any thinking about how it might be possible to provide signed+versioned data packages?

The use-case I have in mind is verifiability, if I am a provider of data and want to re-assure users of a data package that the one they download has not been modified. Modifying it is fine of course, but I feel there are use-cases where having confidence that someone hasn’t modified the data in a detrimental way is valuable.

My questions are really:

  • How might this work?
  • How would this work with different/updated versions of a data package?
  • Can anyone else see any value in this?

Thanks!

Ross.

4 Likes

Hi @rossjones, so the BagIt spec ( draft-kunze-bagit-08 ) is another specification for packaging a bundle of files. It has the idea of creating a manifest-md5.txt (md5 can be replaced with other algorithms) with a list of checksums per file in a “bag” as well as a tagmanifest-md5.txt file for the metadata. It be worth incorporating that thinking here, as it seems rather translatable.

edit: of course, this would mean that the checksum for the datapackage.json metadata itself would have to be located external to it. Something like datapackage-manifest.json.

@rossjones good question. There was this existing discussion here: tabular-data-package should specify how to sign data · Issue #213 · frictionlessdata/specs · GitHub

Basic answer would be something like:

  • Hash the datapackage.json
    • Make sure datapackage.json includes hashes of resources
    • Versions: put version number in the datapackage.json (if you want something more elaborate and git we could look into that).
  • If you want: sign the data package with your public key