Signed Data Packages


#1

I’m wondering if anyone has done any thinking about how it might be possible to provide signed+versioned data packages?

The use-case I have in mind is verifiability, if I am a provider of data and want to re-assure users of a data package that the one they download has not been modified. Modifying it is fine of course, but I feel there are use-cases where having confidence that someone hasn’t modified the data in a detrimental way is valuable.

My questions are really:

  • How might this work?
  • How would this work with different/updated versions of a data package?
  • Can anyone else see any value in this?

Thanks!

Ross.


#2

Hi @rossjones, so the BagIt spec ( https://tools.ietf.org/html/draft-kunze-bagit-08 ) is another specification for packaging a bundle of files. It has the idea of creating a manifest-md5.txt (md5 can be replaced with other algorithms) with a list of checksums per file in a “bag” as well as a tagmanifest-md5.txt file for the metadata. It be worth incorporating that thinking here, as it seems rather translatable.

edit: of course, this would mean that the checksum for the datapackage.json metadata itself would have to be located external to it. Something like datapackage-manifest.json.


#3

@rossjones good question. There was this existing discussion here: https://github.com/frictionlessdata/specs/issues/213

Basic answer would be something like:

  • Hash the datapackage.json
    • Make sure datapackage.json includes hashes of resources
    • Versions: put version number in the datapackage.json (if you want something more elaborate and git we could look into that).
  • If you want: sign the data package with your public key