Shared table schema


#1

Suppose I have a Tabular Data Package with a bunch of CSV files. All CSV files should use the same schema. Is there a way to share a single schema for validating all of the CSV files? Ideally, I should write the schema only once on the datapackage.json file and it should be applied to all of its resources.


Launching goodtables.io: tell us what you think!
#2

Hi @herrmann,

As we talked over Telegram, the way to reuse table schemas is to create them in a separate JSON file (instead of inside the datapackage.json) and reference them. Instead of:

// datapackage.json
{
  "resources": [{
    "name": "data",
    "schema": {
      // The table schema
    }
  }],
  // ...
}

You’d have:

// datapackage.json
{
  "resources": [{
    "name": "data",
    "schema": "tableschema.json"
  }],
  // ...
}

// tableschema.json
{
  // The table schema
}

So you can simply add "schema": "tableschema.json" to all resources that share the same schema. There’s an example of this on https://frictionlessdata.io/specs/tabular-data-resource/, but we could probably improve it.

In the future, I see we adopting JSON References to allow the reused table schema to be in the datapackage.json file. Meanwhile, using a separate tableschema file works.


#3

Thank you, @vitorbaptista, I’ll try using it that way.

Indeed, the Tabular Data Resource specification does show it in the examples, but an added explanation would be nice. I suppose I could just send a PR to the GitHub repository to make this point clearer.

About JSON references, do you mean this IETF draft? I think it would be useful as an added option in the future, but I imagine that support in JSON parsing tools would probably be slow to catch up with this feature.


#4

I agree we could be clearer. And yes, that’s the standard I’m referring to. I haven’t checked, but even though it’s a draft, it’s somewhat common, so I imagine most libraries support it already. However, this is something for the future, as it would require a spec update.


#5

Yes, I agree.
I’ve just made a pull request with a proposal on how to write this on the specs.