Suppose I have a Tabular Data Package with a bunch of CSV files. All CSV files should use the same schema. Is there a way to share a single schema for validating all of the CSV files? Ideally, I should write the schema only once on the datapackage.json
file and it should be applied to all of its resources.
Hi @herrmann,
As we talked over Telegram, the way to reuse table schemas is to create them in a separate JSON file (instead of inside the datapackage.json
) and reference them. Instead of:
// datapackage.json
{
"resources": [{
"name": "data",
"schema": {
// The table schema
}
}],
// ...
}
You’d have:
// datapackage.json
{
"resources": [{
"name": "data",
"schema": "tableschema.json"
}],
// ...
}
// tableschema.json
{
// The table schema
}
So you can simply add "schema": "tableschema.json"
to all resources that share the same schema. There’s an example of this on Tabular Data Resource | Frictionless Standards, but we could probably improve it.
In the future, I see we adopting JSON References to allow the reused table schema to be in the datapackage.json
file. Meanwhile, using a separate tableschema file works.
Thank you, @vitorbaptista, I’ll try using it that way.
Indeed, the Tabular Data Resource specification does show it in the examples, but an added explanation would be nice. I suppose I could just send a PR to the GitHub repository to make this point clearer.
About JSON references, do you mean this IETF draft? I think it would be useful as an added option in the future, but I imagine that support in JSON parsing tools would probably be slow to catch up with this feature.
I agree we could be clearer. And yes, that’s the standard I’m referring to. I haven’t checked, but even though it’s a draft, it’s somewhat common, so I imagine most libraries support it already. However, this is something for the future, as it would require a spec update.
Yes, I agree.
I’ve just made a pull request with a proposal on how to write this on the specs.