The Tabular Data Package as the one source for data definition

MikeThacker · August 15, 2019, 4:13pm

I’d like to check our planned approach to defining a data package and using that definition as the one source of all definitions from which schemas and other resources can be generated.

We’d like to:

Take an existing tabular data package ( the Open Referral tabular data package that links services, providers, locations etc)
Define proposed extensions (annotated as such)
Define constraints to get the “application profile” which says how the package will be used in our situation (which fields to use, which vocabularies are used to populate them, …)
Auto-generate from that one or more entity relation diagrams with colour coding to distinguish the original from extensions and constraints
Autogetenate JSON schemas which define responses to web methods querying the data (eg getService)
Autogenerate CSV schemas for tabular (partial) views of the data (eg a list of services)

Is this logic sound? Are there existing tools that do any of this?

I only know of jts_ERD tool for a small part of it.

Thanks

MikeThacker · September 9, 2019, 4:25pm

Well the silence is deafening but we’ve pressed on anyway, written the code and put it in GutHub.

See the Schemas and Schema generation part of this readme file.

Feedback welcomed.

lwinfree · September 11, 2019, 5:27pm

Hi @MikeThacker! Thanks for the posts and sorry for the delay. I’d love to give you some feedback. In order to help understand, could you please give us some context and background on this project and how you are using datapackages? And could you please clarify if there are specific things with datapacakges that we can help you with?
Thanks!

MikeThacker · September 12, 2019, 8:29am

Hello @lwinfree and thanks for your response.

Although I’m trying to design an approach I can use for many projects where we refine a data standard, for this specific project I’m looking for a way to document extensions to the existing OpenReferral data format standard and define an application profile (saying how the standard will be used in a particular scenario).

OpenReferral already has a Tabular Data Package, an Entity Relation Diagram and an API. I think the second two are manually crafted from the first. I want an automated way of generating the second two (and more) from the first.

Once that is done, I will use a copy of Tabular Data Package (with a few more properties added) to define proposed extensions to the existing OpenReferral standard and a further copy to define our application profile (stating which tables/fields to use, enumerations, taxonomies from which to populate values, …).

My colleague and I have made good progress on this as shown in our GitHub Human-Services repository.

And could you please clarify if there are specific things with data packages that we can help you with?

Well I’d really just like to know if this is a sensible use of data packages and if anyone can see a flaw in the logic. Essentially I want one single machine-readable source defining a data standard from which I can generate ERD, schemas and human-readable documentation.

Thanks very much

lwinfree · September 20, 2019, 6:28pm

Hi @MikeThacker! Thanks for providing more detail. I’ve shared this with the broad FD team.
For now, I’m wondering if you have been working with the OpenReferral team on your project? One of our current Tool Fund grantees is focused on building datapackage support for their Human Services API: https://frictionlessdata.io/articles/open-referral/. Let me know if you’d like to be connected to them - I think there are some synergies between what you are working on.
Thanks,

MikeThacker · September 23, 2019, 10:27am

Hello @lwinfree. Yes my colleagues and I have been speaking with Greg at Open Referral. We’re using the Tabular Data Package to record and our proposed extensions to the Open Referral schema and will more formally submit them if our piloting shows they work.

My post here was more to get feedback on how sensible it is to use a Tabular data Package definition with extras as the source from which all documentation and schemas are derived.
Thanks

rufuspollock · December 29, 2019, 2:20pm

@MikeThacker yes this sounds quite sensible based on a quick read through.

MikeThacker · February 17, 2022, 11:34am

Since my original post, we’ve made good progress using an annotated tabular data package to autogenerate variants (different tabular data packages) of that and then associated machine-readable resources.

We’ve now concluded that we should keep a pure (without our annotations) main tabular data package with a full data structure and use separate machine readable definitions for each “application profile”. An application profile will be a tabular data package that contains a subset of the tables and fields in the main package. It might also change the optional/required setting?

Is there a standard way of documenting and generating these application profiles, i.e. these views on a full data package? All we’ve done so far is to define Jolt transformations.

Related: Is there a way of defining extra constraints? e.g. one of two fields must be populated or these must be at least one record in a one-to-many relationship (i.e. a cardinality of 1:∞)

There’s some more discussion here.

TIA

sarapetti · February 21, 2022, 4:03pm

Hi Mike, thanks very much for these questions. Could I please ask you to repost this to Discord? You are more likely to get community inputs there.

You can also use the Matrix bridge to access the channel.

Thanks!

Topic		Replies	Views
Tutorial for handcrafting a Table Schema Frictionless Data	6	1374	June 15, 2017
Shared table schema Frictionless Data	4	1720	July 20, 2018
Data Package Libraries Overview Frictionless Data	0	1086	August 19, 2015
W3C CSV for the Web - how does it relate to Data Packages? Frictionless Data	10	4809	November 27, 2017
Foreign keys across data packages Frictionless Data	1	1495	April 7, 2018

The Tabular Data Package as the one source for data definition

Related topics