The Tabular Data Package as the one source for data definition

I’d like to check our planned approach to defining a data package and using that definition as the one source of all definitions from which schemas and other resources can be generated.

We’d like to:

  1. Take an existing tabular data package ( the Open Referral tabular data package that links services, providers, locations etc)
  2. Define proposed extensions (annotated as such)
  3. Define constraints to get the “application profile” which says how the package will be used in our situation (which fields to use, which vocabularies are used to populate them, …)
  4. Auto-generate from that one or more entity relation diagrams with colour coding to distinguish the original from extensions and constraints
  5. Autogetenate JSON schemas which define responses to web methods querying the data (eg getService)
  6. Autogenerate CSV schemas for tabular (partial) views of the data (eg a list of services)

Is this logic sound? Are there existing tools that do any of this?

I only know of jts_ERD tool for a small part of it.

Thanks

Well the silence is deafening :grinning: but we’ve pressed on anyway, written the code and put it in GutHub.

See the Schemas and Schema generation part of this readme file.

Feedback welcomed.

Hi @MikeThacker! Thanks for the posts and sorry for the delay. I’d love to give you some feedback. In order to help understand, could you please give us some context and background on this project and how you are using datapackages? And could you please clarify if there are specific things with datapacakges that we can help you with?
Thanks!

Hello @lwinfree and thanks for your response.

Although I’m trying to design an approach I can use for many projects where we refine a data standard, for this specific project I’m looking for a way to document extensions to the existing OpenReferral data format standard and define an application profile (saying how the standard will be used in a particular scenario).

OpenReferral already has a Tabular Data Package, an Entity Relation Diagram and an API. I think the second two are manually crafted from the first. I want an automated way of generating the second two (and more) from the first.

Once that is done, I will use a copy of Tabular Data Package (with a few more properties added) to define proposed extensions to the existing OpenReferral standard and a further copy to define our application profile (stating which tables/fields to use, enumerations, taxonomies from which to populate values, …).

My colleague and I have made good progress on this as shown in our GitHub Human-Services repository.

And could you please clarify if there are specific things with data packages that we can help you with?

Well I’d really just like to know if this is a sensible use of data packages and if anyone can see a flaw in the logic. Essentially I want one single machine-readable source defining a data standard from which I can generate ERD, schemas and human-readable documentation.

Thanks very much

Hi @MikeThacker! Thanks for providing more detail. I’ve shared this with the broad FD team.
For now, I’m wondering if you have been working with the OpenReferral team on your project? One of our current Tool Fund grantees is focused on building datapackage support for their Human Services API: https://frictionlessdata.io/articles/open-referral/. Let me know if you’d like to be connected to them - I think there are some synergies between what you are working on.
Thanks,