Data Package Libraries Overview

rufuspollock · August 19, 2015, 9:56am

This topic gives an overview of the key libraries for managing Data Packages and sub-specifications such as Tabular Data Package and JSON Table Schema.

(Figure source on GDocs)

Crudely:

For each specification the core operations are:
Core - “modelling” the underlying JSON spec in a given language’s classes e.g. in Python having a “DataPackage” class etc
Create / Init: creating new Data Packages (datapackage.json) and writing to disk
Validate: check datapackage.json is valid (not about checking data)
Read: load a Data Package (datapackage.json) from disk and normalizing it (this can be almost trivial but e.g. inlining READMEs is not plus it can encapsulate loading from remote urls)
Stream: stream resources data (this is something that could be outside)
Generally we would expect one library to provide all the core operations, or most of them in one. It is possible that they may be partially split out (as in node libraries) and then combined in one top level library (dpm)
In addition there are a variety of other libraries - often, in fact, the most useful
- infer: infering a resource schema (json table schema for tabular data package) from the data
- data validation: checking data against a schema
- loading data into a given target backend e.g. sql

Status

Next step would be to produce a small spreadsheet with a matrix of these features and what existing language implementations there are (this would be a more detailed version of the current roadmap http://data.okfn.org/roadmap/)

Topic		Replies	Views
Data Package validator Open Economics	1	1264	April 14, 2016
Tutorial for handcrafting a Table Schema Frictionless Data	6	1377	June 15, 2017
New on the Labs Blog: Frictionless Data Specs v1 Updates Frictionless Data	0	777	October 5, 2017
New post on Labs blog: Working with Data Packages in R Frictionless Data	0	981	February 16, 2018
Datapackages management: get table and get series(fields) easily Frictionless Data	3	1200	December 5, 2016

Data Package Libraries Overview

Status

Related topics