Data package readme.md implementation


#1

It is not clear how to implement the README.md element of Data Packages.

  • A README.md can be included in datapackage.zip but it is not referenced in datapackage.json.
  • There is a data package description property that appears to be defined similarly to the README.md.

So are the README.md and the data package description two ways of achieving the same thing?

If not, what different content is placed in each?

Presenting readme content on a platform

Platforms that support data packages may offer to download:

  • a datapackage.json file
  • a datapackage.zip
  • or both

If only a datapackage.json file is offered, then the README.md content can be lost.

One solution is to embedded the README.md content in the datapackage.json. This is done in datahub.io but strangely the property is called readme, and not description.

What should be done?

I think clear guidance is needed on the difference between README.md and the data package description property.

I think implementations should ensure that data package consumers get all the information available. Hence, if a datapackage.json is the only download offered, then the README.md must be converted to a data package property.

What is not clear is:

  • what property should the readme content be written to - description or the non-standard readme?
  • what if a description property already has content?

#2

@Stephen in datahub.io we’re embedding README.md file when processing data packages and store it under readme property for various reasons, e.g., we don’t want to overwrite publisher provided description property.

There is some information about having README.md here http://frictionlessdata.io/specs/data-package/ :

Additional files such as a README, scripts (for processing or analyzing the data) and other material may be provided. By convention scripts go in a scripts directory and thus, a more elaborate data package could look like this:

datapackage.json  # (required) metadata and schemas for this data package
README.md         # (optional) README in markdown format

# data files may go either in data subdirectory or in main directory
mydata.csv
data/otherdata.csv

# the directory for code scripts - again these can go in the base directory
scripts/my-preparation-script.py

Here is the example: https://github.com/datasets/population

Why have README.md file rather than inlined description or readme property? I think it is more convenient, especially when you have long text so you want to keep your metadata minimal. In my understanding, description property is something for short description, whereas README.me is the opposite.


#3

@anuveyatsu has done a great job explaining a nice approach.

To summarize:

  • README.md provides a convenient way to edit this info (outside of json). When transporting we often inline README.md onto the datapackage.json as readme property
  • description: a description (usually a paragraph or so). Often, in my own practice I don’t fill in the description and just generate it from the first paragraph of the README. But if you don’t have a README you can use this instead.