Recently Open Knowledge International have come up with a rather elegant solution for data dissemination that is called Data packages and is basically a simple way to keep data in good, old CSV with the added value of a simple JSON schema file that describes each data field (is it a numeric field for continuous data? for count data? etc.). There is already a lot of documentation on the Data Packages website:
including tutorials by @danfowler explaining how to take advantage of Data packages in many popular scripting and programming languages:
with the result of avoiding typical problems like strings or other data types imported as factors, etc. From my perspective and first impressions, Data Packages make it way easier to implement a reproducible procedure for my own personal use, but at the same time data sharing is rather immediate, as there is more explicit metadata that can be automatically associated when importing in R (or Jupyter, MATLAB or any other compatible environment actually). There are validation tools written specifically for Data Packages, again based on the simple association of 1 CSV file and 1 JSON file (with possibilities of more complex setups, of course) using the JSON Table Schema.
In short, I think many archaeologists may be interested in testing Data packages, especially if you are already using R or Python, and perhaps providing some feedback to the developers to help with your specific use case and any issues you could encounter.
For example, and again, based on my own experience, I think most if not all “supplementary data” for archaeometry studies based on chemical methods could be easily published as Data Packages, and that
would result in almost zero-effort aggregation, comparison of newly published data and reproducibility - I know some who care a lot about that!
Find inventories (e.g. from excavation) are another of my pet peeves, where relatively simple information is kept stored in a wide variety of digital formats. One would naively think that standardization by consensus for such basic stuff could be something that previous generations had already solved, but we all know that is not the case. Again, Data Packages are not conceived as a panacea but my personal view is that if it gains momentum there are some practical advantages for the general diffusion of open data.
What kind of feedback are we talking about? Concrete practical feedback provided here on the forum is the easiest way to get started. We are also planning to create a dedicated Github repository, e.g. frictionlessdata/pilot-archaeology and in the repo we could have example datasets, and to have a series of hangouts for discussing face-to-face and collecting potential user stories. It’s expected that priority will be given to specific needs (that may well turn out to have wider use) rather than ambitious, discipline-wide goals.
We would like to have a first hangout on 21st September.