@steko, I like your take on on the two rep-x-ability terms, there’s been a lot of discussion about those recently in several disciplines.
There was a major article published recently in Science: What does research reproducibility mean? They present reproducibility as “the ability of a researcher to duplicate the results of a prior study using the same materials as were used by the original investigator. That is, a second researcher might use the same raw data to build the same analysis files and implement the same statistical analysis in an attempt to yield the same results”.
This is distinct from replicability: “which refers to the ability of a researcher to duplicate the results of a prior study if the same procedures are followed but new data are collected.”
They further define some new terms: methods reproducibility, results reproducibility, and inferential reproducibility (I didn’t find these of great relevance).
But it’s worth noting that the definitions in this paper, which are also consistent with a linguistic analysis at the renowned Language Log blog, are totally opposite to the ACM, which take their definitions from the International Vocabulary of Metrology. Here are the ACM definitions:
Reproducibility (Different team, different experimental setup)
The measurement can be obtained with stated precision by a different team, a different measuring system, in a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using artifacts which they develop completely independently
Replicability (Different team, same experimental setup)
The measurement can be obtained with stated precision by a different team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same or a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using the author’s own artifacts.
These differences in definitions have also been noted by some recent Nature News articles Muddled meanings hamper efforts to fix reproducibility crisis and 1,500 scientists lift the lid on reproducibility. These report on the general problem of a lack of a common definition of reproducibility, despite a widespread recognition that it’s a problem. So those are helpful to establish and recognise that there is a range of definitions, and situate this book more precisely in the existing literature on the topic.
My JAMT paper includes these definitions for archaeologists, which are aligned with how I see the terms used in other social sciences (and not the ACM):
Reproducibility: A study is reproducible if there is a specific set of computational functions/analyses (usually specified in terms of code) that exactly reproduce all of the numbers and data visualizations in a published paper from raw data. Reproducibility does not require independent data collection and instead
uses the methods and data collected by the original investigator. https://osf.io/s9tya/
Replicability: A study is replicated when another researcher independently implements the same methods of data collection and analysis with a new data set. http://languagelog.ldc.upenn.edu/nll/?p=21956
The crux of the difference is in ‘independent methods’ and ‘new data set’. So I’d say that two people separately measuring the same lithic assemblage in basically the same way (caliper measurements, etc.) are doing ‘empirical reproducibility’. Of course there will be some difference in their results, but probably not much. A replication study would be if third researcher went back to the site, collected a new sample of stone artefacts (‘new data set’), and measured them with a 3D scanner (‘independent methods’). This third researcher would have quite different results because of their new data and different methods, but they’d be well-placed to test the substantive anthropological claims that the the previous researchers made about the assemblage.
But that’s really getting into the weeds, and as I noted earlier in the thread, archaeologists generally value these kinds of studies and we see people doing their PhD on them and publishing papers. Accessing museum collections for new or re-analysis is routine.
What we’re missing is all the magic between data collection and publication. We have no culture of rep-x-ability for archaeological data analysis, normally it’s a private, even secret, process. Archaeologists rarely share any of the analysis pipeline, and I think this is bad for the discipline, and bad for science in general. New methods often appear in the literature, but any serious user has to reverse engineer the journal article in order to use the new methods for themselves. This is a huge obstacle to innovation and sharing of new methods.
In my perfect world, every archaeology paper would be accompanied by a code & data repository that allows the reader to reproduce the results presented in the paper. This means if when I see a cool plot or useful computation in a paper, then I can easily get the code, study the author’s methods in detail, and adapt it for my own data, cite their paper, and build on it, teach it to my students, and so on. Maybe some day…
My current template for achieving this in my own work is here: https://github.com/benmarwick/researchcompendium which has a bunch of robust open source software engineering tools working for me to save time and catch errors.