Black-box studies are a crucial part of assessing the accuracy and reliability of subjective decisions in forensics. The extant black-box studies have generally had two components. In the first study, examiners are asked to assess forensic samples (typically questioned and known samples with ground truth known by the study designers) just like they would in real casework and report decisions on a scale that is pre-defined. That addresses reproducibility (consistency) and validity (accuracy) of examiner decisions. In the second study, examiners are asked to re-assess a small subset of the samples that they observed in the first study. This allows researchers to assess repeatability or within-examiner consistency.
Motivated by these studies, this project explores statistical methods for the analysis of black box studies. In particular, we have used these examples to demonstrate a statistical approach to inferences for reliability that combines the reproducibility and repeatability studies while allowing for inference regarding examiner-sample interactions.