Skip to content

Extracting Case-Specific Data from Validation Studies

Primary Author: Steve Lund
Secondary Authors: Hari Iyer
Type: Webinar

This CSAFE webinar was held on May 10, 2022.

Presenters:
Steve Lund
Statistical Engineering Division, NIST

Hari Iyer
Statistical Engineering Division, NIST

Presentation Description:
The legal and scientific communities agree that any forensic method or algorithm should undergo validation testing before it is relied upon in the judicial system to make life-altering decisions. Validation results are often summarized using average error rates. Given that method performance can vary due to factors like the quantity, quality, and complexity of the questioned impression or sample, case-specific performance assessments are far more relevant than overall average performance assessments.

A given body of validation testing may provide a lot of information about some use cases and little-to-no information about others. Unfortunately, there are often far too many potential use cases to conduct many validation tests for every scenario that occurs in casework. That is, we will generally have little to no validation data for the exact scenario of a given case. This unsettling truth may cause some to wonder whether one can reasonably say anything about the expected performance of an algorithm applied in a given case.

Rather than addressing these concerns with sound science, common phrases like “validated” or “established error rate” promote an imaginary finish line and sidestep critical considerations of case-specific reliability. What is needed is a paradigm shift; instead of asking whether a method or algorithm has been “validated,” we should be asking, “what does the available body of validation testing suggest about the performance of the method in the case at hand?” Because performance varies with the type of case considered, this question should be asked for each and every case in which the method is applied. We need an organized, easy-to-follow overview of the information provided by available validation data regarding algorithm performance in the current case.

By modeling method performance using factors that describe a case’s type (e.g., estimated amount of a given contributor’s DNA in a mixture or the number of minutiae detected in a latent fingerprint) and are suspected of affecting difficulty, we order validation tests in terms of difficulty. For a given case, we then find its place in the ordering and assess the performance among contiguous subsets of validation runs less difficult and more difficult than the current case, respectively. These performance characterizations are used to estimate an interval intended to span the method’s performance is situations like the case at hand. Using this approach provides the following critical and easy- to-understand information:

  1. How many validation tests have been conducted in scenarios more challenging than the case at hand?
  2. How well did the method perform among these validation tests?
  3. How many validation tests have been conducted in scenarios less challenging than the case at hand?
  4. How well did the method perform among these validation tests?

Continually placing the information gathered in validation front and center in reports and testimony focuses attention on the body of empirical results rather than opinions and speculation. Additionally, this framework can help forensic service providers determine and justify whether a given method should be applied to a given case as well as decide whether and where to further invest resources for validation, either in support of a particular case or generally. Unlike a binary notion of “validated” methods, this approach reflects the natural benefit of continued testing. When many validation samples are available, one can increasingly restrict the set of uniformly more challenging use cases to those only slightly more challenging than the current case, thereby providing a better “worst-case” performance estimate. This is the appropriate scientific lens through which to view validation testing – more testing produces less biased results and lower uncertainties. We illustrate the approach described above for the discipline of DNA mixture interpretation using the publicly available ProvedIT database.

Webinar Recording:

Related Resources

Source Camera Identification with Multi-Camera Smartphones

Source Camera Identification with Multi-Camera Smartphones

An overview of source camera identification on multi-camera smartphones, and introduction to the new CSAFE multi-camera smartphone image database, and a summary of recent results on the iPhone 14 Pro’s.
What’s in a Name? Consistency in Latent Print Examiners’ Naming Conventions and Perceptions of Minutiae Frequency

What’s in a Name? Consistency in Latent Print Examiners’ Naming Conventions and Perceptions of Minutiae Frequency

Fingerprint minutia types influence LPEs’ decision-making processes during analysis and evaluation, with features perceived to be rarer generally given more weight. However, no large-scale studies comparing examiner perceptions of minutiae…
An alternative statistical framework for measuring proficiency

An alternative statistical framework for measuring proficiency

Item Response Theory, a class of statistical methods used prominently in educational testing, can be used to measure LPE proficiency in annual tests or research studies, while simultaneously accounting for…
Examiner variability in pattern evidence: proficiency, inconclusive tendency, and reporting styles

Examiner variability in pattern evidence: proficiency, inconclusive tendency, and reporting styles

The current approach to characterizing uncertainty in pattern evidence disciplines has focused on error rate studies, which provide aggregated error rates over many examiners and pieces of evidence. However, decisions…