Skip to content

Extracting Case-Specific Data from Validation Studies

Primary Author: Steve Lund
Secondary Authors: Hari Iyer
Type: Webinar

This CSAFE webinar was held on May 10, 2022.

Presenters:
Steve Lund
Statistical Engineering Division, NIST

Hari Iyer
Statistical Engineering Division, NIST

Presentation Description:
The legal and scientific communities agree that any forensic method or algorithm should undergo validation testing before it is relied upon in the judicial system to make life-altering decisions. Validation results are often summarized using average error rates. Given that method performance can vary due to factors like the quantity, quality, and complexity of the questioned impression or sample, case-specific performance assessments are far more relevant than overall average performance assessments.

A given body of validation testing may provide a lot of information about some use cases and little-to-no information about others. Unfortunately, there are often far too many potential use cases to conduct many validation tests for every scenario that occurs in casework. That is, we will generally have little to no validation data for the exact scenario of a given case. This unsettling truth may cause some to wonder whether one can reasonably say anything about the expected performance of an algorithm applied in a given case.

Rather than addressing these concerns with sound science, common phrases like “validated” or “established error rate” promote an imaginary finish line and sidestep critical considerations of case-specific reliability. What is needed is a paradigm shift; instead of asking whether a method or algorithm has been “validated,” we should be asking, “what does the available body of validation testing suggest about the performance of the method in the case at hand?” Because performance varies with the type of case considered, this question should be asked for each and every case in which the method is applied. We need an organized, easy-to-follow overview of the information provided by available validation data regarding algorithm performance in the current case.

By modeling method performance using factors that describe a case’s type (e.g., estimated amount of a given contributor’s DNA in a mixture or the number of minutiae detected in a latent fingerprint) and are suspected of affecting difficulty, we order validation tests in terms of difficulty. For a given case, we then find its place in the ordering and assess the performance among contiguous subsets of validation runs less difficult and more difficult than the current case, respectively. These performance characterizations are used to estimate an interval intended to span the method’s performance is situations like the case at hand. Using this approach provides the following critical and easy- to-understand information:

  1. How many validation tests have been conducted in scenarios more challenging than the case at hand?
  2. How well did the method perform among these validation tests?
  3. How many validation tests have been conducted in scenarios less challenging than the case at hand?
  4. How well did the method perform among these validation tests?

Continually placing the information gathered in validation front and center in reports and testimony focuses attention on the body of empirical results rather than opinions and speculation. Additionally, this framework can help forensic service providers determine and justify whether a given method should be applied to a given case as well as decide whether and where to further invest resources for validation, either in support of a particular case or generally. Unlike a binary notion of “validated” methods, this approach reflects the natural benefit of continued testing. When many validation samples are available, one can increasingly restrict the set of uniformly more challenging use cases to those only slightly more challenging than the current case, thereby providing a better “worst-case” performance estimate. This is the appropriate scientific lens through which to view validation testing – more testing produces less biased results and lower uncertainties. We illustrate the approach described above for the discipline of DNA mixture interpretation using the publicly available ProvedIT database.

Webinar Recording:

Related Resources

Is it a True Match? Top k correlations in a database search

Is it a True Match? Top k correlations in a database search

This presentation is from the 76th Annual Conference of the American Academy of Forensic Sciences (AAFS), Denver, Colorado, February 19-24, 2024.
Graph-Theoretic Techniques for Forensic Image Comparisons

Graph-Theoretic Techniques for Forensic Image Comparisons

This presentation is from the 76th Annual Conference of the American Academy of Forensic Sciences (AAFS), Denver, Colorado, February 19-24, 2024.
Presumption of Innocence, Probable Cause, and Prior Probability—Bayes Meets Due Process

Presumption of Innocence, Probable Cause, and Prior Probability—Bayes Meets Due Process

This presentation is from the 76th Annual Conference of the American Academy of Forensic Sciences (AAFS), Denver, Colorado, February 19-24, 2024.
Demonstrative Evidence and the Use of Algorithms in Jury Trials

Demonstrative Evidence and the Use of Algorithms in Jury Trials

We investigate how the use of bullet comparison algorithms and demonstrative evidence may affect juror perceptions of reliability, credibility, and understanding of expert witnesses and presented evidence. The use of…