Webinar: Blind Testing in Firearms Examination

CSAFE invites researchers, collaborators, and members of the broader forensics and statistics communities to participate in our Spring 2022 Webinar Series on Thursday, June 23, 2022, from 11:00am-Noon CT. The presentation will be “Blind Testing in Firearms Examination: Preliminary Results, Benefits, Limitations, and Future Directions.” 

Maddisen Neuman
Quality / Research Associate – Houston Forensic Science Center

Presentation Description:
Open proficiency tests created by external vendors are used to meet accreditation requirements and demonstrate examiner competence; however, the tests are limited in the ability to monitor the entire quality management system from evidence submission to reporting of results. As such, many scholars have considered blind proficiency testing to be a more precise testing an individual’s accuracy (1).

In December 2015, the Houston Forensic Science Center (HFSC) implemented a blind quality control (blind QC) program in firearms examination. The intent of the blind QC program is to supplement open proficiency tests by creating mock cases that are submitted to the Firearms section as routine casework, with the expectation that the examiners are unaware that they are receiving test cases. Blind QC cases are created by Firearms section management and submitted by members of the Quality Division. Firearms section management evaluates the mock evidence prior to submission to determine the ground truth and expected results. Once the case is complete, the Firearms section management reviews the results to determine satisfactory completion.

This presentation will outline the results of 51 blind QC cases that were reported between December 2015 and June 2021 resulting in 570 sufficiency and comparison conclusions. No eliminations were reported for true matching pairs, and no identifications were reported for true non-matching pairs; thus, all results were considered satisfactory. In 40.3% (n=225) of the comparisons, the ground truth result was either elimination or identification, but an inconclusive conclusion was reached. Variables such as the examiners assigned to the case, training program, examiner experience level, intended complexity of the case, and evidence type were examined for their contributions to the inconclusive determinations. They type of evidence compared was determined to be the main factor contributing to inconclusive results; specifically, bullets were the main contributors to inconclusive results at a rate of 61.8% (n=168) compared to cartridge cases at a rate of 21.5% (n=57).

The webinar will conclude with a discussion on the rate of inconclusive conclusions, benefits and challenges of the blind QC program, and future directions for blind testing research in firearms examination. While a blind testing program can be challenging to implement, HFSC’s blind QC program demonstrates that the quality management system and procedures used by the Firearms section can obtain accurate and reliable results and provides examiners added confidence in court. Additionally, the blind QC program can be tailored to target specific research questions and provide opportunities for collaboration with other laboratories and researchers.

1. Committee on Identifying the Needs of the Forensic Sciences Community, National Research Council. Strengthening forensic science in the United States: A path forward. Washington, DC, 2009.


The webinars are free and open to the public, but researchers, collaborators and members of the broader forensics and statistics communities are encouraged to attend. Each 60-minute webinar will allow for discussion and questions.

Sign up on the form below (Chrome & Safari web browsers work the best):


AAFS 2022 Recap: An Internal Validation Study of the TopMatch 3D Scanner for Cartridge Cases

A CSAFE lab technician loads a tray of cartridge cases into the TopMatch 3D scanner.

By Samantha Springer, a research assistant at the Center for Statistics and Applications in Forensic Evidence (CSAFE)


At the 74th annual American Academy of Forensic Sciences (AAFS) scientific conference, Kayli Carrillo, a doctoral candidate at Sam Houston State University, presented a study that showed promising results for the future use of virtual microscopy in assisting forensic examiners with analyzing ballistic evidence. The study, performed by Carrillo and her colleagues at the Harris County Institute of Forensic Sciences in Houston, Texas, utilized a TopMatch VCM system identical to the microscopes used in CSAFE’s ballistics lab. CSAFE’s lab is part of the Roy J. Carver High Resolution Microscopy Facility at Iowa State University.

The internal validation study involved three stages of examination with known and unknown sourced cartridge cases analyzed by multiple examiners. The three phases introduced very few inconclusive determinations, and no matches were determined to be false positives or false negatives. These results indicate a study with a very high internal validity, which shows that the use of virtual comparison microscopy, specifically TopMatch software, can aid in forensic analysis.

Continued research will adopt a fourth step to further evaluate the inconclusive determinations made in the study by examiners. This step will compare such conclusions found when using VCM versus light comparison microscopy, alternatively known as 2D microscopy. Based on the promising findings of this study, the Harris County Institute of Forensic Sciences plans to utilize TopMatch microscopy in the analysis of their cartridge cases.

CSAFE researchers have made great strides in developing statistical and scientific foundations for assessing and matching firearms and toolmarks. Learn more at https://forensicstats.org/firearms-and-toolmark-analysis/.

Insights: Treatment of Inconclusives in the AFTE Range of Conclusions


Treatment of Inconclusives in the AFTE Range of Conclusions


Several studies have estimated the error rates of firearm examiners in recent years, most of which showed very small error rates overall. However, the actual calculation of these error rates, particularly how each study treats inconclusive results, differed significantly between studies. Researchers funded by CSAFE revisited these studies to see how inconclusive results were treated and how these differences impacted their overall error rate calculations.

Lead Researchers

Heike Hofmann
Susan Vanderplas
Alicia Carriquiry


Law, Probability and Risk

Publication Date

September 2020



Survey various studies that assess the error rates of firearms examinations.


Determine the differences in how inconclusives are treated in each study.


Identify areas where these studies can be improved.

The Study

Hofmann et al. surveyed the most cited black box studies involving firearm and toolmark analysis. These studies varied in structure, having closed-set or open-set data. They were also conducted in different regions, either in the United States and Canada or in the European Union. The most relevant difference, however, was how each study treated inconclusive results.

All studies used one of three methods to treat inconclusives:

  • Option 1: Exclude the inconclusive from the error rate.
  • Option 2: Include the inconclusive as a correct result.
  • Option 3: Include the inconclusive as an incorrect result.

Key Terms:

Black Box Study: a study that evaluates only the correctness of a participant’s decisions.

Closed-set Study: one in which all known and questioned samples come from the same source.

Open-set Study: one in which the questioned samples may come from outside sources.


Option 1 was deemed inappropriate for accurate error rates. Option 2 was useful for error rates of the individual examiners, while Option 3 reflected the error rates of the process itself.

Examiners tended to lean towards identification over inconclusive or elimination. In addition, they were far more likely to reach an inconclusive with different-source evidence, which should have been an elimination in nearly all cases.

Process errors occurred at higher rates than examiner errors.

Design issues created a bias toward the prosecution, such as closed-set studies where all samples came from the same source, prescreened kit components which inflated the rate of identifications, or multiple known sources which could not quantify a proper error rate for eliminations.

Fig. 1. Sketch of the relationship between ground truth of evidence (dots) and examiners’ decisions (shaded areas). In a perfect scenario dots only appear on the shaded area of the same colour. Any dots on differently coloured backgrounds indicate an error in the examination process.



Hofmann et al. propose a fourth option:

  • Include the inconclusive as an elimination.
  • Calculate the error rates for the examiner and the process separately.

While most studies included a bias toward the prosecution, this was not the case for studies conducted in the European Union. Further study is recommended to verify this difference and determine its cause.

Insights: Comparison of three similarity scores for bullet LEA matching


Comparison of three similarity scores for bullet LEA matching


As technology advances in the forensic sciences, it is important to evaluate the performance of recent innovations. Researchers funded by CSAFE judged the efficacy of different scoring methods for comparing land engraved areas (LEAs) found on bullets.

Lead Researchers

Susan Vanderplas
Melissa Nally
Tylor Klep
Christina Cadevall
Heike Hofmann


Forensic Science International

Publication Date

March 2020


Evaluate the performance of scoring measures at a land-to-land level, using random forest scoring, cross correlation and consecutive matching striae (CMS).

Consider the efficacy of these scoring measures on a bullet-to-bullet level.

The Study

  • Data was taken from three separate studies, each using similar firearms from the same manufacturer, Ruger, to compare land engraved areas (LEAs), areas on a bullet marked by a gun barrel’s lands –– the sections in between the grooves on the barrel’s rifling.
  • Examiners processed the LEA data through a matching algorithm and scored it using these three methods:


Random Forest (RF):

A form of machine-learning that utilizes a series of decision trees to reach a single result.


Cross-Correlation (CC):

A measure of similarity between two series of data.


Consecutive Matching Striae (CMS):

Identifying the similarities between the peaks and valleys of LEAs.


The Equal Error rate of each scoring method across multiple studies

  • On a bullet-to-bullet level, the Random Forest and Cross-Correlation scoring methods made no errors.
  • On a land-to-land level, the RF and CC methods outperformed the CMS method.
  • When comparing equal error rates, the CMS method had an error rate of over 20%, while both the RF and CC methods’ error rates were roughly 5%. The RF method performed slightly better.



The random forest algorithm struggled to identify damage to bullets that obscured LEAs caused by deficiencies in the gun barrel such as pitting from gunpowder or “tank rash” from expended bullets.

  • In future studies, examiners could pair the RF algorithm with another algorithm to assess the quality of the data and determine which portions can be used for comparison.

All the studies used firearms from Ruger, a manufacturer picked because their firearms mark very well on bullets. Future studies can assess the performance of these scoring methods on firearms from different manufacturers with differing quality marks.

Insights: A Robust Approach to Automatically Locating Grooves in 3D Bullet Land Scans


A Robust Approach to Automatically Locating Grooves in 3D Bullet Land Scans


Land engraved areas (LEAs) can be important distinguishing factors when analyzing 3D scans from bullets. Creating a 3D image of an LEA requires examiners to also scan portions of the neighboring groove engraved areas (GEAs). Current modeling techniques often struggle to separate LEAs from GEAs. CSAFE researchers developed a new method to automatically remove GEA data and tested this method’s performance against previously proposed techniques.

Lead Researchers

Kiegan Rice
Ulrike Genschel
Heike Hofmann


Journal of Forensic Sciences

Publication Date

13 December 2019



Present and discuss automated methods for identifying “shoulder locations” between LEAs and GEAs.

The Study

  • Rice et al. gathered 3D scans of 104 bullets from two available data sets (Hamby 44 and Houston), resulting in a total of 622 LEA scans.
  • They removed the curvature from these 3D scans to make 2D crosscuts of each LEA.
  • Using the 2D crosscuts, the team estimated the shoulder locations between LEAs and GEAs using three different models:


A function (in this case, one available through the open-source “bulletxtrctr” package) which applies a rolling average to smooth out outliers in data.

Robust Linear Model:

A quadratic linear model that minimizes absolute deviations and is therefore less influenced by outliers.

Robust Locally Weighted Regression (LOESS):

A weighted average of many parametric models to fit subsets of data.


Hamby set 44

Houston test set

areas of misidentification:

In this graphic, an Area of Misidentification less than 100 is considered a small deviation, between 100 and 1000 is medium, and greater than 1000 is a large deviation.

  • The Robust LOESS model significantly outperformed the Rollapply and Robust Linear models, resulting primarily in small deviations across all test sets.
  • Conversely, the Robust Linear model had the weakest performance of all three, with mostly large deviations across both Houston sets, and only outperforming the Rollapply model in the right shoulder section of the Hamby 44 set.
  • These results were expected, as the Robust LOESS model is intended to be flexible and handle areas that a quadratic linear model would fail to address.



Both the Hamby 44 and Houston datasets used firearms from the same manufacturer. Future studies can expand on these findings by using a wider variety of barrel types, including different caliber sizes, manufacturers and nontraditional rifling techniques.