NIST Releases Results from a Black Box Study for Digital Forensic Examiners

NIST Black Box Study for Digital Forensic Examiners

The National Institute of Standards and Technology (NIST) has published the results from a black box study for digital forensic examiners. The study, released in February 2022, describes the methodology used in the study and summarizes the results.

The study was conducted online and open to anyone in the public or private sectors working in the digital forensics field. Survey participants examined and reported on the simulated digital evidence from casework-like scenarios. NIST said study’s goal was to assess the performance of the digital forensic community as a whole.

Results from a Black-Box Study for Digital Forensic Examiners (NISTIR 8412) can be viewed at

From Results from a Black-Box Study for Digital Forensic Examiners, page 33:

Summary Key Takeaways

Despite the limitations of the study, two key takeaways about the state of the digital evidence discipline emerged:

  • Digital forensics examiners showed that they can answer difficult questions related to the analysis of mobile phones and personal computers. Questions ranged from basic, such as identifying who the user of the phone had contacted, to advanced questions that related to the use of the TOR browser.
  • The response to the study underscored the size, variety, and complexity of the field. The study received responses from examiners working in international, federal, state, local government, and private labs whose major work included law enforcement, defense, intelligence, and incident response/computer security. There were also responses from people outside of these areas.


Results Available from OSAC Registry Implementation Survey

OSAC Registry Implementation Survey: 2021 Report

The Organization of Scientific Area Committees for Forensic Science (OSAC) released the results from its first annual Registry Implementation Survey. The report, published in February 2022, provides a detailed look at the respondents and the implementation status of the 46 standards represented in the survey.

In the summer of 2021, OSAC released the survey targeted at forensic science service providers from across the country. It was designed to help OSAC better understand how the standards on the OSAC registry are being used, the challenges around standards implementation and what support is needed to improve it.

The OSAC Registry Implementation Survey: 2021 Report is available at

From page 10 of the OSAC Registry Implementation Survey: 2021 Report:

Priority for Implementing Standards
When asked what priority survey participants considered standards implementation for their organization, half of the respondents (50%) said it was a medium priority, or important. This was followed by 34% of respondents indicating that implementation was a high priority, or very important. Twenty-three respondents (14.8%) indicated that implementation was a low priority or not a priority at this time (Figure 4).

Figure 4. Priorities for Standards Implementation
Click on image to enlarge. Figure 4. Priorities for Standards Implementation

A Study on Improving Forensic Decision Making will be the Topic of CSAFE’s February Webinar

Figure 2 from the study shows sources of cognitive bias in sampling, observations, testing strategies, analysis and/or conclusions, that impact even experts. These sources of bias are organized in a taxonomy of three categories: case-specific sources (Category A), individual-specific sources (Category B) and sources that relate to human nature (Category C).

A new study that proposes a broad and versatile approach to strengthening expert decision making will be the focus of an upcoming Center for Statistics and Applications in Forensic Evidence (CSAFE) webinar.

The webinar, Improving Forensic Decision Making: A Human-Cognitive Perspective, will be held Thursday, Feb. 17 from 12–1 p.m. CST. It is free and open to the public.

Itiel Dror
Itiel Dror

During the webinar, Itiel Dror, a cognitive neuroscience researcher from the University College London, will discuss his journal article, Linear Sequential Unmasking–Expanded (LSU-E): A general approach for improving decision making as well as minimizing noise and bias. The article was published in Forensic Science International: Synergy and co-authored by Jeff Kukucka, associate professor of psychology at Towson University.

In the article, the authors introduce Linear Sequential Unmasking (LSU-E), an approach that can be applied to all forensic decisions, and also reduces noise and improves decisions “by cognitively optimizing the sequence of information in a way that maximizes information utility and thereby produces better and more reliable decisions.”

From the Abstract:

In this paper, we draw upon classic cognitive and psychological research on factors that influence and underpin expert decision making to propose a broad and versatile approach to strengthening expert decision making. Experts from all domains should first form an initial impression based solely on the raw data/evidence, devoid of any reference material or context, even if relevant. Only thereafter can they consider what other information they should receive and in what order based on its objectivity, relevance, and biasing power. It is furthermore essential to transparently document the impact and role of the various pieces of information on the decision making process. As a result of using LSU-E, decisions will not only be more transparent and less noisy, but it will also make sure that the contributions of different pieces of information are justified by, and proportional to, their strength.

To register for the February webinar, visit

The CSAFE Spring 2022 Webinar Series is sponsored by the National Institute of Standards and Technology (NIST) through cooperative agreement 70NANB20H019.

ASCLD Forensic Research Committee Provides Useful Resources for Researchers and Practitioners

ASCLD Forensic Research Committee

The American Society of Crime Laboratory Directors (ASCLD) Forensic Research Committee (FRC) works to identify the research, development, technology and evaluation needs and priorities for the forensic science community. The FRC has several initiatives and resources available on its website to aid both researchers and practitioners. Below, we highlight a few of those resources.

For more information, visit the FRC website:

Collaboration Hub

The FRC collaboration hub hosts the Researcher-Practitioner Collaboration Directory. The directory helps connect researchers with ongoing projects to practitioners who are willing to participate in the studies. The searchable directory includes descriptions of each project, including an abstract and the estimated participant time involved. Researchers can easily submit their projects for inclusion in the directory by completing an online form.

ASCLD Research Priorities

FRC has created a list of high-impact research opportunities that help to identify key areas where impactful research would support the forensic science community and enhance lab operations. The research priorities list for 2022-2024 can be downloaded at

Lightning Talks

The FRC hosts a virtual “Lightning Talks” series to highlight new and emerging research in all areas of forensic science. Each episode features three short talks given by practitioners, researchers or students. Previous Lightning Talks are archived on FRC’s YouTube page.

Laboratories and Educators Alliance Program (LEAP)

LEAP facilitates collaborative research between academia and forensic science laboratories. This program identifies forensic science needs and provides a platform for laboratories, researchers and students to seek projects aligning with their mutual research capabilities. The FRC website includes a map of LEAP partners, a short video explaining LEAP and sign-up forms for crime labs and universities. LEAP is a joint effort between ASCLD and the Council of Forensic Science Educators (COFSE).

Validation and Evaluation Repository

The Validation and Evaluation Repository is a list of unique validations and evaluations conducted by forensic labs and universities. ASCLD’s summary of the repository states, “It is ASCLD’s hope that this listing will foster communication and reduce unnecessary repetition of validations and evaluations to benefit the forensic community.” The searchable repository is available at

Research Executive Summaries

The Future Forensics Subcommittee of the FRC has initiated the publication of brief executive summaries of the recent literature within the forensic sciences. The summaries are written by ASCLD members and are meant to provide a brief overview of noteworthy publications and trends in the literature. Currently, the summaries include reviews in the areas of fingermarks, controlled substances, paint and glass evidence, forensic toxicology, forensic biology, gunshot residue analysis and firearms and toolmarks.

Treatment of inconclusives in the AFTE range of conclusions

In the past decade, and in response to the recommendations set forth by the National Research Council Committee on Identifying the Needs of the Forensic Sciences Community (2009), scientists have conducted several black-box studies that attempt to estimate the error rates of firearm examiners. Most of these studies have resulted in vanishingly small error rates, and at least one of them (D. P. Baldwin, S. J. Bajic, M. Morris, and D. Zamzow. A Study of False-Positive and False-Negative Error Rates in Cartridge Case Comparisons. Technical report, Ames Lab IA, Performing, Fort Belvoir, VA, April 2014.) was cited by the President’s Council of Advisors in Science and Technology (PCAST) during the Obama administration, as an example of a well-designed experiment. What has received little attention, however, is the actual calculation of error rates and in particular, the effect of inconclusive findings on those error estimates. The treatment of inconclusives in the assessment of errors has far-reaching implications in the legal system. Here, we revisit several black-box studies in the area of firearms examination, investigating their treatment of inconclusive results. It is clear that there are stark differences in the rate of inconclusive results in regions with different norms for training and reporting conclusions. More surprisingly, the rate of inconclusive decisions for materials from different sources is notably higher than the rate of inconclusive decisions for same-source materials in some regions. To mitigate the effects of this difference we propose a unifying approach to the calculation of error rates that is directly applicable in forensic laboratories and in legal settings.

Assessing the resources and requirements of statistics education in forensic science

With the increasing ability to easily collect and analyze data, statistics plays a more critical role in scientific research activities, such as designing experiments, controlling processes, and understanding or validating lab results. As a result, incorporating statistics training into the curriculum is becoming a trend in STEM education across a range of fields. However, assessing the level and focus of statistical skills that each discipline requires is complicated and subjective. Situations vary based on the subject, program level and expectation, and university resources. As part of the Center for Statistics and Applications in Forensic Evidence (CSAFE), we assess the statistics requirements in accredited university programs in forensic science, through reviewing accreditation requirements and analyzing program admission requirements and curricula. We present results for this pilot project characterizing the expectation of the American Academy of Forensic Sciences for statistics skills and their alignment with tasks performed by forensic scientists, statistics teaching resources available to forensics programs, and possible solutions for reducing any identified gaps.

Consensus on validation of forensic voice comparison

Since the 1960s, there have been calls for forensic voice comparison to be empirically validated under casework conditions. Since around 2000, there have been an increasing number of researchers and practitioners who conduct forensic-voice-comparison research and casework within the likelihood-ratio framework. In recent years, this community of researchers and practitioners has made substantial progress toward validation under casework conditions becoming a standard part of practice: Procedures for conducting validation have been developed, along with graphics and metrics for representing the results, and an increasing number of papers are being published that include empirical validation of forensic-voice-comparison systems under conditions reflecting casework conditions. An outstanding question, however, is: In the context of a case, given the results of an empirical validation of a forensic-voice-comparison system, how can one decide whether the system is good enough for its output to be used in court? This paper provides a statement of consensus developed in response to this question. Contributors included individuals who had knowledge and experience of validating forensic-voice comparison systems in research and/or casework contexts, and individuals who had actually presented validation results to courts. They also included individuals who could bring a legal perspective on these matters, and individuals with knowledge and experience of validation in forensic science more broadly. We provide recommendations on what practitioners should do when conducting evaluations and validations, and what they should present to the court. Although our focus is explicitly on forensic voice comparison, we hope that this contribution will be of interest to an audience concerned with validation in forensic science more broadly. Although not written specifically for a legal audience, we hope that this contribution will still be of interest to lawyers.

Commentary on Curley et al. Assessing cognitive bias in forensic decisions: a review and outlook

In their recent critical review titled “Assessing Cognitive Bias in Forensic Decisions: A Review and Outlook,” Curley et al. (1) offer a confused and incomplete discussion of “task relevance” in forensic science. Their failure to adopt a clear and appropriate definition of “task relevance” undermines the central conclusion of their article—the assertion that it is not necessarily an error for forensic scientists to rely on task-irrelevant information and that “task-irrelevant contextual information may sometimes aid forensic decision makers.” This conceptual flaw in the article becomes clear when we define “task relevance” appropriately, in the manner it was defined by the U.S. National Commission on Forensic Science (2). The Commission’s definition provides a bright-line standard for distinguishing contextual information that is helpful and should be considered from contextual information that is unhelpful and should not be considered. Once that matter is clarified, it becomes possible to discuss intelligently whether steps should be taken to minimize examiners’ exposure to task irrelevant information in order to reduce the potential for contextual bias

Estimating Repeatability and Reproducibility with Limited Replications

In many measurement settings, it is important to assess the reliability and validity of measurements. As an example, forensic examiners are called upon to assess the quality of forensic evidence and draw conclusions about the evidence (e.g., whether two fingerprints came from the same source). Reliability and validity are often assessed through “black box” studies in which examiners make judgments regarding evidence of known origin under conditions meant to imitate real investigation. An open question is whether examiners differ in their ability to assess different items of evidence, i.e., whether there are examiner-by-evidence interactions. For logistical and cost reasons it is not practical to obtain a full set of replicate measurements. We leverage a hierarchical Bayesian analysis of variance model to address this limitation and simultaneously explain the variation in the decisions both between different examiners (reproducibility) and within an examiner (repeatability). The model can be applied to continuous, binary or ordinal data. Simulation studies demonstrate the approach and the methods are applied to data from handwriting and latent print examinations.

Using Machine Learning Methods to Predict Similarity of Striations on Bullet Lands

Recent advances in microscopy have made it possible to collect 3D topographic data, enabling virtual comparisons based on the collected 3D data next to traditional comparison microscopy. Automatic matching algorithms have been introduced for various scenarios, such as matching cartridge cases (Tai and Eddy 2018) or matching bullet striae (Hare et al. 2017b, Chu et al 2013, De Kinder and Bonfanti 1999). One key aspect of validating automatic matching algorithms is to evaluate the performance of the algorithm on external tests. Here, we are presenting a discussion of the performance of the matching algorithm (Hare et al. 2017b) in three studies. We are considering matching performance based on the Random forest score, cross correlation, and consecutive matching striae (CMS) at the land-to-land level and, using Sequential Average Maxima scores, also at the bullet-to bullet level. Cross correlation and Random Forest scores both result in perfect discrimination of same-source and different-source bullets. At the land-to-land level, discrimination (based on area under the curve, AUC) is excellent (> 0.90).