NIST Seeks Digital Forensics Experts to Participate in Vital ‘Blackbox’ Study

Objectivity and accuracy are the pinnacle of forensic science. Yet everyone can agree: humans make errors — but to what degree when it comes to digital forensic evidence-gathering and analysis?

The National Institute of Standards and Technology (NIST) is launching the first “blackbox” research study to quantify the accuracy of computer and mobile phone forensics and answer this question.

Digital evidence provides an additional layer of potential human error, especially taking into consideration rapidly evolving technologies, and situations when key evidence must be identified and extracted from large volumes of digital data. It is for these reasons that CSAFE too has is working on a mobile app analysis tool EVIHUNTER.

On a broader scale, this NIST study acts as an answer to the 2009 National Academy of Sciences report: Strengthening Forensic Science in the United States: A Path Forward, which calls for blackbox studies to measure reliability of forensic methods that involve human judgement.

Digital evidence, though grounded in technology, certainly relies on the human element. By participating in the NIST study, digital forensic practitioners can help strengthen the future of forensic science by providing a foundation of quantitative probability that can be used by courts and jurors to weigh the validity of presented digital evidence and analysis — as well as inform future studies needed in this realm. Digital forensic experts can answer a question paramount to fulfilling their own goals and missions in their positions: Are our industry sector’s methods accurate and reliable?

The Study Details

Blackbox studies are unique in their anonymity. They assess the reliability and accuracy (right or wrong) of human judgement methods only, without concern for how experts reached their answer. Therefore, the study will not judge individuals and their performance but rather will be aimed to measure the performance of the digital forensics community as a whole.

The study will be conducted online — and enrollment is now open and the test is available for approximately three months.

Digital forensic experts who volunteer for the study will be provided a download of simulated evidence from the NIST website, in the form of one virtual mobile phone and one virtual computer. In roughly a two-hour time commitment, participants will be asked to examine simulated digital evidence and answer a series of questions similar to those that would be expected in a real criminal investigation. Participants will use forensic software tools of their choosing to analyze the forensic images.

Who Can Participate

All public and private sector digital examiners who conduct hard drive or mobile phone examinations as part of their official duties are encouraged to volunteer and participate in this study.

No one individual’s performance or laboratory will be calculated. Rather, NIST will publish anonymized and comprehensive results of the overall performance of the digital forensic expert community and different sectors within that community.

To learn more or to enroll in this vital study to advancing digital forensics forward, visit NIST Blackbox Study for Digital Examiners and follow the simple steps to get started.


[Enroll in NIST Blackbox Study]

Insights: Psychometric Analysis of Forensic Examiner Behavior


Psychometric Analysis of Forensic Examiner Behavior


Understanding how fingerprint examiners’ proficiency and behavior influence their decisions when interpreting evidence requires the use of many analytical models. Researchers sought to better identify and study uncertainty in examiners’ decision making. This is because final source identifications still rely on complex and subjective interpretation of the evidence by examiners. By applying novel methods like Item Response Theory (IRT) to existing tools like error rate studies, the team proposes a new approach to account for differences among examiners and in task difficulty levels.

Lead Researchers

Amanda Luby
Anjali Mazumder
Brian Junker

Publication Date

June 13, 2020



Survey recent advances in psychometric analysis of forensic decision-making.


Use behavioral models from the field of Item Response Theory to better understand the operating characteristics of the identification tasks that examiners perform.


The Data

A 2011 FBI Black Box study assigned 169 fingerprint examiners a selection of items to analyze, which included a latent print evaluation, a source destination, a reason and a rating of the difficulty of the task for each pair of prints.

Key Definitions


Using factors such as aptitudes and personality traits to study the difference between individuals.

Item Response Trees (IRTrees)

Visual representation of each decision an examiner makes in the process of performing an identification task. Based on IRT, which attempts to explain the connections between the properties of a test item –– a piece of fingerprint evidence –– and an individual’s –– a fingerprint examiner’s –– performance in response to that item.

Cultural Consensus Theory (CCT)

A method that facilitates the discovery and description of consensus among a group of people with shared beliefs. For this study, CCT helps identify the common knowledge and beliefs among fingerprint examiners –– things that examiners may take for granted but that laypeople would not necessarily know.



Researchers segmented the data with the Rasch Model to separate a latent print’s difficulty level from an examiners’ proficiency. This allowed comparison to the existing method of studying error rates.


Then they constructed IRTrees to model a fingerprint examiner’s decision-making process when deciding whether a print is a positive match, negative match, inconclusive, or has no latent value. (See Figure 1)


Finally, the team used IRTrees and Cultural Consensus Theory to create “answer keys” –– a set of reasons and shared knowledge –– that provide insight into how a fingerprint examiner arrives at an “inconclusive” or “no value” decision. (See Figure 2)

Figure 1

Visual representation of latent print analysis

Figure 2

How examiners arrive at “inconclusive” or “no value” decisions



Using IRT models provides substantial improvement over current examiner error rate studies. These include the ability to justifiably compare examiner proficiencies
even if they did not do the same identification tasks and that the influence of task difficulty can be seen in examiner proficiency estimates.


IRTrees give researchers the ability to accurately model the complex decision-making in fingerprint identification tasks –– it is much more than simply stating a print is a “match” or a “non-match.” This reveals the skill involved in fingerprint examination work.


Examiners tend to overrate the difficulty of middling-difficulty tasks, while underrating the difficulty of extremely easy or extremely difficult tasks.



This analysis was somewhat limited by available data; for confidentiality and privacy considerations, the FBI Black Box Study does not provide the reference prints used nor the personal details of the examiners themselves. Future collaboration with experts, both in fingerprint analysis and human decision making, can provide more detailed data and thus improve the models.