NIST Updates Software Reference Library

Software files can be identified by a sort of electronic fingerprint called a hash. The NSRL dataset update makes it easy to separate hashes indicating run-of-the-mill files from those that might contain incriminating evidence, making investigative work easier. Credit: N. Hanacek/NIST

The National Institute of Standards and Technology (NIST) announced an update to the National Software Reference Library. The expanded, more searchable database will make it easier to sift through seized computers, phones and other electronic equipment.

The database plays a frequent role in criminal investigations involving electronic files, which can be evidence of wrongdoing. According to the NIST news release, “In the first major update to the NSRL in two decades, NIST has increased the number and type of records in the database to reflect the widening variety of software files that law enforcement might encounter on a device. The agency has also changed the format of the records to make the NSRL more searchable.”

NIST said that criminal and civil investigations frequently involve digital evidence in the form of software and files from seized computers and phones. Investigators need a way to filter out the large quantities of data irrelevant to the investigation so they can focus on finding relevant evidence.

The news release stated, “The update comes at a time when investigators must contend with a rapidly expanding universe of software, most of which produces numerous files that are stored in memory. Each of these files can be identified by a sort of electronic fingerprint called a hash, which is the key to the sifting process. The sophistication of the sifting process can vary depending on the type of investigation being performed.”

NIST reported that the NSRL’s reference dataset doubled from half a billion hash records in August 2019 to more than a billion in March 2022.

The new release notes why the dataset is important to digital forensic labs: “This growth makes the NSRL a vitally important tool for digital forensics labs, which specialize in this sort of file review. Such work has become a crucial part of investigations: There are about 11,000 digital forensics labs in the United States (compared with about 400 crime labs).”

The previous database version dates back 20 years, and while searching was possible, it was cumbersome. The new NSRL update will make it easier for users to create custom filters to sort through files and find what they need for a particular investigation.

The dataset and more information on the update are available at

The Center for Statistics and Applications in Forensic Evidence (CSAFE), a NIST Center of Excellence, conducts research addressing the need for forensic tools and methods for digital evidence. Learn more about this research at


Working on a Laptop

On April 22, the Center for Statistics and Applications in Forensic Evidence (CSAFE) hosted the webinar, Shining a Light on Black Box Studies. It was presented by Dr. Kori Khan, an assistant professor in the department of statistics at Iowa State University, and Dr. Alicia Carriquiry,

CSAFE director and Distinguished Professor and President’s Chair in Statistics at Iowa State.

In the webinar, Khan and Carriquiry used two case studies—the Ames II ballistics study and a palmar prints study by Heidi Eldridge, Marco De Donno, and Christophe Champod (referred to in the webinar as the EDC study)—to illustrate the common problems of examiner representation and high levels of non-response (also called missingness) in Black Box studies, as well as recommendations for addressing these issues in the future.

If you did not attend the webinar live, the recording is available at

What is Foundational Validity?

To start to understand Black Box studies, we must first establish foundational validity. The 2016 PCAST report brought Black Box studies into focus and defined them to be a thing of interest. The report detailed that in order for these feature comparison types of disciplines, we need to establish foundational validity, which means that empirical studies must show that, with known probability:

  • An examiner obtains correct results for true positives and true negatives.
  • An examiner obtains the same results when analyzing samples from the same types of sources.
  • Different examiners arrive at the same conclusions.

What is a Black Box Study?

The PCAST report proposed that the only way to establish foundational validity for feature comparison methods that rely on some amount of objective determination is through multiple, independent Black Box studies. In these studies, the examiner is supposed to be considered a “Black Box,” meaning there is some amount of subjective determination.

Method: Examiners are given test sets and samples and asked to render opinions about what their conclusion would have been if this was actual casework. Examiners are not asked about how they arrive at these conclusions. Data is collected and analyzed to establish accuracy. In a later phase, participants are given more data and their responses are again collected and then measured for repeatability and reproducibility.

Goal: The goal with Black Box studies is to analyze how well the examiners perform in providing accurate results. Therefore, in these studies, it is essential that ground truth be known with certainty.

What are the common types of measures in Black Box studies?

The four common types of measures are False Positive Error Rate (FPR), False Negative Error Rate (FNR), Sensitivity, and Specificity. Inconclusives are generally excluded from Black Box studies as neither an incorrect identification or incorrect exclusions, so inconclusive decisions are not treated as errors.

What are some common problems in some existing Black Box studies?

Representative Samples of Examiners

In order for results to reflect real-world scenarios, we need to ensure that the Black Box volunteer participants are representative of the population of interest. In an ideal scenario, volunteers are pulled from a list of persons within the population of interest, though this is not always possible.

All Black Box studies rely on volunteer participation, which can lead to self-selection bias, meaning those who volunteer are different from those who don’t. For example, perhaps those who volunteer are less busy than those who don’t volunteer. Therefore, it’s important that Black Box studies have inclusion criteria to help make the volunteer set as representative of the population of interest as possible.

In the Ames II case study, volunteers were solicited through the FBI and the Association of Firearm and Toolmarks (AFTE) contact list. Participants were limited by the following criteria:

Problems with this set:

  • Many examiners do not work for an accredited U.S. public crime laboratory.
  • Many examiners are not current members of AFTE.

Overall, this is strong evidence in this study that the volunteer set does not match or represent the population of interest, which can negatively influence the accuracy of Black Box study results.

Handling Missing Data

Statistical literature has many rules of thumb stating that it is okay to carry out statistical analyses on the observed data if the missing data accounts for between 5–20% and the missingness is “ignorable”. If missingness is non-ignorable, any amount of missingness can bias estimates. Across most Black Box studies, missing data is between 30–40%. We can adjust for some non-response, but first we must know whether it’s ignorable or non-ignorable.

  • Adjusting for missing data depends on the missingness mechanism (potentially at two levels: unit and item).
  • Ignorable:
    • Missing Completely at Random: the probability that any observation is missing doew not depend on any other variable in the dataset (observed or unobserved)
    • Missing at Random: the probability that any observation is missing only depends on other observed variables.
  • Non-ignorable
    • Not Missing at Random (NMAR): The probability that any observation is missing depends on unobserved values. Also know as non-ignorable.

To make this determination, the following data at a minimum must be known:

  • The participants who enrolled and did not participate
  • The participants who enrolled and did participate
  • Demographics for each of these groups of examiners
  • The total number of test sets and types assigned to each examiner
  • For each examiner, a list of the items he/she did or did not answer
  • For each examiner, a list of the items he/she did or did not correctly answer

Most Black Box studies do not release this information or the raw data. For example:

However, study made much of the necessary data known, allowing researchers to study missingness empirically. If there is a characteristic of examiners that is associated with higher error rates, and if that characteristic is also associated with higher levels of missingness, we have evidence that the missingness is non-ignorable and can come up with ways to address it.

In this example, of the 226 examiners who returned some test sets in the studies, 197 of those also had demographic information. Of those 197, 53 failed to render a decision for over half of the 75 tests presented to them. The EDC study noted that examiners who worked for non-U.S. entities committed 50% of the false positives made in the study, but only accounted for less than 20% of the examiners. Researchers wanted to discover whether examiners who worked for non-U.S. entities had higher rates of missingness. After analyzing the data, researchers found that instead of the 19% of respondents that worked for non-U.S. entities that were expected to have a missingness of over half, the observed amount was 28% of respondents.

Researchers then conducted a hypothesis test to see if there was an association between working for a non-U.S. entity and missingness by taking a random sample size, calculating the proportion of foreign workers in the sample, repeating many times, and comparing the observed value of 28% to the calculated ones.

  • H0: Working for a non-US entity is statistically independent of missingness
  • HA: Working for a non-US entity is associated with a higher missingness

Using this method, researchers found that the observed result (28%) would occur only 4% of the time, if there was no relationship between missingness and working for a non-U.S. entity, meaning that there is strong evidence that working for a non-U.S. entity is associated with higher missingness.

Researchers repeated the process to test whether missingness is higher among examiners who did not work for an accredited lab and had similar findings:

In this case, the hypothesis showed that his result (47% missingness) would only be expected about 0.29% of the time. Therefore, there is strong evidence that working for an unaccredited lab is associated with a higher missingness.

What are the next steps for gaining insights from Black Box studies?

The two issues discussed in this webinar—lack of a representative sample of participants and non-ignorable non-response—can be addressed in the short term with minor funding and cooperation among researchers.


  • Draw a random sample of courts (state, federal, nationwide, etc.)
  • Enumerate experts in each
  • Stratify and sample experts
  • Even if the person refuses to participate, at least we know in which ways (education, gender, age, etc.) the participants are or are not representative of the population of interest.


  • This is producing the biggest biases in the studies that have been published.
  • Adjusting for non-response is necessary for the future of Black Box studies.
  • Results can be adjusted if those who conduct the studies release more data and increase transparency to aid collaboration.

Longer term solutions include:

  • Limiting who qualifies as an “expert” when testifying in court (existing parameters require minimal little to no certification, education, or testing)
  • Institutionalized, regular discipline-wide testing with expectations of participation.
  • Requirements to share data from Black Box studies in more granular form.

NIJ Announces New Forensic Science Strategic Research Plan for 2022–2026

NIJ Forensic Science Strategic Research Plan 2022-2026

The National Institute of Justice (NIJ) Office of Investigative and Forensic Sciences has published its latest four-year Forensic Science Strategic Research Plan.

The plan focuses on improving the quality and practice of forensic science through innovative solutions that support research and development, testing and evaluation, technology, and information exchange.

As stated in the plan, “NIJ developed this Forensic Science Strategic Research Plan to communicate its research agenda and advance its forensic science research mission. The strategic priorities and objectives outlined in this plan closely parallel the opportunities and challenges faced by the forensic science community.”

NIJ identified five strategic research priorities:

Each priority includes a comprehensive set of objectives and action items.

According to the NIJ, the strategic research plan should interest many stakeholders, including crime laboratory professionals, researchers and technology developers, the legal community, and policymakers.

The Forensic Science Strategic Research plan is available to download at

New Study Explores Public Beliefs About the Reliability of Forensic Science

A forensic scientist looking at prints on a computer screen.

By Samantha Springer, a research assistant at the Center for Statistics and Applications in Forensic Evidence


As with many scientific fields, forensic science has faced public and expert criticism since its conception. In response, the discipline must find ways to increase confidence in its methods and usage. One potential means of doing so was examined in a recent paper by Carlos Miguel Ibaviosa and Jason M. Chin, who posited that increased transparency and openness could solve forensic science’s public image problem.

The paper “Beyond CSI: Calibrating public beliefs about the reliability of forensic science through openness and transparency” looks at the criticisms surrounding forensic science in three stages. To begin, the authors look closely at studies that have examined the CSI Effect, which hypothesizes that procedural shows like CSI, which show forensic science to be infallible, give the public an unrealistic view of the field, which could then impact how forensic evidence is weighed in court. Most hypotheses assume this effect causes the public to view forensic science in an overly-positive and trusting way. However, after reviewing studies with these hypotheses, the paper finds that findings are inconsistent and methods are poorly constructed.

After determining that the general public is not strongly swayed by versions of forensic science they see depicted on TV, the authors review studies performed over the past 11 years that have found the public’s view of forensic science to be one of skepticism.

Although the five studies reviewed were not all performed by the same researchers, their methods were relatively comparable, and all looked at participants’ ratings of reliability for multiple different forensic tests, including DNA and bitemark evidence.

Overall, data suggested a disconnect between what experts and research regard as accurate and what the public understands as being accurate.

An example of this can be found in DNA analysis. While viewed as the gold standard within the forensic science community, two studies conducted 11 years apart showed a decrease in public trust for DNA—from a 94% reliability rating in a 2008 study to 83% in 2019.

Also miscalibrated from actual scientific findings were public views of unvalidated methods such as bitemark analysis. One study conducted in 2015 found that the public rating of the reliability for bitemark analysis was 89.26%, higher than the rating for the much more reliable fingerprint analysis, which was deemed 88.15% reliable.

The differences in language, sample size, and other sample characteristics across the studies prevent a definitive conclusion about public views of forensic science over time and their relation to expert findings on reliability. However, the authors suggest the findings still indicate a cause for concern for forensic evidence practitioners and others in the forensic science field.

Following the reviews of these studies and the lack of impression the CSI effect has on the public, the question for the authors of the paper became what, then, was responsible for the disconnect regarding reliability of forensic evidence between experts and the public?

The suggested answer is that the ability of DNA methodologies to detect previous errors, that in many cases led to wrongful convictions, was widely reported on by news coverage, as were the reports by academic bodies criticizing some of the methods responsible for such miscarriages of justice. This could have contributed to an overall public distrust in forensic science that must now be mitigated, and thus the paper turns to possible ways in which the field can bolster its credibility.

The paper’s recommendations for improved public perception and credibility focus on three components supported by research:

  1. Epistemic trust. Epistemic trust is the trust in knowledge given to us by others. This trust, on the part of the public, consists of the perceived competence of the researcher, the benevolence they show regarding improving society, and the integrity with which they follow scientific principles. Acknowledging mistakes and uncertainty in their work secures public epistemic trust in a researcher.
  2. The promotion of openness and transparency in the scientific field. When this is done, high-quality science will be distinguishable from low-quality science, as the public and scientists involved will be able to review the data and methods of different studies. Even an expressed intention of transparency has been shown to strengthen the epistemic trust of the field.
  3. Alignment with public expectations. Studies found that participants view questionable research practices, such as selective reporting, as highly morally unacceptable, despite their use not being outwardly illegal. Following these preferences will show a willingness to engage with the public as well as a dedication to good methodology.

Read the Study

Beyond CSI: Calibrating public beliefs about the reliability of forensic science through openness and transparency, Science & Justice, published online Feb. 17, 2022.

OSAC Footwear & Tire Subcommittee Develops Process Map

An overview of the Footwear and Tire Examination Process Map developed by the OSAC Footwear & Tire Subcommittee

By Samantha Springer, a research assistant at the Center for Statistics and Applications in Forensic Evidence (CSAFE)


On June 8, 2022, the Organization of Scientific Area Committees for Forensic Science’s (OSAC) Footwear & Tire Subcommittee published a current practice document for footwear and tire examination.

The 37-page document consists of multiple process maps that cover a range of practices in the field of footwear and tire examination, including casts, gel lifts, known and unknown assessments, and different types of substrates with or without the presence of blood. Additionally, the flowcharts cover administrative processes such as verification and reporting, technical assessments, and administrative assessments.

The current practice document defines its purpose as five-fold:

  • help improve efficiencies while reducing errors,
  • highlight gaps where further research or standardization would be beneficial,
  • assist with training new examiners,
  • develop specific laboratory policies, and
  • identify best practices.

The document represents current practices instead of best practices and therefore does not necessarily endorse all the methodologies shown in the multiple process maps to ensure practitioners can find the process their lab uses. According to an article published by the National Institute of Standards and Technology (NIST), David Kanaris, Chair of the OSAC subcommittee, plans to release a more interactive version of the document in the future.

NIST facilitated the development of this process map through a collaboration between the NIST Forensic Science Program and OSAC’s Footwear & Tire Subcommittee.

Other OSAC subcommittees have released their process maps for other forensic science areas, including speaker recognition, DNA, friction ridge examinations, and firearms examinations.

CSAFE researchers Alicia Carriquiry, CSAFE director, and Jacqueline Speir, an associate professor at West Virginia University, are members of the OSAC Footwear & Tire Subcommittee.

Learn more about CSAFE’s work on footwear impression analysis at