Webinar: Shining a Light on Black Box Studies

CSAFE invites researchers, collaborators, and members of the broader forensics and statistics communities to participate in our Spring 2022 Webinar Series on Friday, April 22, 2022, from 11:00am-Noon CST. The presentation will be “Shining a Light on Black Box Studies.” 

Presenters:
Dr. Kori Khan
Assistant Professor – Iowa State University
Dr. Alicia Carriquiry
Director – CSAFE

Presentation Description:
The American criminal justice system heavily relies on conclusions reached by the forensic science community. In the last ten years, there has been an increased interest in assessing the validity of the methods used to reach such conclusions. For pattern comparison disciplines, however, this can be difficult because the methods employed rely on visual examinations and subjective determinations. Recently, “black box studies” have been put forward as the gold standard for estimating the rate of errors a discipline makes to assist judges in assessing the validity and admissibility of the analysis of forensic evidence. These studies have since been conducted for various disciplines and their findings are used by judges across the country to justify the admission of forensic evidence testimony. These black box studies suffer from flawed experimental designs and inappropriate statistical analyses. We show that these limitations likely underestimate error rates and preclude researchers from making conclusions about a discipline’s error rates. With a view to future studies, we propose minimal statistical criteria for black box studies and describe some of the data that need to be available to plan and implement such studies.

The webinars are free and open to the public, but researchers, collaborators and members of the broader forensics and statistics communities are encouraged to attend. Each 60-minute webinar will allow for discussion and questions.

Sign up on the form below (Chrome & Safari web browsers work the best):

 

Webinar: Modeling And iNventory of Tread Impression System (MANTIS) – The development, deployment and application of an active footwear data collection system

CSAFE invites researchers, collaborators, and members of the broader forensics and statistics communities to participate in our Spring 2022 Webinar Series on Thursday, March 24, 2022, from 11:00am-Noon CST. The presentation will be “Modeling And iNventory of Tread Impression System (MANTIS): The development, deployment and application of an active footwear data collection system.” 

Presenters:
Dr. Richard Stone
Associate Professor – Iowa State University
Dr. Susan VanderPlas
Assistant Professor – University of Nebraska-Lincoln

Presentation Description:
This session will detail the development, capabilities and successful deployment of the Modeling And iNventory of Tread Impression System (MANTIS). MANTIS Optics Scanner takes real time video of gait as the shoe comes in contact with the cover place (again the clear portion). It synchronizes a series of video cameras to create a detailed image of the shoe that can later be processed by software such as Sift + Ransac to create the tread pattern for comparison. The cameras capture between 8 to 15 megapixels for the configuration below (four cameras located in the housing). The use of video optics is expandable to utilize the laser scanning option, though the current utilization focuses on optical capture, thus allowing for tread capture during dynamic movement, i.e. a person walking or running across the system.

The webinars are free and open to the public, but researchers, collaborators and members of the broader forensics and statistics communities are encouraged to attend. Each 60-minute webinar will allow for discussion and questions.

Sign up on the form below (Chrome & Safari web browsers work the best):

 

Webinar: Improving Forensic Decision Making: a Human-Cognitive Perspective

CSAFE invites researchers, collaborators, and members of the broader forensics and statistics communities to participate in our Spring 2022 Webinar Series on Thursday, February 17th, 2022, from Noon-1:00 pm CST. The presentation will be “Improving Forensic Decision Making: a Human-Cognitive Perspective.” 

Presenter:
Itiel Dror
Cognitive Neuroscience Researcher – University College London

Presentation Description:
Humans play a critical role in forensic decision making. Drawing upon classic cognitive and psychological research on factors that influence and underpin expert decision making, this webinar will show the weaknesses and vulnerabilities in forensic decision making. The presenter will also propose a broad and versatile approach to strengthening forensic expert decisions.

Associated Reading:
Linear Sequential Unmasking–Expanded (LSU-E): A general approach for improving decision making as well as minimizing noise and bias

The webinars are free and open to the public, but researchers, collaborators and members of the broader forensics and statistics communities are encouraged to attend. Each 60-minute webinar will allow for discussion and questions.

Sign up on the form below (Chrome & Safari web browsers work the best):

 

 

Webinar: Using Mixture Models to Examine Group Differences: An Illustration Involving the Perceived Strength of Forensic Science Evidence

CSAFE invites researchers, collaborators, and members of the broader forensics and statistics communities to participate in our Spring 2022 Webinar Series on Thursday, December 9th, 2021, from 9:00-10:00 am CT. The presentation will be “Using Mixture Models to Examine Group Differences: An Illustration Involving the Perceived Strength of Forensic Science Evidence.” 

Presenter:
Naomi Kaplan Damary, PhD
Lecturer – The Hebrew University of Jerusalem

Presentation Description:
Forensic examiners compare items to assess whether they originate from a common source. In reaching conclusions, they consider the probability of the observed similarities and differences under alternative assumptions regarding the source(s) of the items (i.e., same or different source). These conclusions can be reported in various ways including likelihood ratios or random match probabilities. Thompson et. al., 2018 examined how lay people perceive the strength of these reports through the use of paired comparison models, obtaining rank-ordered lists of the various statements and an indication of the perceived differences among them. The current study expands this research by examining whether the population is comprised of sub-populations that interpret these statements differently and whether their differences can be characterized. A mixture model that allows for multiple sub-populations with possibly different rankings of the statements is fit to the data and the possibility that covariates explain sub-population membership is considered. A deeper understanding of the way potential jurors perceive various forms of forensic reporting could improve communication in the courtroom.

Associated Reading:
Insights: Using Mixture Models to Examine Group Differences Among Jurors

The webinars are free and open to the public, but researchers, collaborators and members of the broader forensics and statistics communities are encouraged to attend. Each 60-minute webinar will allow for discussion and questions.

Webinar: A Survey of Fingerprint Examiners’ Attitudes towards Probabilistic Reporting

This event took place on September 22, 2021. A recording of the event can be found below.

Presenter:
Simon Cole
Professor – University of California, Irvine

Presentation Description:
Over the past decade, with increasing scientific scrutiny on forensic reporting practices, there have been several efforts to introduce statistical thinking and probabilistic reasoning into forensic practice. These efforts have been met with mixed reactions—a common one being skepticism, or downright hostility, towards this objective. For probabilistic reasoning to be adopted in forensic practice, more than statistical knowledge will be necessary. Social scientific knowledge will be critical to effectively understand the sources of concern and barriers to implementation. This study reports the findings of a survey of forensic fingerprint examiners about reporting practices across the discipline and practitioners’ attitudes and characterizations of probabilistic reporting. Overall, despite its adoption by a small number of practitioners, community-wide adoption of probabilistic reporting in the friction ridge discipline faces challenges. We found that almost no respondents currently report probabilistically. Perhaps more surprisingly, most respondents who claimed to report probabilistically, in fact, do not. Furthermore, we found that two-thirds of respondents perceive probabilistic reporting as ‘inappropriate’—their most common concern being that defense attorneys would take advantage of uncertainty or that probabilistic reports would mislead, or be misunderstood by, other criminal justice system actors. If probabilistic reporting is to be adopted, much work is still needed to better educate practitioners on the importance and utility of probabilistic reasoning in order to facilitate a path towards improved reporting practices.

The data and materials used in this research are publicly available at: https://data.csafe.iastate.edu/ExaminerSurveyStudies/.

Associated Reading:
Insights: Mt. Everest—We Are Going to Lose Many

Insights: Latent Print Quality in Blind Proficiency Testing

INSIGHT

Latent Print Quality in Blind Proficiency Testing:

Using Quality Metrics to Examine Laboratory Performance

OVERVIEW

Organizations and leaders continuously call for blind proficiency testing in modern forensic labs because it more accurately simulates routine examiner casework. In response, researchers funded by CSAFE worked with the Houston Forensic Science Center to assess the results of their blind quality control program and then applied quality metrics to the test materials to see if the quality of the prints impacted their conclusions.

Lead Researchers

Brett O. Gardner
Maddisen Neuman
Sharon Kelley

Journal

Forensic Science International

Publication Date

May 7, 2021

THE GOALS

1

Examine the results of blind proficiency testing within a fingerprint examination unit of a crime laboratory.

2

Use available quality metrics software to measure the quality of the submitted prints.

3

See if there is an association between fingerprint quality and examiners’ conclusions.

The Studies

The Quality Division at the Houston Forensic Science Center inserted 376 prints into 144 blind test cases over a two-year period. In these cases, examiners determined if the prints were of sufficient quality to search in their Automated Fingerprint Identification System (AFIS). After searching AFIS for corresponding prints, they concluded whether or not the test prints were similar enough to make a Primary AFIS Association (PAA). Then, the Blind Quality Control (BQC) team judged the examiners’ accuracy.

 

Meanwhile, Gardner et al. entered the same test prints into LQMetrics –– a commonly used software tool for fingerprint examiners that rates the quality of a print image on a scale of 0–100. The team scored print images with a quality score greater than 65 as “Good,” 45–65 as “Bad,” and lower than 45 as “Ugly.”

Results

Examiner Conclusions Among Good, Bad and Ugly Latent Prints

Print quality (as categorized by Good, Bad and Ugly) was significantly associated with examiner conclusions and ultimate accuracy. Note: N = 373 prints. There were 133 Good prints, 114 Bad prints and 126 Ugly prints

FOCUS ON THE FUTURE

 

The correct source for prints submitted to AFIS appeared in the top ten results only 41.7% of the time, lower than an estimated 53.4% of the time based on the quality of such prints. This highlights the potential for blind proficiency testing to gauge the accuracy of the entire system –– including AFIS.

This study only included prints that had the potential to be submitted to AFIS, dismissing images not labeled as latent prints. Future studies should include a full range of images to more closely reflect real-world casework.

Insights: Latent Print Comparison and Examiner Conclusions

INSIGHT

Latent Print Comparison and Examiner Conclusions

A Field Analysis of Case Processing in One Crime Laboratory

OVERVIEW

While research on error rates and identifying areas of bias and influence in forensic examination exists, most of it occurs under controlled conditions. With this in mind, researchers set out to investigate real-world latent print comparison-based casework performed by the Houston Forensic Science Center (HFSC) and to assess the results of their latent print analyses for an entire year.

Lead Researchers

Brett O. Gardner
Sharon Kelley 
Maddisen Neuman

Journal

Forensic Science International

Publication Date

December 2, 2020

THE GOALS

1

Analyze the HFSC latent print unit’s 2018 casework and describe examiner conclusions.

2

Explore what factors might have affected the examiners’ decisions.

3

Establish the extent of differences between individual examiner’s conclusions.

The Study

Researchers gathered data from JusticeTrax, HFSC’s laboratory information management system. With this, they looked at 20,494 latent print samples the HFSC team examined in 2018. In total, 17 different examiners submitted reports that year. All examiners were certified by the International Association for Identification and had anywhere from 5 to 36 years of experience in the field.

When provided a latent print for comparison, the examiners first checked if the print had enough usable data to enter into an Automated Fingerprint Identification System (AFIS). If so, the examiners then made one of three conclusions based on AFIS results:

No Association: The print is not a potential match with any known print in the AFIS database

Preliminary AFIS Association (PAA): The print is a potential match with a known print in the AFIS database

Reverse Hit: The print is not a potential match with any known print in the AFIS database, but later matches to newly added record prints

Results

1

44.8% of the prints examined had enough usable data to enter into AFIS.

44.8%

Out of the 11,812 prints entered into AFIS, only 20.7% (2,429 prints) resulted in a PAA

20.7%

2

Examiners were slightly more likely to conclude a print was sufficient to enter into AFIS in cases involving a person offense (a crime committed against a person)

3

The types of AFIS software used produced vastly different results. The county-level AFIS (called MorphoTrak) and the federal-level AFIS (called Next Generation Identification, or NGI), were both nearly five times more likely to result in a PAA than the state-level AFIS (called NEC).

4

Individual examiners had drastically different standards to whether a print had enough usable data to enter into AFIS, and again regarding whether the AFIS results could be considered a PAA. This could differ by nearly twice as much, as one examiner concluded 13.3% of their AFIS results were PAAs, while another had 27.1% PAAs in their results.

FOCUS ON THE FUTURE

 

The major differences between the county, state and federal-level AFIS software indicates that more research is needed on AFIS databases to increase their reliability across the board.

These results only reflect the work of one crime lab over the course of one year. Future research should be conducted with multiple labs in various locations.

HFSC made significant changes to its workflow in recent years, which may contribute to the disparity in examiner conclusions.

Insights: Implementing Blind Proficiency Testing in Forensic Laboratories

INSIGHT

Implementing Blind Proficiency Testing in Forensic Laboratories:

Motivation, Obstacles, and Recommendations

OVERVIEW

Accredited forensic laboratories are required to conduct proficiency testing –– but most rely solely on declared proficiency tests. A 2014 study showed that only 10% of forensic labs in the United States performed blind proficiency testing, whereas blind tests are standard in other fields including medical and drug testing laboratories. Researchers wanted to identify the barriers to widespread blind proficiency testing and generate solutions to removing these obstacles. After reviewing the existing research, they realized they must convene a meeting of experts to establish an understanding of the challenges to implementation.

Lead Researchers

Robin Mejia
Maria Cuellar
Jeff Saylards

Journal

Forensic Science International: Synergy

Publication Date

September 2020

Participants

CSAFE met with laboratory directors and quality managers from seven forensic laboratory systems in the eastern US and the Houston Forensic Science Center. Two of the quality managers represented the Association of Forensic Quality Assurance Managers (AFQAM). In addition, several professors, graduate students and researchers from three universities attended the meeting.

APPROACH AND METHODOLOGY

1

Compare blind proficiency testing to declared testing then have participants discuss the potential advantages of establishing blind testing as standard.

2

Facilitate and document a discussion of the logistical and cultural barriers labs might face when adopting blind testing. Use this to create a list of challenges.

3

Collect and analyze suggested steps labs can take to overcome the list of challenges to implementing blind proficiency testing.

Challenges and Solutions

Challenge Proposed Solution
Realistic test case creation can be complex.
Quality managers develop the expertise to create test cases; laboratories create a shared evidence bank.
The development of realistic submission materials may be difficult.
The QA staff must develop the knowledge locally to ensure the test evidence conforms with a jurisdiction’s typical cases.
Cost may be prohibitively expensive.
Multiple laboratories can share resources and make joint purchases; external test providers could develop materials to lower the cost.
Test must be submitted to the lab by an outside LEA.
Choosing which law enforcement agency (LEA) to work with should be decided locally based on the relationship between lab management and the LEA.
Not all LIMS are equipped to easily flag and track test cases.
Labs can choose to either use a Laboratory Information Management System (LIMS) with this functionality or develop an in-house system to flag test cases.
Labs must ensure results are not released as real cases.
The QA team will need to work with individuals in other units of the lab to prevent accidental releases. It may also be useful to have contacts in the submitting LEA or local District Attorney’s office.
Proficiency tests could impact metrics, so labs need to decide whether to include them.
These decisions must be made on a lab-by-lab basis; a consortium of labs or organizations such as AFQAM can aid in standardization.
Blind testing challenges the cultural myth of 100% accuracy.
Senior lab management must champion blind testing and show that adding it as a tool will demonstrate both the quality of examiners and help labs discover and remedy errors.

Learn More

 

Watch the HFSC webinar, “Crime Lab Proficiency and Quality Management.”

Dr. Robin Mejia discusses “Implementing Blind Proficiency Testing” in the CSAFE webinar

Insights: Psychometric Analysis of Forensic Examiner Behavior

INSIGHT

Psychometric Analysis of Forensic Examiner Behavior

OVERVIEW

Understanding how fingerprint examiners’ proficiency and behavior influence their decisions when interpreting evidence requires the use of many analytical models. Researchers sought to better identify and study uncertainty in examiners’ decision making. This is because final source identifications still rely on complex and subjective interpretation of the evidence by examiners. By applying novel methods like Item Response Theory (IRT) to existing tools like error rate studies, the team proposes a new approach to account for differences among examiners and in task difficulty levels.

Lead Researchers

Amanda Luby
Anjali Mazumder
Brian Junker

Publication Date

June 13, 2020

THE GOALS

1

Survey recent advances in psychometric analysis of forensic decision-making.

2

Use behavioral models from the field of Item Response Theory to better understand the operating characteristics of the identification tasks that examiners perform.

APPROACH AND METHODOLOGY

The Data

A 2011 FBI Black Box study assigned 169 fingerprint examiners a selection of items to analyze, which included a latent print evaluation, a source destination, a reason and a rating of the difficulty of the task for each pair of prints.

Key Definitions

Psychometrics

Using factors such as aptitudes and personality traits to study the difference between individuals.

Item Response Trees (IRTrees)

Visual representation of each decision an examiner makes in the process of performing an identification task. Based on IRT, which attempts to explain the connections between the properties of a test item –– a piece of fingerprint evidence –– and an individual’s –– a fingerprint examiner’s –– performance in response to that item.

Cultural Consensus Theory (CCT)

A method that facilitates the discovery and description of consensus among a group of people with shared beliefs. For this study, CCT helps identify the common knowledge and beliefs among fingerprint examiners –– things that examiners may take for granted but that laypeople would not necessarily know.

APPLYING IRTREES AND CCT TO FINGERPRINT ANALYSIS

1

Researchers segmented the data with the Rasch Model to separate a latent print’s difficulty level from an examiners’ proficiency. This allowed comparison to the existing method of studying error rates.

2

Then they constructed IRTrees to model a fingerprint examiner’s decision-making process when deciding whether a print is a positive match, negative match, inconclusive, or has no latent value. (See Figure 1)

3

Finally, the team used IRTrees and Cultural Consensus Theory to create “answer keys” –– a set of reasons and shared knowledge –– that provide insight into how a fingerprint examiner arrives at an “inconclusive” or “no value” decision. (See Figure 2)

Figure 1

Visual representation of latent print analysis

Figure 2

How examiners arrive at “inconclusive” or “no value” decisions

KEY TAKEAWAYS FOR PRACTITIONERS

1

Using IRT models provides substantial improvement over current examiner error rate studies. These include the ability to justifiably compare examiner proficiencies
even if they did not do the same identification tasks and that the influence of task difficulty can be seen in examiner proficiency estimates.

2

IRTrees give researchers the ability to accurately model the complex decision-making in fingerprint identification tasks –– it is much more than simply stating a print is a “match” or a “non-match.” This reveals the skill involved in fingerprint examination work.

3

Examiners tend to overrate the difficulty of middling-difficulty tasks, while underrating the difficulty of extremely easy or extremely difficult tasks.

FOCUS ON THE FUTURE

 

This analysis was somewhat limited by available data; for confidentiality and privacy considerations, the FBI Black Box Study does not provide the reference prints used nor the personal details of the examiners themselves. Future collaboration with experts, both in fingerprint analysis and human decision making, can provide more detailed data and thus improve the models.