Insights: Latent Print Quality in Blind Proficiency Testing

INSIGHT

Latent Print Quality in Blind Proficiency Testing:

Using Quality Metrics to Examine Laboratory Performance

OVERVIEW

Organizations and leaders continuously call for blind proficiency testing in modern forensic labs because it more accurately simulates routine examiner casework. In response, researchers funded by CSAFE worked with the Houston Forensic Science Center to assess the results of their blind quality control program and then applied quality metrics to the test materials to see if the quality of the prints impacted their conclusions.

Lead Researchers

Brett O. Gardner
Maddisen Neuman
Sharon Kelley

Journal

Forensic Science International

Publication Date

May 7, 2021

THE GOALS

1

Examine the results of blind proficiency testing within a fingerprint examination unit of a crime laboratory.

2

Use available quality metrics software to measure the quality of the submitted prints.

3

See if there is an association between fingerprint quality and examiners’ conclusions.

The Studies

The Quality Division at the Houston Forensic Science Center inserted 376 prints into 144 blind test cases over a two-year period. In these cases, examiners determined if the prints were of sufficient quality to search in their Automated Fingerprint Identification System (AFIS). After searching AFIS for corresponding prints, they concluded whether or not the test prints were similar enough to make a Primary AFIS Association (PAA). Then, the Blind Quality Control (BQC) team judged the examiners’ accuracy.

 

Meanwhile, Gardner et al. entered the same test prints into LQMetrics –– a commonly used software tool for fingerprint examiners that rates the quality of a print image on a scale of 0–100. The team scored print images with a quality score greater than 65 as “Good,” 45–65 as “Bad,” and lower than 45 as “Ugly.”

Results

Examiner Conclusions Among Good, Bad and Ugly Latent Prints

Print quality (as categorized by Good, Bad and Ugly) was significantly associated with examiner conclusions and ultimate accuracy. Note: N = 373 prints. There were 133 Good prints, 114 Bad prints and 126 Ugly prints

FOCUS ON THE FUTURE

 

The correct source for prints submitted to AFIS appeared in the top ten results only 41.7% of the time, lower than an estimated 53.4% of the time based on the quality of such prints. This highlights the potential for blind proficiency testing to gauge the accuracy of the entire system –– including AFIS.

This study only included prints that had the potential to be submitted to AFIS, dismissing images not labeled as latent prints. Future studies should include a full range of images to more closely reflect real-world casework.

Insights: Latent Print Comparison and Examiner Conclusions

INSIGHT

Latent Print Comparison and Examiner Conclusions

A Field Analysis of Case Processing in One Crime Laboratory

OVERVIEW

While research on error rates and identifying areas of bias and influence in forensic examination exists, most of it occurs under controlled conditions. With this in mind, researchers set out to investigate real-world latent print comparison-based casework performed by the Houston Forensic Science Center (HFSC) and to assess the results of their latent print analyses for an entire year.

Lead Researchers

Brett O. Gardner
Sharon Kelley 
Maddisen Neuman

Journal

Forensic Science International

Publication Date

December 2, 2020

THE GOALS

1

Analyze the HFSC latent print unit’s 2018 casework and describe examiner conclusions.

2

Explore what factors might have affected the examiners’ decisions.

3

Establish the extent of differences between individual examiner’s conclusions.

The Study

Researchers gathered data from JusticeTrax, HFSC’s laboratory information management system. With this, they looked at 20,494 latent print samples the HFSC team examined in 2018. In total, 17 different examiners submitted reports that year. All examiners were certified by the International Association for Identification and had anywhere from 5 to 36 years of experience in the field.

When provided a latent print for comparison, the examiners first checked if the print had enough usable data to enter into an Automated Fingerprint Identification System (AFIS). If so, the examiners then made one of three conclusions based on AFIS results:

No Association: The print is not a potential match with any known print in the AFIS database

Preliminary AFIS Association (PAA): The print is a potential match with a known print in the AFIS database

Reverse Hit: The print is not a potential match with any known print in the AFIS database, but later matches to newly added record prints

Results

1

44.8% of the prints examined had enough usable data to enter into AFIS.

44.8%

Out of the 11,812 prints entered into AFIS, only 20.7% (2,429 prints) resulted in a PAA

20.7%

2

Examiners were slightly more likely to conclude a print was sufficient to enter into AFIS in cases involving a person offense (a crime committed against a person)

3

The types of AFIS software used produced vastly different results. The county-level AFIS (called MorphoTrak) and the federal-level AFIS (called Next Generation Identification, or NGI), were both nearly five times more likely to result in a PAA than the state-level AFIS (called NEC).

4

Individual examiners had drastically different standards to whether a print had enough usable data to enter into AFIS, and again regarding whether the AFIS results could be considered a PAA. This could differ by nearly twice as much, as one examiner concluded 13.3% of their AFIS results were PAAs, while another had 27.1% PAAs in their results.

FOCUS ON THE FUTURE

 

The major differences between the county, state and federal-level AFIS software indicates that more research is needed on AFIS databases to increase their reliability across the board.

These results only reflect the work of one crime lab over the course of one year. Future research should be conducted with multiple labs in various locations.

HFSC made significant changes to its workflow in recent years, which may contribute to the disparity in examiner conclusions.

Insights: Implementing Blind Proficiency Testing in Forensic Laboratories

INSIGHT

Implementing Blind Proficiency Testing in Forensic Laboratories:

Motivation, Obstacles, and Recommendations

OVERVIEW

Accredited forensic laboratories are required to conduct proficiency testing –– but most rely solely on declared proficiency tests. A 2014 study showed that only 10% of forensic labs in the United States performed blind proficiency testing, whereas blind tests are standard in other fields including medical and drug testing laboratories. Researchers wanted to identify the barriers to widespread blind proficiency testing and generate solutions to removing these obstacles. After reviewing the existing research, they realized they must convene a meeting of experts to establish an understanding of the challenges to implementation.

Lead Researchers

Robin Mejia
Maria Cuellar
Jeff Saylards

Journal

Forensic Science International: Synergy

Publication Date

September 2020

Participants

CSAFE met with laboratory directors and quality managers from seven forensic laboratory systems in the eastern US and the Houston Forensic Science Center. Two of the quality managers represented the Association of Forensic Quality Assurance Managers (AFQAM). In addition, several professors, graduate students and researchers from three universities attended the meeting.

APPROACH AND METHODOLOGY

1

Compare blind proficiency testing to declared testing then have participants discuss the potential advantages of establishing blind testing as standard.

2

Facilitate and document a discussion of the logistical and cultural barriers labs might face when adopting blind testing. Use this to create a list of challenges.

3

Collect and analyze suggested steps labs can take to overcome the list of challenges to implementing blind proficiency testing.

Challenges and Solutions

Challenge Proposed Solution
Realistic test case creation can be complex.
Quality managers develop the expertise to create test cases; laboratories create a shared evidence bank.
The development of realistic submission materials may be difficult.
The QA staff must develop the knowledge locally to ensure the test evidence conforms with a jurisdiction’s typical cases.
Cost may be prohibitively expensive.
Multiple laboratories can share resources and make joint purchases; external test providers could develop materials to lower the cost.
Test must be submitted to the lab by an outside LEA.
Choosing which law enforcement agency (LEA) to work with should be decided locally based on the relationship between lab management and the LEA.
Not all LIMS are equipped to easily flag and track test cases.
Labs can choose to either use a Laboratory Information Management System (LIMS) with this functionality or develop an in-house system to flag test cases.
Labs must ensure results are not released as real cases.
The QA team will need to work with individuals in other units of the lab to prevent accidental releases. It may also be useful to have contacts in the submitting LEA or local District Attorney’s office.
Proficiency tests could impact metrics, so labs need to decide whether to include them.
These decisions must be made on a lab-by-lab basis; a consortium of labs or organizations such as AFQAM can aid in standardization.
Blind testing challenges the cultural myth of 100% accuracy.
Senior lab management must champion blind testing and show that adding it as a tool will demonstrate both the quality of examiners and help labs discover and remedy errors.

Learn More

 

Watch the HFSC webinar, “Crime Lab Proficiency and Quality Management.”

Dr. Robin Mejia discusses “Implementing Blind Proficiency Testing” in the CSAFE webinar

Insights: Psychometric Analysis of Forensic Examiner Behavior

INSIGHT

Psychometric Analysis of Forensic Examiner Behavior

OVERVIEW

Understanding how fingerprint examiners’ proficiency and behavior influence their decisions when interpreting evidence requires the use of many analytical models. Researchers sought to better identify and study uncertainty in examiners’ decision making. This is because final source identifications still rely on complex and subjective interpretation of the evidence by examiners. By applying novel methods like Item Response Theory (IRT) to existing tools like error rate studies, the team proposes a new approach to account for differences among examiners and in task difficulty levels.

Lead Researchers

Amanda Luby
Anjali Mazumder
Brian Junker

Publication Date

June 13, 2020

THE GOALS

1

Survey recent advances in psychometric analysis of forensic decision-making.

2

Use behavioral models from the field of Item Response Theory to better understand the operating characteristics of the identification tasks that examiners perform.

APPROACH AND METHODOLOGY

The Data

A 2011 FBI Black Box study assigned 169 fingerprint examiners a selection of items to analyze, which included a latent print evaluation, a source destination, a reason and a rating of the difficulty of the task for each pair of prints.

Key Definitions

Psychometrics

Using factors such as aptitudes and personality traits to study the difference between individuals.

Item Response Trees (IRTrees)

Visual representation of each decision an examiner makes in the process of performing an identification task. Based on IRT, which attempts to explain the connections between the properties of a test item –– a piece of fingerprint evidence –– and an individual’s –– a fingerprint examiner’s –– performance in response to that item.

Cultural Consensus Theory (CCT)

A method that facilitates the discovery and description of consensus among a group of people with shared beliefs. For this study, CCT helps identify the common knowledge and beliefs among fingerprint examiners –– things that examiners may take for granted but that laypeople would not necessarily know.

APPLYING IRTREES AND CCT TO FINGERPRINT ANALYSIS

1

Researchers segmented the data with the Rasch Model to separate a latent print’s difficulty level from an examiners’ proficiency. This allowed comparison to the existing method of studying error rates.

2

Then they constructed IRTrees to model a fingerprint examiner’s decision-making process when deciding whether a print is a positive match, negative match, inconclusive, or has no latent value. (See Figure 1)

3

Finally, the team used IRTrees and Cultural Consensus Theory to create “answer keys” –– a set of reasons and shared knowledge –– that provide insight into how a fingerprint examiner arrives at an “inconclusive” or “no value” decision. (See Figure 2)

Figure 1

Visual representation of latent print analysis

Figure 2

How examiners arrive at “inconclusive” or “no value” decisions

KEY TAKEAWAYS FOR PRACTITIONERS

1

Using IRT models provides substantial improvement over current examiner error rate studies. These include the ability to justifiably compare examiner proficiencies
even if they did not do the same identification tasks and that the influence of task difficulty can be seen in examiner proficiency estimates.

2

IRTrees give researchers the ability to accurately model the complex decision-making in fingerprint identification tasks –– it is much more than simply stating a print is a “match” or a “non-match.” This reveals the skill involved in fingerprint examination work.

3

Examiners tend to overrate the difficulty of middling-difficulty tasks, while underrating the difficulty of extremely easy or extremely difficult tasks.

FOCUS ON THE FUTURE

 

This analysis was somewhat limited by available data; for confidentiality and privacy considerations, the FBI Black Box Study does not provide the reference prints used nor the personal details of the examiners themselves. Future collaboration with experts, both in fingerprint analysis and human decision making, can provide more detailed data and thus improve the models.