Insights: Mock Jurors’ Evaluation of Firearm Examiner Testimony

INSIGHTS

Mock Jurors’ Evaluation of Firearm Examiner Testimony

OVERVIEW

Traditionally, firearm and toolmark experts have testified that a weapon leaves “unique” marks on bullets and casings permitting a “source identification” conclusion to be made. While scientific organizations have called this sort of categorical assertion into question, jurors still place a great deal of weight on a firearms expert’s testimony.

To examine the weight jurors place on these testimonies, researchers conducted two studies: the first evaluated if using more cautious language influenced jurors’ opinions on expert testimony, and the second measured if cross-examination altered these opinions.

Lead Researchers

Brandon L. Garrett
Nicholas Scurich
William E. Crozier

Journal

Law & Human Behavior

Publication Date

2020

Goals

The team tested four hypotheses in these studies:

1

Jurors will accord significant weight to a testimony that declares a categorical “match” between two casings.

2

Jurors’ opinions will not be changed by more cautious language in a firearms expert testimony.

3

Guilty verdicts would only be lowered by using the most cautious language (i.e., “cannot exclude the gun”).

4

Cross-examination would lower guilty verdicts depending on the specific language used.

The Studies

Study 1:

1,420 participants read a synopsis of a criminal case which included the testimony of a firearms expert. The expert gave one of seven specifically worded conclusions, ranging from a “simple match,” to a more cautious “reasonable degree of ballistic certainty,” to “cannot be excluded.”

The participants then decided whether they would convict based on the testimony.

Study 2:

1,260 participants were given the same synopsis, with two important changes:

  • The expert’s testimony had three possible conclusions (inconclusive, a conclusive match, or a cautious
    “cannot be excluded”) rather than seven.
  • Some participants also heard cross-examination of the firearms expert.

The participants again decided whether they would convict the defendant and rated the testimony’s credibility.

Results

Study 1:

Guilty Verdict

Figure 1. Proportion of guilty verdicts with 95% confidence intervals.

  • Compared to an inconclusive result, finding a “match” tripled the rate of guilty verdicts. Variations to how the “match” is described did not affect verdicts.
  • The sole exception is when the match was described as “…the defendant’s gun ‘cannot be
    excluded’ as the source.” Then the rate of guilty verdicts doubled –– instead of tripled –– compared to an inconclusive result.

Study 2:

Cross-Examination

Conclusion

Proportion of Guilty Verdicts

Figure 2. Proportion of guilty verdicts (with 95% confidence intervals) in each experimental condition.

  • Cross-examination did not help jurors to consistently discount firearms conclusions. This is consistent with prior work showing mixed effects of cross-examination on jury perceptions of strength of evidence.
  • ‘Cannot exclude’ and ‘identification’ conclusions lead to significantly more “guilty” convictions than the “inconclusive” condition.

Focus on the future

 

While it is unfortunate that using more cautious language does not affect jurors’ decisions, there is no downside to implementing it because it can prevent misleading or overstated conclusions.

Future studies should provide video testimony and discussion to better mimic a real-world trial.

The methods that firearms experts use have not been adequately tested, so jurors cannot accurately judge the strength of the evidence or the expert’s proficiency. This requires further research into the validity and reliability of firearms comparison methods.

Insights: Latent Print Comparison and Examiner Conclusions

INSIGHT

Latent Print Comparison and Examiner Conclusions

A Field Analysis of Case Processing in One Crime Laboratory

OVERVIEW

While research on error rates and identifying areas of bias and influence in forensic examination exists, most of it occurs under controlled conditions. With this in mind, researchers set out to investigate real-world latent print comparison-based casework performed by the Houston Forensic Science Center (HFSC) and to assess the results of their latent print analyses for an entire year.

Lead Researchers

Brett O. Gardner
Sharon Kelley 
Maddisen Neuman

Journal

Forensic Science International

Publication Date

December 2, 2020

THE GOALS

1

Analyze the HFSC latent print unit’s 2018 casework and describe examiner conclusions.

2

Explore what factors might have affected the examiners’ decisions.

3

Establish the extent of differences between individual examiner’s conclusions.

The Study

Researchers gathered data from JusticeTrax, HFSC’s laboratory information management system. With this, they looked at 20,494 latent print samples the HFSC team examined in 2018. In total, 17 different examiners submitted reports that year. All examiners were certified by the International Association for Identification and had anywhere from 5 to 36 years of experience in the field.

When provided a latent print for comparison, the examiners first checked if the print had enough usable data to enter into an Automated Fingerprint Identification System (AFIS). If so, the examiners then made one of three conclusions based on AFIS results:

No Association: The print is not a potential match with any known print in the AFIS database

Preliminary AFIS Association (PAA): The print is a potential match with a known print in the AFIS database

Reverse Hit: The print is not a potential match with any known print in the AFIS database, but later matches to newly added record prints

Results

1

44.8% of the prints examined had enough usable data to enter into AFIS.

44.8%

Out of the 11,812 prints entered into AFIS, only 20.7% (2,429 prints) resulted in a PAA

20.7%

2

Examiners were slightly more likely to conclude a print was sufficient to enter into AFIS in cases involving a person offense (a crime committed against a person)

3

The types of AFIS software used produced vastly different results. The county-level AFIS (called MorphoTrak) and the federal-level AFIS (called Next Generation Identification, or NGI), were both nearly five times more likely to result in a PAA than the state-level AFIS (called NEC).

4

Individual examiners had drastically different standards to whether a print had enough usable data to enter into AFIS, and again regarding whether the AFIS results could be considered a PAA. This could differ by nearly twice as much, as one examiner concluded 13.3% of their AFIS results were PAAs, while another had 27.1% PAAs in their results.

FOCUS ON THE FUTURE

 

The major differences between the county, state and federal-level AFIS software indicates that more research is needed on AFIS databases to increase their reliability across the board.

These results only reflect the work of one crime lab over the course of one year. Future research should be conducted with multiple labs in various locations.

HFSC made significant changes to its workflow in recent years, which may contribute to the disparity in examiner conclusions.

Insights: Treatment of Inconclusives in the AFTE Range of Conclusions

INSIGHTS

Treatment of Inconclusives in the AFTE Range of Conclusions

OVERVIEW

Several studies have estimated the error rates of firearm examiners in recent years, most of which showed very small error rates overall. However, the actual calculation of these error rates, particularly how each study treats inconclusive results, differed significantly between studies. Researchers funded by CSAFE revisited these studies to see how inconclusive results were treated and how these differences impacted their overall error rate calculations.

Lead Researchers

Heike Hofmann
Susan Vanderplas
Alicia Carriquiry

Journal

Law, Probability and Risk

Publication Date

September 2020

THE GOALS

1

Survey various studies that assess the error rates of firearms examinations.

2

Determine the differences in how inconclusives are treated in each study.

3

Identify areas where these studies can be improved.

The Study

Hofmann et al. surveyed the most cited black box studies involving firearm and toolmark analysis. These studies varied in structure, having closed-set or open-set data. They were also conducted in different regions, either in the United States and Canada or in the European Union. The most relevant difference, however, was how each study treated inconclusive results.

All studies used one of three methods to treat inconclusives:

  • Option 1: Exclude the inconclusive from the error rate.
  • Option 2: Include the inconclusive as a correct result.
  • Option 3: Include the inconclusive as an incorrect result.

Key Terms:

Black Box Study: a study that evaluates only the correctness of a participant’s decisions.

Closed-set Study: one in which all known and questioned samples come from the same source.

Open-set Study: one in which the questioned samples may come from outside sources.

Results

Option 1 was deemed inappropriate for accurate error rates. Option 2 was useful for error rates of the individual examiners, while Option 3 reflected the error rates of the process itself.

Examiners tended to lean towards identification over inconclusive or elimination. In addition, they were far more likely to reach an inconclusive with different-source evidence, which should have been an elimination in nearly all cases.

Process errors occurred at higher rates than examiner errors.

Design issues created a bias toward the prosecution, such as closed-set studies where all samples came from the same source, prescreened kit components which inflated the rate of identifications, or multiple known sources which could not quantify a proper error rate for eliminations.

Fig. 1. Sketch of the relationship between ground truth of evidence (dots) and examiners’ decisions (shaded areas). In a perfect scenario dots only appear on the shaded area of the same colour. Any dots on differently coloured backgrounds indicate an error in the examination process.

FOCUS ON THE FUTURE

 

Hofmann et al. propose a fourth option:

  • Include the inconclusive as an elimination.
  • Calculate the error rates for the examiner and the process separately.

While most studies included a bias toward the prosecution, this was not the case for studies conducted in the European Union. Further study is recommended to verify this difference and determine its cause.

Insights: Probabilistic Reporting in Criminal Cases in the United States

INSIGHT

Probabilistic Reporting in Criminal Cases in the United States:

A Baseline Study

OVERVIEW

Forensic examiners are frequently asked to give reports and testimonies in court and there have been calls for them to report their findings probabilistically. Terms like match, consistent with or identical are categorical in nature, not statistical –– they do not communicate the value of the evidence in terms of probability. While there is robust debate over how forensic scientists should report, less attention is paid to how they do report.

Lead Researchers

Simon A. Cole 
Matt Barno

Journal

Science & Justice

Publication Date

September 2020

Key Research Questions

1

To what extent are forensic reports in these disciplines consistent with published standards?

2

To what extent are forensic reports in these disciplines probabilistic, and, if so, how is probability expressed?

APPROACH AND METHODOLOGY

Data Set

572 transcripts and reports from Westlaw, consultants’ files and proficiency tests using a heterogeneous, opportunistic data collection approach.

What

Researchers reviewed reports across four pattern disciplines:

  • Friction Ridge Prints
  • Firearms & Toolmarks
  • Questioned Documents
  • Shoeprints

How

Using disciplinary standards as a framework, researchers determined the type of report being reviewed and if it used standard terminology. Then, they coded each report both for whether or not it was probabilistic and for the type of language used, such as “same source,” “identified” and “consistent.”

KEY TAKEAWAYS for Practitioners

Across all four disciplines, the prevailing standards for reporting were categorical in nature. The majority of reports analyzed adhered to the reporting standards for their discipline –– but discussion of probability was extremely rare and, even in those cases, frequently used to dismiss the use of probability itself.

reports used categorical terms in their reporting

reports used terms that adhered to their disciplinary standards

reports used probabilistic terms

friction ridge prints
89%
firearms & toolmarks
67%
questioned documents
50%
shoemark
87%
friction ridge prints
74%
firearms & toolmarks
100%
questioned documents
96%
shoemark
82%
friction ridge prints
11%
firearms & toolmarks
33%
questioned documents
50%
shoemark
13%

Focus on the future

 

To increase the probabilistic reporting of forensics results:

1

Incorporate probabilistic reporting into disciplinary standards.

2

Educate practitioners, lawyers, and judges on the reasons for, and importance, of probabilistic reporting.

3

Demand that experts quantify their uncertainty when testifying in court.

Insights: Juror Appraisals of Forensic Science Evidence

INSIGHT

Juror Appraisals of Forensic Science Evidence:

Effects of Proficiency and Cross-examination

OVERVIEW

Researchers conducted two studies to determine how much an examiner’s blind proficiency score affects the jurors’ confidence in their testimonies.

Lead Researchers

William E. Crozier
Jeff Kukucka
Brandon L. Garrett

Journal

Forensic Science International

Publication Date

October 2020

Key Research Questions

1

Determine how disclosing blind proficiency test results can inform a jury’s decision making.

2

Assess how using these proficiency test results in cross-examination can influence jurors.

APPROACH AND METHODOLOGY

WHO

Two separate groups (1,398 participants in Study 1, and 1,420 in Study 2) read a mock trial transcript in which a forensic examiner provided the central evidence.

What

Evidence: bitemark on a victim’s arm or a fingerprint on the robber’s gun.

Blind Proficiency Scores: the examiner either made zero mistakes in the past year (high proficiency), made six mistakes in the past year (low proficiency), claimed high proficiency without proof (high unproven proficiency), or did not discuss their proficiency at all (control).

How

Participants in both studies were asked to render a verdict, estimate the likelihood of the defendant’s guilt, and provide opinions on the examiner and the evidence.

KEY TAKEAWAYS for Practitioners

1

Stating proficiency scores did influence the participants’ verdicts. In both studies, the examiner presented as having low proficiency elicited fewer convictions than the other examiners.

2

While the high-proficiency examiner did not elicit more convictions than the control in Study 1, they not only got more convictions in Study 2, but also proved to withstand cross-examination better than the other examiners.

3

In both studies, proficiency information influenced the participants’ opinions of the examiners themselves, but not their domain’s methods or evidence.

Focus on the future

 

Despite having lower conviction rates, the low-proficiency examiners were still viewed very favorably and still achieved convictions a majority of the time in both studies (65% and 71% respectively), so fears of an examiner being “burned” by a low-proficiency score are largely overblown.

For defense lawyers to ask about proficiency results, they require access to the information. However, crime laboratories can potentially gain a significant advantage by only disclosing high-proficiency scores. Thus, it is important that such information be disclosed evenly and transparently.

Next Steps

 

The components and data of both studies are available on the Open Science Framework.

Insights: Implementing Blind Proficiency Testing in Forensic Laboratories

INSIGHT

Implementing Blind Proficiency Testing in Forensic Laboratories:

Motivation, Obstacles, and Recommendations

OVERVIEW

Accredited forensic laboratories are required to conduct proficiency testing –– but most rely solely on declared proficiency tests. A 2014 study showed that only 10% of forensic labs in the United States performed blind proficiency testing, whereas blind tests are standard in other fields including medical and drug testing laboratories. Researchers wanted to identify the barriers to widespread blind proficiency testing and generate solutions to removing these obstacles. After reviewing the existing research, they realized they must convene a meeting of experts to establish an understanding of the challenges to implementation.

Lead Researchers

Robin Mejia
Maria Cuellar
Jeff Saylards

Journal

Forensic Science International: Synergy

Publication Date

September 2020

Participants

CSAFE met with laboratory directors and quality managers from seven forensic laboratory systems in the eastern US and the Houston Forensic Science Center. Two of the quality managers represented the Association of Forensic Quality Assurance Managers (AFQAM). In addition, several professors, graduate students and researchers from three universities attended the meeting.

APPROACH AND METHODOLOGY

1

Compare blind proficiency testing to declared testing then have participants discuss the potential advantages of establishing blind testing as standard.

2

Facilitate and document a discussion of the logistical and cultural barriers labs might face when adopting blind testing. Use this to create a list of challenges.

3

Collect and analyze suggested steps labs can take to overcome the list of challenges to implementing blind proficiency testing.

Challenges and Solutions

Challenge Proposed Solution
Realistic test case creation can be complex.
Quality managers develop the expertise to create test cases; laboratories create a shared evidence bank.
The development of realistic submission materials may be difficult.
The QA staff must develop the knowledge locally to ensure the test evidence conforms with a jurisdiction’s typical cases.
Cost may be prohibitively expensive.
Multiple laboratories can share resources and make joint purchases; external test providers could develop materials to lower the cost.
Test must be submitted to the lab by an outside LEA.
Choosing which law enforcement agency (LEA) to work with should be decided locally based on the relationship between lab management and the LEA.
Not all LIMS are equipped to easily flag and track test cases.
Labs can choose to either use a Laboratory Information Management System (LIMS) with this functionality or develop an in-house system to flag test cases.
Labs must ensure results are not released as real cases.
The QA team will need to work with individuals in other units of the lab to prevent accidental releases. It may also be useful to have contacts in the submitting LEA or local District Attorney’s office.
Proficiency tests could impact metrics, so labs need to decide whether to include them.
These decisions must be made on a lab-by-lab basis; a consortium of labs or organizations such as AFQAM can aid in standardization.
Blind testing challenges the cultural myth of 100% accuracy.
Senior lab management must champion blind testing and show that adding it as a tool will demonstrate both the quality of examiners and help labs discover and remedy errors.

Learn More

 

Watch the HFSC webinar, “Crime Lab Proficiency and Quality Management.”

Dr. Robin Mejia discusses “Implementing Blind Proficiency Testing” in the CSAFE webinar

Insights: A Clustering Method for Graphical Handwriting Components and Statistical Writership Analysis

INSIGHT

A Clustering Method for Graphical Handwriting Components and Statistical Writership Analysis

OVERVIEW

Researchers developed and tested a statistical algorithm for analyzing the shapes made in handwriting to determine their source. Unlike other programs that analyze what words are written, this algorithm analyzes how the words are written.

Lead Researchers

Amy M. Crawford
Nicholas S. Berry
Alicia L. Carriquiry

Journal

Statistical Analysis and Data Mining

Publication Date

August 2020

The Goals

1

Develop a semi-automated process to examine and compare handwriting samples from questioned and reference documents.

2

Improve upon the existing methodology for determining writership.

APPROACH AND METHODOLOGY

In this study, researchers scanned and analyzed 162 handwritten documents by 27 writers from the Computer Vision Lab database, a publicly available source of handwritten text samples, and broke down the writing into 52,541 unique graphs using the processing program handwriter. From there, a K-means algorithm clustered the graphs into 40 groups of similar graphs, each anchored by a mean or center graph. To allocate graphs to groups, researchers developed a new way to measure the distance between graphs.

Then, researchers processed an additional document from each of the 27 writers –– held back as a “questioned document” –– to test if the algorithm could accurately determine which document belonged to which writer. The new method for clustering graphs appears to be an improvement over the current approach based on adjacency grouping,
which relies only on edge connectivity of graphs.

using adjacent clustering

above 50% probability of a match on 23 documents

using dynamic K-means clustering

above 90% probability of a match on 23 documents

correctly matched 26 documents

Key Definitions

Graphs

Simple structures with nodes and edges to represent shapes that constitute handwriting

Writership

The set of graphs a person typically makes when writing

K-means Algorithm

An iterative algorithm that separates data points into clusters based on nearest mean values

KEY TAKEAWAYS FOR PRACTITIONERS

1

The new approach shows promise, as it allows practitioners to more objectively analyze
handwriting by studying the way letters and words are formed.

2

When compared to the more readily available but more volatile adjacency grouping
method, the K-means clustering method contributed to greater accuracy when trying to
identify the writer of a questioned document from among a closed set of potential writers.

FOCUS ON THE FUTURE

 

The new method favors certain properties of handwriting over others to assess similarities and can be extended to incorporate additional features.

The mean of a group of graphs is often a shape that does not actually occur in the document. Instead of centering groups using a mean graph, researchers are exploring whether using an exemplar graph as a group’s anchor will simplify calculations.

Next Steps

Handwriter

Explore and try the handwriter algorithm by downloading it

CSAFE Handwriting Database

Investigate publicly available databases of handwritten documents

Computer Vision Lab database

Investigate publicly available databases of handwritten documents

Insights: Statistical Methods for the Forensic Analysis of Geolocated Event Data

INSIGHT

Statistical Methods for the Forensic Analysis of Geolocated Event Data

OVERVIEW

Researchers investigated the application of statistical methods to forensic questions involving spatial event-based digital data. A motivating example involves assessing whether or not two sets of GPS locations corresponding to digital events were generated by the same source. The team established two approaches to quantify the strength of evidence concerning this question.

Lead Researchers

Christopher Galbraith
Padhraic Smyth
Hal S. Stern

Journal

Forensic Science International: Digital Investigation

Publication Date

July 2020

The Goal

Develop quantitative techniques for the forensic analysis of geolocated event data.

APPROACH AND METHODOLOGY

Researchers collected geolocation data from Twitter messages over two spatial regions, Orange County, CA and the borough of Manhattan in New York City, from May 2015 to February 2016. Selecting only tweets from public accounts, they were able to gather GPS data regarding the frequency of geolocated events in each area.

Key Definitions

Likelihood Ratio (LR)

A comparison of the probability of observing a set of evidence measures under two different theories in order to assess relative support for the theories.

Score-Based Likelihood Ratio (SLR)

An approach that summarizes evidence measures by a score function before applying the likelihood ratio approach.

This study considered a scenario in which two sets of tweet locations are relevant to then determine the source of the tweets. The tweets could be from different devices or from the same device during two different time periods.

The team used kernel density estimation to establish a likelihood ratio approach for observing the tweets under two competing hypotheses: are the tweets from the same source or a different source?

Utilizing this second approach creates a score-based likelihood ratio that summarizes the similarity of the two sets of locations while assessing the strength of the evidence.

Decisions based on both LR and SLR approaches were compared to known ground truth to determine true and false-positive rates.

KEY TAKEAWAYS FOR PRACTITIONERS

1

Both methods show promise in being able to distinguish same-source pairs of spatial event data from different-source pairs.

2

The LR approach outperformed the SLR approach for all dataset sizes considered.

3

The behavior of both approaches can be impacted by the characteristics of the observed region and amount of evidential data available.

FOCUS ON THE FUTURE

 

In this study, time defined sets of locations gathered from Twitter. But, other methods for defining sets of locations, for example, including multiple devices over the same time period, could yield different results.

The amount of available data (the number of tweets) impacts the score-based approach.

Insights: Psychometric Analysis of Forensic Examiner Behavior

INSIGHT

Psychometric Analysis of Forensic Examiner Behavior

OVERVIEW

Understanding how fingerprint examiners’ proficiency and behavior influence their decisions when interpreting evidence requires the use of many analytical models. Researchers sought to better identify and study uncertainty in examiners’ decision making. This is because final source identifications still rely on complex and subjective interpretation of the evidence by examiners. By applying novel methods like Item Response Theory (IRT) to existing tools like error rate studies, the team proposes a new approach to account for differences among examiners and in task difficulty levels.

Lead Researchers

Amanda Luby
Anjali Mazumder
Brian Junker

Publication Date

June 13, 2020

THE GOALS

1

Survey recent advances in psychometric analysis of forensic decision-making.

2

Use behavioral models from the field of Item Response Theory to better understand the operating characteristics of the identification tasks that examiners perform.

APPROACH AND METHODOLOGY

The Data

A 2011 FBI Black Box study assigned 169 fingerprint examiners a selection of items to analyze, which included a latent print evaluation, a source destination, a reason and a rating of the difficulty of the task for each pair of prints.

Key Definitions

Psychometrics

Using factors such as aptitudes and personality traits to study the difference between individuals.

Item Response Trees (IRTrees)

Visual representation of each decision an examiner makes in the process of performing an identification task. Based on IRT, which attempts to explain the connections between the properties of a test item –– a piece of fingerprint evidence –– and an individual’s –– a fingerprint examiner’s –– performance in response to that item.

Cultural Consensus Theory (CCT)

A method that facilitates the discovery and description of consensus among a group of people with shared beliefs. For this study, CCT helps identify the common knowledge and beliefs among fingerprint examiners –– things that examiners may take for granted but that laypeople would not necessarily know.

APPLYING IRTREES AND CCT TO FINGERPRINT ANALYSIS

1

Researchers segmented the data with the Rasch Model to separate a latent print’s difficulty level from an examiners’ proficiency. This allowed comparison to the existing method of studying error rates.

2

Then they constructed IRTrees to model a fingerprint examiner’s decision-making process when deciding whether a print is a positive match, negative match, inconclusive, or has no latent value. (See Figure 1)

3

Finally, the team used IRTrees and Cultural Consensus Theory to create “answer keys” –– a set of reasons and shared knowledge –– that provide insight into how a fingerprint examiner arrives at an “inconclusive” or “no value” decision. (See Figure 2)

Figure 1

Visual representation of latent print analysis

Figure 2

How examiners arrive at “inconclusive” or “no value” decisions

KEY TAKEAWAYS FOR PRACTITIONERS

1

Using IRT models provides substantial improvement over current examiner error rate studies. These include the ability to justifiably compare examiner proficiencies
even if they did not do the same identification tasks and that the influence of task difficulty can be seen in examiner proficiency estimates.

2

IRTrees give researchers the ability to accurately model the complex decision-making in fingerprint identification tasks –– it is much more than simply stating a print is a “match” or a “non-match.” This reveals the skill involved in fingerprint examination work.

3

Examiners tend to overrate the difficulty of middling-difficulty tasks, while underrating the difficulty of extremely easy or extremely difficult tasks.

FOCUS ON THE FUTURE

 

This analysis was somewhat limited by available data; for confidentiality and privacy considerations, the FBI Black Box Study does not provide the reference prints used nor the personal details of the examiners themselves. Future collaboration with experts, both in fingerprint analysis and human decision making, can provide more detailed data and thus improve the models.

Insights: Error Rates, Likelihood Ratios, and Jury Evaluation of Forensic Evidence

INSIGHT

Error Rates, Likelhood Ratios, and Jury Evaluation of Forensic Evidence

OVERVIEW

Forensic examiner testimony regularly plays a role in criminal cases — yet little is known about the weight of testimony on jurors’ judgment.

Researchers set out to learn more: What impact does testimony that is further qualified by error rates and likelihood ratios have on jurors’ conclusions concerning fingerprint comparison evidence and a novel technique involving voice comparison evidence?

Lead Researchers

Brandon L. Garrett J.D.
William E. Crozier, Ph.D. 
Rebecca Grady, Ph.D.

Journal

Journal of Forensic Sciences

Publication Date

22 April 2020

THE HYPOTHESIS

Participants would place less weight on voice comparison testimony than they would on fingerprint testimony, due to cultural familiarity and perceptions.

Participants who heard error rate information would put less weight on forensic evidence — voting guilty less often — than participants who heard traditional and generic instructions lacking error rates.

Participants who heard likelihood ratios would place less weight on forensic expert testimony compared to testimony offering an unequivocal and categorical conclusion of an ID or match.

APPROACH AND METHODOLOGY

WHO

900 participants read a mock trial about a convenience store robbery with 1 link between defendant and the crime

WHAT

2 (Evidence: Fingerprint vs. Voice Comparison)
x 2 (Identification: Categorical or Likelihood Ratio)
x 2 (Instructions: Generic vs. Error Rate) design

HOW

Participants were randomly assigned to 1 of the 8 different conditions

After reading materials + jury instructions, participants decided whether they would vote “beyond-a-reasonable-doubt” that the defendant was guilty

KEY TAKEAWAYS FOR PRACTITIONERS

Laypeople gave more weight to fingerprint evidence than voice comparison evidence.

Fewer guilty verdicts arose from voice evidence — novel forensic evidence methods might not provide powerful evidence of guilt.

Fingerprint evidence reliability decreases when jurors learn about error rates.

Error rate information appears particularly important for types of forensic evidence that people may already assume as highly reliable.

Participants considering fingerprint evidence were more likely to find the defendant not guilty when provided instruction on error rates. When the fingerprint expert offered a likelihood ratio, the error rate instructions did not decrease guilty verdicts.

When asked to rate which is worse — wrongly convicting an innocent person or failing to convict a guilty person or both — the study found the majority of participants were concerned with convicting an innocent person.

30 %

Participants who believe convicting an innocent person was the worst offense were less likely to vote guilty due to more doubt in the evidence.

0 %

Those who had greater concern for releasing a guilty person were more likely to vote guilty.

0 %

of participants believed the errors were equally bad.

Researchers found, overall, that presenting an error rate moderated the weight of evidence only when paired with a fingerprint identification.

FOCUS ON THE FUTURE

To produce better judicial outcomes when juries are formed with laypeople:

Direct efforts toward offering more explicit judicial instructions.

Craft a better explanation of evidence limitations.

Consider the findings when developing new forensic techniques –– new techniques aren’t as trusted by a jury despite proving more reliable and lowering error rates.

Pay attention to juror preconceptions about the reliability of evidence.