Insights: Using Mixture Models to Examine Group Differences Among Jurors

INSIGHTS

Using Mixture Models to Examine Group Differences Among Jurors:

An Illustration Involving the Perceived Strength of Forensic Science Evidence

OVERVIEW

It is critically important for jurors to be able to understand forensic evidence,
and just as important to understand how jurors perceive scientific reports.
Researchers have devised a novel approach, using statistical mixture
models, to identify subpopulations that appear to respond differently to
presentations of forensic evidence.

Lead Researchers

Naomi Kaplan-Damary
William C. Thompson
Rebecca Hofstein Grady
Hal S. Stern

Journal

Law, Probability, and Risk

Publication Date

30 January 2021

Publication Number

IN 116 IMPL

Goals

1

Use statistical models to determine if subpopulations exist among samples of mock jurors.

2

Determine if these subpopulations have clear differences in how they perceive forensic evidence.

THE THREE STUDIES

Definition:

Mixture model approach:
a probabilistic model that detects subpopulations within a study population empirically, i.e., without a priori hypotheses about their characteristics.

Results

  • Data from the three studies suggest that subpopulations exist and perceive statements differently.
  • The mixture model approach found subpopulation structures not detected by the hypothesis-driven approach.
  • One of the three studies found participants with higher numeracy tended to respond more strongly to statistical statements, while those with lower numeracy preferred more categorical statements.

higher numeracy

lower numeracy

Focus on the future

 

The existence of group differences in how evidence is perceived suggests that forensic experts need to present their findings in multiple ways. This would better address the full range of potential jurors.

These studies were limited due to relatively small number of participants. A larger study population may allow us to learn more about the nature of population heterogeneity.

In future studies, Kaplan-Damary et al. recommend a greater number of participants and the consideration of a greater number of personal characteristics.

Insights: Mock Jurors’ Evaluation of Firearm Examiner Testimony

INSIGHTS

Mock Jurors’ Evaluation of Firearm Examiner Testimony

OVERVIEW

Traditionally, firearm and toolmark experts have testified that a weapon leaves “unique” marks on bullets and casings permitting a “source identification” conclusion to be made. While scientific organizations have called this sort of categorical assertion into question, jurors still place a great deal of weight on a firearms expert’s testimony.

To examine the weight jurors place on these testimonies, researchers conducted two studies: the first evaluated if using more cautious language influenced jurors’ opinions on expert testimony, and the second measured if cross-examination altered these opinions.

Lead Researchers

Brandon L. Garrett
Nicholas Scurich
William E. Crozier

Journal

Law & Human Behavior

Publication Date

2020

Publication Number

IN 115 IMPL

Goals

The team tested four hypotheses in these studies:

1

Jurors will accord significant weight to a testimony that declares a categorical “match” between two casings.

2

Jurors’ opinions will not be changed by more cautious language in a firearms expert testimony.

3

Guilty verdicts would only be lowered by using the most cautious language (i.e., “cannot exclude the gun”).

4

Cross-examination would lower guilty verdicts depending on the specific language used.

The Studies

Study 1:

1,420 participants read a synopsis of a criminal case which included the testimony of a firearms expert. The expert gave one of seven specifically worded conclusions, ranging from a “simple match,” to a more cautious “reasonable degree of ballistic certainty,” to “cannot be excluded.”

The participants then decided whether they would convict based on the testimony.

Study 2:

1,260 participants were given the same synopsis, with two important changes:

  • The expert’s testimony had three possible conclusions (inconclusive, a conclusive match, or a cautious
    “cannot be excluded”) rather than seven.
  • Some participants also heard cross-examination of the firearms expert.

The participants again decided whether they would convict the defendant and rated the testimony’s credibility.

Results

Study 1:

Guilty Verdict

Figure 1. Proportion of guilty verdicts with 95% confidence intervals.

  • Compared to an inconclusive result, finding a “match” tripled the rate of guilty verdicts. Variations to how the “match” is described did not affect verdicts.
  • The sole exception is when the match was described as “…the defendant’s gun ‘cannot be
    excluded’ as the source.” Then the rate of guilty verdicts doubled –– instead of tripled –– compared to an inconclusive result.

Study 2:

Cross-Examination

Conclusion

Proportion of Guilty Verdicts

Figure 2. Proportion of guilty verdicts (with 95% confidence intervals) in each experimental condition.

  • Cross-examination did not help jurors to consistently discount firearms conclusions. This is consistent with prior work showing mixed effects of cross-examination on jury perceptions of strength of evidence.
  • ‘Cannot exclude’ and ‘identification’ conclusions lead to significantly more “guilty” convictions than the “inconclusive” condition.

Focus on the future

 

While it is unfortunate that using more cautious language does not affect jurors’ decisions, there is no downside to implementing it because it can prevent misleading or overstated conclusions.

Future studies should provide video testimony and discussion to better mimic a real-world trial.

The methods that firearms experts use have not been adequately tested, so jurors cannot accurately judge the strength of the evidence or the expert’s proficiency. This requires further research into the validity and reliability of firearms comparison methods.

Insights: Probabilistic Reporting in Criminal Cases in the United States

INSIGHT

Probabilistic Reporting in Criminal Cases in the United States:

A Baseline Study

OVERVIEW

Forensic examiners are frequently asked to give reports and testimonies in court and there have been calls for them to report their findings probabilistically. Terms like match, consistent with or identical are categorical in nature, not statistical –– they do not communicate the value of the evidence in terms of probability. While there is robust debate over how forensic scientists should report, less attention is paid to how they do report.

Lead Researchers

Simon A. Cole 
Matt Barno

Journal

Science & Justice

Publication Date

September 2020

Publication Number

IN 112 IMPL

Key Research Questions

1

To what extent are forensic reports in these disciplines consistent with published standards?

2

To what extent are forensic reports in these disciplines probabilistic, and, if so, how is probability expressed?

APPROACH AND METHODOLOGY

Data Set

572 transcripts and reports from Westlaw, consultants’ files and proficiency tests using a heterogeneous, opportunistic data collection approach.

What

Researchers reviewed reports across four pattern disciplines:

  • Friction Ridge Prints
  • Firearms & Toolmarks
  • Questioned Documents
  • Shoeprints

How

Using disciplinary standards as a framework, researchers determined the type of report being reviewed and if it used standard terminology. Then, they coded each report both for whether or not it was probabilistic and for the type of language used, such as “same source,” “identified” and “consistent.”

KEY TAKEAWAYS for Practitioners

Across all four disciplines, the prevailing standards for reporting were categorical in nature. The majority of reports analyzed adhered to the reporting standards for their discipline –– but discussion of probability was extremely rare and, even in those cases, frequently used to dismiss the use of probability itself.

reports used categorical terms in their reporting

reports used terms that adhered to their disciplinary standards

reports used probabilistic terms

friction ridge prints
89%
firearms & toolmarks
67%
questioned documents
50%
shoemark
87%
friction ridge prints
74%
firearms & toolmarks
100%
questioned documents
96%
shoemark
82%
friction ridge prints
11%
firearms & toolmarks
33%
questioned documents
50%
shoemark
13%

Focus on the future

 

To increase the probabilistic reporting of forensics results:

1

Incorporate probabilistic reporting into disciplinary standards.

2

Educate practitioners, lawyers, and judges on the reasons for, and importance, of probabilistic reporting.

3

Demand that experts quantify their uncertainty when testifying in court.

Insights: Juror Appraisals of Forensic Science Evidence

INSIGHT

Juror Appraisals of Forensic Science Evidence:

Effects of Proficiency and Cross-examination

OVERVIEW

Researchers conducted two studies to determine how much an examiner’s blind proficiency score affects the jurors’ confidence in their testimonies.

Lead Researchers

William E. Crozier
Jeff Kukucka
Brandon L. Garrett

Journal

Forensic Science International

Publication Date

October 2020

Publication Number

IN 111 IMPL

Key Research Questions

1

Determine how disclosing blind proficiency test results can inform a jury’s decision making.

2

Assess how using these proficiency test results in cross-examination can influence jurors.

APPROACH AND METHODOLOGY

WHO

Two separate groups (1,398 participants in Study 1, and 1,420 in Study 2) read a mock trial transcript in which a forensic examiner provided the central evidence.

What

Evidence: bitemark on a victim’s arm or a fingerprint on the robber’s gun.

Blind Proficiency Scores: the examiner either made zero mistakes in the past year (high proficiency), made six mistakes in the past year (low proficiency), claimed high proficiency without proof (high unproven proficiency), or did not discuss their proficiency at all (control).

How

Participants in both studies were asked to render a verdict, estimate the likelihood of the defendant’s guilt, and provide opinions on the examiner and the evidence.

KEY TAKEAWAYS for Practitioners

1

Stating proficiency scores did influence the participants’ verdicts. In both studies, the examiner presented as having low proficiency elicited fewer convictions than the other examiners.

2

While the high-proficiency examiner did not elicit more convictions than the control in Study 1, they not only got more convictions in Study 2, but also proved to withstand cross-examination better than the other examiners.

3

In both studies, proficiency information influenced the participants’ opinions of the examiners themselves, but not their domain’s methods or evidence.

Focus on the future

 

Despite having lower conviction rates, the low-proficiency examiners were still viewed very favorably and still achieved convictions a majority of the time in both studies (65% and 71% respectively), so fears of an examiner being “burned” by a low-proficiency score are largely overblown.

For defense lawyers to ask about proficiency results, they require access to the information. However, crime laboratories can potentially gain a significant advantage by only disclosing high-proficiency scores. Thus, it is important that such information be disclosed evenly and transparently.

Next Steps

 

The components and data of both studies are available on the Open Science Framework.

Insights: Error Rates, Likelihood Ratios, and Jury Evaluation of Forensic Evidence

INSIGHT

Error Rates, Likelhood Ratios, and Jury Evaluation of Forensic Evidence

OVERVIEW

Forensic examiner testimony regularly plays a role in criminal cases — yet little is known about the weight of testimony on jurors’ judgment.

Researchers set out to learn more: What impact does testimony that is further qualified by error rates and likelihood ratios have on jurors’ conclusions concerning fingerprint comparison evidence and a novel technique involving voice comparison evidence?

Lead Researchers

Brandon L. Garrett J.D.
William E. Crozier, Ph.D. 
Rebecca Grady, Ph.D.

Journal

Journal of Forensic Sciences

Publication Date

22 April 2020

Publication Number

IN 106 IMPL

THE HYPOTHESIS

Participants would place less weight on voice comparison testimony than they would on fingerprint testimony, due to cultural familiarity and perceptions.

Participants who heard error rate information would put less weight on forensic evidence — voting guilty less often — than participants who heard traditional and generic instructions lacking error rates.

Participants who heard likelihood ratios would place less weight on forensic expert testimony compared to testimony offering an unequivocal and categorical conclusion of an ID or match.

APPROACH AND METHODOLOGY

WHO

900 participants read a mock trial about a convenience store robbery with 1 link between defendant and the crime

WHAT

2 (Evidence: Fingerprint vs. Voice Comparison)
x 2 (Identification: Categorical or Likelihood Ratio)
x 2 (Instructions: Generic vs. Error Rate) design

HOW

Participants were randomly assigned to 1 of the 8 different conditions

After reading materials + jury instructions, participants decided whether they would vote “beyond-a-reasonable-doubt” that the defendant was guilty

KEY TAKEAWAYS FOR PRACTITIONERS

Laypeople gave more weight to fingerprint evidence than voice comparison evidence.

Fewer guilty verdicts arose from voice evidence — novel forensic evidence methods might not provide powerful evidence of guilt.

Fingerprint evidence reliability decreases when jurors learn about error rates.

Error rate information appears particularly important for types of forensic evidence that people may already assume as highly reliable.

Participants considering fingerprint evidence were more likely to find the defendant not guilty when provided instruction on error rates. When the fingerprint expert offered a likelihood ratio, the error rate instructions did not decrease guilty verdicts.

When asked to rate which is worse — wrongly convicting an innocent person or failing to convict a guilty person or both — the study found the majority of participants were concerned with convicting an innocent person.

30 %

Participants who believe convicting an innocent person was the worst offense were less likely to vote guilty due to more doubt in the evidence.

0 %

Those who had greater concern for releasing a guilty person were more likely to vote guilty.

0 %

of participants believed the errors were equally bad.

Researchers found, overall, that presenting an error rate moderated the weight of evidence only when paired with a fingerprint identification.

FOCUS ON THE FUTURE

To produce better judicial outcomes when juries are formed with laypeople:

Direct efforts toward offering more explicit judicial instructions.

Craft a better explanation of evidence limitations.

Consider the findings when developing new forensic techniques –– new techniques aren’t as trusted by a jury despite proving more reliable and lowering error rates.

Pay attention to juror preconceptions about the reliability of evidence.

Insights: What do Forensic Analysts Consider Relevant to their Decision Making?

INSIGHT

What do Forensic Analysts Consider Relevant to their Decision Making?

OVERVIEW

Forensic analysts make critical judgments that can play a crucial role in criminal investigations, so it is important that their decisions are as objective as possible. However, they often receive information that may not be relevant to their work and can subconsciously bias their analyses.
Researchers surveyed analysts from multiple forensic disciplines to see what information they consider relevant to their tasks.

Lead Researchers

Brett O. Gardner
Sharon Kelley
Daniel C. Murrie
Itiel E. Dror

Journal

Science and Justice

Publication Date

September 2019

Publication Number

IN 101 IMPL

Goals

1

Discover what information analysts consider relevant

2

Evaluate whether there is a general consensus across disciplines

3

Determine if these opinions match the National Commission of Forensic Science’s definition of task-relevance

The Study

The National Commission of Forensic Science (NCFS) defines task-relevant information as:

“Necessary for drawing conclusions: 1) about the propositions in question, 2) from the physical evidence that has been designated for examination, [and] 3) through the correct application of an accepted analytic method by a competent analyst.”

The team surveyed 183 forensic analysts among four primary forensic disciplines: Biology, Pattern Evidence, Chemistry, and Crime Scene Investigation. The survey contained 16 different types of information regarding either a case, suspect, or victim.

The analysts categorized the importance of each type of information to their specific tasks, labeling them as either:

Essential

Irrelevant

Would Review If Available

Results

1

Among four forensic science disciplines and 16 types of information (resulting in 64 total ratings for task-relevance), the analysts only reached 100% consensus three times. In fact, in 45 of 64 items, opinions between analysts directly contradicted each other.

2

However, in 36 ratings, the analysts reached a near-consensus where over 75% agreed. Pattern evidence analysts had the highest rate of consensus and crime scene investigators had the most disagreement.

3

Most analysts, apart from crime scene investigators, agreed that personal information regarding a suspect or victim was irrelevant to their tasks. This is consistent with the NCFS’s guidelines for task relevance.

4

The opinions of crime scene investigators were distinct from the other disciplines, as their task is to gather information rather than analyze it.

Focus on the future

 

While the survey contains which types of information the analysts consider relevant, it does not explain why they made these decisions.

It is important to remember that people do not always know the full reasoning behind their decision making.

Even within the same forensic disciplines, different laboratories may not have the same guidelines for what they consider relevant.

The forensic disciplines must reach a general consensus on what information is task-relevant.