Category: Source Code

Algorithmic Evidence in Criminal Trials

Guest Blog

Kori Khan
Assistant Professor
Department of Statistics, Iowa State University

We are currently in an era where machine learning and algorithms offer novel approaches to solving problems both new and old. Algorithmic approaches are swiftly being adopted for a range of issues: from making hiring decisions for private companies to sentencing criminal defendants. At the same time, researchers and legislators are struggling with how to evaluate and regulate such approaches.

The regulation of algorithmic output becomes simultaneously more complex and pressing in the context of the American criminal justice system. U.S. courts are regularly admitting evidence generated from algorithms in criminal cases. This is perhaps unsurprising given the permissive standards for admission of evidence in American criminal trials. Once admitted, however, the algorithms used to generate the evidence—which are often proprietary or designed for litigation—present a unique challenge. Attorneys and judges face questions about how to evaluate algorithmic output when a person’s liberty hangs in the balance. Devising answers to these questions inevitably involves delving into an increasingly contentious issue—access to the source code.

In criminal courts across the country, it appears most criminal defendants have been denied access to the source code of algorithms used to produce evidence against them. I write, “it appears,” because here, like in most areas of the law, empirical research into legal trends is limited to case studies or observations about cases that have drawn media attention. For these cases, the reasons for denying a criminal defendant access to the source code have not been consistent. Some decisions have pointed out that the prosecution does not own the source code, and therefore is not required to produce it. Others implicitly acknowledge that the prosecution could be required to produce the source code and instead find that the defendant has not shown a need for access to the source code. It is worth emphasizing that these decisions have not found that the defendant does not need access to source code; but rather, that the defendant has failed to sufficiently establish that need. The underlying message in many of these decisions, whether implicit or explicit, is that there will be cases, perhaps quite similar to the case being considered, where a defendant will require access to source code to mount an effective defense. The question of how to handle access to the code in such cases does not have a clear answer.

Legal scholars are scrambling to provide guidance. Loosely speaking, proposals can be categorized into two groups: those that rely on existing legal frameworks and those that suggest a new framework might be necessary. For the former category, the heart of the issue is the tension between the intellectual property rights of the algorithm’s producer and the defendant’s constitutional rights. On the one hand, the producers of algorithms often have a commercial interest in ensuring that competitors do not have access to the source code. On the other hand, criminal defendants have the right to question the weight of the evidence presented in court.

There is a range of opinions on how to balance these competing interests. These opinions run along a spectrum of always allowing defendants access to source code to rarely allowing defendants access to the code. However, most fall somewhere in the middle. Some have suggested “front-end” measures in which lawmakers establish protocols to ensure the accuracy of algorithmic output before their use in criminal courts. These measures might include an escrowing of the source code, similar to how some states have handled voting technology. Within the courtroom, suggestions for protecting the producers of code include utilizing traditional measures, such as the protective orders commonly used in trade secret suits. Other scholars have proposed a defendant might not always need access to source code. For example, some suggest that if the producer of the algorithm is willing to run tests constructed by the defense team, this may be sufficient in many cases. Most of these suggestions make two key assumptions: 1) either legislators or defense attorneys should be able to devise standards to identify the cases for which access to source code is necessary to evaluate an algorithm and 2) legislators or defense attorneys can devise these standards without access to the source code themselves.

These assumptions require legislators and defense attorneys to answer questions that the scientific community itself cannot answer. Outside of the legal setting, researchers are faced with a similar problem: how can we evaluate scientific findings that rely on computational research? For the scientific community, the answer for the moment is that we are not sure. There is evidence that the traditional methods of peer review are inadequate. In response, academic journals and institutes have begun to require that researchers share their source code and any relevant data. This is increasingly viewed as a minimal standard to begin to evaluate computational research, including algorithmic approaches. However, just as within the legal community, the scientific community has no clear answers for how to handle privacy or proprietary interests in the evaluation process.

In the past, forensic science methods used in criminal trials have largely been developed and evaluated outside the purview of the larger scientific community, often on a case-by-case basis. As both the legal and scientific communities face the challenge of regulating algorithms, there is an opportunity to expand existing interdisciplinary forums and create new ones.

Learn about source code in criminal trials by attending the Source Code on Trial Symposium on March 12 at 2:30 to 4 p.m. Register at https://forensicstats.org/source-code-on-trial-symposium/.

Publications and Websites Used in This Blog:

How AI Can Remove Bias From The Hiring Process And Promote Diversity And Inclusion

Equivant, Northpoint Suite Risk Need Assessments

The Case for Open Computer Programs

Using AI to Make Hiring Decisions? Prepare for EEOC Scrutiny

Source Code, Wikipedia

The People of the State of New York Against Donsha Carter, Defendant

Commonwealth of Pennsylvania Versus Jake Knight, Appellant

The New Forensics: Criminal Justice, False Certainty, and the Second Generation of Scientific Evidence

Convicted by Code

Machine Testimony

Elections Code, California Legislative Information

Trade Secret Policy, United States Patent and Trademark Office

Computer Source Code: A Source of the Growing Controversy Over the Reliability of Automated Forensic Techniques

Artificial Intelligence Faces Reproducibility Crisis

Author Guidelines, Journal of the American Statistical Association

Reproducible Research in Computational Science

Closed Source Forensic Software: Confronting the Evidence?

There is a persistent underlying flaw in the criminal justice system, stemming from unvalidated forensic science cloaked in intellectual property. Not only does this pose a risk when forensic evidence is a key factor in criminal convictions, but it also reveals how confidential forensic technology could violate defendants’ constitutional rights.

Forensic analysis software, used to generate evidence in criminal trial proceedings, frequently contains closed source code. Such proprietary software prevents the scientific community, the public, juries, attorneys, and defendants from accessing the fundamental methods — or potential errors therein — that can ultimately influence verdicts. This creates a pathway for individuals to be wrongly convicted as a result of jurors being swayed by flawed evidence disguised as good science.

An excellent example is the case of United States v. Ellis, in which DNA was the key evidence used against the defendant accused of illegal firearm possession. The police forensic lab found the DNA analysis inconclusive, prompting further analysis by third-party-owned software. With multiple hypotheses and test variations run on the sample, the prosecution relied on the result of one particular analysis based on the assumption that the defendant was one of four possible contributors to the DNA sample.

When Mr. Ellis’ attorney requested access to the source code, “…the government refused to disclose it, arguing that the information is protected by trade secrets.”

In response, the Electronic Frontier Foundation (EFF) and American Civil Liberties Union of Pennsylvania filed an amicus with the United States District Court of the Western District of Pennsylvania, outlining the inconsistency between closed source code, the defendants’ Sixth Amendment rights, and the right of the public to oversee the criminal trial.

“Source code, and other aspects of forensic software programs used in a criminal prosecution, must be disclosed in order to ensure that innocent people do not end up behind bars,” said the EFF. “Or worse — on death row.”

While it is understandable that developers of forensic software wish to protect their intellectual property, it raises a fundamental question: should IP be protected at the expense of civil rights? To protect the innocent, maintain public oversight, and ensure the advancement of forensic science practices, the curtain must be pulled back on protected methodologies. Arguably, the benefits of doing so would lead to fairer trials and greater trust in the scientific tools utilized within the criminal justice system.

Click here to learn more about CSAFE’s commitment to open source tools.

Open Source Software: Applications in Forensics and The Courtroom

This is an invited blog post from Richard Torres, an attorney at the Legal Aid Society in New York City. Guest blog posts do not necessarily reflect the views of CSAFE. CSAFE is highlighting this topic due our team’s commitment to the development of open source software and its role in increasing the fair administration of justice.

There is no national oversight body to protect our courts from bad forensic science. The scientific community is the only gatekeeper between questionable forensic practices and the courts, but scientists are not able to examine what they cannot see.

Forensic science software responsible for analyzing evidence is often built with closed source code. This prevents the public, scientists and ultimately, the legal system from being able to review the methods (or code) used to derive scientific conclusions – conclusions that are of utmost importance when someone’s freedom, and in some cases life, is on the line.

Software tools with open source, publically available code allow the scientific and legal communities to conduct a fair review of the processes and steps used to evaluate evidence. The case study below demonstrates the need for greater use of and access to open source software within the criminal justice system.

The Necessity of Open Source Code: A Case Study

The New York City Office of the Chief Medical Examiner (OCME) was one of the first crime laboratories to implement probabilistic genotyping. Probabilistic genotyping is a recent approach for interpreting complex DNA mixtures, many otherwise uninterpretable, and assigning them a statistical weight. It seems that most forensic labs are heading in this direction.

OCME was early in the probabilistic genotyping game when they created a program called the Forensic Statistical Tool (FST). The OCME, with the backing of the five New York City District Attorneys, fought hard to keep the source code for FST behind a tightly closed door even though it was created using tax payers’ dollars.

FST was brought online around 2011. For over six years, OCME lab analysts would testify under oath as to FST results but were unable to explain how FST got its answer. Most judges and jurors did not appear particularly concerned – people trust computers.

My office, The Legal Aid Society, repeatedly sought access to the FST source code in cases where our clients were facing years, even decades in prison. We need to know if faulty forensic science is being used to imprison our clients. Unfortunately, in state court, judges simply would not force OCME to share their code.

A lawyer in our office, Clinton Hughes, decided to build an open source program using the same exact calculations as FST called reQBT. He recruited college interns with math and computer science backgrounds to assist with the finer details – we are lawyers, not scientists. He ran reQBT on FST’s validation study data and, in many cases, reQBT got the same answers. Clint’s team kept reviewing reQBT’s code and could not figure out why reQBT was getting different answers in some cases. We had growing concerns over whether the problem was with reQBT. Was FST’s computer code implementing OCME’s biological models reliably?

In late 2016, as FST was being replaced with a newer program called STRmix, Chris Flood and Sylvie Levine from the New York Federal Defenders’ Office were able to convince Federal Judge Valerie Caproni to order OCME to provide the defense with the FST source code. The Federal Defenders hired Nathan Adams of Forensic Bioinformatics to perform a code review – this was the first time a defense expert reviewed the FST source code. Our suspicion that FST was not properly implementing OCME’s models was confirmed. Adams found that FST performed its calculations differently than what OCME said and it affected FST’s results. There was a catch. Judge Caproni signed a court order preventing Adams from disclosing the specifics of the code problems. FST remained mostly in the dark.

OCME did not immediately respond to Adams alleging in federal court that FST was performing different calculations from what OCME claimed. There were no letters to defense attorneys stating that there may be a problem with FST. It appears that the journals that published the FST studies were not notified either. Were state prosecutors notified?

We renewed our requests to state court judges to order the source code given what Adams found. Ultimately, OCME employees conceded that there was a code change that affected FST’s calculations – possibly a different code change from the one found by Adams. Yet, there was no full scale validation study to establish how well the calculation changes worked.

The Legal Aid Society and Federal Defenders lodged a complaint to the New York State Inspector General requesting an investigation into the changes to FST’s code alongside other concerns about OCME. FST was used in over a thousand cases. Will they need to be reopened? Are people in prison based on faulty forensics?

National press recently reported on our complaint to the Inspector General. At that point, OCME did an about face as to open source. They now say they will share the FST source code.

Applying Lessons Learned to All Forensic Science Disciplines

This problem is not unique to DNA software. Access to open source software is critical to the analysis of pattern and digital evidence. Eliminating bias and error in forensic science is only possible through the total transparency open source code allows. Forensic scientists and researchers responsible for developing these tools must be committed to making their methods publicly available. Lawyers and judges need this to be able to ensure the reliability of the adversarial trial process to better ensure that innocent people are not sitting in jail.

More information about the mission of CSAFE can be found on our homepage. Learn more the impact the Legal Aid Society is making on the criminal justice system on their website.