Skip to content

New Study Uses a Score-based Likelihood Ratio System for Handwriting Identification

Researchers from the Center for Statistics and Applications in Forensic Evidence (CSAFE) developed and tested a statistical method for handwriting identification, using a score-based likelihood ratio system to determine the weight of the evidence.

The study, published in Statistical Analysis and Data Mining, was led by Danica Ommen, an assistant professor of statistics at Iowa State University, and Madeline Johnson, a former Iowa State graduate student.

Ommen and Johnson set out to extend the scope of CSAFE handwriting research. Earlier work performed by CSAFE researchers focused on estimating the probability that two handwritten documents have the same source and only considered a closed set case, which requires all of the potential sources of the handwriting to be known.

Figure 1 from the journal article shows one of the clusters extracted from a writing sample using CSAFE’s R package, handwriter. A cluster is characterized by an exemplar graph, shown in red.
Click on image to enlarge. Figure 1 from the journal article shows one of the clusters extracted from a writing sample using CSAFE’s R package, handwriter. A cluster is characterized by an exemplar graph, shown in red.

Ommen and Johnson instead focused on an open set case, where the writer of the documents can be anyone in a defined population subgroup. Specifically, their objective was to estimate the probability that two samples were authored by the same or by a different writer without the requirement of knowing all of the potential sources.

They used 90 handwriting samples of various lengths from CSAFE’s handwriting database. The writing was broken down into small graphical structures, called graphs, by CSAFE’s publicly available R package, handwriter, and grouped into clusters for comparison.

Ommen and Johnson considered two possible scenarios when comparing the samples: common source and specific source. They explained that a common source scenario is where two questioned documents with unknown writers are compared to determine whether they came from the same source. And a specific source scenario is where a questioned document is compared to a prepared sample from a known writer.

They then used score-based likelihood ratios to approximate the weight of evidence in both types of scenarios. They used three different approaches for the specific source scenario when generating the known non-matching comparisons.

For each of the scenarios considered, Ommen and Johnson used random forest algorithms to determine the comparison scores, including a pre-trained random forest using all of the gathered data and one trained according to the relevant forensic scenario and associated known writer.

They found that the trained random forest performed well with longer writing samples in the common source scenarios but struggled with shorter ones.

“This is not unexpected because the longer documents contain more writing which we would expect to provide more information about a writer’s writing style and be the most representative of the proportion of graphs per cluster compared to shorter documents,” Ommen and Johnson explained in the study.

The specific source score-based likelihood ratios performed better than the common source score-based likelihood ratios because they were tailored to the case at hand. In all scenarios, it was more difficult for the score-based likelihood ratio system to confirm samples with the same source than with different sources.

Ommen and Johnson concluded that while each score-based likelihood ratio performed well, deciding which one is appropriate depends on the question of interest and available evidence.

“Common source SLRs should be applied when there are two questioned documents, and the goal is to quantify the probability of them being from the same writer against the probability of them being from different writers, without specifying which writer this is,” they write.

On the other hand, Ommen and Johnson said that the specific source score-based likelihood ratios should be used when there is one questioned document, a known suspect who wrote a control document, and a database of handwriting from writers in a relevant population.

“In all three specific source approaches, the goal is to quantify the probability of the questioned document being written by the suspect against the probability that the questioned document was written by someone else,” they write.

Download the journal article and read the insights at https://forensicstats.org/blog/2022/01/10/insights-handwriting-identification-using-random-forests-and-score-based-likelihood-ratios/.

CSAFE director Alicia Carriquiry will give an update on CSAFE handwriting analysis and demonstrate handwriter in an upcoming webinar. The webinar will be held Tuesday, Oct. 18 from 11 a.m.–noon CT. For more information and to register, visit https://forensicstats.org/event/webinar-handwriter-a-demonstration-and-update-on-csafe-handwriting-analysis/.

FROM THE BLOG