Insights: Handwriting Identification Using Random Forests and Score-Based Likelihood Ratios

INSIGHTs

Handwriting Identification Using Random Forests and Score-Based Likelihood Ratios

OVERVIEW

Handwriting analysis has long been a largely subjective field of study, relying on visual inspections from trained examiners to determine if questioned documents come from the same source. In recent years, however, efforts have been made to develop methods and software which quantify the similarity between writing samples more objectively. Researchers funded by CSAFE developed and tested a new statistical method for handwriting recognition, using a score-based likelihood
ratio (SLR) system to determine the evidential value.

Lead Researchers

Madeline Quinn Johnson
Danica M. Ommen

Journal

Statistical Analysis and Data Mining

Publication Date

03 December 2021

The Goals

1

Apply the SLR system to various handwritten documents.

2

Evaluate the system’s performance with various approaches to the data.

The Study

CSAFE collected handwriting samples from 90 participants, using prompts of various lengths to get samples of different sizes. These writing samples were broken down into graphs, or writing segments with nodes and connecting edges, then grouped into clusters for comparison.

When comparing the gathered samples, Johnson and Ommen considered two possible scenarios:

Common Source Scenario:

two questioned documents with unknown writers are compared to determine whether they come from the same source.

Specific Source Scenario:

a questioned document is compared to a prepared sample from a known writer.

They then used Score-based Likelihood Ratios (SLRs) to approximate the weight of the evidence in both types of scenarios.

The researchers used three different approaches when generating the known non-matching comparisons for the specific source SLRs:

Trace-Anchored Approach:

only uses comparisons between the questioned document (the trace) and a collection of writers different from the specific source (the background population).

Source-Anchored Approach:

only uses comparisons between writing from the specific source and the background population.

General-Match Approach:

only uses comparisons between samples from different writers in the background population.

Once the SLRs for each scenario were generated, they used random forest algorithms to determine comparison scores, including a pre-trained random forest using all of the gathered data, and one trained according to the relevant SLR.

Results

1

In common source scenarios, the trained random forest performed well with longer writing samples, but struggled with shorter ones.

2

The specific source SLRs performed better than the common source SLRs because they are tailored to the case at hand.

3

In all scenarios, it was more difficult for the SLR system to confirm samples with the same source than with different sources.

FOCUS ON THE FUTURE

 

The SLRs do not perform well with short documents, possibly due to a mismatch between the number of clusters used and the length of the document. Future work could determine the optimal number of clusters based on the document’s length.

Because the SLRs provide data on the strength of forensic handwriting evidence for an open-set of sources, this approach is an improvement on the previous clustering method developed by CSAFE, which used a closed set of known sources.

Insights: A Clustering Method for Graphical Handwriting Components and Statistical Writership Analysis

INSIGHT

A Clustering Method for Graphical Handwriting Components and Statistical Writership Analysis

OVERVIEW

Researchers developed and tested a statistical algorithm for analyzing the shapes made in handwriting to determine their source. Unlike other programs that analyze what words are written, this algorithm analyzes how the words are written.

Lead Researchers

Amy M. Crawford
Nicholas S. Berry
Alicia L. Carriquiry

Journal

Statistical Analysis and Data Mining

Publication Date

August 2020

The Goals

1

Develop a semi-automated process to examine and compare handwriting samples from questioned and reference documents.

2

Improve upon the existing methodology for determining writership.

APPROACH AND METHODOLOGY

In this study, researchers scanned and analyzed 162 handwritten documents by 27 writers from the Computer Vision Lab database, a publicly available source of handwritten text samples, and broke down the writing into 52,541 unique graphs using the processing program handwriter. From there, a K-means algorithm clustered the graphs into 40 groups of similar graphs, each anchored by a mean or center graph. To allocate graphs to groups, researchers developed a new way to measure the distance between graphs.

Then, researchers processed an additional document from each of the 27 writers –– held back as a “questioned document” –– to test if the algorithm could accurately determine which document belonged to which writer. The new method for clustering graphs appears to be an improvement over the current approach based on adjacency grouping,
which relies only on edge connectivity of graphs.

using adjacent clustering

above 50% probability of a match on 23 documents

using dynamic K-means clustering

above 90% probability of a match on 23 documents

correctly matched 26 documents

Key Definitions

Graphs

Simple structures with nodes and edges to represent shapes that constitute handwriting

Writership

The set of graphs a person typically makes when writing

K-means Algorithm

An iterative algorithm that separates data points into clusters based on nearest mean values

KEY TAKEAWAYS FOR PRACTITIONERS

1

The new approach shows promise, as it allows practitioners to more objectively analyze
handwriting by studying the way letters and words are formed.

2

When compared to the more readily available but more volatile adjacency grouping
method, the K-means clustering method contributed to greater accuracy when trying to
identify the writer of a questioned document from among a closed set of potential writers.

FOCUS ON THE FUTURE

 

The new method favors certain properties of handwriting over others to assess similarities and can be extended to incorporate additional features.

The mean of a group of graphs is often a shape that does not actually occur in the document. Instead of centering groups using a mean graph, researchers are exploring whether using an exemplar graph as a group’s anchor will simplify calculations.

Next Steps

Handwriter

Explore and try the handwriter algorithm by downloading it

CSAFE Handwriting Database

Investigate publicly available databases of handwritten documents

Computer Vision Lab database

Investigate publicly available databases of handwritten documents