Webinar: handwriter: A Demonstration and Update on CSAFE Handwriting Analysis

CSAFE invites researchers, collaborators, and members of the broader forensics and statistics communities to participate in our Summer 2022 Webinar Series on Tuesday, October 18, 2022, from 11:00am-Noon CT. The presentation will be “handwriter: A Demonstration and Update on CSAFE Handwriting Analysis.”

Presenter:
Alicia Carriquiry
Director, Center for Statistics and Applications in Forensic Evidence (CSAFE)
Distinguished Professor and President’s Chair, Department of Statistics – Iowa State University

Presentation Description:

Forensic handwriting analysis relies on the principle of individuality: no two writers produce identical writing, and given enough quality and quantity of writing, it is possible to infer whether two documents were written by the same person. Forensic handwriting analysis is carried out by examiners trained to find subtle differences and similarities between a questioned document and a reference sample. Examiners visually compare samples and offer an opinion regarding the source of the questioned document. 

In the last 10-15 years, researchers have proposed algorithmic tools to complement examiners’ visual assessments. A well-known software system called FLASH ID (Sciometrics, LLC) first decomposes the image of a questioned sample into structures called graphemes and then characterizes them by their topology and shape. Given a closed set of reference samples, the software computes a score that quantifies the similarity between the questioned document and the references. FLASH ID has been extensively tested and has been shown to exhibit very good accuracy. 

At CSAFE, we are working on semi-automated methods suitable for closed or for open sets of reference writers, and for the examination of samples at the level of words or at the level of graphical structures similar, but not identical to graphemes. In the webinar we will describe each of the different methods, and show initial but promising results. When the reference set of writers is closed we use a Bayesian approach that outputs a probability of writership for each writer in the set. Because the output is an estimated probability, the interpretation of results is straightforward. We are still developing and testing the word-based approach and the more algorithmic approach that can be used when the set of potential writers is open, but can show some initial results and our plans for future developments. We will demonstrate our software to implement these methods: handwriter. The program is not yet fully functional, but an early version is in the public domain and can be freely accessed at https://github.com/CSAFE-ISU/handwriter. 

 

Webinars are free and open to the public, but researchers, collaborators and members of the broader forensics and statistics communities are encouraged to attend. Space is only guaranteed for the first 150 people registered. Each 60-minute webinar will allow for discussion and questions.

Sign up on the form below (Chrome & Safari web browsers work the best):

Insights: Handwriting Identification Using Random Forests and Score-Based Likelihood Ratios

INSIGHTs

Handwriting Identification Using Random Forests and Score-Based Likelihood Ratios

OVERVIEW

Handwriting analysis has long been a largely subjective field of study, relying on visual inspections from trained examiners to determine if questioned documents come from the same source. In recent years, however, efforts have been made to develop methods and software which quantify the similarity between writing samples more objectively. Researchers funded by CSAFE developed and tested a new statistical method for handwriting recognition, using a score-based likelihood
ratio (SLR) system to determine the evidential value.

Lead Researchers

Madeline Quinn Johnson
Danica M. Ommen

Journal

Statistical Analysis and Data Mining

Publication Date

03 December 2021

Publication Number

IN 124 HW

The Goals

1

Apply the SLR system to various handwritten documents.

2

Evaluate the system’s performance with various approaches to the data.

The Study

CSAFE collected handwriting samples from 90 participants, using prompts of various lengths to get samples of different sizes. These writing samples were broken down into graphs, or writing segments with nodes and connecting edges, then grouped into clusters for comparison.

When comparing the gathered samples, Johnson and Ommen considered two possible scenarios:

Common Source Scenario:

two questioned documents with unknown writers are compared to determine whether they come from the same source.

Specific Source Scenario:

a questioned document is compared to a prepared sample from a known writer.

They then used Score-based Likelihood Ratios (SLRs) to approximate the weight of the evidence in both types of scenarios.

The researchers used three different approaches when generating the known non-matching comparisons for the specific source SLRs:

Trace-Anchored Approach:

only uses comparisons between the questioned document (the trace) and a collection of writers different from the specific source (the background population).

Source-Anchored Approach:

only uses comparisons between writing from the specific source and the background population.

General-Match Approach:

only uses comparisons between samples from different writers in the background population.

Once the SLRs for each scenario were generated, they used random forest algorithms to determine comparison scores, including a pre-trained random forest using all of the gathered data, and one trained according to the relevant SLR.

Results

1

In common source scenarios, the trained random forest performed well with longer writing samples, but struggled with shorter ones.

2

The specific source SLRs performed better than the common source SLRs because they are tailored to the case at hand.

3

In all scenarios, it was more difficult for the SLR system to confirm samples with the same source than with different sources.

FOCUS ON THE FUTURE

 

The SLRs do not perform well with short documents, possibly due to a mismatch between the number of clusters used and the length of the document. Future work could determine the optimal number of clusters based on the document’s length.

Because the SLRs provide data on the strength of forensic handwriting evidence for an open-set of sources, this approach is an improvement on the previous clustering method developed by CSAFE, which used a closed set of known sources.

Insights: A Clustering Method for Graphical Handwriting Components and Statistical Writership Analysis

INSIGHT

A Clustering Method for Graphical Handwriting Components and Statistical Writership Analysis

OVERVIEW

Researchers developed and tested a statistical algorithm for analyzing the shapes made in handwriting to determine their source. Unlike other programs that analyze what words are written, this algorithm analyzes how the words are written.

Lead Researchers

Amy M. Crawford
Nicholas S. Berry
Alicia L. Carriquiry

Journal

Statistical Analysis and Data Mining

Publication Date

August 2020

Publication Number

IN 109 HW

The Goals

1

Develop a semi-automated process to examine and compare handwriting samples from questioned and reference documents.

2

Improve upon the existing methodology for determining writership.

APPROACH AND METHODOLOGY

In this study, researchers scanned and analyzed 162 handwritten documents by 27 writers from the Computer Vision Lab database, a publicly available source of handwritten text samples, and broke down the writing into 52,541 unique graphs using the processing program handwriter. From there, a K-means algorithm clustered the graphs into 40 groups of similar graphs, each anchored by a mean or center graph. To allocate graphs to groups, researchers developed a new way to measure the distance between graphs.

Then, researchers processed an additional document from each of the 27 writers –– held back as a “questioned document” –– to test if the algorithm could accurately determine which document belonged to which writer. The new method for clustering graphs appears to be an improvement over the current approach based on adjacency grouping,
which relies only on edge connectivity of graphs.

using adjacent clustering

above 50% probability of a match on 23 documents

using dynamic K-means clustering

above 90% probability of a match on 23 documents

correctly matched 26 documents

Key Definitions

Graphs

Simple structures with nodes and edges to represent shapes that constitute handwriting

Writership

The set of graphs a person typically makes when writing

K-means Algorithm

An iterative algorithm that separates data points into clusters based on nearest mean values

KEY TAKEAWAYS FOR PRACTITIONERS

1

The new approach shows promise, as it allows practitioners to more objectively analyze
handwriting by studying the way letters and words are formed.

2

When compared to the more readily available but more volatile adjacency grouping
method, the K-means clustering method contributed to greater accuracy when trying to
identify the writer of a questioned document from among a closed set of potential writers.

FOCUS ON THE FUTURE

 

The new method favors certain properties of handwriting over others to assess similarities and can be extended to incorporate additional features.

The mean of a group of graphs is often a shape that does not actually occur in the document. Instead of centering groups using a mean graph, researchers are exploring whether using an exemplar graph as a group’s anchor will simplify calculations.

Next Steps

Handwriter

Explore and try the handwriter algorithm by downloading it

CSAFE Handwriting Database

Investigate publicly available databases of handwritten documents

Computer Vision Lab database

Investigate publicly available databases of handwritten documents