Skip to content

CSAFE Study Explores Clustering Method for Handwriting Components and Writership Analysis

Writing CSAFE
Three graph whose assignment will contribute to characterizing a writer’s style.D is deterministic group assignment.C is cluster group assignment.
Three graph whose assignment will contribute to characterizing a writer’s style.D is deterministic group assignment.C is cluster group assignment.

Recently published research from Center for Statistics and Applications in Forensic Evidence (CSAFE) researchers details a new approach for analyzing the shapes made in handwriting to determine their source, a development that one day may be useful in forensic practice.

The study, published in Statistical Analysis and Data Mining, scanned and analyzed handwritten documents from a variety of writers in the Computer Vision Lab (CVL) database. The writing was segmented into small graphical structures, called graphs, using CSAFE’s publicly available handwriter R algorithm package. These graph structures will be the basis for evaluating the structural styles of a writer.

Next, the graphs were grouped using measures based on the similarity of structural attributes. Alicia Carriquiry, director of CSAFE and a co-author of the study, said they developed an approach that allowed them to control the number of groupings, each of which will hold graphs that may not be identical in connectivity but have a similar dominating structure.

“We developed a dynamic and flexible K-means-based clustering method that is tolerant to incidental pen strokes,” said Carriquiry. “This allows graphs with similar, but not identical, graphical structures to be placed in the same group.”

She said their clustering algorithm relies on a novel distance measure and a mean calculation they developed and designed specifically for graphs that represent handwriting. Using a subset of the CVL handwriting database, they applied their clustering method on a set of documents with known origins to perform a writership analysis.

The word “csafe” taken from a processed document. (A) Binarized pen strokes in gray, pixel-wide skeleton overlaid in mblack. Red dots denote graph breakpoints of the connected writing. (B–G) Each graph shown separately where red dots show nodes (endpoints and intersections).
The word “csafe” taken from a processed document. (A) Binarized pen strokes in gray, pixel-wide skeleton overlaid in mblack. Red dots denote graph breakpoints of the connected writing. (B–G) Each graph shown separately where red dots show nodes (endpoints and intersections).

“The clusters for graphs we created from the CVL handwriting samples show that writers can be characterized and identified by examining the frequency with which they emit graphs to the various clusters,” Carriquiry said.

They also processed additional writing samples that were held back as “questioned documents” to test whether the algorithm could accurately determine which document belonged to which writer. Carriquiry and her team discovered that their new method for clustering graphs appears to be an improvement over the current clustering approach.

Carriquiry said they are continuing to test the new method before it is ready for real handwriting analysis cases. “Not only could the clustering method be useful in forensic practice, but could be applicable in any other field where the data come in the form of graphical objects,” she said.

Contributing to the study were former Iowa State University doctoral students Amy Crawford and Nicholas Berry. Both are now statistical scientists at Berry Consultants LLC.

View and download the full journal article at https://lib.dr.iastate.edu/csafe_pubs/77/.

Watch the recording from the Handwriting Analysis at CSAFE webinar presented by Alicia Carriquiry on March 11, 2021.

Insights: A Clustering Method for Graphical Handwriting Components and Statistical Writership Analysis

Edge distance measure calculation companion figures.
Edge distance measure calculation companion figures.

FROM THE BLOG