INSIGHT
A Clustering Method for Graphical Handwriting Components and Statistical Writership Analysis
OVERVIEW
Researchers developed and tested a statistical algorithm for analyzing the shapes made in handwriting to determine their source. Unlike other programs that analyze what words are written, this algorithm analyzes how the words are written.
Lead Researchers
Amy M. Crawford
Nicholas S. Berry
Alicia L. Carriquiry
Journal
Statistical Analysis and Data Mining
Publication Date
August 2020
Publication Number
IN 109 HW
The Goals
1
Develop a semi-automated process to examine and compare handwriting samples from questioned and reference documents.
2
Improve upon the existing methodology for determining writership.
APPROACH AND METHODOLOGY
In this study, researchers scanned and analyzed 162 handwritten documents by 27 writers from the Computer Vision Lab database, a publicly available source of handwritten text samples, and broke down the writing into 52,541 unique graphs using the processing program handwriter. From there, a K-means algorithm clustered the graphs into 40 groups of similar graphs, each anchored by a mean or center graph. To allocate graphs to groups, researchers developed a new way to measure the distance between graphs.
Then, researchers processed an additional document from each of the 27 writers –– held back as a “questioned document” –– to test if the algorithm could accurately determine which document belonged to which writer. The new method for clustering graphs appears to be an improvement over the current approach based on adjacency grouping,
which relies only on edge connectivity of graphs.
using adjacent clustering
above 50% probability of a match on 23 documents
using dynamic K-means clustering
above 90% probability of a match on 23 documents
correctly matched 26 documents
Key Definitions
Graphs
Simple structures with nodes and edges to represent shapes that constitute handwriting
Writership
The set of graphs a person typically makes when writing
K-means Algorithm
An iterative algorithm that separates data points into clusters based on nearest mean values
KEY TAKEAWAYS FOR PRACTITIONERS
1
The new approach shows promise, as it allows practitioners to more objectively analyze
handwriting by studying the way letters and words are formed.
2
When compared to the more readily available but more volatile adjacency grouping
method, the K-means clustering method contributed to greater accuracy when trying to
identify the writer of a questioned document from among a closed set of potential writers.
FOCUS ON THE FUTURE
The new method favors certain properties of handwriting over others to assess similarities and can be extended to incorporate additional features.
The mean of a group of graphs is often a shape that does not actually occur in the document. Instead of centering groups using a mean graph, researchers are exploring whether using an exemplar graph as a group’s anchor will simplify calculations.