# A Clustering Method for Graphical Handwriting Components and Statistical Writership Analysis

## OVERVIEW

Researchers developed and tested a statistical algorithm for analyzing the shapes made in handwriting to determine their source. Unlike other programs that analyze what words are written, this algorithm analyzes how the words are written.

Amy M. Crawford
Nicholas S. Berry
Alicia L. Carriquiry

###### Journal

Statistical Analysis and Data Mining

August 2020

## The Goals

### 1

Develop a semi-automated process to examine and compare handwriting samples from questioned and reference documents.

### 2

Improve upon the existing methodology for determining writership.

## APPROACH AND METHODOLOGY

In this study, researchers scanned and analyzed 162 handwritten documents by 27 writers from the Computer Vision Lab database, a publicly available source of handwritten text samples, and broke down the writing into 52,541 unique graphs using the processing program handwriter. From there, a K-means algorithm clustered the graphs into 40 groups of similar graphs, each anchored by a mean or center graph. To allocate graphs to groups, researchers developed a new way to measure the distance between graphs.

Then, researchers processed an additional document from each of the 27 writers –– held back as a “questioned document” –– to test if the algorithm could accurately determine which document belonged to which writer. The new method for clustering graphs appears to be an improvement over the current approach based on adjacency grouping,
which relies only on edge connectivity of graphs.

above 50% probability of a match on 23 documents

### using dynamic K-means clustering

above 90% probability of a match on 23 documents

correctly matched 26 documents

### Key Definitions

#### Graphs

Simple structures with nodes and edges to represent shapes that constitute handwriting

#### Writership

The set of graphs a person typically makes when writing

#### K-means Algorithm

An iterative algorithm that separates data points into clusters based on nearest mean values

## FOCUS ON THE FUTURE

The new method favors certain properties of handwriting over others to assess similarities and can be extended to incorporate additional features.

The mean of a group of graphs is often a shape that does not actually occur in the document. Instead of centering groups using a mean graph, researchers are exploring whether using an exemplar graph as a group’s anchor will simplify calculations.