Datasets consisting of geolocated events provide rich spatial characterizations of human behavior. Individuals tend to be self-consistent over time while generating such events, visiting the same locations such as home, the office, or the gym. In this paper we develop an approach to quantify similarity between sets of spatial events, drawing inspiration from the forensic evaluation of DNA evidence. A randomization-based technique is applied in which locations are sampled from conditional distributions of spatial locations (constructed via mixtures of kernel density estimates with weights derived from discrete locations). Score functions based on the distance between groups of events are then computed and used to construct coincidental match probabilities. We illustrate the approach with a large geolocation data set collected from Twitter users. Results are compared to computing the log-likelihood of one set of spatial events under a mixture-KDE from another to assess similarity. Our experimental results indicate that the proposed method can accurately assess the similarity between sets of geolocations, with potential applications in forensic and cybersecurity settings.
Spatial DNA: Measuring similarity of geolocation datasets with applications to forensics

Conference/Workshop:
American Statistical Association Joint Statistical Meetings
American Statistical Association Joint Statistical Meetings
Published: 2019
Primary Author: Christopher Galbraith
Secondary Authors: Padhraic Smyth
Type: Presentation Slides
Research Area: Digital
Related Resources
LibDroid: Summarizing information flow of Android Native Libraries via Static Analysis
With advancements in technology, people are taking advantage of mobile devices to access e-mails, search the web, and video chat. Therefore, extracting evidence from mobile phones is an important component…
Evaluating Reference Sets for Score-Based Likelihood Ratios for Camera Device Identification
An investigator wants to know if an illicit image captured by an unknown camera was taken by a person of interest’s (POI’s) phone. Score-based likelihood ratios (SLRs) have been used…
Likelihood Ratios for Categorical Evidence with Applications to Digital Forensics
In forensic investigations, the goal of evidence evaluation is often to address source-/identity-based questions in which the evidence consists of two sets of observations: one from an unknown source tied…
Automatic Detection of Android Steganography Apps via Symbolic Execution and Tree Matching
The recent focus of cyber security on automated detection of malware for Android apps has omitted the study of some apps used for “legitimate” purposes, such as steganography apps. Mobile…