Skip to content

Spatial DNA: Measuring similarity of geolocation datasets with applications to forensics

Conference/Workshop:
American Statistical Association Joint Statistical Meetings
Published: 2019
Primary Author: Christopher Galbraith
Secondary Authors: Padhraic Smyth
Research Area: Digital

Datasets consisting of geolocated events provide rich spatial characterizations of human behavior. Individuals tend to be self-consistent over time while generating such events, visiting the same locations such as home, the office, or the gym. In this paper we develop an approach to quantify similarity between sets of spatial events, drawing inspiration from the forensic evaluation of DNA evidence. A randomization-based technique is applied in which locations are sampled from conditional distributions of spatial locations (constructed via mixtures of kernel density estimates with weights derived from discrete locations). Score functions based on the distance between groups of events are then computed and used to construct coincidental match probabilities. We illustrate the approach with a large geolocation data set collected from Twitter users. Results are compared to computing the log-likelihood of one set of spatial events under a mixture-KDE from another to assess similarity. Our experimental results indicate that the proposed method can accurately assess the similarity between sets of geolocations, with potential applications in forensic and cybersecurity settings.

Related Resources

Statistical Methods for the Forensic Analysis of User-Event Data

Statistical Methods for the Forensic Analysis of User-Event Data

A common question in forensic analysis is whether two observed data sets originate from the same source or from different sources. Statistical approaches to addressing this question have been widely…
Statistical Methods for the Forensic Analysis of Geolocated Event Data

Statistical Methods for the Forensic Analysis of Geolocated Event Data

A common question in forensic analysis is whether two observed data sets originated from the same source or from different sources. Statistical approaches to addressing this question have been widely…
Statistical models to predict exposure settings using two different iPhone camera apps

Statistical models to predict exposure settings using two different iPhone camera apps

The StegoAppDB [Newman, J. (2019)] is a digital image database containing camera data from Android and iPhone mobile phones and developed for forensic purposes. Taken with a custom-designed camera app…
Statistical methods for digital image forensics: Algorithm mismatch for blind spatial steganalysis and score-based likelihood ratios for camera device identification

Statistical methods for digital image forensics: Algorithm mismatch for blind spatial steganalysis and score-based likelihood ratios for camera device identification

Forensic science currently faces a variety of challenges. Statistically suitable reference databases need to be developed and maintained. Subjective methods that can introduce bias need to be replaced by objective…