Skip to content

Spatial DNA: Measuring similarity of geolocation datasets with applications to forensics

Conference/Workshop:
American Statistical Association Joint Statistical Meetings
Published: 2019
Primary Author: Christopher Galbraith
Secondary Authors: Padhraic Smyth
Research Area: Digital

Datasets consisting of geolocated events provide rich spatial characterizations of human behavior. Individuals tend to be self-consistent over time while generating such events, visiting the same locations such as home, the office, or the gym. In this paper we develop an approach to quantify similarity between sets of spatial events, drawing inspiration from the forensic evaluation of DNA evidence. A randomization-based technique is applied in which locations are sampled from conditional distributions of spatial locations (constructed via mixtures of kernel density estimates with weights derived from discrete locations). Score functions based on the distance between groups of events are then computed and used to construct coincidental match probabilities. We illustrate the approach with a large geolocation data set collected from Twitter users. Results are compared to computing the log-likelihood of one set of spatial events under a mixture-KDE from another to assess similarity. Our experimental results indicate that the proposed method can accurately assess the similarity between sets of geolocations, with potential applications in forensic and cybersecurity settings.

Related Resources

Forensic Analysis of Android Cloud SDKs

Forensic Analysis of Android Cloud SDKs

This presentation is from the 76th Annual Conference of the American Academy of Forensic Sciences (AAFS), Denver, Colorado, February 19-24, 2024.
The Impact of Multi-Camera Smart Phones on Source Camera Identification

The Impact of Multi-Camera Smart Phones on Source Camera Identification

An investigator has a questioned image from an unknown source and wants to determine whether it came from a camera on a person of interest’s smartphone. This scenario is referred…
Likelihood ratios for changepoints in categorical event data with applications in digital forensics

Likelihood ratios for changepoints in categorical event data with applications in digital forensics

We investigate likelihood ratio models motivated by digital forensics problems involving time-stamped user-generated event data from a device or account. Of specific interest are scenarios where the data may have…
Producing Datasets: Capturing Images on Multi-Camera Smartphones for Source Camera Identification

Producing Datasets: Capturing Images on Multi-Camera Smartphones for Source Camera Identification

This poster introduces the new CSAFE Multi-camera Smartphone Image Database and describes how the image were collected and reviewed.