Quantifying the association between discrete event time series with applications to digital forensics

We consider the problem of quantifying the degree of association between pairs of discrete event time series, with potential applications in forensic and cybersecurity settings. We focus in particular on the case where two associated event series exhibit temporal clustering such that the occurrence of one type of event at a particular time increases the likelihood that an event of the other type will also occur nearby in time. We pursue a non‐parametric approach to the problem and investigate various score functions to quantify association, including characteristics of marked point processes and summary statistics of interevent times. Two techniques are proposed for assessing the significance of the measured degree of association: a population‐based approach to calculating score‐based likelihood ratios when a sample from a relevant population is available, and a resampling approach to computing coincidental match probabilities when only a single pair of event series is available. The methods are applied to simulated data and to two real world data sets consisting of logs of computer activity and achieve accurate results across all data sets.

Score-based Likelihood Ratios for Camera Device Identification

Many areas of forensics are moving away from the notion of classifying evidence simply as a match or non-match. Instead, some use score-based likelihood ratios (SLR) to quantify the similarity between two pieces of evidence, such as a fingerprint obtained from a crime scene and a fingerprint obtained from a suspect. We apply trace-anchored score-based likelihood ratios to the camera device identification problem. We use photo-response non-uniformity (PRNU) as a camera fingerprint and one minus the normalized correlation as a similarity score. We calculate trace-anchored SLRs for 10,000 images from seven camera devices from the BOSSbase image dataset. We include a comparison between our results the universal detector method.

Algorithm mismatch in spatial steganalysis

The number and availability of stegonographic embedding algorithms continues to grow. Many traditional blind steganalysis frameworks require training examples from every embedding algorithm, but collecting, storing and processing representative examples of each algorithm can quickly become untenable. Our motivation for this paper is to create a straight-forward, nondata-intensive framework for blind steganalysis that only requires examples of cover images and a single embedding algorithm for training. Our blind steganalysis framework addresses the case of algorithm mismatch, where a classifier is trained on one algorithm and tested on another, with four spatial embedding algorithms: LSB matching, MiPOD, S-UNIWARD and WOW.

We use RAW image data from the BOSSbase database and and data collected from six iPhone devices. Ensemble Classifiers with Spatial Rich Model features are trained on a single embedding algorithm and tested on each of the four algorithms. Classifiers trained on MiPOD, S-UNIWARD and WOW data achieve decent error rates when testing on all four algorithms. Most notably, an Ensemble Classifier with an adjusted decision threshold trained on LSB matching data achieves decent detection results on MiPOD, S-UNIWARD and WOW data.

Score-based likelihood ratios in device identification

Many areas of forensics are moving away from the notion of classifying evidence simply as a match or non-match. Instead, some use score-based likelihood ratios (SLR) to quantify the similarity between two pieces of evidence, such as a fingerprint obtained from a crime scene and a fingerprint obtained from a suspect. We apply trace-anchored score-based likelihood ratios to the camera device identification problem. We use photo-response non-uniformity (PRNU) as a camera fingerprint and one minus the normalized correlation as a similarity score. We calculate trace-anchored SLRs for 10,000 images from seven camera devices from the BOSSbase image dataset. We include a comparison between our results the universal detector method.

Automatic Classification of Bloodstain Patterns Caused by Gunshot and Blunt Impact at Various Distances

The forensics discipline of bloodstain pattern analysis plays an important role in crime scene analysis and reconstruction. One reconstruction question is whether the blood has been spattered via gunshot or blunt impact such as beating or stabbing. This paper proposes an automated framework to classify bloodstain spatter patterns generated under controlled conditions into either gunshot or blunt impact classes. Classification is performed using machine learning. The study is performed with 94 blood spatter patterns which are available as public data sets, designs a set of features with possible relevance to classification, and uses the random forests method to rank the most useful features and perform classification. The study shows that classification accuracy decreases with the increasing distance between the target surface collecting the stains and the blood source. Based on the data set used in this study, the model achieves 99% accuracy in classifying spatter patterns at distances of 30 cm, 93% accuracy at distances of 60 cm, and 86% accuracy at distances of 120 cm. Results with 10 additional backspatter patterns also show that the presence of muzzle gases can reduce classification accuracy.

Letter to the Editor: Automatic Classification of Bloodstain Patterns

The forensics discipline of bloodstain pattern analysis plays an important role in crime scene analysis and reconstruction. One reconstruction question is whether the blood has been spattered via gunshot or blunt impact such as beating or stabbing. This paper proposes an automated framework to classify bloodstain spatter patterns generated under controlled conditions into either gunshot or blunt impact classes. Classification is performed using machine learning. The study is performed with 94 blood spatter patterns which are available as public data sets, designs a set of features with possible relevance to classification, and uses the random forests method to rank the most useful features and perform classification. The study shows that classification accuracy decreases with the increasing distance between the target surface collecting the stains and the blood source. Based on the data set used in this study, the model achieves 99% accuracy in classifying spatter patterns at distances of 30 cm, 93% accuracy at distances of 60 cm, and 86% accuracy at distances of 120 cm. Results with 10 additional backspatter patterns also show that the presence of muzzle gases can reduce classification accuracy.

Insights: A Robust Approach to Automatically Locating Grooves in 3D Bullet Land Scans

INSIGHTS

A Robust Approach to Automatically Locating Grooves in 3D Bullet Land Scans

OVERVIEW

Land engraved areas (LEAs) can be important distinguishing factors when analyzing 3D scans from bullets. Creating a 3D image of an LEA requires examiners to also scan portions of the neighboring groove engraved areas (GEAs). Current modeling techniques often struggle to separate LEAs from GEAs. CSAFE researchers developed a new method to automatically remove GEA data and tested this method’s performance against previously proposed techniques.

Lead Researchers

Kiegan Rice
Ulrike Genschel
Heike Hofmann

Journal

Journal of Forensic Sciences

Publication Date

13 December 2019

Publication Number

IN 103 FT

GOAL

1

Present and discuss automated methods for identifying “shoulder locations” between LEAs and GEAs.

The Study

Rice et al. gathered 3D scans of 104 bullets from two available data sets (Hamby 44 and Houston), resulting in a total of 622 LEA scans.
They removed the curvature from these 3D scans to make 2D crosscuts of each LEA.
Using the 2D crosscuts, the team estimated the shoulder locations between LEAs and GEAs using three different models:

Rollapply:

A function (in this case, one available through the open-source “bulletxtrctr” package) which applies a rolling average to smooth out outliers in data.

Robust Linear Model:

A quadratic linear model that minimizes absolute deviations and is therefore less influenced by outliers.

Robust Locally Weighted Regression (LOESS):

A weighted average of many parametric models to fit subsets of data.

Results

Hamby set 44

Houston test set

areas of misidentification:

In this graphic, an Area of Misidentification less than 100 is considered a small deviation, between 100 and 1000 is medium, and greater than 1000 is a large deviation.

The Robust LOESS model significantly outperformed the Rollapply and Robust Linear models, resulting primarily in small deviations across all test sets.
Conversely, the Robust Linear model had the weakest performance of all three, with mostly large deviations across both Houston sets, and only outperforming the Rollapply model in the right shoulder section of the Hamby 44 set.
These results were expected, as the Robust LOESS model is intended to be flexible and handle areas that a quadratic linear model would fail to address.

FOCUS ON THE FUTURE

Both the Hamby 44 and Houston datasets used firearms from the same manufacturer. Future studies can expand on these findings by using a wider variety of barrel types, including different caliber sizes, manufacturers and nontraditional rifling techniques.

Additionally, explore relevant publications:

Download

Download the Insight

FULL Study

Author: Molly Doolittle

INSIGHTS

A Robust Approach to Automatically Locating Grooves in 3D Bullet Land Scans

OVERVIEW

Lead Researchers

Journal

Publication Date

Publication Number

GOAL

1

The Study

Rollapply:

Robust Linear Model:

Robust Locally Weighted Regression (LOESS):

Results

Hamby set 44

Houston test set

FOCUS ON THE FUTURE

Additionally, explore relevant publications:

Download

FULL Study

More Insights

INSIGHT

Quantifying the Association Between Discrete Event Time Series with Applications to Digital Forensics

OVERVIEW

Lead Researchers

Journal

Publication Date

Publication Number

The Goals

1

2

APPROACH AND METHODOLOGY

1

2

KEY TAKEAWAYS for Practitioners

1

2

3

4

Next Steps

Download

FULL Study

More Insights

Partially Funded by