NIST Seeks Public Comment on Draft Report of Digital Forensic Methods

Working on a Laptop

The National Institute of Standards and Technology (NIST) has published Digital Investigation Techniques: A NIST Scientific Foundation Review. The draft report will be open for public comments through July 11, 2022.

The report reviews the methods that digital forensic experts use to analyze evidence from computers, mobile phones and other electronic devices.

According to a news release from NIST, the authors of the report examined peer-reviewed literature, documentation from software developers, test results on forensic tools, standards and best practices documents and other sources of information.

The news release also stated that the report discusses several challenges that digital forensic experts face, including the rapid pace of technological change, and recommends better methods for information-sharing among experts and a more structured approach to testing forensic tools.

NIST will host a webinar to discuss the draft report and its findings on June 1 from 1–3 p.m. EDT. For more information about the webinar and to register, visit www.nist.gov/news-events/events/2022/06/webinar-digital-investigation-techniques-nist-scientific-foundation.

Read the full news release on the report at www.nist.gov/news-events/news/2022/05/nist-publishes-review-digital-forensic-methods.

The Center for Statistics and Applications in Forensic Evidence (CSAFE), a NIST Center of Excellence, conducts research addressing the need for forensic tools and methods for digital evidence. Learn more about this research at forensicstats.org/digital-evidence.

Insights: Statistical Methods for the Forensic Analysis of Geolocated Event Data

INSIGHT

Statistical Methods for the Forensic Analysis of Geolocated Event Data

OVERVIEW

Researchers investigated the application of statistical methods to forensic questions involving spatial event-based digital data. A motivating example involves assessing whether or not two sets of GPS locations corresponding to digital events were generated by the same source. The team established two approaches to quantify the strength of evidence concerning this question.

Lead Researchers

Christopher Galbraith
Padhraic Smyth
Hal S. Stern

Journal

Forensic Science International: Digital Investigation

Publication Date

July 2020

The Goal

Develop quantitative techniques for the forensic analysis of geolocated event data.

APPROACH AND METHODOLOGY

Researchers collected geolocation data from Twitter messages over two spatial regions, Orange County, CA and the borough of Manhattan in New York City, from May 2015 to February 2016. Selecting only tweets from public accounts, they were able to gather GPS data regarding the frequency of geolocated events in each area.

Key Definitions

Likelihood Ratio (LR)

A comparison of the probability of observing a set of evidence measures under two different theories in order to assess relative support for the theories.

Score-Based Likelihood Ratio (SLR)

An approach that summarizes evidence measures by a score function before applying the likelihood ratio approach.

This study considered a scenario in which two sets of tweet locations are relevant to then determine the source of the tweets. The tweets could be from different devices or from the same device during two different time periods.

The team used kernel density estimation to establish a likelihood ratio approach for observing the tweets under two competing hypotheses: are the tweets from the same source or a different source?

Utilizing this second approach creates a score-based likelihood ratio that summarizes the similarity of the two sets of locations while assessing the strength of the evidence.

Decisions based on both LR and SLR approaches were compared to known ground truth to determine true and false-positive rates.

KEY TAKEAWAYS FOR PRACTITIONERS

1

Both methods show promise in being able to distinguish same-source pairs of spatial event data from different-source pairs.

2

The LR approach outperformed the SLR approach for all dataset sizes considered.

3

The behavior of both approaches can be impacted by the characteristics of the observed region and amount of evidential data available.

FOCUS ON THE FUTURE

 

In this study, time defined sets of locations gathered from Twitter. But, other methods for defining sets of locations, for example, including multiple devices over the same time period, could yield different results.

The amount of available data (the number of tweets) impacts the score-based approach.

Insights: Quantifying the Association Between Discrete Event Time Series with Applications to Digital Forensics

INSIGHT

Quantifying the Association Between Discrete Event Time Series with Applications to Digital Forensics

Effects of Proficiency and Cross-examination

OVERVIEW

Digital devices provide a new opportunity to examiners because for every user event — like opening software, browsing online, or sending an email — an event time series is created, logging that data. Yet, using this type of user-generated event data can be difficult to correlate between
two devices for examiners. The research team set out to quantify the degree of association between two event time series both with and without population data.

Lead Researchers

Christopher Galbraith
Padhraic Smyth
Hal S. Stern

Journal

Journal of the Royal Statistical Society

Publication Date

January 2020

The Goals

1

Investigate suitable measures to quantify the association between two event series on digital devices.

2

Determine the likelihood that the series were generated by the same source or by different sources –– ultimately to assess the degree of association between the two event series.

APPROACH AND METHODOLOGY

Researchers explored a variety of measures for quantifying the association between two discrete event time series. They used multiple score functions to determine the similarity between the series. These score functions were discriminative for same- and different-source pairs of event series.

The following methods for assessing the strength of association for a given pair of event series proved most accurate:

1

Constructing score-based likelihood ratios (SLRs) that assess the relative likelihood of observing a given degree of association when the series came from the same or different sources. This uses a population-based approach.

2

Calculating coincidental match probabilities (CMPs) to simulate a different-source score distribution via what the research team refers to as sessionized resampling when working with a single pair of event series. When a sample from a relevant population is not available, this method still produces accurate results.

KEY TAKEAWAYS for Practitioners

1

The population-based approach of SLRs remains the preferred technique in terms of accuracy and interpretability.

2

The resampling technique using CMPs shows significant potential for quantifying the association between a pair of time event series, helping examiners determine the likelihood that two different time series were created by the same person, especially when no population sampling data is available.

3

With multiple-event series, combining these techniques could be valuable for pattern mining to determine which event series are associated with one another.

4

Developments in this area have the capacity to positively impact work in forensic and cybersecurity settings.

Next Steps

 

Both SLR and CMP techniques require more extensive study and testing before being used in practice by forensic examiners.

All techniques that are described are implemented in the open-source R package assocr.