Skip to content

Digital Evidence

Overarching GOALS

From the outset, CSAFE and NIST set out to address the growing demand for forensics tools and methods specifically for digital evidence –– all built on strong probabilistic and statistics foundations.

Digital devices play a central role in modern society and  criminal investigations requiring examination of digital devices and digital artifacts are common. 

Examples of digital evidence based activities of forensic experts include detecting the use of steganographic algorithms by criminals to hide illicit text or images in otherwise innocent pictures or videos, recovering illicit data/software hidden on phones/tablets, and examining digital traces of user activity (e.g., messages/texts, login events) for signs of criminal activity.

JenniferNewman_web

Jennifer Newman

Associate Professor

Iowa State University

PadhraicSmyth_web

Padhraic Smyth

Chancellor's Professor

University of California, Irvine

YongGuan1_square

Yong Guan

Professor

Iowa State University

Additional Team Members

Roy Maxion
maxion@cs.cmu.edu

Hal S. Stern
sternh@uci.edu

Li Lin
llin@iastate.edu

Barbara Guttmann (NIST) barbara.guttman@nist.gov

focus Areas

The CSAFE team at ISU has developed and made public a reference dataset, StegoAppDB, comprised of more than 960,000 highly-provenanced image data created using our unique approach of modeling the steganography (stego) software applications (because their algorithms are not public). The data is used to develop and test algorithms and create new models of the camera pipeline. The next generation of algorithms for detecting hidden images will rely on deep learning algorithms with millions of free parameters (see, e.g., Boroumand et al., 2019). We are growing the database substantially through automation of the image data collection and stego creation process, such as using a drone to collect copious amounts of image data — in order to allow investigators to work with deep learning methods.

Steganography is a tool to send a secret message hidden in a common medium, like a digital image, so that the very existence of the payload escapes notice. The effort to hide and detect payload are inverse: a developer can easily create an app to hide payload, while steganalysis, or detection of hidden content, by a forensic image analyst requires expert knowledge and often extensive resources. Hundreds of thousands of stego apps have been downloaded for mobile devices, yet, until recently, steganalysis has appeared mainly in academic settings and not specifically for mobile apps.

In CSAFE 1.0, our project focused on the careful construction of the first forensic reference database for stego images created from mobile apps, producing StegoAppDB. StegoAppDB has more than 960,000 innocent and stego images, is fully provenanced and copyright-free, and publicly available online. However, StegoAppDB contains only indoor scenes, a limited number of distinct scenes, and a limited number of auto exposure images. For CSAFE 2.0, we propose to add image data selectively to enhance the attractiveness of StegoAppDB for wider use in the forensic imaging community, in the following manner:

  1. Add several million outdoor images so that the most successful machine learning techniques, deep learning neural networks, have access to ample data.
  2. Add images that have been photo-edited before embedding payload, which we suspect is a common occurrence. In fact, it is speculated that image editing operations may affect steganalysis error rates, but little to nothing is known about this, and certainly nothing for mobile stego images.
  3. Include stego apps beyond those currently in StegoAppDB and create an in-house detection tool for verification of our stego process, since no software package exists specifically to detect stego images created with mobile stego apps. We will use emulators and reverse engineering for the stego app analysis to retain accurate parameters for data provenance.

Our goal is to acquire outdoor scene data in a resource-efficient manner using a car, a bike, and/or a robot, with our in-house camera app Cameraw, taking many photos in an automated fashion, using simple and inexpensive mounts.

To establish support in and partnership with the practitioner community, we propose to conduct a brief survey at the beginning of the project. Its goal is to obtain information on the status quo of steganography in the digital evidence community and establish connections with labs to identify their needs as related to steganography. During meetings at SWGDE in 2019, the PI met Glenn Cavin, Lieutenant, Plano Police Department, Plano, Texas, who has provided a letter of support for his interest in this project. Additionally, note that the detection tool could be of potential interest to forensic practitioners.

The expanded database from combinations of these three image generation methods will allow users access to more realistically simulated evidence for mobile steganography and provide a source of data for tool development of mobile steganalysis. The in-house detection tool might provide insight for such a tool in the long-term.

The goal of this project is to develop new statistical methodologies to support digital forensics in analysis of user-generated event data — focusing on native or cloud-based logged event data from devices such as computers and mobile phones, typically of the form <accountID, timestamp, geolocation, metadata>. The types of forensic questions to be addressed will focus on identity-related questions, such as whether two sets of geolocation events were generated by the same individual or by two different individuals.

The rapid increase in the use of digital devices in the past decade has resulted in significant new challenges for digital forensics. In this context, the goal of this project is to develop new statistical methodologies to support digital forensics in analysis of user-generated event data. The project will focus on native or cloud-based logged event data from devices such as computers and mobile phones, typically of the form <accountID, timestamp, geolocation, metadata>. The types of forensic questions to be addressed will focus on identity-related questions, such as whether two sets of geolocation events were generated by the same individual or by two different individuals. Traditional approaches in statistical forensics, such as likelihood ratios and score-based likelihood ratios, will be extended to handle user-generated event data, particularly data with spatial and spatio-temporal information, building on our prior work in Phase 1 of CSAFE. The extensions will require development of robust and accurate new probabilistic models and fast new algorithms for large data sets. The project will use a variety of different real-world user event datasets as testbeds for development and evaluation. Deliverables from the project will include research papers, tutorial webinars, public testbed datasets and open-source software.

With the EviHunter tools, we are constructing a prototype Android App Evidence Database. In Phase II of the project we plan to extend EviHunter to be able to analyze complex features of Android apps caused by addressing current challenges. With the EviHunter technologies and AED, we expect that forensic investigators will improve their case work in terms of better efficiency, precision and completeness.

The plethora of mobile apps has introduced significant challenges to digital forensics practitioners, due to the large number and diversity of apps, for example over 2.6 million apps on Google Play plus many other apps available from 53 other app stores across the world. The average smartphone user has more than 80 apps on their phone and uses close to 40 of them each month. One chilling fact is that the lack of expert-knowledge about these large number and various types of apps has led to the practitioners’ challenges, including the inability of keeping up with the quick updates of apps on the app markets, discover critical and relevant evidence, or common problems of large casework backlogs in crime labs.

We propose this project that ultimately aims to develop EviHunter, a new path-, context-, flow-sensitive Android application program analysis tool set, to report the list of forensic relevant evidential data generated from the Android apps on a mobile device, by parsing the mobile application installation package using both static and dynamic approaches. During the first five years of the CSAFE project, we developed automated Android app program analysis tools to discover most (if not all) evidential data generated by the apps from a suspect’s mobile device or a remote cloud server. We have developed an award-winning EviHunter toolset (published at ACM CCS 2018 and IEEE SADFE 2018), and with the EviHunter tools, we are constructing a prototype Android App Evidence Database. In the next three years of the project (CSAFE Phase II), we plan to extend EviHunter to be able to analyze complex features of Android apps caused by addressing the following challenges: (1) software obfuscation, (2) limited capability of Soot in generating call graphs, (3) packing/unpacking of apps, (4) native code/3rd-libraries, (5) adapting dynamic Monkey-based tool into better supporting for dynamic EviHunter, and (6) building a better and more complete list of source/sink methods. Also, we will review the design of the prototyped EviHunter tools to optimize the code implementation to reduce the memory space usage and improve the computation complexity. With the deeper knowledge of evidence types and their frequency of usage, we plan to apply statistical methods to focus on more critical program paths inside the apps to speed up the app analysis process.

We also plan to continue to collect and construct (likely the largest) Android App Evidence Database. We will work with Iowa State University Legal Counsel to examine the potential legal issues and make the database accessible to the practitioners.

With the EviHunter technologies and AED, we expect that forensic investigators will improve their case work in terms of better efficiency, precision and completeness, compared against the current mobile device forensic practice that performs (to some extent) a manual process via Cellebrite and other mobile forensic tools. In an ideal case, we hope to reduce the case time (on a mobile device) from one or two days to 10-15 minutes, which will significantly make the practitioners’ life easier.

Knowledge transfer

  • Type

Found 29 Results
Page 1 of 2

Statistical methods for digital image forensics: Algorithm mismatch for blind spatial steganalysis and score-based likelihood ratios for camera device identification

Type: Research Area(s):

Published: 2020 | By: Stephanie Reinders

Forensic science currently faces a variety of challenges. Statistically suitable reference databases need to be developed and maintained. Subjective methods that can introduce bias need to be replaced by objective methods. Highly technical forensic methods need to be clearly and…

View on Digital Repository


Statistical Methods for the Forensic Analysis of User-Event Data

Type: Research Area(s):

Published: 2020 | By: Chris Galbraith

A common question in forensic analysis is whether two observed data sets originate from the same source or from different sources. Statistical approaches to addressing this question have been widely adopted within the forensics community, particularly for DNA evidence, providing…

View on Digital Repository


Hunting wild stego images, a domain adaptation problem in digital image forensics

Type: Research Area(s):

Published: 2020 | By: Li Lin

Digital image forensics is a field encompassing camera identication, forgery detection and steganalysis. Statistical modeling and machine learning have been successfully applied in the academic community of this maturing field. Still, large gaps exist between academic results and applications used…

View on Digital Repository


Statistical Methods for the Forensic Analysis of Geolocated Event Data

Type: Research Area(s):

Published: 2020 | By: Christopher Galbraith

A common question in forensic analysis is whether two observed data sets originated from the same source or from different sources. Statistical approaches to addressing this question have been widely adopted within the forensics community, particularly for DNA evidence. Here…

View on Digital Repository


CSAFE 2020 All Hands Meeting

Type: Research Area(s): ,,,,,,,,

The 2020 All Hands Meeting was held May 12 and 13, 2020 and served as the closing to the last 5 years of CSAFE research and focused on kicking off new initiatives for the next phase of the center, CSAFE…


Android App Forensic Evidence Database (AndroidAED)

Type: Research Area(s):

Published: 2020 | By: Chen Shi

After attending this presentation, attendees will better understand how AndroidAED will be beneficial for academic researchers whose studies relate to mobile applications that grant them the ability to search through many of the available applications across various third-party app stores.

View on Digital Repository


A Wild Manhunt for Stego Images Created by Mobile Apps

Type: Research Area(s):

Published: 2020 | By: Li Lin

As mobile Internet and telecommunication technology develops at high speed, the digital image forensics academic community is facing a growing challenge. • Mobile applications (Apps) allow a user to easily edit/process an image for a variety of purposes. • Thanks…

View on Digital Repository


Quantifying the association between discrete event time series with applications to digital forensics

Type: Research Area(s):

Published: 2020 | By: Christopher Galbraith

We consider the problem of quantifying the degree of association between pairs of discrete event time series, with potential applications in forensic and cybersecurity settings. We focus in particular on the case where two associated event series exhibit temporal clustering…

View on Digital Repository


Score-based Likelihood Ratios for Camera Device Identification

Type: Research Area(s):

Published: 2020 | By: Stephanie Reinders

Many areas of forensics are moving away from the notion of classifying evidence simply as a match or non-match. Instead, some use score-based likelihood ratios (SLR) to quantify the similarity between two pieces of evidence, such as a fingerprint obtained…

View on Digital Repository


Algorithm mismatch in spatial steganalysis

Type: Research Area(s):

Published: 2019 | By: Stephanie Reinders

The number and availability of stegonographic embedding algorithms continues to grow. Many traditional blind steganalysis frameworks require training examples from every embedding algorithm, but collecting, storing and processing representative examples of each algorithm can quickly become untenable. Our motivation for…

View on Digital Repository


Score-based likelihood ratios in device identification

Type: Research Area(s):

Published: 2020 | By: Stephanie Reinders

Many areas of forensics are moving away from the notion of classifying evidence simply as a match or non-match. Instead, some use score-based likelihood ratios (SLR) to quantify the similarity between two pieces of evidence, such as a fingerprint obtained…

View on Digital Repository


An Overview of CSAFE’s Publically Available Datasets

Type: Research Area(s): ,

Forensic scientists and researchers can now directly implement CSAFE data into their laboratory work. CSAFE forensic databases are publically available and can be utilized to test current analysis techniques, design new methods, and increase confidence in forensic evidence conclusions. This…


StegoAppDB: A steganography apps forensics image database

Type: Research Area(s):

Published: 2019 | By: Jennifer Newman/J. Newman presented

In this paper, we present a new reference dataset simulating digital evidence for image (photographic) steganography. Steganography detection is a digital image forensic topic that is relatively unknown in practical forensics, although stego app use in the wild is on…

View on Digital Repository


Spatial DNA: Measuring similarity of geolocation datasets with applications to forensics

Type: Research Area(s):

Published: 2019 | By: Christopher Galbraith

Datasets consisting of geolocated events provide rich spatial characterizations of human behavior. Individuals tend to be self-consistent over time while generating such events, visiting the same locations such as home, the office, or the gym. In this paper we develop…

View on Digital Repository


Adversarial Matching of Dark Net Market Vendor Accounts

Type: Research Area(s):

Published: 2019 | By: Xiao Hui Tai

Many datasets feature seemingly disparate entries that actually refer to the same entity. Reconciling these entries, or "matching," is challenging, especially in situations where there are errors in the data. In certain contexts, the situation is even more complicated: an…

View on Digital Repository


Identifying Digital Evidence from Android Devices via Static and Dynamic Analysis

Type: Research Area(s):

This CSAFE Center Wide webinar was presented on June 25, 2019 by Dr. Yong Guan, CSAFE researcher at Iowa State University. Presentation Description: The over 50 app stores across the world provides mobile phone users with access to 8 million…


Android™ App Forensic Evidence Database

Type: Research Area(s):

Published: 2019 | By: Chao-Chun C. Cheng

After attending this presentation, attendees will understand how to use this new Android™ Center for Statistics and Applications in Forensic Evidence-App Evidence Database (CSAFE-AED) in their casework investigation. This presentation will introduce the basics, challenges, and limitations of the current…

View on Digital Repository


StegoAppDB and the Prevalenca of Mobile Steganography

Type: Research Area(s):

Published: 2019 | By: Jennifer Newman

After attending this presentation, attendees will be aware of the first mobile stego app database for image forensics and the current lack of software to detect the prevalence of stego use on mobile devices.

View on Digital Repository


StegoAppDB: A steganography apps forensics image database

Type: Research Area(s):

Published: 2019 | By: Jennifer Newman

In this paper, we present a new reference dataset simulating digital evidence for image (photographic) steganography. Steganography detection is a digital image forensic topic that is relatively unknown in practical forensics, although stego app use in the wild is on…

View on Digital Repository


Tackling Android Stego Apps in the Wild

Type: Research Area(s):

Published: 2018 | By: Wenhao Chen

Digital image forensics is a young but maturing field, encompassing key areas such as camera identification, detection of forged images, and steganalysis. However, large gaps exist between academic results and applications used by practicing forensic analysts. To move academic discoveries…

View on Digital Repository


Page 1 of 2

COMMUNITY CALL-TO-ACTION

Want to collaborate with CSAFE on a project. Contact us to share your idea.

Do you have 44.03 seconds?

44.3 Seconds. That is the average amount of time it takes for a visitor to provide site feedback.
Test it yourself by taking the survey.


    A scientist/researcherA member of the forensic science communityA journalist/publicationA studentOther. Please indicate.


    Learn more about CSAFE overall.Discover research CSAFE is undertaking.Explore collaboration opportunities.Find tools and education opportunities.Other. Please indicate.


    YesNo