Overarching GOALS
From the outset, CSAFE and NIST set out to address the growing demand for forensics tools and methods specifically for digital evidence –– all built on strong probabilistic and statistics foundations.
Digital devices play a central role in modern society and criminal investigations requiring examination of digital devices and digital artifacts are common.
Examples of digital evidence based activities of forensic experts include detecting the use of steganographic algorithms by criminals to hide illicit text or images in otherwise innocent pictures or videos, recovering illicit data/software hidden on phones/tablets, and examining digital traces of user activity (e.g., messages/texts, login events) for signs of criminal activity.
Looking for
WEBINARS,
Short courses, presentations or publications
on Digital Evidence?
Additional Team Members
Roy Maxion
maxion@cs.cmu.edu
Hal S. Stern
sternh@uci.edu
Li Lin
llin@iastate.edu
Barbara Guttmann (NIST) barbara.guttman@nist.gov
focus Areas
The CSAFE team at ISU has developed and made public a reference dataset, StegoAppDB, comprised of more than 960,000 highly-provenanced image data created using our unique approach of modeling the steganography (stego) software applications (because their algorithms are not public). The data is used to develop and test algorithms and create new models of the camera pipeline. The next generation of algorithms for detecting hidden images will rely on deep learning algorithms with millions of free parameters (see, e.g., Boroumand et al., 2019). We are growing the database substantially through automation of the image data collection and stego creation process, such as using a drone to collect copious amounts of image data — in order to allow investigators to work with deep learning methods.
Steganography is a tool to send a secret message hidden in a common medium, like a digital image, so that the very existence of the payload escapes notice. The effort to hide and detect payload are inverse: a developer can easily create an app to hide payload, while steganalysis, or detection of hidden content, by a forensic image analyst requires expert knowledge and often extensive resources. Hundreds of thousands of stego apps have been downloaded for mobile devices, yet, until recently, steganalysis has appeared mainly in academic settings and not specifically for mobile apps.
In CSAFE 1.0, our project focused on the careful construction of the first forensic reference database for stego images created from mobile apps, producing StegoAppDB. StegoAppDB has more than 960,000 innocent and stego images, is fully provenanced and copyright-free, and publicly available online. However, StegoAppDB contains only indoor scenes, a limited number of distinct scenes, and a limited number of auto exposure images. For CSAFE 2.0, we propose to add image data selectively to enhance the attractiveness of StegoAppDB for wider use in the forensic imaging community, in the following manner:
- Add several million outdoor images so that the most successful machine learning techniques, deep learning neural networks, have access to ample data.
- Add images that have been photo-edited before embedding payload, which we suspect is a common occurrence. In fact, it is speculated that image editing operations may affect steganalysis error rates, but little to nothing is known about this, and certainly nothing for mobile stego images.
- Include stego apps beyond those currently in StegoAppDB and create an in-house detection tool for verification of our stego process, since no software package exists specifically to detect stego images created with mobile stego apps. We will use emulators and reverse engineering for the stego app analysis to retain accurate parameters for data provenance.
Our goal is to acquire outdoor scene data in a resource-efficient manner using a car, a bike, and/or a robot, with our in-house camera app Cameraw, taking many photos in an automated fashion, using simple and inexpensive mounts.
To establish support in and partnership with the practitioner community, we propose to conduct a brief survey at the beginning of the project. Its goal is to obtain information on the status quo of steganography in the digital evidence community and establish connections with labs to identify their needs as related to steganography. During meetings at SWGDE in 2019, the PI met Glenn Cavin, Lieutenant, Plano Police Department, Plano, Texas, who has provided a letter of support for his interest in this project. Additionally, note that the detection tool could be of potential interest to forensic practitioners.
The expanded database from combinations of these three image generation methods will allow users access to more realistically simulated evidence for mobile steganography and provide a source of data for tool development of mobile steganalysis. The in-house detection tool might provide insight for such a tool in the long-term.
The goal of this project is to develop new statistical methodologies to support digital forensics in analysis of user-generated event data — focusing on native or cloud-based logged event data from devices such as computers and mobile phones, typically of the form <accountID, timestamp, geolocation, metadata>. The types of forensic questions to be addressed will focus on identity-related questions, such as whether two sets of geolocation events were generated by the same individual or by two different individuals.
The rapid increase in the use of digital devices in the past decade has resulted in significant new challenges for digital forensics. In this context, the goal of this project is to develop new statistical methodologies to support digital forensics in analysis of user-generated event data. The project will focus on native or cloud-based logged event data from devices such as computers and mobile phones, typically of the form <accountID, timestamp, geolocation, metadata>. The types of forensic questions to be addressed will focus on identity-related questions, such as whether two sets of geolocation events were generated by the same individual or by two different individuals. Traditional approaches in statistical forensics, such as likelihood ratios and score-based likelihood ratios, will be extended to handle user-generated event data, particularly data with spatial and spatio-temporal information, building on our prior work in Phase 1 of CSAFE. The extensions will require development of robust and accurate new probabilistic models and fast new algorithms for large data sets. The project will use a variety of different real-world user event datasets as testbeds for development and evaluation. Deliverables from the project will include research papers, tutorial webinars, public testbed datasets and open-source software.
With the EviHunter tools, we are constructing a prototype Android App Evidence Database. In Phase II of the project we plan to extend EviHunter to be able to analyze complex features of Android apps caused by addressing current challenges. With the EviHunter technologies and AED, we expect that forensic investigators will improve their case work in terms of better efficiency, precision and completeness.
The plethora of mobile apps has introduced significant challenges to digital forensics practitioners, due to the large number and diversity of apps, for example over 2.6 million apps on Google Play plus many other apps available from 53 other app stores across the world. The average smartphone user has more than 80 apps on their phone and uses close to 40 of them each month. One chilling fact is that the lack of expert-knowledge about these large number and various types of apps has led to the practitioners’ challenges, including the inability of keeping up with the quick updates of apps on the app markets, discover critical and relevant evidence, or common problems of large casework backlogs in crime labs.
We propose this project that ultimately aims to develop EviHunter, a new path-, context-, flow-sensitive Android application program analysis tool set, to report the list of forensic relevant evidential data generated from the Android apps on a mobile device, by parsing the mobile application installation package using both static and dynamic approaches. During the first five years of the CSAFE project, we developed automated Android app program analysis tools to discover most (if not all) evidential data generated by the apps from a suspect’s mobile device or a remote cloud server. We have developed an award-winning EviHunter toolset (published at ACM CCS 2018 and IEEE SADFE 2018), and with the EviHunter tools, we are constructing a prototype Android App Evidence Database. In the next three years of the project (CSAFE Phase II), we plan to extend EviHunter to be able to analyze complex features of Android apps caused by addressing the following challenges: (1) software obfuscation, (2) limited capability of Soot in generating call graphs, (3) packing/unpacking of apps, (4) native code/3rd-libraries, (5) adapting dynamic Monkey-based tool into better supporting for dynamic EviHunter, and (6) building a better and more complete list of source/sink methods. Also, we will review the design of the prototyped EviHunter tools to optimize the code implementation to reduce the memory space usage and improve the computation complexity. With the deeper knowledge of evidence types and their frequency of usage, we plan to apply statistical methods to focus on more critical program paths inside the apps to speed up the app analysis process.
We also plan to continue to collect and construct (likely the largest) Android App Evidence Database. We will work with Iowa State University Legal Counsel to examine the potential legal issues and make the database accessible to the practitioners.
With the EviHunter technologies and AED, we expect that forensic investigators will improve their case work in terms of better efficiency, precision and completeness, compared against the current mobile device forensic practice that performs (to some extent) a manual process via Cellebrite and other mobile forensic tools. In an ideal case, we hope to reduce the case time (on a mobile device) from one or two days to 10-15 minutes, which will significantly make the practitioners’ life easier.
This project addresses two scenarios. In the specific source scenario, an investigator has two pieces of evidence: a questioned image Q from an unknown camera and a person of interest’s (POI’s) phone. The investigator wants to determine whether Q was captured by a camera on the POI’s phone. In the common source scenario, an investigator has two questioned images Q1 and Q2 from unknown cameras and the investigator wants to determine whether Q1 and Q2 were taken by the same camera. We plan to develop and test an objective, reproducible score-based likelihood ratio (SLR) approach to quantify the similarity between a questioned image Q and a camera fingerprint K from the POI’s phone (specific source scenario) or between two questioned images Q1 and Q2 (common source scenario) on a large, authenticated image dataset that includes multi-camera phones. This approach will enable the investigator to both determine whether image Q originated from a camera on the POI’s phone and to quantify the strength of the evidence.
Currently available authenticated image datasets from mobile phones are limited in the number of models and don’t contain adequate numbers of images for analysis from each available camera on the device. We plan to augment available datasets with authenticated images from a wide variety of phone brands and models, that include multiple phones of each model, and multiple images from each camera on each phone.
Our preliminary results addressing the specific source scenario on a small set of phones shows that if questioned image Q and camera fingerprint K are from cameras of the same model, calculating SLRs with a reference set of close non-matching phones (phones of the same model as the POI’s phone) produces the lowest rate of false inclusions. However, if Q and K are from different models a reference set of close non-matching phones can lead to a higher false inclusion rate than a reference set that uses a variety of models and brands. Thus, knowing the phone model that captured Q is important knowledge for the investigator. If this result persists on the larger dataset, we will develop a method to estimate the likelihood that Q originated from a specific brand and model of phone.
Knowledge transfer
Found 65 Results
Page 1 of 4
Page 1 of 4
Page 1 of 4
Forensic Analysis of Android Cloud SDKs
Type: Presentation Slides Research Area(s): Digital
Published: 2024 | By: Chen Shi
This presentation is from the 76th Annual Conference of the American Academy of Forensic Sciences (AAFS), Denver, Colorado, February 19-24, 2024.
The Impact of Multi-Camera Smart Phones on Source Camera Identification
Type: Presentation Slides Research Area(s): Digital
Published: 2024 | By: Stephanie Reinders
An investigator has a questioned image from an unknown source and wants to determine whether it came from a camera on a person of interest’s smartphone. This scenario is referred to as source camera identification. Researchers discovered that slight imperfections…
Likelihood ratios for changepoints in categorical event data with applications in digital forensics
Type: Publication Research Area(s): Digital,Forensic Statistics
Published: 2024 | By: Rachel Longjohn
We investigate likelihood ratio models motivated by digital forensics problems involving time-stamped user-generated event data from a device or account. Of specific interest are scenarios where the data may have been generated by a single individual (the device/account owner) or…
Producing Datasets: Capturing Images on Multi-Camera Smartphones for Source Camera Identification
Type: Poster Research Area(s): Digital
Published: 2024 | By: Megan McGuire
This poster introduces the new CSAFE Multi-camera Smartphone Image Database and describes how the image were collected and reviewed.
The Impact of Zoom on Smartphone Camera Identification
Type: Poster Research Area(s): Digital
Published: 2024 | By: Gavin Norton
This poster explores the impact of digital zoom on source camera identification. Images were collected at 5 zoom magnifications using the telephoto cameras of ten iPhone 14 Pro smartphones.
Source Camera Identification with Multi-Camera Smartphones
Type: Presentation Slides Research Area(s): Digital,Forensic Statistics
Published: 2023 | By: Stephanie Reinders
An overview of source camera identification on multi-camera smartphones, and introduction to the new CSAFE multi-camera smartphone image database, and a summary of recent results on the iPhone 14 Pro's.
An Anti-Fuzzing Approach for Android Apps
Type: Conference Proceeding,Publication Research Area(s): Digital
Published: 2023 | By: Chris Chao-Chun Cheng
One of significant mobile app forensic analysis problems is the app evidence extraction from the device. Given the fact that mobile apps could generate more than 19K files in a device [6], simply manually inspecting every file is time consuming…
Forensic Analysis of Android Cryptocurrency Wallet Applications
Type: Conference Proceeding,Publication Research Area(s): Digital
Published: 2023 | By: Chen Shi
Crypto wallet apps that integrate with various block-chains allow the users to make digital currencies transaction with QR codes. According to reports from financesonline [3], there is over 68 million crypto wallet app users in 2021. As new crypto wallets…
Variations and Extensions of Information Leakage Metrics with Applications to Privacy Problems with Imperfect Statistical Information
Type: Conference Proceeding,Publication Research Area(s): Digital,Forensic Statistics
Published: 2023 | By: Shahnewaz Karim Sakib
The conventional information leakage metrics assume that an adversary has complete knowledge of the distribution of the mechanism used to disclose information correlated with the sensitive attributes of a system. The only uncertainty arises from the specific realizations that are…
Camera Device Identification and the Effects of Underexposure
Type: Dissertation,Publication Research Area(s): Digital,Forensic Statistics
Published: 2023 | By: Seth Pierre
Technology today allows a photograph from a digital camera to be matched with the camera that took it. However, the matching software was created over 10 years ago using data that is not necessarily representative of today’s data. The objective…
Likelihood ratios for categorical count data with applications in digital forensics
Type: Publication Research Area(s): Digital,Forensic Statistics
Published: 2022 | By: Rachel Longjohn
We consider the forensic context in which the goal is to assess whether two sets of observed data came from the same source or from different sources. In particular, we focus on the situation in which the evidence consists of…
CSAFE Project Update & ASCLD FRC Collaboration
Type: Presentation Slides Research Area(s): Digital,Firearms and Toolmarks,Footwear,Forensic Statistics,Handwriting,Implementation and Practice
Published: 2022 | By: Jeff Salyards
This presentation highlighted CSAFE's collaboration with the ASCLD FRC Collaboration Hub.
Forensic Analysis on Android Social Networking Applications
Type: Presentation Slides Research Area(s): Digital
Published: 2023 | By: Chen Shi
This presentation is from the 75th Anniversary Conference of the American Academy of Forensic Sciences, Orlando, Florida, February 13-18, 2023. Posted with permission of CSAFE.
Source Camera Identification on Multi-Camera Phones
Type: Presentation Slides Research Area(s): Digital,Forensic Statistics
Published: 2023 | By: Stephanie Reinders
Camera identification addresses the scenario where an investigator has a questioned digital image from an unknown camera. The investigator wants to know whether the questioned image was taken by a camera on a person of interest’s phone. Researchers discovered that…
A Likelihood Ratio Approach for Detecting Behavioral Changes in Device Usage Over Time
Type: Poster Research Area(s): Digital
Published: 2023 | By: Rachel Longjohn
This work focuses on the situation in which investigators have obtained as evidence logs of user-generated activities on a device, such as sending text messages or emails, opening or interacting with mobile apps, or making calls from particular locations. Quantitative…
LibDroid: Summarizing information flow of Android Native Libraries via Static Analysis
Type: Publication Research Area(s): Digital
Published: 2022 | By: Chen Shi
With advancements in technology, people are taking advantage of mobile devices to access e-mails, search the web, and video chat. Therefore, extracting evidence from mobile phones is an important component of the investigation process. As Android app developers could leverage…
Evaluating Reference Sets for Score-Based Likelihood Ratios for Camera Device Identification
Type: Presentation Slides Research Area(s): Digital
Published: 2022 | By: Stephanie Reinders
An investigator wants to know if an illicit image captured by an unknown camera was taken by a person of interest’s (POI’s) phone. Score-based likelihood ratios (SLRs) have been used to answer this question in previous research. We explore whether…
Likelihood Ratios for Categorical Evidence with Applications to Digital Forensics
Type: Poster Research Area(s): Digital,Forensic Statistics
Published: 2022 | By: Rachel Longjohn
In forensic investigations, the goal of evidence evaluation is often to address source-/identity-based questions in which the evidence consists of two sets of observations: one from an unknown source tied to a crime and the other from a known source.…
Automatic Detection of Android Steganography Apps via Symbolic Execution and Tree Matching
Type: Publication Research Area(s): Digital
Published: 2021 | By: Wenhao Chen
The recent focus of cyber security on automated detection of malware for Android apps has omitted the study of some apps used for “legitimate” purposes, such as steganography apps. Mobile steganography apps can be used for delivering harmful messages, and…
Tutorial on Likelihood Ratios with Applications in Digital Forensics
Type: Webinar Research Area(s): Digital,Forensic Statistics,Training and Education
This CSAFE webinar was held on September 15, 2022. Presenters: Rachel Longjohn PhD Student - Department of Statistics, University of California, Irvine Dr. Padhraic Smyth Chancellor's Professor - Departments of Statistics and Computer Sciences, University of California, Irvine Presentation Description:…
Page 1 of 4
COMMUNITY CALL-TO-ACTION
Want to collaborate with CSAFE on a project. Contact us to share your idea.