Analyzing User-Generated Event Data (Digital)

CSAFE researchers are developing statistical methods to analyze user-generated event data from digital devices such as mobile phones and computers. Such data includes web navigation actions, web searches, text messaging, social media posting, and file editing. A particular focus of the work is on the development of methods and algorithms to determine how likely it is that different event streams are related or not — for example whether two event streams were likely to have been generated by the same individual.

Area

Digital Evidence

Disciplines

Statistics, Computer Science

Research

  • Developed a statistical method for change detection in event time-series data, based on hidden Markov models with Poisson-distributed observations.
  • Switching the primary focus from the problem of statistical change detection (for a single event stream), CSAFE researchers undertook solving the problem of detecting consistency or dependency among multiple event streams in the next phase. Building on a statistical framework known as marked point processes, the researchers developed methodology and algorithms for (a) measuring the association between different event streams, and (b) quantifying the degree of certainty associated with such associations via score-based likelihood ratios and probabilities of coincidental match.
  • Experimental results, using both simulated and real user data, concluded that the proposed approach has significant discriminative power on these data sets and shows promise for applications in digital forensics. In a case study with real-world event data from laptops and phones, the researchers demonstrated hold-out true positive rates in the range of 85% to 95% and false positive rates of 3% to 10%, for detecting same-source versus different-source pairs of event streams.

Benefits of Research

The outcomes of this research can help forensic investigators improve the quality of information extracted from devices during digital forensic investigations, and potentially provide more quantitative statistical support for evidence and arguments presented in court related to digital traces of user behavior on a device.

Select Publications, Conference Papers, Presentations and/or Tools

Presentation,10th International Conference on Forensic Inference and Statistics, Minneapolis, MN, September 2017.

Software coming soon: Researchers plan to make the software publicly available as open-source code for others to use in the digital forensics community.