Many datasets feature seemingly disparate entries that actually refer to the same entity. Reconciling these entries, or “matching,” is challenging, especially in situations where there are errors in the data. In certain contexts, the situation is even more complicated: an active adversary may have a vested interest in having the matching process fail. By leveraging eight years of data, we investigate one such adversarial context: matching different online anonymous marketplace vendor handles to unique sellers. Using a combination of random forest classifiers and hierarchical clustering on a set of features that would be hard for an adversary to forge or mimic, we manage to obtain reasonable performance (over 75% precision and recall on labels generated using heuristics), despite generally lacking any ground truth for training. Our algorithm performs particularly well for the top 30% of accounts by sales volume, and hints that 22,163 accounts with at least one confirmed sale map to 15,652 distinct sellers—of which 12,155 operate only one account, and the remainder between 2 and 11 different accounts. Case study analysis further confirms that our algorithm manages to identify non-trivial matches, as well as impersonation attempts.
Adversarial Matching of Dark Net Market Vendor Accounts

Conference/Workshop:
25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
Published: 2019
Primary Author: Xiao Hui Tai
Secondary Authors: Kyle Soska, N. Christian
Type: Presentation Slides
Research Area: Digital
Related Resources
Forensic Analysis of Android Cloud SDKs
This presentation is from the 76th Annual Conference of the American Academy of Forensic Sciences (AAFS), Denver, Colorado, February 19-24, 2024.
The Impact of Multi-Camera Smart Phones on Source Camera Identification
An investigator has a questioned image from an unknown source and wants to determine whether it came from a camera on a person of interest’s smartphone. This scenario is referred…
Likelihood ratios for changepoints in categorical event data with applications in digital forensics
We investigate likelihood ratio models motivated by digital forensics problems involving time-stamped user-generated event data from a device or account. Of specific interest are scenarios where the data may have…
Producing Datasets: Capturing Images on Multi-Camera Smartphones for Source Camera Identification
This poster introduces the new CSAFE Multi-camera Smartphone Image Database and describes how the image were collected and reviewed.