StegoAppDB: A Forensics Image Database for Mobile Steganography
StegoAppDB, a steganography apps forensics image database, is a database of image data from mobile phones. It is the first database consisting of mobile phone photographs and stego images produced from mobile stego apps, including a rich set of side information. StegoAppDB contains over 960,000 innocent and stego images using 10 different phone models from 24 distinct devices, with detailed provenanced data such as a wide range of ISO and exposure settings, EXIF data, stego app, message information, embedding rate, and other information.
A search on the database can be separated into two main categories, searching on stego and related images, and searching on original images. Stego images are created using Android and iOS mobile stego apps. We provide cover-stego image pairs for each stego image so that the data may be used for machine learning applications of steganalysis. Original images are acquired using our own Cameraw camera app, and saved in both DNG and high-quality JPEG formats. Cameraw is available on Github for both iOS and Androids. We retain the original devices and continue to add to the database. While designed for steganography, other digital image forensic areas may find this database to be suitable.
This database is publicly available and has no copyright or privacy issues associated with it.
For example, there is a portion of the data that is appropriate for research into camera device identification, using, for example, the Photo Response Non-Uniformity (PRNU) fingerprint.
For more information about the data in the database, please go to FAQ.
Q: How can I download portions of the data?
A: Go to the “Search” link, and select either “Originals” or “Stegos.” The search is an additive process, so make your selection from the options, selecting your criteria for searching.
The database can be downloaded here.
Q: How can I download the entire set of files in the database?
A: The entire contents of the database can be downloaded here. The file is approximately 1.2 TB, so verify that you have the space on your end to accommodate the file.
Q: What additional data is given with the images?
A: Your data file includes images, a text file that gives a record of your search parameters and a csv file with all the attributes for each individual image in the download file. In addition, a link to a pdf file titled “SAD Instructions and Information” is included with each download and is also available on this webpage. It gives more details about the downloaded data, including more details on the file folder structure, image types, and contents of the csv files.There are more parameters provided in the csv file than are available to query on. Exposure settings, stego app used and hidden message are some of the attributes in the csv file.
Q: Which paper should I cite if I use StegoAppDB?
Q: What is an “original image”?
A: We define an original image to be an image acquired by the mobile phone camera. An original image has the default pixel dimensions as dictated by the camera app. An original image can be used in many ways. A piece can be cropped out and this smaller-sized image can be used as input/cover image to a machine learning algorithm. Or, an original image can be selected by the user as input to a stego app on the phone.
Q: What is a “cover image”?
A: A cover image is an image that is used directly for embedding and from it, the corresponding stego image is created. Another term for cover image is a zero-rate embedded stego image, that is, it is a stego image that has not been embedded. A cover image and a stego image have the same pixel dimensions, and are practically visually identical, differing only at those locations where message bits reside. Features are then extracted from cover-stego image pairs and used in machine learning to train a classifier. In machine learning algorithms, a cover image is often a cropped (smaller) piece of an original image, as original-sized images can be too large for machine learning algorithms to compute efficiently.
Q: What is an “input image”?
A: We define an input image as an image whose values and pixel dimensions are known, and which is then fed to an algorithm that will hide a message in it or hide a message in a re-sized version of it. The input image in many academic steganalysis algorithms is often the cover image itself, which can be a cropped sub-image from an original image. However, in mobile stego apps, the input image is typically an original image – one taken by the phone’s camera – selected as input to the GUI app. Once selected by the user, the input (original) image is passed to the stego app’s internal code and is often downsized. Any images created internally to the app cannot be accessed by the user, so once it is downsized, the downsized image is not available for machine learning purposes, as cover-stego pairs of images must be the same size.
Q: What is a “stego image”?
A: A stego image in the StegoAppDB is the image that is output from a steganography-embedding mobile apps (see the search page for specific stego apps used in SAD). The csv file that accompanies a stego-images download contains extensive information describing the parameters used to create the stego image. For example, not only is the name of the mobile app that produces the stego given, but also the cover image name, any input image associated with the cover image, as we as the original image from which these other images are derived. Additional information includes the message text embedded, the location in the message dictionary of the randomly-chosen message text, the embedding rate, and any passwords given to perform the embedding. This information is given so that if a user decides to extract the message, its content can be verified.
Q: Where can I find the EXIF file for an image?
A: The EXIF file is included only with the “original image” files (see above for definition of “original image”). An original image is acquired with a digital camera from a mobile phone. EXIF files are not included with input images, cover images, or stego images.
Q: My search is not bringing up any images. Why?
A: The search is an additive process, so at least one box must be checked within the segmented sections to produce results. If no boxes are checked in a section, the search will not find any images and an error message in red is printed at the top of the search page. Another reason that no images are found is that no image satisfies all the criteria, such as too restrictive values for exposure settings.
Q: My search finds too many images and the download file size is too large. What do I do?
A: Select fewer criteria. You may select fewer phones, fewer apps (for stegos), or a narrower range of ISO and/or exposure time settings.
Q: How many images are in the database?
A: Over 960,000!
Q: How are the images interrelated, such as which cover image is cropped from which original image, what stego images are created, etc.?
A: The entity relationship (ER) diagram for the database describes how the data is related in the database. See here for the paper describing the database, and see here for ER diagram itself. The ER diagram gives the general overview of the relationship between images in the database, and also gives the attributes associated with each image. Given a specific stego image, the name for the cover image used to create that stego image is given in the csv file downloaded with the stego images. The document titled “PDF file of Download Information and Instructions” also has information that may be helpful to understand the relation between images in your download.
Download Information and Instructions
Information and Instructions are available here.
Please check back for updates.