In my last blog post I discussed the art of embedding secret messages in any file so that only the sender and the receiver know about the presence of that message. This is called steganography. In this post I will write about the information security discipline that tries to discover this kind of messages.
Steganalysis is the counter part of steganography and it is defined as the art or science of discovering hidden data in cover objects. The people who works in this discipline are called steganalysts.
Nowadays, a lot of different techniques have recently appeared in steganalysis but they can be generally summarized by 3 branches:
- "Chi-square" Methods: The chi-square attack is a statistical test to measure if a given set of observed data and an expected set of data are similar or not. The original version of this attack could detect sequentially embedded messages and was later generalized to randomly scattered messages.
- Distinguishing Statistic Methods: In this approach, the steganalyst first carefully inspects the embedding algorithm and then identifies a quantity (the distinguishing statistics) that changes predictably with the length of the embedded message. The detection philosophy is not limited to any specific type of the embedding operation and works for randomly scattered messages as well. One disadvantage of this approach is that the detection needs to be customized to each embedding paradigm and the design of proper distinguishing statistics cannot be easily automatized.
- Blind Classifier Methods: First, a blind detector needs to learn what a typical, unmodified image looks like from multiple perspectives. Then, a classifier is trained to learn the differences between an unmodified image and a stegoimage (an image that has been modified). This methodology combined with a powerful classifier gives very impressive results.
It is really important to mention that the job of a steganalyst is to detect if there is a secret message hidden in a digital file. It is not their job to recover the secret message.
There are many different methods for detecting if an image has been modified. One of the easiest ones is developed by using the idea that cameras doesn't use all the different colors in the nature. Cameras approximate some of the colors to a near color so they don't need to manage a big amount of different values in the color palette. For example, let's assume that we have a grey-scaled image with grey intensities from 0 to 255, it is easier to use only half of those values by rounding the odds values to the next even number.
Figure 1 shows the histogram for an image using this value compression method. An image histogram is a graphical representation of the number of pixels in an image as a function of their intensity. You can notice that there are values which never appears in the image, those are the ones that are rounded to another value for managing a smaller color palette.
Figure 1. Image histogram from a camera image.
Figure 2 shows the histogram for the same image after hiding a message. We can see that now there are more different values in the color palette. This happens because when we use a LSB steganographic method, we modify the last bit of every pixel, so the values that were not used in the original color palette appears in the histogram of the modified image.
Figure 2. Image histogram from a camera image after embedding a secret message
So, we can know if this kind of image has been modified simply by checking the histograms.
While this has been just a brief introduction to steganalysis, it is a very deep and fascinating discipline. If you want to know a little more just leave a comment and I will reply you as soon as possible.