18 June 2001
Air Force Research Laboratory,
August 2000
The use of digital images is becoming more prevalent, from taking pictures with a digital camera, to using satellite imagery to track military targets in Kosovo. With this increased reliance on digital images for information, the need to ensure their authenticity increases as well.
Existing digital imagery authentication techniques are based on cryptographic principles and digital signatures. These schemes effectively protect the data from modification during transmission, but they offer no protection following transmission. Since the information needed for these schemes to perform the authentication is separate from the data, an attacker can simply modify the data, recalculate the new message digest or digital signature, and attach them together. Without knowledge of the original data or of the original authentication information, it is impossible to contest the authenticity of the modified digital image. However, since the value of digital images is based on its content, the image bits can be modified to embed codes without changing the meaning of its content. Once the codes are embedded in the data content and the data is manipulated, these codes will also be modified so the authenticator can examine them to verify the integrity of the data.
One approach to verify an image's authenticity is to embed checksums into the least significant bits (LSB) of the image. A secret numeric key known by both the sender and the recipient protects these checksums. Walton1 proposes a technique that uses a key-dependent pseudo-random walk on the image. The checksum is obtained by summing the numbers determined by the seven most significant bits and taking a remainder operation with a large integer L. The checksum is inserted in binary form in the least significant bit of selected pixels. This could be repeated for many disjoint random walks or for one random walk that goes through all pixels. The method is very fast and often modifies only half of the pixels by one gray level, which is undetectable to the human eye.
Although the LSBs of the image are disturbed, the resulting image is perceptually equivalent to the original. The human eye cannot detect any difference because changes to the LSB of a pixel affect its value by only one. Since pixel values range from 1 to 256, there will be at most a �1/256 change (�0.39%) in pixel intensity. This is important since the intent is to minimize the degradation of image fidelity during the embedding procedure. This article extends Walton's algorithm to allow one to localize the tampering.
The algorithm for embedding a checksum is outlined by Walton in three steps. The use of this algorithm is varied, as reported in this article, to achieve different results, but the embedding process is the same throughout. To embed a checksum, start with a M�N matrix of pixel values that will be used for embedding.
Walton's steps for embedding are:
(1) use the user-specified key to determine the random walk over the matrix,
(2) go over the entire matrix to construct the checksum out of the seven most-significant bits of each pixel, and
(3) embed the checksum bits, one by one, into the pixels at the random-walk addresses.
To extract the checksum and check for authenticity:
(1) use the user-specified key to determine the random walk over the matrix,
(2) go over the entire matrix to construct a checksum out of the seven most-significant bits of each pixel, and
(3) use the random-walk addresses and assemble the checksum from the LSBs of the visited pixels that will be compared with the checksum obtained in step two.
There are three different implementations to this basic algorithm. The first version, named "Build One," protects the entire image with one checksum and can determine whether the image was tampered. The second version, "Build Two," embeds checksums into sub-blocks of a grayscale image in order to localize tampering. The last version, "Build Three," extends Build Two to deal with color red, green, blue (RGB) images and implements a walk-dependent checksum. A walk-dependent checksum prevents tampering based on exchanging groups of pixels with the same checksum or exchanging sub-blocks with their embedded checksums. Rather than obtain the checksum from summing the pixel values, multiply the pixel value by a function of the position of that pixel, and then sum these products.
Build One takes a M�N grayscale image and a numeric key and embeds a single checksum into the image. Since every pixel in the image is used to create the 32-bit checksum, the whole image is protected. Although 32 pixels (maximum) will be modified by one grayscale level (maximum), any minute tampering on the image can be detected. Because the checksum, CS, is computed from summing the pixel values throughout the whole image (Equation 1), modifying the image will invalidate the checksum.
Equation 1:
where
CS32-bit = 32-bit checksum number
pi,j = pixel value at position (i, j)
Output is one of the following two messages:
"Your image file most likely has not been tampered with."
or
"Your image file has been tampered with!"
The authenticator either signifies the image has been tampered with or has not likely been tampered with. The quantifier "most likely" points to a weakness of this scheme. A successful attack on this scheme is to exchange pixels in the image. Exchanging pixels, provided the pixels containing the checksum are not touched, will not affect the checksum, and therefore, this form of tampering would not be detected.
Build One attempts to protect every pixel in the image with a single checksum, no matter what the dimensions of the image are. Although it may be able to detect tampering, it cannot give any indication as to the tampered location in the image.
Build Two embeds multiple checksums into an image in order to localize any tampering. Build Two takes a M�N grayscale image, and a numeric key, and also allows the user to specify a block size B. The image is broken up into B�B sub-blocks, each embedded with a checksum.
Since the image will be broken into square blocks, the only way to protect every pixel is to have an image with a length and width that are multiples of the block size chosen. For example, if an image is 125 pixels wide and the specified block size is 30, the last five-pixel columns will be unprotected. If the size of the image is known and the block size is chosen appropriately, the number of unprotected pixels can be kept to a minimum. If it is crucial to protect all pixels, an algorithm must be devised for these leftover blocks.
The smallest block size used has sides that are six pixels long. There are 36 pixels in this block and at least 32 pixels are needed to embed the 32-bit checksum.
When Build Two checks for authenticity, any block without matching checksums is marked with a black X and a white border. This strategy allows a tampered block to be noticed, regardless of the background color.
The following figures give an example of Build Two. In Figure 1, an original aerial photograph of an Army Urban Training Site near Fort Drum, New York is shown. Figure 2 shows the image after the checksums have been embedded into sub-blocks (every 6x6 block of pixels) of the image. It is impossible for human eyes to detect the difference between the two. Figure 3 gives an example of a tampered version of the image. Figure 4 shows the receiver's view after the image undergoes the tamper (checksum extraction) test. Any tampered blocks are marked.
Build Two is able to localize and mark the tampered areas. The smaller the chosen block size, the more precise the algorithm can pinpoint areas in an image. The price for this greater accuracy, however, is performance. The smaller the block size, the more checksums to calculate and embed, and consequently, the longer it would take to verify the image's authenticity. Performance varies inversely with the size of the blocks.
During initial testing, some blocks were reported as tampered even when the image had not been touched. This was due to the boundary conditions imposed by MATLAB's representation of intensity value. To solve the problem, the image was normalized to [0,255] instead of using MATLAB's range of [1,256]. This normalization, performed before embedding and extracting, ensured correct encoding of the LSB.
The weaknesses of Build Two are similar to the weaknesses of Build One. Exchanging pixels in each sub-block will not be detected. Moreover, exchanging sub-blocks will also be undetectable. To overcome these weaknesses, the embedded checksum must be walk-dependent.
Build Three extends Build Two to deal with RGB (color) images and introduces a walk-dependant checksum. The walk-dependant checksum of build Three does not allow pixels or blocks to be exchanged without detection, which is a weakness of the previous two builds. Build Three calculates the checksum by multiplying the pixel values by a function of their positions, and then summing them (Equation 2).
Equation 2:
where
CS32-bit = 32-bit checksum number
pi,j = pixel value at position (i, j)
f(i,j) = function of i and j
Since an RGB image is structured like three grayscale images on top of each other (a M�N matrix for each RGB level), there are more checksums embedded in the RGB image. Consequently, performance suffers. If the same size image and block size are used for a grayscale and an RGB image, the RGB version would embed three times as many checksums as the grayscale because it embeds them into each RGB level separately. The trade-off is that modifications localized to a single color plane can be detected.
The following figures are the results of using Build Three to embed checksums into every 12�12 block of pixels of a test image. Figure 5 shows the original image, while Figure 6 shows the image with the embedded checksum. Figure 7 shows the image after some minor modifications were made. Figure 8 shows the results of using the authentication algorithm. Any tampered areas were highlighted to make the changes obvious.
Since each of the three levels of RGB images are embedded with a checksum in Build Three, it is possible to detect minute changes in just one of the color levels. Figures 9, 10, and 11 demonstrate this capability. The original image was modified in three areas. In each area, a different color plane was slightly modified to produce the tampered image. Although the modifications are imperceivable by the human eye, the implemented algorithm was able to successfully localize the changes (see Figures 10 and 11).
With the increased reliance on images for information, it is very important to protect digital images from unauthorized modification. The proposed algorithm checks for and localizes any tampering to a digital image.
All three previous builds work well for detecting any modification to an image. However, because these algorithms are based on modifying the LSBs, they cannot distinguish between malicious and non-malicious attacks on a digital image. Malicious attacks are attacks that modify the content of an image including manipulation, deletion, or addition of features. Non-malicious attacks modify the pixel values of an image, but do not modify its contents. Examples of non-malicious attacks are adjustment of brightness or contrast, lossy compression, or filtering. An acceptable content authentication scheme should protect all pixels in an image and distinguish between non-malicious and malicious changes to digital images.
One possible approach to solve the malicious/non-malicious weakness of the previous builds is to embed watermarks in the frequency domain rather than in the LSBs of pixels. Frequency domain techniques generally provide more robustness with respect to most of the signal processing attacks. LSB embedding techniques, although easy to implement, are too fragile for content authentication purposes.
Another approach is to embed the checksum into a more significant bit-plane (i.e., the third or fourth bit). With this approach, the image would be less sensitive to minor changes, such as adding noise. M-sequences could also be embedded into the image instead of a checksum, as proposed by van Schyndel, Tirkel, and Osborne2. With embedded m-sequences, a correlation detector is used to determine whether an image (or block) has been tampered.
Figure 1. Original image |
Figure 2. Protected image |
Figure 3. Tampered image |
Figure 4. Checked for authenticity |
Figure 5. Original image |
Figure 6. Protected image |
Figure 7. Tampered image |
Figure 8. Checked for authenticity |
Figure 9. Original image |
|
Figure 11. Checked for authenticity |
References
1 S. Walton, "Information Authentication for a Slippery New Age," Dr. Dobbs Journal, April 1995, Vol. 20, No. 4, pp. 18-26.
2 R. G. van Schyndel, A. Z. Tirkel, and C. F. Osborne, "A Digital Watermark," Proc. of the IEEE International Conference on Image Processing, November 1994, Vol. 2, pp. 86-90.
This article was written by 1Lt Arnold Baldoza and Mr. Michael Sieffert of the Air Force Research Laboratory's Information Directorate. For more information contact TECH CONNECT at 1-800-203-6451 or visit the web site at http://www.afrl.af.mil (Technology Transfer). Reference document IF-99-05.
About the Author:
MICHAEL J. SIEFFERT is currently a full time student at Binghamton University of the State University of New York. Fall 2000 will mark the beginning of his junior year as a Computer Science major. He also plans to minor in Mathematics. When not in school, Michael is working as an intern at the Air Force Research Laboratory in Rome NY, where he is a member of a team researching "Smart Images."