Image Manipulation in MATLAB

Image Manipulation in MATLAB Due 11/2 at 11:59 PM 1 Introduction A couple of your friends in the business school have a concept for a great new picture editing application for mobile devices, but they need help on the technical side of things. They’ve asked you to come on board and help write some of the code necessary to manipulate the images. It just so happens that you have been studying matrix manipulations in your differential equations class, and this knowledge along with some basic programming skills will allow you to make a significant contribution to the application. Digital images are just matrices of pixels, and any type of matrix operation can be applied to a matrix containing image data. In this project, you will explore some ways to manipulate images using MATLAB. We’ll start off with transformation matrices and then move on to image compression. 2 Basic MATLAB commands MATLAB has some nice built in functions for reading and writing image files—the first command we’ll be using is imread, which reads in an image file saved in the current directory. imread works with most popular image file formats. To view an image in a matlab figure, use imagesc. imagesc is similar to image, but for our purposes will work more consistently. The following code will read in an image with file name photo1.jpg, save it as the variable X, and display the image in a matlab figure window. Make sure you run the code from the same folder that contains the image. After reading in an image like this, X(:,:,1) is a 2-D matrix with intensity values for the red channel, X(:,:,2) for the green and X(:,:,3) for the blue. When images are read in using imread, MATLAB stores the data as 8-bit integers, or integers that can range from 0 to 255. If we want to perform mathematical operations on the image data using floating point numbers, the integers must be converted to floats as well. If you just read in an image as X, you can use X double = double(X); to perform the floating point conversion. If you want to write image data to an image file, you can use imwrite. Note that if you converted the image data to the double format, you’ll need to convert back to integer values. The command to do this is uint8. The following code shows how to read in an image, convert to double, and write to a .jpg file. You’ll want to build your image manipulation code around this template if you wish to write your output to an image file. 1 Note that if you want to view the image after you’ve converted the image data to double format, you may need to convert back to uint8 just as you did to use imwrite. In other words, the command you’ll need to view an image once you’ve converted it to double format is imagesc(uint8(X double)). For much of this lab, we’ll be working with grayscale versions of photos to keep the matrix manipulations simpler. To make a grayscale image out of our color image, we want to combine the matrices that store the information for the red, green, and blue colors into one matrix whose (i, j)-th entry tells the intensity level of the pixel at that position in the image, with 0 being black and 255 being white. Since we want the information from each of the matrices storing the red, green, and blue values for the image to contribute to the intensity of each pixel, we’ll create the matrix for the grayscale image by making a linear combination of each of X double(:,:,1), X double(:,:,2), and X double(:,:,3). The following lines of code will convert the image to grayscale and display this converted image in a MATLAB figure: X gray = .3*X double(:,:,1) + .3*X double(:,:,2) + .4*X double(:,:,3); imagesc(uint8(X gray)) colormap(’gray’) 3 Image Manipulation Matrix multiplication allows us to transform a vector in many ways. The following matrix takes the entries of a vector and shifts them down one position, cycling the last entry around to the top. ? ? ? ? ? ? 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 ? ? ? ? ? ? ? ? ? ? ? ? 1 2 3 4 5 ? ? ? ? ? ? = ? ? ? ? ? ? 5 1 2 3 4 ? ? ? ? ? ? Matrices like the one on the left above, which we’ll call P, can be helpfully visualized with the MATLAB command spy(). If P is defined in the MATLAB workspace, then typing spy(P) gives the figure shown below. spy() is especially useful for visualizing sparse* matrices that have large dimensions. *Matrices with relatively few nonzero entries. 2 Figure 1: The figure shows the location of the nonzero entries of the matrix P. The axes give the row (y-axis) and column (x-axis) indices for the nonzero entries. Notice that if you start with the identity matrix and interchange rows until you get the matrix on the left above, multiplying a vector by that new matrix applies the same row interchanges to the vector. Each of the rows were shifted down one, and the last row cycled around to the top. This is called a permutation matrix because it is obtained by permuting the rows and columns of the identity matrix. In this case, row 1 turns into row 6, row 2 turns into row 3 etc. This transformation matrix works on matrices, too. ? ? ? ? ? ? 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 ? ? ? ? ? ? ? ? ? ? ? ? 1 6 11 16 21 2 7 12 17 22 3 8 13 18 23 4 9 14 19 24 5 10 15 20 25 ? ? ? ? ? ? = ? ? ? ? ? ? 5 10 15 20 25 1 6 11 16 21 2 7 12 17 22 3 8 13 18 23 4 9 14 19 24 ? ? ? ? ? ? Notice how the matrix multiplication cycled the rows around in the matrix the same way it did in the vector. Since an image is just a matrix, we can transform them using a linear transformation. The following image was transformed using a 256 ×256 version of the transformation matrix above, shifting the image down by 50 pixels. 3 Here’s the code that produced the image above. Row vectors can be transformed too, just by multiplying by the transformation matrix on the right side. As an example: 1 2 3 4 5 ? ? ? ? ? ? 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 ? ? ? ? ? ? = 2 3 4 5 1 However, the reordering is not the same as before—the transpose of the transformation matrix must be used to shift the elements so that the last element is first. 1 2 3 4 5 ? ? ? ? ? ? 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 ? ? ? ? ? ? = 5 1 2 3 4 4 Image Compression 4.1 The Discrete Sine Transform The Discrete Sine Transform (DST) is a technique for decomposing a vector into a linear combination of sine functions with different frequencies. The idea is similar to that of a Taylor series, except instead of using polynomials to approximate a function, we are using sine functions. If the data in a vector are smooth, then the low frequency components will dominate the linear combination. If the data are not smooth (discontinuous, jagged, rapidly increasing or decreasing), then the coefficients on the higher frequency components will have greater magnitude. In practice, transforming a vector with the DST means multiplying the vector by a special matrix called (unsurprisingly) the DST matrix. There are several ways to define the DST matrix; for 4 this assignment, use: Si,j = r 2 n sin p(i - 1 2 )(j - 1 2 ) n ! Where S is an n × n matrix, and Si,j is the entry of S in the i-th row and j-th column. There are several ways to construct this matrix, but the simplest way is to use nested for loops. If you get stuck, you may find the second half of Worksheet 5 on the APPM 2460 web page helpful. To apply this transform matrix to a vector, just multiply. So if y is the transformed version of x, we would obtain it by computing y = Sx. Since S is square, the 1-D DST (i.e. the DST that operates on vectors) is an operation that takes in a vector of length n and returns another vector of length n. For 1-D data (vectors), the output is a vector containing weights for the different frequency components; the higher the weight, the more important that frequency is. However, we cannot use S to transform our data because our image data is a matrix, which is two dimensional. So, we will need the 2-D transform. Thankfully, it’s very easy to compute the 2-D DST using the 1-D transform matrix. Let Xg be the grayscale version of the image data† . Then the 2-D DST for the image Xg is: Y = SXgS T Intuitively, you can think of SXg as applying applying the 1-D DST to the columns of Xg, and XgS T as applying the 1-D DST to the rows of Xg. So SXgS T applies the 1-D transform to both the rows and the columns of Xg. Our DST matrix has the special property that it is symmetric, or equal to its transpose. So for our DST matrix S, S T = S Now we can define the 2-D transformed image as: Y = SXgS If we want to get our original image back from the DST, we’ll need to know the inverse of S. Our matrix S also has the property that it is its own inverse. So we have, S -1 = S This is useful, since inverses are often difficult to compute. †Xg is a matrix whose (i, j) entry represents the grayscale level at pixel position (i, j). In our case, the values range from 0 to 255, with 0 being black and 255 being white. 5 Figure 2: DST coefficients for an image (values in the matrix Y ). Values in the upper left are weights on low frequency sine components while values in the lower right are weights on high frequency sine components. Since the values in the upper left are significantly larger than those in the lower right, we can see that low frequencies dominate the overall image. 4.2 Compression JPEG is a type of “lossy compression,” which means that the compressed file contains less information than the original. Since human eyes are better at seeing lower frequency components, we can afford to toss out the highest frequency components of our transformed image. The more uniform an image, the more data we can throw away without causing a noticeable loss in quality. More complicated images can still be compressed, but heavy compression is more noticeable. Thankfully, the DST helps us sort out which components of the image are represented by low frequencies, which are the ones we need to keep. The information corresponding to the highest frequencies is stored in the lower right of the transformed matrix, while the information for the lowest frequencies is stored in the upper left. Therefore, we want to save data in the upper left, and not store data from the remaining entries. We will simulate this effect by zeroing out the high frequency components. The following code will zero out the matrix below the off diagonal: Adjusting the value of p moves the diagonal up and down the matrix, affecting how much data are 6 retained. This illustration shows how the off diagonal moves with changes in p. After deleting the high frequency data, the inverse 2-D DST must be applied to return the transformed image back to normal space (right now it will look nothing like the original photograph). Since none of the zeros need to be stored, this process could allow for a significant reduction in file size. The compression we perform here is a simplified version of the compression involved when storing an image as a JPEG file, which uses a Discrete Cosine Transform on 8 × 8 blocks of the image data. Figure 3: The left figure was obtained by performing the DST on the image Y = SXS, zeroing out the bottom right “half” of the matrix Y (using the above algorithm with p=0.5), and then applying the inverse DST. The right figure was obtained by zeroing out the top left “half” of Y and applying the DST. This shows that the values in the bottom right half of the matrix store the image data that corresponds to quickly oscillating pixel intensities, or fine textures in the image. The values in the top left of Y correspond to image data with coarser, smoother texture. We also clearly see that the values in the top left contain the vast majority of the information in the image. 5 Questions Your friends have asked you to write code capable of shifting, cropping, and compressing images (image compression can also double as a blurring effect because removing high frequency information smooths sharp edges). To do so, follow the steps laid out in the questions below. In your report, you should include examples of images that you have processed with your code, and you should describe the techniques you used to achieve your results. For this project, submit all your code in an appendix at the end of your report. Make sure to comment your code clearly, so it’s very obvious which problem the code was for. Output not supported by code in the appendix will not be counted. 7 5.1 Image Translation 1. Write a MATLAB function to read in the files photo1.jpg and photo2.jpg, which can be found on the APPM 2460 web page, and store them as matrices of doubles. Convert the color arrays into grayscale matrices and include these grayscale images in your writeup. These tasks can be completed by piecing together the blocks of code given in Section 2. 2. When a grayscale image is stored as a matrix, it is a simple task to alter the exposure of the image. Remembering that the values of the matrix represent the pixel intensity, increase the exposure in photo2.jpg so that the resulting image appears more “whited out” than the original. Include this increased-exposure image in your report and place the original alongside it so your friends can clearly see the difference between the two. 3. Given an image that is 4 pixels x 4 pixels, so that the grayscale representation of the image is a 4 x 4 matrix called A, what matrix E would you multiply A by in order to switch the left-most column of pixels with the right-most column of pixels? Should you multiply A by E on the right or the left (i.e. would AE, EA, or both give the desired result)? 4. The matrix from photo1.jpg is not square. We can, however, still apply matrices to shift the pixels. Find a matrix that will perform a horizontal shift of 240 pixels to photo1.jpg and include the shifted image in your write up. Hint: We saw with n × n matrices that to perform a horizontal shift we multiply our matrix by a transformation matrix on the right. The transformation matrix on the right was obtained by transforming the columns of the n × n identity in the same way we wanted the columns of the image matrix to be transformed. For a non-square matrix X, we can take the same approach, but we have to start with the correct identity matrix. Think about the dimensions of the matrix you want to transform and find the matrix IR such that XIR = X. Manipulate the columns of IR to obtain the transformation matrix. Display your matrix using spy(). 5. How could you perform a horizontal and vertical shift? That is, what matrix operations would need to be applied to to get an image to wrap around both horizontally and vertically? Apply transformations to the original matrix from photo1.jpg that result in both a horizontal and vertical shift. This matrix isn’t square, so think about the dimensions for the appropriate transformation matrices. Shift the image 240 pixels horizontally and 100 pixels vertically. Display your transformation matrix/matrices using spy(). 6. Using what you learned about transformation matrices, determine what matrix would be required to flip an image upside down. Using that transformation, flip photo2.jpg upside down. Use spy() once more to display your transformation matrix. 7. What should transposing an image matrix do? Try it with photo2.jpg. Does it look the way you expected? 8. Use matrix operations to “fix” (shift the image around so that it matches a portion of the photo1.jpg image) the distorted mount2.jpg, and reconnect it to mount1.jpg. You can combine the pictures by creating a new matrix that contains the values for both images. When you are done editing and combining the photos, you should have reconstructed a grayscale version of photo1.jpg. Hint: When viewing the image using imagesc, the axis labels are the pixel numbers. 8 5.2 Image Compression 9. Verify that, for the 5 × 5 matrix S, S = S -1 . That is, show that SS = I5 (this can be done in MATLAB). 10. Determine what steps need to be taken to undo the 2-D DST. Remember that our DST is defined by Y = SXgS, and also the special properties of S. You can easily check to see if your inverse transform works by applying it to Y and viewing it with imagesc. 11. Perform our simplified JPEG-type compression on the image of Albus the cat, photo2.jpg • Read an image into MATLAB and store as a matrix of doubles • Convert the 3-D RGB matrix to a 2-D grayscale matrix • Perform the 2-D discrete sine transform on the grayscale image data • Delete some of the less important values in the transformed matrix using the included algorithm • Perform the inverse discrete sine transform • View the image or write to a file Compress the image with several different values of p. Include sample images for compression values that don’t cause an obvious drop in quality, as well as some that do. 12. You should be able to make p pretty small before noticing a significant loss in quality. Explain why you think this might be the case. The point of image compression is to reduce storage requirements; compare the number of non-zero entries in the transformed image matrix (Y , not Xg) to a qualitative impression of compressed image quality. What value of p do you think provides a good balance? (no correct answer, just explain) 13. The compression ratio is defined as the ratio between the uncompressed file size and compressed size CR = uncompressed size compressed size . Determine CR for p = 0.5, 0.3, 0.1, using the number of nonzero entries in your transformed image matrix Y as a substitute for file size. In your view, what p value (any value of p, not just p = 0.5, 0.3, 0.1) gives the best (largest) CR while still maintaining reasonable image quality? 9