Preparing for work
As always, first open a terminal window and run the command.update50
to make sure your application is already up to date. Before you begin, follow this cd ~ / workspace
wget http://cdn.cs50.net/2015/fall/psets/4/pset4/pset4.zip
to download a ZIP archive of this task. Now if you run ls you will see that you have a file called pset4.zip in your ~/workspace directory . Extract it using the command: If you run the lsunzip pset4.zip
command again , you will see that another directory has appeared. Now you can delete the zip file as shown below: Let's open the pset4 directory, run ls , and make sure the directory contains rm -f pset4.zip
cd pset4
bmp / jpg / questions.txt
whodunit or “Who did this?”
If you've ever seen the default Windows XP desktop (https://en.wikipedia.org/wiki/Bliss_(image)) (hills and blue sky), then you've seen BMP. On web pages, you've most likely seen GIFs. Have you looked at digital photos? So, we had the joy of seeing JPEG. If you've ever taken a screenshot on a Mac, you've most likely seen a PNG. Read on the Internet about the BMP, GIF, JPEG, PNG formats and answer these questions:-
How many colors does each format support?
-
Which format supports animation?
-
What is the difference between lossy and lossless compression?
-
Which of these formats uses lossy compression?
-
What happens from a technical point of view when a file is deleted in a FAT file system?
-
What can be done to ensure (with a high probability) that deleted files cannot be recovered?
xxd -c 24 -g 3 -s 54 smiley.bmp
You should see what is shown below; We have again highlighted in red all instances of 0000ff. In the image in the leftmost column you can see the addresses in the file, which are equivalent to the offset from the first byte of the file. All of them are given in hexadecimal number system. If we convert hexadecimal 00000036 to decimal, we get 54. So you're looking at the 54th byte from smiley.bmp . Recall that in 24-bit BMP files, the first 14 + 40 = 54 bytes are filled with metadata. So if you want to see the metadata, run the following command: xxd -c 24 -g 3 smiley.bmp
If smiley.bmp contains ASCII characters , we will see them in the rightmost column in xxd instead of all those dots. So, smiley is a 24-bit BMP (each pixel is represented by 24 ÷ 8 = 3 bytes) with a size (resolution) of 8x8 pixels. Each line (or "Scanline" as it is called) thus takes up (8 pixels) x (3 bytes per pixel) = 24 bytes. This number is a multiple of four, and this is important because the BMP file is stored slightly differently if the number of bytes in the line is not a multiple of four. So, in small.bmp, another 24-bit BMP file in our folder, you can see a green 3x3 pixel box. If you open it in an image viewer, you will see that it resembles the image shown below, only smaller in size. Each line in small.bmp thus occupies (3 pixels) × (3 bytes per pixel) = 9 bytes, which is not a multiple of 4. To get a line length that is a multiple of 4, it is padded with additional zeros: between 0 and 3 bytes we fill each line in 24-bit BMP format (can you guess why that is?). For small.bmp , 3 bytes of zeros are needed, since (3 pixels) x (3 bytes per pixel) + (3 bytes of padding) = 12 bytes, which is really a multiple of 4. To "see" this padding, do the following. xxd -c 12 -g 3 -s 54 small.bmp
Note that we use a different value for -c than with smiley.bmp , so xxd only outputs 4 columns this time (3 for the green square and 1 for padding). For clarity, we have highlighted all instances of 00ff00 in green. For contrast, let's use xxd for the large.bmp file . It looks exactly the same as small.bmp, only its resolution is 12x12 pixels, that is, four times larger. Run the command below. You may need to expand the window to avoid transfer. xxd -c 36 -g 3 -s 54 large.bmp
You'll see something like this: Please note, there are no digressions in this BMP! After all, (12 pixels) × (3 bytes per pixel) = 36 bytes, and this is a multiple of 4. The xxd hex editor showed us the bytes in our BMP files. How can we get them programmatically? In copy.c there is one program whose sole purpose in life is to create a copy of the BMP, piece by piece. Yes, you can use cp for this . However, cp will not be able to help Mr. Boddy. Let's hope copy.c does this, so here we go: ./copy smiley.bmp copy.bmp
If you now run ls (with the appropriate flag), you'll see that smiley.bmp and copy.bmp are indeed the same size. Let's check again if this is actually true? diff smiley.bmp copy.bmp
If this command doesn't display anything on the screen, it means that the files are indeed identical (important: some programs, like Photoshop, include trailing zeros at the ends of some VMPs. Our version of copy discards them, so don't worry if in case copying other BMPs that you have downloaded or created for testing, the copy will be a few bytes smaller than the original). You can open both files in Ristretto's image viewer (double-click) to confirm this visually. But diff does this comparison byte by byte, so her vision is sharper than yours! How was this copy created? It turns out that copy.c is related to bmp.h . Let's make sure: open bmp.h. There you'll see the actual definitions of those headers we've already mentioned, adapted from Microsoft's own implementations. Additionally, this file defines the BYTE, DWORD, LONG, and WORD data types, which are the data types typically found in the world of Win32 (i.e., Windows) programming. Note that these are essentially aliases for primitives that you are (hopefully) already familiar with. It turns out that BITMAPFILEHEADER and BITMAPINFOHEADER were using these types. This file also defines a struct called RGBTRIPLE. It “encapsulates” three bytes: one blue, one green and one red (this is the order in which we will look for RGB triplets on the disk). How are these structs useful? To recap, a file is simply a sequence of bytes (or ultimately bits) on disk. However, these bytes are typically ordered so that the first few represent something, then the next few represent something else, and so on. File "formats" exist because we have standards, or rules, that define what bytes mean what. Now, we can simply read the file from disk into RAM as one large byte array. And we remember that the byte at position [i] represents one thing, while the byte at position [j] is something else. But why not give some of these bytes names so we can more easily retrieve them from memory? This is exactly what the structures in bmp.h help us with. Instead of thinking of a file as one long sequence of bytes, we see it broken down into more understandable blocks - sequences of structures. Recall that smiley.bmp has a resolution of 8x8 pixels, so it takes up 14 + 40 + (8 × 8) × 3 = 246 bytes on disk (you can check this using the ls command). Here's what it looks like on disk according to Microsoft: We can see that order matters when it comes to members of structs. Byte 57 is rgbtBlue (not, say, rgbtRed) because rgbtBlue is defined in RGBTRIPLE first. By the way, our use of the packed attribute ensures that clang does not try to “word align” members (with the address of the first byte of each member being a multiple of 4), so that we do not end up with holes in our structures that do not exist on disk at all. Let's move on. Find the URLs that match BITMAPFILEHEADER and BITMAPINFOHEADER, as per the comments in bmp.h. Attention, great moment: you are starting to use MSDN (Microsoft Developer Network)! Instead of scrolling further through copy.c , answer a few questions to understand how the code works. As always, the man command is your true friend, and now also MSDN. If you don't know the answers, Google it and think about it. You can also refer to the stdio.h file at https://reference.cs50.net/.
-
Set a breakpoint in main (by clicking to the left of the ruler with main line numbers).
-
In a terminal tab , go to ~/workspace/pset4/bmp , and compile copy.c into the copy program using make.
-
Run debug50 copy smiley.bmp copy.bmp , this will open the debugger panel on the right.
-
Walk through the program step by step using the panel on the right. Note bf and bi . In ~/workspace/pset4/questions.txt , answer the questions:
-
What is stdint.h ?
-
What is the point of using uint8_t , uint32_t , int32_t and uint16_t in a program?
-
How many bytes does BYTE , DWORD , LONG , and WORD contain respectively (Assuming 32-bit architecture)?
- What (ASCII, decimal or hexadecimal) should the first two bytes of a BMP file be? (leading bytes, which are used to identify the file format (with high probability) are often called "magic numbers").
-
What's the difference between bfSize and biSize?
-
What does a negative biHeight mean?
-
Which field in BITMAPINFOHEADER defines the color depth in BMP (i.e. bits per pixel)?
-
Why can fopen function return NULL in copy.c 37?
-
Why is the third argument to fread in our code equal to 1?
-
What value in copy.c 70 defines padding if bi.biWidth is 3?
-
What does fseek do?
-
What is SEEK_CUR?
Back to Mr. Boddy. Exercise:
Write a program called whodunit in a file called whodunit.c that shows Mr. Boddy's drawing. Hmmmm, what? Like copy, the program must take exactly two command line arguments, and if you run your program as shown below, the result will be stored in verdict.bmp, in which Mr. Boddy's drawing will not be noisy../whodunit clue.bmp verdict.b
Let us suggest that you will start solving this mystery by running the command below. cp copy.c whodunit.c
You might be amazed at how many lines of code you need to write to help Mr. Boddy. There is nothing unnecessary hidden in smiley.bmp , so feel free to test the program on this file. It is small, and you can compare the output of your program and the output of xxd during development (or maybe there is something hidden in smiley.bmp ? Actually, no). By the way, this problem can be solved in different ways. Once you identify Mr. Boddy's drawing, he will rest in peace. Since whodunit can be implemented in multiple ways, you won't be able to check the correctness of the implementations with check50 . And let it spoil your fun, but the assistants' solution is also not available for the whodunit problem . Finally, in the file In ~/workspace/pset4/questions.txt , answer the following question: Whodunit? //ктоэтосделал?
resize
Well, now - the next test! Let's write a program called resize in resize.c . It will resize an uncompressed 24-bit BMP image in steps of n. Your application must accept exactly three command line arguments, with the first (n) being an integer not greater than 100, the second being the name of the file that will be modified, and the third being the name of the saved version of the modified file.Usage: ./resize n infile outfile
With such a program, we could create large.bmp from small.bmp by resizing the latter by 4 (i.e., multiplying both the width and height by 4), as shown below. For simplicity, you can start the task by copying copy.c./resize 4 small.bmp large.bmp
again and naming the copy resize.c . But first, ask yourself and answer these questions: what does it mean to change the BMP size (you can assume that n times the infile size will not exceed 232 - 1) . Determine which fields in BITMAPFILEHEADER and BITMAPINFOHEADER you need to change. Consider whether you need to add or remove scanlines fields . And, yes, be grateful that we are not asking you to consider all possible values of n from 0 to 1! (although, if you're interested, this is a problem from a hacker's book ;)). However, we assume that for n = 1 the program will work correctly and the output file outfile will be the same size as the original infile. Do you want to check the program using check50? Type the following command: ~cs50/pset4/resize
Well, if you want to see, for example, the
large.bmp headers (in a more user-friendly form than xxd allows), you need to run the following command:
~cs50/pset4/peek large.bmp
Even better, if you want to compare your headers with the headers of the CS50 assistant files. You can run commands inside your
~/workspace/pset4/bmp directory (think about what each command does). If you used
malloc , be sure to use
free to prevent memory leaks. Try using
valgrind to check for leaks.
./resize 4 small.bmp student.bmp
~cs50/pset4/resize 4 small.bmp staff.bmp
~cs50/pset4/peek student.bmp staff.bmp
How to decide?
-
Open the file that we need to enlarge, and also create and open a new file in which the enlarged image will be recorded;
-
update header information for the output file. Since our image is in BMP format and we are changing its size, we need to write the header of the new file with the new dimensions. What will change? The file size, as well as the image size - its width and height.
-
We read the outgoing file, line by line, pixel by pixel. To do this, we again turn to our file I/O library and the fread function. It takes a pointer to a structure that will contain the bytes read, the size of the single element we are going to read, the number of such elements, and a pointer to the file from which we will read.
-
We increase each line horizontally according to the specified scale, and write the result to the output file.
How do we write files? We have a function fwrite, to which we pass an indicator to the structure where the data to be written to the file is located, the size of the element, their number and a pointer to the output file. To organize the loop, we can use the for loop we are already familiar with .
-
Fill in the gaps! If the number of pixels in a line is not a multiple of four, we must add "alignment" - zero bytes. We will need a formula to calculate the alignment size. To write null bytes to an output file, you can use the fputc function, passing it the character you want to write and a pointer to the output file.
Now that we've stretched the string horizontally and added an alignment to the output file, we need to move the current position in the output file because we need to jump over the alignment.
-
Increase vertical size. It's more complicated, but we can use the sample code from copy.c (copy.c opens the output file, writes a header to the output file, reads the image from the source file line by line, pixel by pixel, and writes them to the output file). Based on this, the first thing you can do is run the following command: cp copy.c resize.c
To stretch an image vertically means to copy each line several times. There are several different ways to do this. For example, using rewriting, when we save all the pixels of one line in memory and write this line to the output file in a loop as many times as needed. Another method is recopying: after reading a line from the outgoing file, writing it to the output file and aligning it, return the fseek function back to the beginning of the line in the outgoing file and repeat everything again several times.
-
Open the file with the contents of the memory card.
-
Find the beginning of the JPEG file. All files on this card are images in JPEG format.
recover
In anticipation of Week 4's problem paper, I've spent the last few days looking at photos saved in JPEG format by my digital camera on a 1 GB CompactFlash (CF) memory card. Please don't tell me that I've actually spent the last few days on Facebook instead. Unfortunately, my computer skills leave much to be desired, and without knowing it, I accidentally deleted all the photos! Fortunately, in the computer world, "deleted" usually does not equal "killed." My computer insists that the memory card is now empty, but I know it's lying. Assignment: Write a program in ~/workspace/pset4/jpg/recover.c that will recover these photos. Hmmm. Okay, here's one more clarification. Although the JPEG format is more complex than BMP, JPEG has "signatures," byte patterns that help distinguish it from other file formats. Most JPEG files start with the following three bytes:0xff 0xd8 0xff
From the first byte to the third, from left to right. The fourth byte will most likely be one of the following combinations: 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,0xe8, 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef. In other words, the first four bits of the fourth byte of a JPEG file are 1110. Chances are, if you find one of these patterns on the drive where the photos were stored (like my memory card), this will be the start of the JPEG file. Of course, you may encounter this on which disk purely by chance; data recovery cannot be called an exact science.
GO TO FULL VERSION