JavaRush /Java Blog /Random EN /Harvard CS50: Week 3 Assignments (Lectures 7 and 8), Part...
Masha
Level 41

Harvard CS50: Week 3 Assignments (Lectures 7 and 8), Part 1

Published in the Random EN group
Lectures from the Harvard course on the fundamentals of programming CS50 Additional materials: asymptotic notation, sorting and searching algorithms Part two. "Tag" in C

Preparation

Before taking on the problems, watch lectures 7-8 , read and delve into the “ Additional materials of the third week ”. They are devoted to search algorithms (linear and binary), sorting (there are many of them!), as well as the work of a debugger (the ability to work with a debugger to debug programs is an extremely important skill!). If everything went as it should with the lectures and theory, you can easily answer the test questions:
  • Why does binary search require a sorted array?
  • Why is bubble sort estimated to be O(n2)?
  • Why is the insertion sort estimate Ω(n)?
  • How does selection sort work? Describe in three sentences (or less).
  • What is the upper limit (worst case) on how long merge sort can run?
  • The GDB debugger allows you to debug a program. And if you formulate more specifically, what exactly does it allow you to do?

Goals

  • Learn to work with projects that contain multiple files
  • Learn to read someone else's source code
  • Understand various algorithms and recursive functions
Most of these goals are practically impossible to formally evaluate. Therefore, from the point of view of formal verification of problems, there is little that will seem difficult to you. However, we remind you: you can only learn programming through constant practice! Therefore, we encourage you not only to solve the tasks, but also to spend additional time and effort and implement all the algorithms that were discussed this week. Forward!

Begin

Recall that for the practice problems in weeks one and two, you wrote programs from scratch and created your own pset1 and pset2 directories using the mkdir command . For the third week's practice assignment, you need to download the distribution (or "distro" as we call it) that we wrote and add your own lines of code to it. First, read our code and try to understand it. The most important skill this week is learning how to read other people's code. So, log in to cs50.io and run the command update50 in a terminal window to make sure the workspace version is up to date. If you accidentally closed the terminal window and cannot find it, go to the View menu and make sure that the Console option is checked (check it if it is not). Click on (+), inside the green circle on the frame of the terminal window, select New Terminal . Harvard CS50: Week 3 Assignments (Lectures 7 and 8), Part 1 - 1 After that, run the command cd ~/workspace and make sure you are inside the workspace directory (this is your directory). After that, run the command: wget http://cdn.cs50.net/2015/fall/psets/3/pset3/pset3.zip to download the ZIP archive of the problem book into your workspace (wget is the Linux download command). If everything is ok, you will see the following phrase in the line: 'pset3.zip' saved Make sure that you really downloaded pset3.zip by running the command: ls and then run unzip pset3.zip to unzip the file. If you now run the ls command again , you will also see the pset3 directory . Go to it by running the command cd pset3 and then request to view the contents of the directory again: ls you will see that there are two subdirectories in this directory: fifteen find Already intriguing!

Search

Let's now dive into one of these subdirectories. To do this, we'll run the command: cd ~/workspace/pset3/find If you display the contents of this folder on the screen (you probably already remember how to do this), this is what you should see: helpers.c helpers.h Makefile find.c generate.c Wow, there are a lot of files. But don't worry, we'll deal with them now. The file generate.c contains code for a program that uses a "pseudo-random number generator" (generated by the drand48 function ) to generate a large number of random (actually pseudo-random, computers can't handle pure randomness!) numbers, and place them one at a time in line.Compile the program: make generate Now run it by running the command: ./generate The program will give you the following message: Usage: generate n [s] This means that the program expects one or two command line arguments.Using the first one, n, is mandatory; this number means how many pseudorandom numbers you want to create. The parameter s is optional, as indicated by the square brackets. The number s can be called the "seed" for the pseudorandom number generator. By "seed" we mean the input data to the pseudorandom number generator that affects its output. For example, if you are using drand48, first By calling srand48 (another function whose purpose is to "seed" drand48) with an argument of, say, 1, and then calling drand48 three times, drand48 might return 2728, then 29785, then 54710. If you instead of the previous one, using drand48, first call srand48 with an argument of, say, 2, and then use drand48 again three times, drand48 might return 59797, then 10425, then 37569. But if you re-see drand48 by calling srand48 again with an argument of 1, the result of the next three calls to drand48 you will get 2728 again, then 29785, then 54710! You see, everything happens for a reason. Run the program again, this time, say n=10, as shown below; you will see a list of 10 pseudo-random numbers. ./generate 10 Let's run the program a third time with the same value of n; you should see another list of 10 numbers. Now try running the program with two parameters. Let's take s=0 as shown below. ./generate 10 0 Now run the same command again: ./generate 10 0 You can’t even argue: you saw the same “random” sequence of ten digits again. This is what happens if you don't change the pseudorandom number generator seeds. Now let's look at the generate.c program itself(remember how?). The comments at the beginning of this file explain the general functionality of the program. But we seem to have forgotten to comment on the code itself. Read the code carefully and read it until you understand the meaning of each line. Then comment out this code for us, replacing each TODO with a phrase describing the purpose or functionality of the corresponding line or lines of code. (note: unsigned int is an “unsigned” int, meaning it cannot be negative). To get more information about rand or srand, run: man drand48 or man srand48 After you've commented out as much as you can in generate.c, recompile the program to make sure you didn't break anything: make generate If generate no longer compiles, fix what you broke: )! As a reminder, the make command automates code compilation, so you don't have to run the clang command by manually inserting a bunch of switches. But in fact, make simply simplifies our input, but in fact, the same clang with the parameters we need is hidden behind it. However, as your programs get larger, make can no longer figure out from the context how to compile the code. In this case, you will have to tell make how to compile the programs, especially if they involve using different source files (such as .c). To solve this problem we will use configuration files (Makefiles) that tell make exactly what to do. How did the make command know how we should compile generate? In fact, the team used a configuration file that we had already written. This file is called Makefile and it is located in the same directory as generate.c . Makefile is a list of rules specifying how to create the file generate from generate.c. In the file you will find, in particular, the relevant lines: generate: generate.c clang -ggdb3 -O0 -std=c11 -Wall -Werror -o generate generate.c The first line tells make that a "target" called generate should be created by calling the command from the second line. Moreover, the first line tells make that generate depends on generate.c, which means that make will only rebuild generate on subsequent runs if that file has been changed. Not a bad time-saving trick, right? In fact, run the command below again if you didn't change generate.c . make generate You will be informed that generate has already been updated to the current version. Note : The indentation at the beginning of the second line is not a sequence of spaces, but rather a tab character. Unfortunately, make requires commands to be preceded by tabs, so be careful not to change them to spaces or you'll run into weird errors! The -Werror flag tells the clang commandtreat warnings (bad) as errors (even worse), so you'll be forced (in a good, educational way!) to correct them. Now let's look at the find.c file . Note that this program expects one command line argument: "igloo", which must be found in a "haystack" of values. After that, review the code and compile the program by running the command below. make find make basically gave us the following: clang -ggdb3 -O0 -std=c11 -Wall -Werror -o find find.c helpers.c -lcs50 -lm Pay attention! You've just compiled an application consisting of one, but two files: helpers.c and find.c . How did make know about this? To understand this, open the Makefile again to see what's really going on there. The relevant lines are below. find: find.c helpers.c helpers.h clang -ggdb3 -O0 -std=c11 -Wall -Werror -o find find.c helpers.c -lcs50 -lm What we mean by dependency (after the colon) is that any changes to find.c , helpers.c , or helpers.h will force make to rebuild find the next time it is called for those purposes. Now run this program by doing, for example, the following: ./find 13 After this, you will be asked to provide a certain stack (that is, integers), and feed them one straw at a time. Once you get tired of typing numbers, press the key combination Ctrl-D . This combination will send the program an end-of-file (EOF) character. The symbol will force the GetInt function from the CS50 library to return the constant INT_MAX , and this, in turn, in find.c will force find to stop typing "stack". The program will now look for the needle in the haystack you provided, and it will eventually tell you if it was found. In short, the program searches for some value in an array. At least she should, but the catch is that she won't find anything yet! But here you come to the rescue! We'll talk about the importance of your role a little later. In fact, the process of providing a haystack can be automated, at least by "merging" the output of generate into find as input. For example, the command below passes 1000 pseudorandom numbers to find, which then searches among those values ​​for 42. ./generate 1000 | ./find 42 Note that when generate passes the raw data to find, you won't see the generate number, but you will see find running. Alternatively, you can "redirect" the output of generate to a file with a command like this: ./generate 1000 > numbers.txt The contents of this file can be redirected to the input of find with the command: ./find 42 < numbers.txt Let's take another look at the code in the Makefile and notice the following line: all: find generate This means that you You can build generate and find by running the following command: make all Moreover, the command below this line is equivalent to the command above it, since make builds the first entry in the Makefile by default. make If only you could boil down all these practice tasks to one command! But - alas! Finally, pay attention to these last lines in the Makefile: clean: rm -f *.o a.out core find generate This entry allows you to remove all files that end in .o or are called core (we'll explain this in a moment!), and also run the find or generate programs simply by executing the line: If you want to make clean experiment , then here's something to be careful with: don't add, say, *.c to that last line of the Makefile! (why?) All lines that begin with the # sign are just comments.

Task 1: Search

It's time for something interesting! Note that find.c calls search , a function declared in helpers.h . Unfortunately, we forgot to implement this function completely in helpers.c ! (It should be noted that we could put the contents of helpers.h and helpers.c into one find.c . But sometimes it is better to separate programs into several files. Especially if some functions, or rather utility functions, may be useful for other programs. Like the CS50 library functions for example. Take a look at helpers.c and you will see that search always returns false, regardless of whether the given value is in values. Rewrite search so that it uses linear search, returning true , if value is in values ​​and false if value is not in values. Take care to return false immediately if n is not positive. When you are ready to check for validity, try calling the following command: Since one of the numbers ./generate 1000 50 | ./find 127 printed with generate when 50 was seeded is 127, your code should find the needle! For contrast, try this command: ./generate 1000 50 | ./find 128 Since 128 is not among the set of numbers generated by generate when 50 was seeded, your code does not must find the "needle". Run other tests by running generate with some seed, analyzing the output, and looking for the needle that should be in the haystack. Note that main in find.c is written in such a way that find returns 0 if the "needle" is found, otherwise it returns 1. You can check the so-called "output code" that main returns when executed after running some other commands echo $? . For example, if your implementation of search is correct, if you run ./generate 1000 50 | ./find 127 echo $? you will see 0, while 127 is among the 1000 numbers output by generate with a seed of 50, so the search you wrote should return true. In this case, main (written by us) should return 0 (that is, exit). If you give the following input ./generate 1000 50 | ./find 128 echo $? , you will see 1 because 128 is not among the 1000 numbers that were output by generate with a seed of 50. In this case, search (written by you) will return false, and main (written by us) will return 1 ( and then finish the job). Do you understand the logic? When everything is ready to be checked using check50, run the following command: check50 2015.fall.pset3.find helpers.c By the way, you shouldn’t get used to testing your code using check50 before testing it yourself. After all, check50 doesn't really exist, so running the code with your own input samples, comparing the actual output to the expected output, is the best habit you can get into at this point. It's true, don't become addicted! If you're interested in playing with the cs50 assistants' implementation of find, run this command: ~cs50/pset3/find

Sorting

Linear search is not something that is truly impressive, but from the first lectures (and in the seventh we repeated this trick again!) you remember that you can find a needle in a haystack much faster. But first we need to sort our haystack. Note that find.c calls sort , a function declared in helpers.h . Unfortunately, we “forgot” to fully implement this feature in helpers.c ! Look into helpers.c and you'll see that sort returns instantly, even though find's main function actually passes it the actual array. Now remember the array declaration syntax. Not only do you specify the element type of this array, but you also specify its size in square brackets. This is what we do for haystack in find.c : int haystack [MAX]; But when traversing an array, you only specify its name. And we do it exactly the same way when we go through haystack to do sort in find.c : sort (haystack, size); (Guess why we pass the size of that array separately?) When we declare a function that takes a one-dimensional array as an argument, we don't need to specify the size of the array . Likewise, we didn't do this when we declared sort in helpers.h (and helpers.c): void sort (int values [] int n); Implement sort so that the function actually sorts the array of numbers from small to large. It needs to run time equal to O(n 2 ), where n is the size of the array. You'll likely want to implement bubble sort, selection sort, or insertion sort, as we learned about those in week three. However, it is important to note: there is no "correct" way to implement these algorithms. There are a huge number of variations. In fact, you can always improve them if you see fit, as long as your implementation converges to O(n2 ) . However, leave the experimentation to the function body, and in order to simplify testing, do not change our sort declaration. It should look like this: void sort (int values [] int n); Since the function returns a void value, it should not return a sorted array. Instead, it must "destructively" sort the actual array it is "running" through, moving its elements. In week four we will discuss that arrays are passed by reference rather than by value. That is, sort will not receive copies of the array, but the original array itself. While you can't change our sort declaration, you can always define your own function in helpers.c, which can then be called from sort. Decide for yourself how best to test your sort implementation. Don't forget that printf and GDB will help you! And don't forget that you can create the same sequence of pseudo-random numbers over and over again by explicitly specifying the seed values ​​for generate.

Binary search, tips

The first thing you need to understand about the find function is that we already have written code (distribution). We are simply filling in some gaps in existing code. The find.c program requests input of numbers (that is, to fill a "stack"), and then searches for a "needle" in it at the user's request, using sort and search - functions defined in helpers.c . So, find is already written, we need to write helpers . So here's what we need to do:
  1. Implement search. This function should return true if the value is found in the stack and false if it is not there.
  2. Implement sort, a function that sorts an array of values.
Initially, the search was implemented as linear. But you can do better. The linear search algorithm runs in O(n) time . Your task is to write a binary search algorithm. Its execution time is O(log n) . However, its problem is that it needs sorted data as input, otherwise it will not work. You remember the example of the torn book, and you probably know why this is so. If you have gone through a binary search through the elements and there are no more of them left (that is, there is simply no “needle” in this “stack”), you need the function to return false. Binary search can be implemented iteratively or recursively (David Malan talked about recursion in the eighth lecture).
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION