Complexity of algorithms

Hello! Today's lecture will be a little different from the rest. It will differ in that it is only indirectly related to Java. Complexity of algorithms - 1

However, this topic is very important for every programmer. We'll talk about algorithms . What is an algorithm? In simple terms, this is some sequence of actions that must be performed to achieve the desired result . We often use algorithms in our daily lives. For example, every morning you have a task: to come to school or work, and at the same time be:

Dressed
clean
full

What algorithm will allow you to achieve this result?

Wake up with an alarm.
Take a shower, wash.
Prepare breakfast, make coffee/tea.
Eat.
If you haven’t ironed your clothes since the evening, iron them.
Get dressed.
Leave the house.

This sequence of actions will definitely allow you to get the desired result. In programming, the whole essence of our work lies in the constant solution of problems. A significant part of these tasks can be performed using already known algorithms. For example, you are faced with the task of sorting a list of 100 names in an array . This task is quite simple, but it can be solved in different ways. Here is one solution: Algorithm for sorting names alphabetically:

Buy or download on the Internet "Dictionary of Russian Personal Names" 1966 edition.
Find each name from our list in this dictionary.
Write down on a piece of paper on which page of the dictionary the name is located.
Put the names in order using notes on a piece of paper.

Will such a sequence of actions solve our problem? Yes, it will allow. Will this solution be effective ? Hardly. Here we come to another very important property of algorithms - their efficiency . You can solve the problem in different ways. But both in programming and in everyday life, we choose the method that will be most effective. If your task is to make a butter sandwich, you can certainly start by planting wheat and milking a cow. But it won't be effective.solution - it will take a very long time and will cost a lot of money. To solve your simple problem, you can simply buy bread and butter. And the algorithm with wheat and a cow, although it allows you to solve the problem, is too complicated to be applied in practice. To assess the complexity of algorithms in programming, they created a special notation called Big-O (“big O”) . Big-O allows you to estimate how much the execution time of an algorithm depends on the data passed to it. Let's look at the simplest example - data transfer. Imagine that you need to transfer some information as a file over a long distance (for example, 5000 kilometers). Which algorithm will be the most efficient? It depends on the data with which he has to work. For example, we have a 10 megabyte audio file. Complexity of algorithms - 2

In this case, the most efficient algorithm would be to transfer the file over the Internet. This will take a couple of minutes at the most! So, let's voice our algorithm once again: "If you want to transfer information in the form of files over a distance of 5000 kilometers, you need to use data transmission over the Internet." Great. Now let's analyze it. Does it solve our problem?Generally speaking, yes, it does. But what about its complexity? Hmm, this is where things get interesting. The fact is that our algorithm is very dependent on the incoming data, namely, on the size of the files. Now we have 10 megabytes, and everything is in order. What if we need to transfer 500 megabytes? 20 gigabytes? 500 terabytes? 30 petabytes? Will our algorithm stop working? No, all these data volumes can still be transferred. Will it run longer? Yes, it will! Now we know an important feature of our algorithm: the larger the size of the data to be transmitted, the longer the algorithm will take to complete.. But we would like to understand more precisely what this dependence looks like (between the size of the data and the time for their transfer). In our case, the complexity of the algorithm will be linear . “Linear” means that as the amount of data increases, the time to transfer them will increase approximately proportionally. If the data becomes 2 times more, and it will take 2 times more time to transfer them. If the data becomes 10 times larger, and the transfer time will increase by 10 times. Using the Big-O notation, the complexity of our algorithm is defined as O(N). This notation is best remembered for the future - it is always used for algorithms with linear complexity. Pay attention: we are not talking about various “variable” things here at all: the speed of the Internet, the power of our computer, and so on. When evaluating the complexity of an algorithm, it just doesn't make sense - we can't control it anyway. Big-O evaluates the algorithm itself, regardless of the “environment” in which it will have to work. Let's continue with our example. Let's say that in the end it turned out that the size of the files to transfer is 800 terabytes. If we transmit them via the Internet, the problem will, of course, be solved. There's just one problem: it would take roughly 708 days to transmit over a standard modern link (at 100 megabits per second) that most of us use at home. Almost 2 years! :O So, our algorithm is clearly not suitable here. Need some other solution! Unexpectedly, an IT giant, Amazon, comes to our aid! Its Amazon Snowmobile service allows you to load large amounts of data into mobile storage and deliver it to the right address by truck! Complexity of algorithms - 3

So, we have a new algorithm! “If you want to transfer information in the form of files over 5,000 kilometers and this process takes more than 14 days when transferring over the Internet, you need to use the transport of data on an Amazon truck.” The figure of 14 days is chosen randomly here: let's say this is the maximum period that we can afford. Let's analyze our algorithm. What about speed? Even if a truck travels at only 50 km/h, it will travel 5,000 kilometers in just 100 hours. That's just over four days! This is much better than the Internet transmission option. What about the complexity of this algorithm? Will it also be linear, O(N)? No, it will not. After all, the truck does not care how much you load it - it will still go at about the same speed and arrive on time. Whether we have 800 terabytes, or 10 times more data, the truck will still reach the place in 5 days. In other words, the algorithm for delivering data via a truck has constant complexity. “Constant” means that it does not depend on the data passed to the algorithm. Put a 1GB flash drive in the truck and it will arrive in 5 days. Put there disks with 800 terabytes of data - it will reach you in 5 days. When using Big-O, constant complexity is denoted as O(1) . Since we are familiar with O(N) and O(1) , let's now look at more "programming" examples :) Let's say you are given an array of 100 numbers, and the task is to print each of them to the console. You write a regular loop forthat does this task

int[] numbers = new int[100];
// ..заполняем массив числами

for (int i: numbers) {
   System.out.println(i);
}

What is the complexity of the written algorithm? Linear, O(N). The number of actions that the program must perform depends on how many numbers were passed into it. If there are 100 numbers in the array, there will be 100 actions (outputs on the screen). If there are 10,000 numbers in the array, 10,000 actions will need to be performed. Can our algorithm be improved? No. In any case, we will have to make N passes through the array and execute N outputs to the console. Let's consider another example.

public static void main(String[] args) {

   LinkedList<Integer> numbers = new LinkedList<>();
   numbers.add(0, 20202);
   numbers.add(0, 123);
   numbers.add(0, 8283);
}

We have an empty one LinkedListin which we insert some numbers. We need to evaluate the complexity of the algorithm for inserting a single number in LinkedListour example, and how it depends on the number of elements in the list. The answer is O(1) - constant complexity . Why? Note that each time we insert a number at the beginning of the list. In addition, as you remember, when you insert numbers into LinkedListelements, they don’t move anywhere - links are redefined (if you suddenly forgot how LinkedList works, take a look at one of our old lectures ). If now the first number in our list is number х, and we insert the number y at the beginning of the list, all it takes is:

x.previous  = y;
y.previous = null;
y.next = x;

For this redefinition of references , it doesn’t matter to us how many numbers are currently inLinkedList - at least one, at least a billion. The complexity of the algorithm will be constant - O(1).

Logarithmic complexity

No panic! :) If at the word “logarithmic” you want to close the lecture and not read further, wait a couple of minutes. There will be no mathematical difficulties here (there are plenty of such explanations in other places), and we will analyze all the examples “on the fingers”. Imagine that your task is to find one specific number in an array of 100 numbers. More precisely, to check whether it is there at all. As soon as the desired number is found, the search must be stopped, and the following entry should be output to the console: “The desired number has been found! Its index in the array = ....” How would you solve such a problem? Here the solution is obvious: you need to go through the elements of the array one by one starting from the first (or from the last) and check if the current number matches the one you are looking for. Accordingly, the number of actions directly depends on the number of elements in the array. If we have 100 numbers, then we need to go to the next element 100 times and check the number for a match 100 times. If there are 1000 numbers, then there will be 1000 check steps. This is obviously linear complexity,O(N) . And now we will add one clarification to our example: the array in which you need to find the number is sorted in ascending order . Does it change anything for our task? We can still search for the desired number by brute force. But instead, we can use the well-known binary search algorithm . Complexity of algorithms - 5

In the top row of the image, we see a sorted array. We need to find the number 23 in it. Instead of sorting through the numbers, we simply divide the array into 2 parts and check the average number in the array. We find the number that is located in cell 4 and check it (second row in the picture). This number is 16, and we are looking for 23. The current number is less. What does this mean? That all previous numbers (which are located before the number 16) can not be checked: they will definitely be less than what we are looking for, because our array is sorted! Let's continue searching among the remaining 5 elements. Pay attention:we've only done one check, but we've already ruled out half the options. We only have 5 items left. We will repeat our step - again divide the remaining array by 2 and again take the middle element (line 3 in the figure). This number is 56, and it is more than the one we are looking for. What does this mean? That we reject 3 more options - the number 56 itself, and two numbers after it (they are definitely greater than 23, because the array is sorted). We have only 2 numbers left to check (the last row in the figure) - numbers with array indices 5 and 6. We check the first of them, and this is what we were looking for - the number 23! Its index = 5! Let's look at the results of our algorithm, and then deal with its complexity. (By the way, now you understand why it is called binary: its essence lies in the constant division of data by 2). The result is impressive! If we were looking for the right number with a linear search, we would need 10 checks, and with a binary search we missed 3! In the worst case, there would be 4 of them, if at the last step the number we needed was the second, and not the first. What about its complexity? This is a very interesting moment :) The binary search algorithm is much less dependent on the number of elements in the array than the linear search algorithm (i.e., simple enumeration). With 10 elements in the array, a linear search will need a maximum of 10 checks, and a binary search will need a maximum of 4 checks. The difference is 2.5 times. But for an array of 1000 elements, a linear search would need 1000 checks, while a binary search would only need 10 ! The difference is already 100 times! Pay attention:the number of elements in the array has increased by a factor of 100 (from 10 to 1000), while the number of checks required for binary search has only increased by a factor of 2.5, from 4 to 10. If we get to 10,000 elements , the difference is even more impressive: 10,000 checks for linear search, and a total of 14 checks for binary. And again: the number of elements increased 1000 times (from 10 to 10000), and the number of checks increased only 3.5 times (from 4 to 14). The complexity of the binary search algorithm is logarithmic , or, using Big-O notation, O(log n). Why is she called that? The logarithm is the inverse of exponentiation. The binary logarithm is used to calculate the power of a number 2. For example, we have 10,000 elements that we need to sort through with a binary search. Complexity of algorithms - 6

Now you have a picture in front of your eyes, and you know that for this you need a maximum of 14 checks. But what if there is no picture in front of your eyes, and you need to calculate the exact number of necessary checks? It is enough to answer a simple question: to what power must the number 2 be raised so that the result obtained is >= the number of elements to be checked? For 10000 it will be the 14th degree. 2 to the power of 13 is too little (8192) But 2 to the power of 14 = 16384, this number satisfies our condition (it is >= the number of elements in the array). We found the logarithm - 14. So many checks we need! :) Algorithms and their complexity is a topic too extensive to fit in one lecture. But knowing it is very important: in many interviews you will receive algorithmic tasks. For theory, I can recommend you a few books. You can start with “ Grockai Algorithms ”: although the examples in the book are written in Python, the language of the book and the examples are very simple. The best option for a beginner, besides, it is small in volume. From reading more seriously - books by Robert Laforet and Robert Sedgwick. Both are written in Java, which will make learning a little easier for you. After all, you are quite familiar with this language! :) For students with a good mathematical background, the best option would be a book by Thomas Kormen . But one theory will not be full! “Know” != “be able” You can practice solving algorithm problems on HackerRank and Leetcode . Problems from there are often used even in interviews on Google and Facebook, so you definitely won't be bored :) To consolidate the lecture material, I advise you to watch an excellent video about Big-O on YouTube. See you at the next lectures! :)

Comments

TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION