Algorithm complexity

Hello! Today's lecture will be a little different from the rest. It will differ in that it is only indirectly related to Java. Algorithm complexity - 1

However, this topic is very important for every programmer. We'll talk about algorithms . What is an algorithm? In simple terms, this is a certain sequence of actions that must be performed to achieve the desired result . We often use algorithms in everyday life. For example, every morning you are faced with a task: to come to school or work, and at the same time be:

Dressed
Clean
Well-fed

What algorithm will allow you to achieve this result?

Wake up to an alarm clock.
Take a shower, wash your face.
Prepare breakfast, make coffee/tea.
Eat.
If you haven’t ironed your clothes since the evening, iron them.
Get dressed.
Leave the house.

This sequence of actions will definitely allow you to get the desired result. In programming, the whole point of our work is to constantly solve problems. A significant part of these tasks can be performed using already known algorithms. For example, you are faced with a task: sort a list of 100 names in an array . This problem is quite simple, but it can be solved in different ways. Here is one solution: Algorithm for sorting names alphabetically:

Buy or download on the Internet “Dictionary of Russian Personal Names” 1966 edition.
Find every name on our list in this dictionary.
Write down on a piece of paper which page of the dictionary the name is on.
Put the names in order using the notes on a piece of paper.

Will such a sequence of actions allow us to solve our problem? Yes, it will completely allow it. Will this solution be effective ? Hardly. Here we come to another very important property of algorithms - their efficiency . The problem can be solved in different ways. But both in programming and in everyday life, we choose the method that will be most effective. If your task is to make a sandwich with butter, you can, of course, start by sowing wheat and milking a cow. But this will be an ineffective solution - it will take a lot of time and cost a lot of money. To solve your simple problem, you can simply buy bread and butter. And the algorithm with wheat and cow, although it allows you to solve the problem, is too complex to apply it in practice. To assess the complexity of algorithms in programming, a special notation was created called Big-O (“big O”) . Big-O allows you to estimate how much the execution time of an algorithm depends on the data passed into it . Let's look at the simplest example - data transfer. Imagine that you need to transmit some information in the form of a file over a long distance (for example, 5000 kilometers). Which algorithm will be the most efficient? It depends on the data he has to work with. For example, we have an audio file of 10 megabytes in size. Algorithm complexity - 2

In this case, the most efficient algorithm would be to transfer the file over the Internet. It will only take a couple of minutes! So, let's voice our algorithm again: “If you need to transfer information in the form of files over a distance of 5000 kilometers, you need to use data transmission over the Internet.” Great. Now let's analyze it. Does it solve our problem? In general, yes, it completely solves it. But what can you say about its complexity? Hmm, now this is where things get interesting. The fact is that our algorithm very much depends on the incoming data, namely the size of the files. Now we have 10 megabytes and everything is fine. What if we need to transfer 500 megabytes? 20 gigabytes? 500 terabytes? 30 petabytes? Will our algorithm stop working? No, all these amounts of data can still be transferred. Will it take longer to complete? Yes, it will! Now we know an important feature of our algorithm: the larger the size of the data to be transferred, the longer the algorithm will take to complete . But we would like to understand more precisely what this relationship looks like (between the size of the data and the time it takes to transfer it). In our case, the complexity of the algorithm will be linear. “Linear” means that as the volume of data increases, the time for its transmission will increase approximately proportionally. If there is 2 times more data, and it will take 2 times more time to transfer it. If there is 10 times more data, the transfer time will increase 10 times. Using Big-O notation, the complexity of our algorithm is defined as O(N) . This notation is best remembered for future reference; it is always used for algorithms with linear complexity. Please note: we are not talking here at all about various “variable” things: Internet speed, the power of our computer, and so on. When assessing the complexity of an algorithm, this simply does not make sense - we have no control over it anyway. Big-O evaluates the algorithm itself, regardless of the “environment” in which it will have to work. Let's continue with our example. Let's say it eventually turns out that the size of the files to be transferred is 800 terabytes. If we transmit them over the Internet, the problem, of course, will be solved. There's just one problem: transmission over a standard modern link (at 100 megabits per second) that most of us use in our homes will take approximately 708 days. Almost 2 years! :O So, our algorithm is clearly not suitable here. We need some other solution! Suddenly, the IT giant Amazon comes to our aid! Its Amazon Snowmobile service allows you to load large amounts of data into mobile storage units and deliver them to the desired address by truck! Algorithm complexity - 3

So we have a new algorithm! “If you need to transfer information in the form of files over a distance of 5,000 kilometers, and the process will take more than 14 days when transferred over the Internet, you need to use Amazon truck transport.” The figure of 14 days was chosen here randomly: let’s say this is the maximum period we can afford. Let's analyze our algorithm. What about speed? Even if the truck travels at just 50 km/h, it will cover 5,000 kilometers in just 100 hours. That's just over four days! This is much better than the Internet transmission option. What about the complexity of this algorithm? Will it also be linear, O(N)? No, it will not. After all, the truck doesn’t care how much you load it - it will still drive at approximately the same speed and arrive on time. Whether we have 800 terabytes, or 10 times more data, the truck will still get there in 5 days. In other words, the algorithm for delivering data via truck has constant complexity . “Constant” means that it does not depend on the data passed to the algorithm. Put a 1GB flash drive in the truck and it will arrive in 5 days. Put disks with 800 terabytes of data there and it will arrive in 5 days. When using Big-O, constant complexity is denoted as O(1) . Since we have become acquainted with O(N) andO(1) , let's now look at more “programmer” examples :) Let's say you are given an array of 100 numbers, and the task is to print each of them to the console. You write a regular loop forthat performs this task

int[] numbers = new int[100];
// ..заполняем массив числами

for (int i: numbers) {
   System.out.println(i);
}

What is the complexity of the written algorithm? Linear, O(N). The number of actions that the program must perform depends on exactly how many numbers were passed into it. If there are 100 numbers in the array, there will be 100 actions (outputs on the screen). If there are 10,000 numbers in the array, 10,000 actions will need to be performed. Can our algorithm be improved? No. In any case, we will have to make N passes through the array and perform N outputs to the console. Let's look at another example.

public static void main(String[] args) {

   LinkedList<Integer> numbers = new LinkedList<>();
   numbers.add(0, 20202);
   numbers.add(0, 123);
   numbers.add(0, 8283);
}

We have an empty one LinkedListinto which we insert several numbers. We need to estimate the complexity of the algorithm for inserting a single number into LinkedListour example, and how it depends on the number of elements in the list. The answer is O(1) - constant complexity . Why? Please note: each time we insert a number at the beginning of the list. In addition, as you remember, when inserting numbers into LinkedListelements, they are not shifted anywhere - links are redefined (if you suddenly forgot how LinkedList works, take a look at one of our old lectures ). If now the first number in our list is the number х, and we insert the number y at the beginning of the list, all that is needed is:

x.previous  = y;
y.previous = null;
y.next = x;

For this reference redefinition, it doesn’t matter to us how many numbers there are nowLinkedList - at least one, at least a billion. The complexity of the algorithm will be constant - O(1).

Logarithmic complexity

Don't panic! :) If the word “logarithmic” makes you want to close the lecture and not read further, wait a couple of minutes. There will be no mathematical difficulties here (there are plenty of such explanations in other places), and we will analyze all the examples “on the fingers”. Imagine that your task is to find one specific number in an array of 100 numbers. More precisely, check whether it is there at all. As soon as the required number is found, the search must be stopped, and the entry “The required number has been found!” should be displayed in the console. Its index in the array = ....” How would you solve such a problem? Here the solution is obvious: you need to iterate through the array elements one by one, starting with the first (or last) and check whether the current number coincides with the desired one. Accordingly, the number of actions directly depends on the number of elements in the array. If we have 100 numbers, then we need to go to the next element 100 times and check the number for a match 100 times. If there are 1000 numbers, then there will be 1000 check steps. This is obviously linear complexity, O(N) . Now we will add one clarification to our example: the array in which you need to find a number is sorted in ascending order . Does this change anything for our task? We can still search for the desired number by brute force. But we can use the well-known binary search algorithm instead . Algorithm complexity - 5

In the top row of the image we see a sorted array. In it we need to find the number 23. Instead of iterating over the numbers, we simply divide the array into 2 parts and check the average number in the array. We find the number that is located in cell 4 and check it (second row in the picture). This number is 16, and we are looking for 23. The current number is less. What does this mean? That all previous numbers (which are located up to the number 16) do not need to be checked : they will definitely be less than the one we are looking for, because our array is sorted! Let's continue the search among the remaining 5 elements. Pay attention:We've only done one check, but we've already eliminated half of the possible options. We only have 5 elements left. We will repeat our step - again divide the remaining array by 2 and again take the middle element (line 3 in the figure). This number is 56, and it is larger than the one we are looking for. What does this mean? That we discard 3 more options - the number 56 itself, and two numbers after it (they are definitely greater than 23, because the array is sorted). We have only 2 numbers left to check (the last row in the figure) - numbers with array indices 5 and 6. We check the first of them, and this is what we were looking for - the number 23! Its index = 5! Let's look at the results of our algorithm, and then we'll understand its complexity. (By the way, now you understand why it is called binary: its essence is to constantly divide data by 2). The result is impressive! If we were looking for the desired number using linear search, we would need 10 checks, but with binary search we did it in 3! In the worst case, there would be 4 of them, if at the last step the number we needed turned out to be the second, and not the first. What about its complexity? This is a very interesting point :) The binary search algorithm depends much less on the number of elements in the array than the linear search algorithm (that is, simple enumeration). With 10 elements in the array, linear search will need a maximum of 10 checks, and binary search will need a maximum of 4 checks. The difference is 2.5 times. But for an array of 1000 elements, linear search will need 1000 checks, and binary search will need only 10 ! The difference is already 100 times! Pay attention:the number of elements in the array increased 100 times (from 10 to 1000), and the number of necessary checks for binary search increased only 2.5 times - from 4 to 10. If we reach 10,000 elements , the difference is even more impressive: 10,000 checks for linear search, and a total of 14 checks for binary. And again: the number of elements increased 1000 times (from 10 to 10000), but the number of checks increased only 3.5 times (from 4 to 14). The complexity of the binary search algorithm is logarithmic , or, in Big-O notation, O(log n) . Why is it called that? A logarithm is the inverse of exponentiation. The binary logarithm is used to calculate the power of 2. For example, we have 10,000 elements that we need to go through using a binary search. Algorithm complexity - 6

Now you have a picture before your eyes, and you know that this requires a maximum of 14 checks. But what if there is no picture before your eyes, and you need to count the exact number of necessary checks? It is enough to answer a simple question: to what power should the number 2 be raised so that the result obtained is >= the number of elements being checked? For 10000 it will be the 14th power. 2 to the 13th power is too small (8192) But 2 to the 14th power = 16384 , this number satisfies our condition (it is >= the number of elements in the array). We found the logarithm - 14. That’s how many checks we need! :) Algorithms and their complexity are a topic too vast to be included in one lecture. But knowing it is very important: in many interviews you will receive algorithmic problems. For theory, I can recommend you several books. A good place to start is “ Grocking Algorithms ”: although the examples in the book are written in Python, the book’s language and examples are very simple. The best option for a beginner, and it is also small in volume. More serious reading: books by Robert Laforet and Robert Sedgwick . Both are written in Java, which will make learning a little easier for you. After all, you are quite familiar with this language! :) For students with a good mathematical background, the best option would be the book by Thomas Corman . But you won’t be satisfied with just theory! “Know” != “be able” You can practice solving algorithm problems on HackerRank and Leetcode . Problems from there are often used even during interviews at Google and Facebook, so you definitely won’t be bored :) To reinforce the lecture material, I advise you to watch an excellent video about Big-O on YouTube. See you at the next lectures! :)

Comments

TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION