Review and testing of sorting methods. Part I

The other day, in the comments on VKontakte, I had an argument with one of the other students of the project. The essence of the dispute was “who will win” - a method sort()from a class java.util.Arraysor self-written implementations of simple algorithms: bubble , insertion , selection , shell (Shell algorithm). Review and testing of sorting methods. Part I - 1

For some, the answer to this question may be obvious, but since the dispute arose, despite the fact that each side had “respected sources” in favor of its point of view, it was decided to conduct a study, stretching the gray matter in the process, implementing various algorithms. TL;DR: java.util.Arrays.sort() it is unconditionally the leader on arrays of 100,000 elements or more; with a smaller size, the Shell method can sometimes compete with it. The rest of the considered algorithms are completely empty and can be useful only under some exotic conditions. Now let's look at how arrays are sorted in our silicon uber-devices.

Selection sort. Sorting by selection

Let's start with the simplest and most obvious method. The essence of it is perfectly demonstrated to us by Robert Sedgwick in his video lecture on coursera (I’ll quote the animation from there that I badly compressed into a gif): View Running through the array from the first element, at each step we look for the minimum element on the right side, with which we swap the current one. As a result, we reserve the final version of our array in sorted form. Here is the code implementing this algorithm in Java:

public void sort(int[] array) {
        int n = array.length;
        for (int i = 0; i < n; i ++) {
            int minIndex = min(array, i, n - 1);
            swap(array, i, minIndex);
        }
    }

public static void swap(int[] array, int i, int j) {
        int temp = array[i];
        array[i] = array[j];
        array[j] = temp;
    }
public static int min(int[] array, int begin, int end) {
        int minVal = array[begin];
        int minIndex = begin;
        for (int i = begin + 1; i <= end; i++) {
            if (array[i] < minVal) {
                minVal = array[i];
                minIndex = i;
            }
        }
        return minIndex;
    }

Analysis of the algorithm shows that it is necessary to comb through the remainder of the array at each pass, that is, we will need exactly N + (N-1) + (N-2) + ... + 1 = N^2/2 comparisons. Thus, the complexity of the algorithm is O(N^2). What does this mean? This means that by increasing the number of elements in the array (N) by 2 times, we will increase the running time of the algorithm not by 2, but by 2^2 = 4 times. By increasing N by 10 times, we will increase the operating time by 100 times, and so on. On my 2012 laptop with a Core i3 processor running Ubuntu 14.4, I got the following uptime: Review and testing of sorting methods. Part I - 2

Review and testing of sorting methods. Part I - 2

Insertion sort. Insertion sort

Here the idea is slightly different. Again, let's turn to the animation from Doctor Sedgwick: View What is ahead has not even been viewed by us yet, and everything that we leave behind always remains in order. The point is that we “return” each new element of the original array to the beginning until it “rests” on a smaller element. Thus, we again have N passes (for each element of the original array), but in each pass, in most cases, we do not look at the entire remainder, but only a part. That is, we will get option 1 + (N-1) + (N-2) + … + N = N^2/2 only if we have to return each next element to the very beginning, that is, in the case of the input sorted “in reverse” array (unlucky, unlucky). In the case of an already sorted array (here's luck) there will be a complete freebie - on each pass there is only one comparison and leaving the element in place, that is, the algorithm will work in a time proportional to N. The complexity of the algorithm will be determined by the worst theoretical case, that is, O( N^2). On average, the operating time will be proportional to N^2/4, that is, twice as fast as the previous algorithm. In my implementation, due to the non-optimal use of permutation, the running time was longer than that of Selection. I plan to correct and update the post soon. Here is the code and the result of its operation on the same machine:

public void sort(int[] array) {
        int length = array.length;
        for (int i = 1; i < length; i++) {
            for (int j = i; j >= 1; j--) {
                if (array[j] < array[j - 1])
                    swap(array, j, j - 1);
                else
                    break;
            }
        }
    }

Review and testing of sorting methods. Part I - 3

Shell sort. Shell sort

A smart guy, Donald Schell, noticed back in 1959 that the most expensive cases in the algorithm for insertions are when the element returns very far to the beginning of the array: on some pass we return the element to the beginning by a couple of positions, and on another pass almost through the whole array to the beginning is far and long. Is it possible to do this at once, jumping through several elements? And he found such a way. It consists of sequentially performing special partial sorts, generally called d-sort or, according to Sedgwick, h-sort (I suspect h means hop). 3-sort, for example, will compare the element in question not with the previous one, but will skip two and compare with the one 3 positions back. If changed, it will compare it again with the element 3 positions back and so on. The bottom line is that the resulting array will be “3-sorted”, that is, the incorrect position of the elements will be less than 3 positions. Working with this insertion algorithm will be easy and pleasant. By the way, “1-sort” is nothing more than just an insertion algorithm =) By sequentially applying h-sort to the array with decreasing h values, we can sort a large array faster. Here's what it looks like: View The challenge here is how to choose the right sequence of partial sorts. Ultimately, the performance of the algorithm depends on this. The most common is the sequence proposed by Donald Knuth: h = h*3 + 1, that is, 1, 4, 13, 40, ... and so on up to 1/3 of the array size. This sequence provides decent performance and is also easy to implement. Analysis of the algorithm requires tons of language and is beyond my ability. The breadth of the analysis is also determined by the many variants of h sequences. Empirically, we can say that the speed of the algorithm is very good - see for yourself: Review and testing of sorting methods. Part I - 4

Review and testing of sorting methods. Part I - 4

A million elements in less than a second! And here is the Java code with the Knut sequence.

public void sort(int[] array) {
        int h = 1;
        while (h*3 < array.length)
            h = h * 3 + 1;

        while(h >= 1) {
            hSort(array, h);
            h = h/3;
        }
    }

    private void hSort(int[] array, int h) {
        int length = array.length;
        for (int i = h; i < length; i++) {
            for (int j = i; j >= h; j = j - h) {
                if (array[j] < array[j - h])
                    swap(array, j, j - h);
                else
                    break;
            }
        }
    }

Bubble sort. Bubble method

This is a classic! Almost every novice programmer implements this algorithm. It's such a classic that Dr. Sedgwick didn't even have an animation for it, so I had to do the work myself. View Here, on each pass, we traverse the array from beginning to end, swapping neighboring elements that are out of order. As a result, the largest elements “float” (hence the name) to the end of the array. We start each new pass optimistically hoping that the array is already sorted ( sorted = true). At the end of the passage, if we see that we have made a mistake, we start a new passage. The difficulty here is that, again, we are traversing (almost) the entire array on each pass. Comparison occurs at every step, exchange occurs at almost every step, which makes this algorithm one of the slowest (if we consider rationally implemented ones, and not “shaking sort” and others like that). It’s interesting that formally the complexity here will also be equal to O(N^2), but the coefficient is much higher than that of insertions and selections. Algorithm code:

public void sort(int[] array) {
        boolean isSorted;
        int nMinusOne = array.length - 1;
        for(int i = 0; i < nMinusOne; i++) {
            isSorted = true;
            for (int j = 0; j < nMinusOne - i; j++) {
                if (array[j] > array[j + 1]) {
                    swap(array, j, j + 1);
                    isSorted = false;
                }
            }
            if (isSorted)
                return;
        }
    }

Operating time: Review and testing of sorting methods. Part I - 5

Feel the difference: more than half an hour on a million elements! Conclusion: Never use this algorithm!!!

Summary of the first part

As a result, I suggest looking at the general table for these algorithms. Review and testing of sorting methods. Part I - 6

You can also compare with the results for the built-in method java.util.Arrays.sort(). It looks like some kind of magic - what could be faster than Shell? I will write about this in the next part. There we will look at widely used quick sort algorithms, as well as merge sort, learn about the difference in methods for sorting arrays from primitives and reference types, and also get acquainted with a very important interface in this matter Comparable;) Below you can study a graph built on a logarithmic scale using data tables. The flatter the line, the better the algorithm =) Review and testing of sorting methods. Part I - 7

Review and testing of sorting methods. Part I - 7

For those who want to download the entire project and run tests on their own, keep the link: Java See you in the next part! =)

Comments

TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION