[Java] Write a sort faster than Arrays.sort

I am doing competition programming with Java, but the worst calculation amount of the function ʻArrays.sort that sorts the array of primitive type provided in the standard library of Java` is $ O (N ^ 2). I heard that it is $, and I implemented my own sort function because it was bad (however, if it is not primitive, it seems that Merge Sort of $ O (N \ log N) $ is used at worst. Stable This is also subject to change as it is only guaranteed to be a sort).

(Addition 2020/04/14 15:27) It seems that Java 14 has been modified so that the worst computational complexity is $ O (N \ log N) $ as follows. However, it seems that it will take some time before Java 14 can be used in competition pros, so self-made sorting Looks good to have. [Java 14 Arrays.sort (int [])](https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/Arrays.html#sort #sort ( Quoted from int []))

The sorting algorithm is a Dual-Pivot Quicksort by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm offers O(n log(n)) performance on all data sets, and is typically faster than traditional (one-pivot) Quicksort implementations.

(End of postscript)

Also, I wanted to beat ʻArrays.sort` if I wrote it anyway, so I implemented a multi-pattern sort algorithm with a device for speeding up and compared the speed.

This time, we implemented and measured the following three types of sorting algorithms, which make them fight against ʻArrays.sort`.

Merge Sort (recursive, stupid implementation)
Merge Sort (recursive) + Insertion Sort
Merge Sort (non-recursive) + Insertion Sort
Radix Sort

The array size used for the test was $ N = 10 ^ 5, 10 ^ 6, 10 ^ 7 $, and the average value of multiple execution times was measured. Due to the execution time, $ N = 10 ^ 5 $ So we can only measure $ 1000 $ times, $ 10 ^ 6 $ for $ 100 $ times, $ 10 ^ 7 $ for $ 10 $ times. The test cases are java.util.Random'snextInt ()andnextLong (). It was generated using, and in order to eliminate the difference due to input, the generated array was copied and each sort was executed and measured for the array with the same contents.

It is known that the processing for sorted arrays is explosive as a feature of ʻArrays.sort`, but I felt it was non-essential, so this time I aimed to win in a random case.

Merge Sort (recursive version) + Insertion Sort

Merge Sort is an algorithm that briefly describes: The detailed explanation is omitted because many people have already written it in an easy-to-understand manner.

Step 1. Divide the array into the first half and the second half. Step 2. Sort the first and second half of the array with a recursive call. Step 3. Merge the sorted first half and second half well to sort the whole thing.

Regarding the complexity of this algorithm, we can see that the recurrence formula is $ O (N \ log N) $ when the size of the array is $ N $. The lower limit of the complexity of sorting based on comparison is $ O. It is known to be (N \ log N) $, which means it is fast on order.

However, when I actually wrote and operated this algorithm, the speed was far below that of ʻArrays.sort` as shown below. The implementation of this Merge Sort is the recursive version of Merge Sort + shown later. Just delete the Insertion Sort part from the Insertion Sort code and add the termination condition, which is almost the same, so omit it.

int array sort	N=10^5	N=10^6	N=10^7
Arrays.sort (Standard library)	5.979	73.396	891.750
MergeSort (Recursion)	9.373	117.921	1238.865

One of the causes is the behavior when the array is divided into small pieces. Generally, if the order is improved, the calculation and the way of holding the data become complicated, so the overhead is the order for the small size. When it comes to Merge Sort, the orderly inferior Insertion Sort (insertion sort, computational complexity $ O (N ^ 2) $) is faster when it comes to merge sort. ..

Therefore, set a boundary value for the array size, and if an array with a size smaller than that boundary value is passed, rewrite Merge Sort to execute Insertion Sort. The code with this rewrite is shown below. In the code below, all ranges are treated as half-open intervals.

`RecursionalMergeSort.java`


//Boundary value to switch from Merge Sort to Insertion Sort
private static final int INSERTION_SORT_THRESHOLD = 60;

//Functions visible from the outside
public static final void sort(int[] a, int from, int to) {
    sort(a, from, to, new int[a.length >> 1]);
}

// Merge Sort +Processing body of Insertion Sort
//work is a work area for saving the original array when merging the first half and the second half..
//work is reused to save memory usage.
private static final void sort(int[] a, int begin, int end, int[] work) {
    //Performs an insertion sort if the length of the given array is less than or equal to the boundary value
    if (end - begin <= INSERTION_SORT_THRESHOLD) {
        insertionSort(a, begin, end);
        return;
    }
    //Subscript in the middle of the array. 
    // >>1 represents dividing by 2.
    int mid = (begin + end) >> 1;
    //Recursively sort the first half of the array.
    sort(a, begin, mid, work);
    //Recursively sort the second half of the array.
    sort(a, mid, end, work);
    //First half array length
    int len = mid - begin;
    //Copy the first half of the array to the work area.
    //Copying an array is a System rather than a for statement.arraycopy works faster
    System.arraycopy(a, begin, work, 0, len);
    //i is the subscript of the storage location. 
    //wi is work(=First half of the array)Indicates which subscript you are looking at.
    //ti represents which subscript in the second half of the array you are looking at.
    //Subscripts in the second half of the array start with mid.
    for (int i = begin, wi = 0, ti = mid;; i++) {
        if (ti == end) {
            //If you have seen all the second half,Copy the elements remaining in the work area
            System.arraycopy(work, wi, a, i, len - wi);
            //There are no more elements to merge, so exit the loop
            break;
        } else if (work[wi] > a[ti]) {
            //Compare the element you are looking at in the first half with the element you are looking at in the second half.
            //Store the smaller one,Shift the subscript back. 
            a[i] = a[ti++];
        } else {
            //Same as above.
            a[i] = work[wi++];
            //If you finish looking at the first half,The rest of the elements in the second half are already in the array
            if (wi == len) {
                break;
            }
        }
    }
}

//Insertion Sort called when the array size is small.
private static final void insertionSort(int[] a, int from, int to) {
    //Array a is from<= x <When arranging in ascending order in the range of i, 
    // a[i]Find where to insert by shifting the element back.
    for (int i = from + 1; i < to; i++) {
        //Value to insert
        int tmp = a[i];
        // a[i-1]If it is above, there is no need to search
        if (a[i - 1] > tmp) {
            //Insertion subscript
            int j = i;
            //The first time always satisfies the while conditional statement, so do-whil
            do {
                //Move the element back by one.
                a[j] = a[j - 1];
                //Insertion destination is earlier
                j--;
            //"Reach the beginning" or "a"[i]Discover the following elements "
            } while (j > from && a[j - 1] > tmp);
            // a[i]Put in the place where it should be inserted.
            a[j] = tmp;
        }
    }
}

The comparison results are as follows. (Units are milliseconds, using the same test data)

int array sort	N=10^5	N=10^6	N=10^7
Arrays.sort (Standard library)	5.979	73.396	891.750
MergeSort (Recursion)	9.373	117.921	1238.865
MergeSort (Recursion) + Insertion Sort	6.557	82.228	991.833

It's still not as fast as ʻArrays.sort, but it's been accelerated to a practically acceptable speed. However, our goal is to implement a faster sort than ʻArrays.sort, so it's even faster.

A recursive call may be the cause of the slowness. It is said that rewriting the recursive call to for or while will improve the speed. Therefore, the next step is to implement Merge Sort that does not use recursion.

Merge Sort (non-recursive) + Insertion Sort

In the recursive version of Merge Sort, it was implemented with an image (= top-down) descending from top to bottom (from larger to smaller), but in the non-recursive version, it is implemented in the opposite direction (= bottom-up). That is, we first decompose the arrays into smaller sizes and then sort the arrays. We will call this sorted subcolumn a block. Then we will call this block in the same way as in the previous implementation. By merging two by two and increasing the size, the entire column is finally sorted. The implementation is shown below, but the code and comments common to the recursive version are omitted as appropriate.

`NonRecursionalMergeSort.java`


//Because it is non-recursive,The work area array is generated internally(You don't have to pass it as an argument).
public static final void sort(int[] a, int begin, int end) {
    //First of all,Perform Insertion Sort for each small block.
    //this time,Use the boundary value defined in the previous implementation as the smallest unit of block
    //i is the first subscript of the block
    for (int i = begin;;) {
        //j is the subscript at the end of the block(However,Since it is a half-open interval, j is not included.)
        int j = i + INSERTION_SORT_THRESHOLD;
        //The array length is not always divisible by the size of the block.
        //So,Determine if the block is wide enough. 
        if (j < end) {//If you can get enough width
            //Execute Insertion Sort
            insertionSort(a, i, j);
        } else {//If you don't have enough width(=When you reach the end of the array)
            //Run Insertion Sort from i to end.
            insertionSort(a, i, end);
            //I've reached the end so I'm out of the loop.
            break;
        }
        //The starting point of the next block is j. (i,Note that j was a half-open section)
        i = j;
    }
    //The length of the range of arrays to sort.
    int len = end - begin;
    //Work area.Same as before,Used to save the first half of the array.
    //The boundary is not always in the middle because the block division produces fractions..
    //Therefore,The area to be secured becomes wider.
    int[] work = new int[len];
    //block is the size of the block.Every time you merge, the size doubles, 
    //Update formula is block<<=1.
    for (int block = INSERTION_SORT_THRESHOLD; block <= len; block <<= 1) {
        //Because I merge two blocks at a time,Calculate the size of two blocks.
        int twoBlocks = block << 1;
        //from is a subscript that points to the beginning of the first of the two blocks to merge.
        //Therefore,In the update formula, add the size of two blocks.
        //max is the maximum value that can be taken from(-1).
        //There is no need to merge if the fraction is less than one interval., 
        // max = end -The length of one block is subtracted as a block.
        for (int from = begin, max = end - block; from < max; from += twoBlocks) {
            //Boundary between two blocks(Point to the beginning of the second block).
            int mid = from + block;
            //End of second block.Take min, paying attention to the end of the interval.
            int to = Math.min(from + twoBlocks, end);
            //The merge of the two blocks is the same as the recursive version, so the explanation is omitted..
            System.arraycopy(a, from, work, 0, block);
            for (int i = from, wi = 0, ti = mid;; i++) {
                if (ti == to) {
                    System.arraycopy(work, wi, a, i, block - wi);
                    break;
                } else if (work[wi] > a[ti]) {
                    a[i] = a[ti++];
                } else {
                    a[i] = work[wi++];
                    if (wi == block) {
                        break;
                    }
                }
            }
        }
    }
}
//The insertionSort method is the same as the recursive version, so it is omitted..

The comparison result is as follows.

int array sort	N=10^5	N=10^6	N=10^7
Arrays.sort (Standard library)	5.979	73.396	891.750
MergeSort (Recursion)	9.373	117.921	1238.865
MergeSort (Recursion) + Insertion Sort	6.557	82.228	991.833
MergeSort (Non-recursive) + Insertion Sort	3.926	43.144	477.579

It seems that it was possible to speed up considerably just by rewriting it non-recursively. It is superior to ʻArrays.sort`.

However, there are actually faster algorithms for sorting arrays that have ʻintorlong` as elements.

Radix Sort

Here, we implement an algorithm called Radix Sort. The amount of calculation of Radix Sort is $ O (k \ ast N) with the word length of scale notation (for example, $ 10 $ base notation) as $ k $. ) $.

Radix Sort is not a "comparison" sorting algorithm. What it means is that when an element can be represented in some radix notation, it only focuses on the value of each digit without a direct comparison. You can sort by. ʻIntandlong are represented by bit columns, and this bit column is nothing but radix sort. Therefore, columns consisting of ʻint and long are sorted by Radix Sort. It will be sorted. If you look at the implementation shown below, you can see that there is no comparison of the two elements of the array even though it is a sorting algorithm.

Specifically, the algorithm consists of the following k Steps, where k is the bit length. By performing each Step with $ O (N) $, the total $ O (k \ ast N) $ becomes Can be achieved.

Step 1. Prepare a bucket to put the element whose lowest bit is 0 and a bucket to put the element whose lowest bit is 1, and store the elements of the sequence. Step 2. Do the same operation for the second bit from the bottom as in Step 1. However, for any two elements that go into the same bucket, the order in which they are stored should not break the order of the buckets in step 1. Must be something. Step i. Sort for the i-th bit from the bottom in the same way. Step k. Sort for the kth bit from the bottom in the same way.

I find it difficult to understand the "however, ..." in Step 2, so I'll follow the flow with an example. In the following, a = [010, 011, 100, 101,000] is sorted according to the above procedure. Since the word length is 3, the sorting will be completed in Steps 1 to 3 below.

Step 1. Let a = [010, 011, 100, 101, 000]-> bucket = [[010, 100, 000], [011, 101]]. Return this to a and a = [010, 100, 000] , 011, 101] and updated.

Step 2. Let a = [010, 100, 000, 011, 101]-> bucket = [[100, 000, 101], [010, 011]]. Return this to a and a = [010, 100, 000] , 011, 101] and updated. At this time, [100, 000, 101] put in bucket 0 are put in the same order as a before processing. The same applies to bucket 1.

Step 3. Set a = [100, 000, 101, 010, 011]-> bucket = [[000, 010, 011], [100, 101]] and set this back to a, a = [000, 010, 011 Updated with 100, 101]. A is certainly sorted.

The above is the general flow of Radix Sort, but there is one big problem. It is the treatment of negative numbers. In many languages such as Java, negative integers are represented using 2's complement representation, so If the above algorithm is applied as it is, a negative number with the most significant bit of 1 will be sorted as being larger than a positive number (in other words, sorting will be performed if it is regarded as ʻunsigned). Also, the magnitude relation of negative numbers in signed and the magnitude relation of ʻunsigned are just the opposite. Therefore, consider that the correct sort sequence can be obtained by following the steps below.

Step 1. Perform Radix Sort in descending order (as ʻunsigned). Step 2. Sorted in descending order by ʻunsigned, so the negative number before and the positive number after it are fixed. Also, if you look at the magnitude relationship as signed, the negative number and the positive number are positive. The numbers are arranged in descending order. Therefore, if only the negative number column is inverted and only the positive number column is inverted, the sorted column with signed can be obtained.

Now you can write a Radix Sort that also supports negative numbers. However, in the implementation shown below, 8 bits are grouped together as one place to shorten the word length. In other words, we are doing Radix Sort using $ 2 ^ 8 = 256 $ radix sort notation.

`RadixSort.java`


//bucket size.
private static final int BUCKET_SIZE = 256;
//int is 32bit,Word length in 256 base is 32/ 8 = 4.
private static final int INT_RECURSION = 4;
//Lower 8 bits(Single digit in 256 base)Mask for extracting
private static final int MASK = 0xff;

//Functions visible from the outside
public static final void sort(int[] a, int from, int to) {
    sort(a, from, to, 0, new int[to - from]);
}

//Bucket is represented by a one-dimensional array to save memory.
//l means that it is now Step l. (However, 0-indexed l= 0, 1, 2, 3) 
private static final void sort(int[] a, final int from, final int to, final int l, final int[] bucket) {
    //Since the mask is done after shifting to the right,Calculate the right shift amount.
    final int shift = l << 3;
    //Manage the number of pieces in each bucket.
    //I want the starting point of each bucket, so I have it as a cumulative sum.
    final int[] cnt = new int[BUCKET_SIZE + 1];
    //Manage how many are placed in each bucket.
    final int[] put = new int[BUCKET_SIZE];
    for (int i = from; i < to; i++) {
        // a[i] >>>Shift the desired 8 bits to the lowest level.
        // (a[i] >>> shift) &MASK tells you which bucket to enter.
        //However,The subscript will be shifted by one as it will be accumulated later..
        cnt[((a[i] >>> shift) & MASK) + 1]++;
    }
    //If you take the accumulation, cnt[i]Stores the start point of the i-th bucket in.
    for (int i = 0; i < BUCKET_SIZE; i++) {
        cnt[i + 1] += cnt[i];
    }
    for (int i = from; i < to; i++) {
        //Calculate which bucket to enter in the same way as before.
        int bi = (a[i] >>> shift) & MASK;
        //Stored in bith bucket. 
        //Do not overwrite what you have already placed, put[bi]Increase
        bucket[cnt[bi] + put[bi]++] = a[i];
    }
    //I want to sort in descending order,Scan bucket in reverse order
    //idx represents the next place to be stored on the array a.
    for (int i = BUCKET_SIZE - 1, idx = from; i >= 0; i--) {
        //The starting point of the i-th bucket
        int begin = cnt[i];
        //i-th bucket size.Calculated using the cumulative sum of cnt.
        int len = cnt[i + 1] - begin;
        //Copy the contents of the bucket back into the original array.
        System.arraycopy(bucket, begin, a, idx, len);
        //The next place to store is len ahead
        idx += len;
    }
    //The next step is Step l+ 1.
    final int nxtL = l + 1;
    //Is there still a place left?
    if (nxtL < INT_RECURSION) {
        //To the next step.bucket is reused
        sort(a, from, to, nxtL, bucket);
        // i =If you have reached here with 0,Descending sort by unsigned is finished
        if (l == 0) {
            //Variables to reuse.
            int lft, rgt;
            //Negative number in the first half,Since there is a positive number in the second half, binary search for the boundary.
            //Find a negative number. O(log N).
            lft = from - 1; rgt = to;
            while (rgt - lft > 1) {
                int mid = (lft + rgt) >> 1;
                if (a[mid] < 0) {
                    lft = mid;
                } else {
                    rgt = mid;
                }
            }
            //Negative number is the value of rgt.
            final int negative = rgt;
            //Subsequence of array a[from:negative-1](Closedsection)To reverse.
            //Negative numbers are arranged in descending order in this section.
            lft = from; rgt = negative - 1;
            while (rgt > lft) {
                int tmp = a[lft]; a[lft] = a[rgt]; a[rgt] = tmp;
                lft++; rgt--;
            }
            //Subsequence of array a[negative:to-1](Closedsection)To reverse.
            //Positive numbers are arranged in descending order in this section.
            lft = negative; rgt = to - 1;
            while (rgt > lft) {
                int tmp = a[lft]; a[lft] = a[rgt]; a[rgt] = tmp;
                lft++; rgt--;
            }
        }
    }
}

The comparison result is as follows.

int array sort	N=10^5	N=10^6	N=10^7
Arrays.sort (Standard library)	5.979	73.396	891.750
MergeSort (Recursion)	9.373	117.921	1238.865
MergeSort (Recursion) + Insertion Sort	6.557	82.228	991.833
MergeSort (Non-recursive) + Insertion Sort	3.926	43.144	477.579
RadixSort	1.000	11.124	133.713

It's an overwhelming victory. However, one thing to be aware of is that sorting the long array takes a lot of time because the order is proportional to the word length. Actually, it is measured by the long array. The result is that the performance of Radix Sort is degraded as shown below.

long array sort	N=10^5	N=10^6	N=10^7
Arrays.sort (Standard library)	6.300	67.841	751.904
MergeSort (Recursion)	9.110	111.099	1340.971
MergeSort (Recursion) + Insertion Sort	6.594	76.131	922.294
MergeSort (Non-recursive) + Insertion Sort	4.108	43.538	484.530
RadixSort	2.684	29.359	339.683

Still, it is by far the fastest of all.

Summary

I think I was able to beat ʻArrays.sort` brilliantly. I don't think Radix Sort was fooled, but I'm very happy that I also won Merge Sort.