Java's Stream API is super fast if used well

Introduction

Java has a function called ParallelStream that allows you to easily write parallel processing, and if you use it properly, you can greatly improve the performance. It's a feature added in Java8, so it's new, but recently I've realized the goodness of parallel streams, so I'll write it.

Operating environment

Contents

Here, we will look at the effect of parallel streams by taking as an example the code that checks whether the number is Kaprekar number (definition 2) [^ 1] for numbers from 0 to 600 million. [^ 1]: You don't need to know what the Kaprekar number is. Anyway, it is some kind of processing that puts a load on the CPU. If you still want to know, please google. Also, the number 600 million has no particular meaning. It's just the number of cases that take a good time to process.

Example without using parallel streams

First, let's look at an example that does not use parallel streams. (Sequential stream) It's sloppy and long, so take a look at the main method below for now.

import java.util.*;
import java.util.stream.Collectors;
import java.util.stream.LongStream;

public class ParallelStreamSample {
    //Whether it is the Kaprekar number (definition 2)
    private static boolean isKaprekarNumber2(long n) {
        List<Character> chars = String.valueOf(n)
                                    .chars()
                                    .mapToObj(c -> (char)c)
                                    .collect(Collectors.toList());
        // min
        chars.sort(Comparator.naturalOrder());
        var min = parseLong(join(chars));

        // max
        chars.sort(Comparator.reverseOrder());
        var max = parseLong(join(chars));

        return n == (max - min);
    }

    private static <T> String join(List<T> list) {
        var sb = new StringBuilder();
        for(T item : list) {
            sb.append(item);
        }
        return sb.toString();
    }

    private static long parseLong(String s) {
        if(s.isEmpty()) {
            return 0;
        } else {
            return Long.parseLong(s);
        }
    }

    public static void main(String[] args) {
        System.out.println("--------------------");
        System.out.println("Kaprekar number definition 2");
        long start = System.nanoTime();
        LongStream.rangeClosed(0, 600_000_000)
                //Sequential stream (usually this call is not needed, but is described for comparison with parallel streams)
                .sequential()
                .filter(n -> n % 9 == 0)
                .filter(ParallelStreamSample::isKaprekarNumber2)
                .forEachOrdered(System.out::println);
        long end = System.nanoTime();

        System.out.printf("processing time(ms): %d\n", (end - start) / 1_000_000);
        System.out.println("--------------------");
    }
}

In my environment, the result is as follows.

--------------------
Kaprekar number definition 2
0
495
6174
549945
631764
63317664
97508421
554999445
processing time(ms): 60284
--------------------

It takes a little over a minute. I'd like to make it a little faster, but how fast would it be with parallel streams?

Example of using parallel streams

Next, let's look at an example of using parallel streams.

import java.util.*;
import java.util.stream.Collectors;
import java.util.stream.LongStream;

public class ParallelStreamSample {
    //Whether it is the Kaprekar number (definition 2)
    private static boolean isKaprekarNumber2(long n) {
        List<Character> chars = String.valueOf(n)
                                    .chars()
                                    .mapToObj(c -> (char)c)
                                    .collect(Collectors.toList());
        // min
        chars.sort(Comparator.naturalOrder());
        var min = parseLong(join(chars));

        // max
        chars.sort(Comparator.reverseOrder());
        var max = parseLong(join(chars));

        return n == (max - min);
    }

    private static <T> String join(List<T> list) {
        var sb = new StringBuilder();
        for(T item : list) {
            sb.append(item);
        }
        return sb.toString();
    }

    private static long parseLong(String s) {
        if(s.isEmpty()) {
            return 0;
        } else {
            return Long.parseLong(s);
        }
    }

    public static void main(String[] args) {
        System.out.println("--------------------");
        System.out.println("Kaprekar number definition 2");
        long start = System.nanoTime();
        LongStream.rangeClosed(0, 600_000_000)
                //Parallel stream
                .parallel()
                .filter(n -> n % 9 == 0)
                .filter(ParallelStreamSample::isKaprekarNumber2)
                .forEachOrdered(System.out::println);
        long end = System.nanoTime();

        System.out.printf("processing time(ms): %d\n", (end - start) / 1_000_000);
        System.out.println("--------------------");
    }
}

Only one line was changed except for the comment. (Sequential () is changed to parallel ()) This allows you to use parallel streams instead of sequential streams.

This has the following results in my environment.

--------------------
Kaprekar number definition 2
0
495
6174
549945
631764
63317664
97508421
554999445
processing time(ms): 22366
--------------------

It's about 22 seconds. It took about 1 minute for the sequential stream, so the processing time is less than half. It is wonderful.

Why is it faster?

This is because the sequential stream uses only a single core for execution, while the parallel stream uses multiple cores for shared processing. It's like whipping a skipping core to make it work faster. (Image only)

About the conditions for using parallel streams

This sample is faster with just one line change, but it doesn't always work that way. To get the benefits of parallel streams, some conditions must be met.

The execution environment is multi-core

This is a major premise. Since it is a mechanism to make it faster by using multiple cores, it will not be faster in a single core environment. Rather, it is slowed down by the overhead of parallel processing.

The CPU is the bottleneck process

This is also a major premise. It is an image that makes it faster by using all the CPU resources, so it is meaningless if something other than the CPU is the bottleneck.

Must be a thread-safe implementation

Running on multiple cores means running on multiple threads. Therefore, the process to be executed must be thread-safe. The above ʻisKaprekarNumber2 ()` method is thread-safe because it depends only on its arguments, but let's see what happens in a non-thread-safe implementation that depends on fields rather than arguments. (It's a strange example ...)

import java.util.*;
import java.util.stream.Collectors;
import java.util.stream.LongStream;

public class ThreadUnsafeSample {
    public static void main(String[] args) {
        System.out.println("--------------------");
        System.out.println("Kaprekar number definition 2");
        long start = System.nanoTime();

        var myNum = new MyNumber();
        LongStream.rangeClosed(0, 600_000_000)
                .parallel()
                .filter(n -> n % 9 == 0)
                .filter(n -> {
                    //Not thread-safe because it reads and writes fields without exclusive control
                    myNum.setNum(n);
                    return myNum.isKaprekarNumber2();
                })
                .forEachOrdered(System.out::println);
        long end = System.nanoTime();

        System.out.printf("processing time(ms): %d\n", (end - start) / 1_000_000);
        System.out.println("--------------------");
    }

    private static class MyNumber {
        private long num;

        private void setNum(long num) {
            this.num = num;
        }

        //Whether it is the Kaprekar number (definition 2)
        private boolean isKaprekarNumber2() {
            List<Character> chars = String.valueOf(num)
                                        .chars()
                                        .mapToObj(c -> (char)c)
                                        .collect(Collectors.toList());
            // min
            chars.sort(Comparator.naturalOrder());
            var min = parseLong(join(chars));

            // max
            chars.sort(Comparator.reverseOrder());
            var max = parseLong(join(chars));

            return num == (max - min);
        }

        private <T> String join(List<T> list) {
            var sb = new StringBuilder();
            for(T item : list) {
                sb.append(item);
            }
            return sb.toString();
        }

        private long parseLong(String s) {
            if(s.isEmpty()) {
                return 0;
            } else {
                return Long.parseLong(s);
            }
        }
    }
}

In my environment, I got the following apparently incorrect results:

--------------------
Kaprekar number definition 2
processing time(ms): 23902
--------------------

As you can see, using parallel streams in a non-threadsafe implementation will not work as expected, so be careful. Of course, if I deleted parallal () or rewritten it to sequential () and executed it in a sequential stream, the result was correct. (Aside from being late ...)

--------------------
Kaprekar number definition 2
0
495
6174
549945
631764
63317664
97508421
554999445
processing time(ms): 75049
--------------------

No exclusive control is performed

In the continuation of the above, if exclusive control is performed in the read / write part of the field, the correct result will be obtained.

.filter(n -> {
    synchronized(myNum) {
        myNum.setNum(n);
        return myNum.isKaprekarNumber2();    
    }
})

The following is the execution result.

--------------------
Kaprekar number definition 2
0
495
6174
549945
631764
63317664
97508421
554999445
processing time(ms): 90823
--------------------

The output is correct, but 1.5 times slower than the original implementation. This is because exclusive control effectively results in execution in a single thread (+ there is an overhead of parallel execution). You don't know what the parallel stream is for.

Summary

If you use parallel streams well, you can expect a significant increase in performance. However, there are many conditions to obtain the effect, so use it with caution.

Digression

About the following part

// max
chars.sort(Comparator.reverseOrder());

Actually, the following seems to be faster. (At this point chars is sorted in ascending order)

// max
Collections.reverse(chars);

However, correction is troublesome </ s> and it doesn't affect the main subject so much, so I leave it as it is.

Recommended Posts

Java's Stream API is super fast if used well
Notes on Java's Stream API and SQL