In Java8 Stream Rough Summary, I introduced most of the Stream operations, but I will write about the reduction operations because it seems that a detailed explanation is needed. Basically, I refer to JavaDoc of Stream.
The reduction operation is an operation that returns the result of combining all the elements in the stream into one using a cumulative function. It seems that reduction is called convolution in Japanese, but it is not so-called convolution. It's more like a sum or an infinite product (or rather, a sum or an infinite product is a type of reduction). Reduction operations include reductions that return a single value and variable reductions that return a container containing multiple values (such as a Collection).
The reduce
method returns the result of accumulating each element using a cumulative function (ʻaccumulator). There are three types of overrides in the
reduce` method:
Method |
---|
T reduce(T identity, BinaryOperator<T> accumulator) |
Optional<T> reduce(BinaryOperator<T> accumulator) |
<U> U reduce(U identity, BiFunction<U,? super T,U> accumulator, BinaryOperator<U> combiner) |
T: Type of element in Stream
ʻAccumulatoris a function (interface) for calculating the cumulative result by repeating the summing of elements. In the
reduce method, ʻaccumulator.apply
is called internally to add up the two values to generate an intermediate result, and the totaling process is repeated for the intermediate result to obtain the final result. ..
Therefore, ʻaccumulator.apply` must be an operation for which the associative law holds.
ʻIndetify is the identity element of ʻaccumulator.apply
.
Mathematical definition of identity element
Arbitrary element a in the set and binary operations on it*An e that satisfies the following properties is called an identity element.
a * e = e * a = a
Java-like expression
An identity element that satisfies the following properties for any element a in the stream is called an identity element.
accumulator.apply(identify, a) == accumulator.apply(a, identify) == a
To give a concrete example, the identity element of + (additive) on a real number set is 0, and the identity element of * (multiplication) is 1.
When a = 2
0 + 2 = 2 + 0 = 2
1 * 2 = 2 * 1 = 2
If the element in the stream does not exist, the original value is returned as a result of the reduction. If you don't specify an identity element (the second override method), you should specify it as much as possible because it simplifies the process inside `reduce'(especially for parallel processing).
combiner
is a function (interface) for combining cumulative results.
A parallel stream that combines the results executed by each thread.
If you do not specify combiner
(first and second override methods), this join process is done by ʻaccumulator. Also, for sequential streams, specifying
combiner` does not use it.
combiner
is the same as ʻaccumulator, the identity element must be ʻidentity
, and it must be a function that satisfies the associative law, and it must also satisfy the following (compatible with ʻaccumulator`).
combiner.apply(u, accumulator.apply(identity, t)) == accumulator.apply(u, t)
The case where combiner
is required is when the functions to be executed when summing the elements and when combining the cumulative results are different.
For example, the sum of squares is the case.
In the case of sum of squares, it is $ a ^ 2 + b ^ 2 $ between the elements, but if you use the same function when combining the results, $ (a ^ 2 + b ^ 2) ^ 2 + ( It will be c ^ 2 + d ^ 2) ^ 2 $.
Therefore, the reduction of the sum of squares can be realized by making combiner
a simple sum.
Stream.iterate(1, x->x+1)
.limit(4)
.parallel()
.reduce(0, (x,y)->x*x + y*y, (x,y)->x+y);
This process takes the sum of squares from 1 to 4, so the result is 30.
However, if you do not use combiner
, the result will be different.
Since it is difficult to check the calculation order in parallel processing, if you change to sequential processing and execute this processing, the result will be 1172.
This is the result of the following calculations:
(((((0^2+1^2)^2)+2^2)^2+3^2)^2+4^2) = 1172
However, this form of processing can be simply expressed using map
.
For sum of squares, all you have to do is find the square of each element with map
and then reduce.
Stream.iterate(1, x->x+1)
.limit(4)
.parallel()
.map(x->x*x)
.reduce(0, (x,y)->x*x + y*y);
The collect
method returns a mutable result container instead of a value as a result.
There are two types of overrides for the collect
method.
Method |
---|
<R> R collect(Supplier<R> supplier, BiConsumer<R,? super T> accumulator, BiConsumer<R,R> combiner) |
<R,A> R collect(Collector<? super T,A,R> collector) |
The first way is to define each operation in the result container yourself. If you pass the create, add, and join operations respectively, each operation will be used for reduction.
supplier
is the process of instantiating the result container.
Basically, you will often pass the result container class :: new
, but if you have a Factory class, you may also pass its creation method.
For parallel processing, supplier is called multiple times, but each time you need to create a new instance.
ʻAccumulator is the process to add one element to the result container. For Collection type objects, the ʻadd
method is applicable.
It must be a process that holds the associative law.
combiner
is the process for combining two result containers.
For parallel processing, the result methods created in each thread are combined by combiner
.
For Collection type objects, the ʻaddAll method is applicable. As with normal reduction processing, the associative law must be established and the processing must be compatible with ʻaccumulator
.
Also, it is not used for sequential streams.
For example, the reduction process that stores the elements in the stream in ArrayList and returns them is as follows.
Stream.iterate(1, x->x+1)
.limit(10)
.parallel()
.collect(ArrayList::new,
Collection::add,
Collection::addAll));
Collector is an object that encapsulates the operations supplier
, ʻaccumulator, and
combiner. The Collector itself is an interface, and in addition to the above three operations, the
characteristics method that returns the characteristic set of the collection and the
finisher` method that performs the final conversion process are defined.
You can create your own Collector object, but the Collctors class has many static methods that return useful Collectors.
For example, the process of making a List as before can be described as follows using the toList
method.
Stream.iterate(1, x->x+1)
.limit(10)
.parallel()
.collect(Collectors.toList()));
However, toList
does not guarantee the type of List returned (although it was ArrayList when I tried it in the above process).
If you want to decide the type of Collection to use in more detail, there is a toCollection
method.
Stream.iterate(1, x->x+1)
.limit(10)
.parallel()
.collect(Collectors.toCollection(ArrayList::new)));
In the toCollection
method, only the supplier
(generation process) is passed.
Any implementation class of Collection can be used, so it can be HashSet (although there is also a method called toSet
for Set).
Since there are many convenient methods like this in the Collectors class, you can easily use variable reduction by using them.
Recommended Posts