Let's consider the meaning of "stream" and "collect" in Java's Stream API.

Introduction

Already a long time ago, Java introduced a functional programming technique called the Stream API. Functional programming is already widespread and Java is a fairly latecomer, but it's still introduced because it has the advantage of being able to write highly readable code efficiently if used well. thinking about. (Reference: "[Functional programming for mediocre programmers](https://anopara.net/2016/04/14/%E5%B9%B3%E5%87%A1%E3%81%AA%E3%" 83% 97% E3% 83% AD% E3% 82% B0% E3% 83% A9% E3% 83% 9E% E3% 81% AB% E3% 81% A8% E3% 81% A3% E3% 81% A6% E3% 81% AE% E9% 96% A2% E6% 95% B0% E5% 9E% 8B% E3% 83% 97% E3% 83% AD% E3% 82% B0% E3% 83% A9% E3% 83% 9F% E3% 83% B3 /) ")

However, the other day, in a lecture, I heard that "Java's collection operation (Stream API) feels more forceful than pure functional languages such as Haskell, and I don't want to use it." Specifically, for functional languages, `list.map (...)` is enough, but for Java, one by one.

**`list.stream().map(/* */).collect(Collectors.toList())`**


```like```stream()```I thought that it would be a redundant writing style because it was necessary to put it in between.

 I prefer to use the Stream API because it's better than nothing, but I wondered if there are some aspects that people who like functional languages wouldn't accept.

 So why Java doesn't adopt a simple writing style like `` `list.map (...) ```, but calls `` `stream ()` `` one by one and converts it to another type called Stream. Then, I asked if I tried to convert again with ``` collect```. Java has a culture of careful language design, and there must be some reason for the pros and cons of the result. I think there are two main reasons for this.

 1. Lazy evaluation
 2. Object-oriented constraints

 Below, I will give my thoughts in detail.

## What is lazy evaluation?
 Generally, when performing a collection operation, it is useless to recreate the collection one by one in the process, and there are concerns about performance.
 To prevent this, even though the code looks like a gradual change in the collection, it should actually be generated in bulk at the end. This calculation method, in which the value is not calculated until it is needed, is called ** lazy evaluation **.
 For example

```java
List<String> list = Arrays.asList("foo", "bar", "hoge", "foo", "fuga");
list.stream()
  .filter(s -> s.startsWith("f"))
  .map(s -> s.toUpperCase())
  .collect(Collectors.toSet()); // ["FOO", "FUGA"]

Like that

--Extract the list of strings starting with "f" --Convert strings to uppercase --Remove duplicates (convert to Set)

When performing the collection operation, a new collection with 3 elements will not be created when filter is called. The actual collection is generated when `collect (Collectors.toSet ())` is called last.

By the way, strictly speaking, it seems that this is not called lazy evaluation, but since there is no other appropriate name, we will call it lazy evaluation. (Reference: "What is lazy evaluation")

The antonym for lazy evaluation is ** strict evaluation **. It is a method to calculate at that point even if the value is unnecessary. This is usually more common.

Disadvantages of lazy evaluation

While there are benefits, there are some unexpected pitfalls if you don't use them with caution. The following is an example (although it is not very preferable) that the delay evaluation may cause a discrepancy between the appearance and the actual execution result.

//Input / output data definition
List<String> input = Arrays.asList("foo", "bar", "hoge", "foo", "fuga");
List<String> copy1 = new ArrayList<>();
List<String> copy2 = new ArrayList<>();

//Collection operation started.Run filter
Stream<String> stream = input.stream()
    .filter(s -> {
        copy1.add(s);
        return s.startsWith("f");
    });
System.out.println(copy1.size()); //At this point, the filter operation is not actually evaluated, so copy1 is left empty and 0 is output.
System.out.println(copy2.size()); //Of course copy2 remains empty, so 0 is output.

//Then execute the map of the collection operation
stream = stream
    .map(s -> {
        copy2.add(s);
        return s.toUpperCase();
    });
System.out.println(copy1.size());  //At this point, the filter operation has not been evaluated yet, so 0 is output.
System.out.println(copy2.size()); //Similarly, the map operation is not evaluated, so 0 is output.

stream.collect(Collectors.toList());
System.out.println(copy1.size()); // stream.5 is output because the filter is finally evaluated by collect
System.out.println(copy2.size()); //Similarly, the map operation is evaluated, so 3 is output.

At first glance, the above code seems to increase the size of copy1, copy2 when calling filter, map, As a matter of fact, the size of copy1 and copy2 increases when stream.collect is called. In this way, if there is a discrepancy between the appearance and the actual evaluation timing, there is a risk that it will be difficult to debug and identify the cause when something goes wrong.

How to balance

Lazy evaluation risks embedding complex bugs if misused. However, if you don't use lazy evaluation at all, you run the risk of wasting collections and slowing performance.

In the case of Java, since it is common to think that the back end may handle a large amount of data, we want to avoid the latter risk, so we have to introduce lazy evaluation. What's more, it would be preferable to have a lazy evaluation that is naturally (?) So that the strict evaluation is not used unintentionally.

However, it is risky to allow lazy evaluation to be applied over a wide range of Java standard features. Therefore, I think that lazy evaluation should be limited to a specific type, and lazy evaluation should not be used in other types.

Appearance of "Stream"

[Stream](https://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%88%E3%83%AA%E3%83%BC%E3%83%A0_(%E3%) 83% 97% E3% 83% AD% E3% 82% B0% E3% 83% A9% E3% 83% 9F% E3% 83% B3% E3% 82% B0)))

A stream is an abstract data type that regards data as a "flowing thing", inputs the data that flows in, and handles the data that flows out as an output.

As mentioned earlier, the only type that can perform lazy evaluation is named "** Stream **". And the name of the collection operation API is "Stream API" as it is, the point is that if you want to do the collection operation, you can use stream () as the name suggests.

By doing so, we believe we have forced lazy evaluation to avoid the risk of performance degradation due to strict evaluation.

Object-oriented constraints

Another reason via stream is object-oriented constraints. (Strictly speaking, it corresponds to the constraint of handling types rather than object-oriented, but since functional types and object-oriented are often compared, the term "object-oriented" is used here.) Let's say you have defined a default map method for the List type.

interface List<E> {
    default <R> List<R> map(Function<? super E, ? extends R> mapper) {
        List<R> result = new ArrayList<>();
        for (E elem : this) {
            result.add(mapper.apply(elem));
        }
        return result;
    }
}

By doing this, you can perform List type conversion like list.map (...) for the time being. If you implement methods such as filter and other collection types in the same way, you can convert collections in a concise way without going through stream.

However, this method has serious drawbacks. It is a collection type other than the standard library.

For example, suppose a developer creates a MyList that implements the List interface and adds a unique method, doSomething. Here, if MyList type is converted to map by the above method, it will be another List type after conversion, and doSomething cannot be called.

MyList<> mylist = new MyList<>();
//Omission
mylist.doSomething(); //OK
myList.map(x -> new AnotherType(x)).doSomething(); //Compile error

This will be a challenge when incorporating functional programming into object-oriented languages. However, I don't really see such cases so I don't have to worry about it, but it's probably unacceptable due to the nature of the Java language.

As far as Scala is concerned, this difficulty has been overcome with implicit type resolution. It is described in the book introduced in Supplement A below, so please take a look if you are interested.

Appearance of "Collector"

For the above reasons, when you start a collection conversion operation, you have to regenerate something different from the original collection, and it is the responsibility of the caller, not the library, to specify the collection. "** Collector **" is responsible for this, and Stream.collect specifies which collection type the caller should convert to. The source code below is the implementation of Collectors.toList.

public static <T>
Collector<T, ?, List<T>> toList() {
    return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add,
                               (left, right) -> { left.addAll(right); return left; },
                               CH_ID);
}

For collection type MyList created by yourself, if you prepare a method to generate Collector instance in the same way, it will be possible to convert from MyList to MyList. This allows collection types created prior to the Stream API to be used without major changes on the Stream API.

By the way, the reason for not creating the toList method in Stream type

Although the visibility is improved by aggregating collection conversions to Collector type, List type etc. appear frequently in daily life. At the very least, I think it's okay to write as concisely as stream.toList () instead of stream.collect (Collectors.toList ()). I think the reason for not doing this is probably a type dependency. The point is that it is essential to refer to the Collection type to the Stream type, but I think that the reason is that referencing the Collection type from the Stream type is cross-referenced, which is not preferable for type design.

Let's chant the magic and use it safely

As mentioned above, the result of considering various balances and consistency is `list.stream (). Map (/ * * /). Collect (Collectors.toList ())`, which is a collection operation that can be taken as redundant. I think that it has settled down in the shape of.

In a sense, I think it's a ** very Java-like conclusion **.

It seems that there is a mysterious project in the world that you should not use the Stream API because it is dangerous when using Java, but since it is made with safety in mind like this, it is uneasy to use it normally. You don't have to go. If you chant stream, collect according to the standard, no problem will occur unless something goes wrong.

in conclusion

Except for those who are particular about pure functional languages, I think that redundant descriptions are well tolerated. If you can use functional programming well, you will be able to write highly readable code efficiently. If you haven't used it yet, please give it a try. (Reference: Introduction to Java Stream API)

Supplement A. Languages other than Java

I'm not very familiar with it, but I'll just give you a reference of what other languages offer.

C# LINQ in C # is lazy evaluation like Java. Unlike Java, you don't have to call stream () to start it, and you often only need to call ToList, for example, when collecting, so it's much more concise and convenient than Java. (Reference: "[Miscellaneous notes] LINQ and lazy evaluation") (Reference: "C # er naturally knows !? Advantages and disadvantages of LINQ lazy evaluation")

Scala Scala is also a functional language, and it is possible to use lazy evaluation and strict evaluation properly. For example, `list.map (...) ``` can be used to convert to another collection by strict evaluation. It is also possible to convert to another collection by lazy evaluation in the form of view, force like `list.view.map (...) .filter (...) .force```. (Reference: "Create a generator with Scala and evaluate lazy evaluation")

In addition, it seems that there was a time when it was difficult to distinguish between strict evaluation and lazy evaluation and caused confusion long ago, but at one time the boundary was clarified.

View
Stream

It seems that only these two types have been arranged as targets for lazy evaluation. As for Scala, the book "Scala Scalable Programming" has a lot of terrifying details, so if you are interested, please do not hesitate to contact us. Please have a look at this.

JavaScript The JavaScript standard has no lazy evaluation. Array.prototype has standard collection operation APIs such as map and filter, but they are all strict evaluations. This is probably because the JavaScript used on the client side does not handle large amounts of data, and lazy evaluation is not required as standard equipment.

Haskell Haskell is also a purely functional language, and it seems that the coat color is different from the ones listed above. Whereas normal languages are based on strict evaluation, Haskell is based on lazy evaluation. Therefore, it seems that there is nothing to do with the balance between strict evaluation and lazy evaluation as we care about in this article. (Reference: "Strict evaluation and lazy evaluation (details)")

Other than the above (PHP, Ruby, Python, etc ...)

I will investigate it soon.

Supplement B. Option to use another library

In addition to the Java standard, there is a collection manipulation library called Eclipse collections. This allows you to neatly describe what is redundant with the Stream API. (Reference: "I touched Eclipse Collections") (Reference: "Eclipse Collections cheat sheet")

In addition, ImmutableList, an immutable List interface, is provided, and it is a library with a deeper color of functional method. If you want to handle more functional collection operations than the Stream API, I think implementation is an option.

However, if you want to completely replace the Stream API with Eclipse Collections, you will have to do a lot of work. When introducing it, the story of the site where the introduction was actually carried out "[Framework support for instilling Eclipse Collections in the field](https://speakerdeck.com/jflute/how-unext-took-in-eclipse- collections-in-fw) ”will be helpful.