[JAVA] Reasons to avoid for and map, filter, reduce, etc.

Introduction

I don't know what the brewing story is, but I will explain why for (traditional for statement and extended for and forEach with similar problems) should be avoided and alternatives. The following content is written from my experience of writing the code of the business logic part of the API server mainly in the Web industry, and it has become a story biased to some fields (all for in the world) I have no intention of criticizing it). I'm sorry that it has become a slightly subject deca title.

The code examples are mainly written in Java, but the content itself summarizes things that are likely to apply even when written in other languages.

Reasons to avoid for

First, write the old-fashioned for statement. The well-known syntax is:

for(int i = 0; i < array.length(); ++i){
  ...
}

This syntax exists in various languages ​​such as C, C ++, Java, C #, JavaScript, etc. (details such as type and array size are different).

This syntax has the following issues with maintainability and readability [^ 1]:

  1. It is easy to make a mistake in the subscript used for the loop and it is buggy
  2. High degree of freedom in description
  3. Need to write to a variable outside of for
  4. I can write more than one thing I want to do
  5. `continue```/ `break``` is difficult to handle

Each content will be explained below.

1. It is easy to make a mistake in the subscript used for the loop and it is buggy

This is a problem with traditional for statements. An example of causing this problem is to write `i <array.length``` as i <= array.length `` in the above code. Also, if you double the loop, you may mistake the subscripts for the inner loop and the outer loop. I think those who actually wrote for statements are definitely experiencing this kind of typo.

This problem can be solved by using `for of` for JavaScript and extension for (`for (var item: array)`) for Java. I think that similar syntax is generally found in new programming languages ​​in a similar paradigm (Ah, in the standard function of C language ...).

2. High degree of freedom in description

In the first place, for has a high degree of freedom in description. [^ 2] "Write multiple processes that are not very related to each other in a loop" "Use a list with elements added up to the point where it was interrupted in the middle of the loop" [^ 3] Even a little irregular processing can be written relatively easily. I think that the fact that you can do anything will come out as a bad side and lead to a decrease in maintainability and readability. I will write about each problem.

2.1 Need to write to variables outside of for

Since some processing is performed in the loop, the processing result is usually output in some form. If IO, which is not a variable such as input / output to the screen, is not taken into consideration, the output will rewrite the variable defined before the loop. In other words, the code looks like this:

var output = new ArrayList<ExampleData>();
// ※1
for (var item: inputList){
  output.add(new ExampleData(item));
}
// ※2

The problem is that you can reassign or change `` `output```. In other words, you need to worry about the following.

--The result of another process described in the process written before for (* 1) is in `output``` --It is possible that ``` output``` is passed from outside the method. -- `output``` is rewritten by the process written after for (* 2)

In other words, it is necessary to review before and after for for `` `output```.

It would be nice if the processing was cut out only with this for When developing with multiple people, I don't think it's preferable to be able to write like this. I think that rewriting and reusing the same variable will reduce maintainability and readability.

Another issue is the educational impact on implementers. Since the normal flow of data is to write to a variable outside for when using for, You may get the habit of writing code with ambiguous inputs and outputs.

By using filter or `` `map``` described later, you can force [^ 4] to describe the process as input / output for each element of the array (or list). I will.

2.2. I can write more than one thing I want to do

First, here is a code example.

var filteredUserIdList = new ArrayList<Id>();
var existUserError = false;
for (var user : userList){
  if(user.hasError()){
    existUserError  = true;
  }
  
  if(filterMatch(user)){
    filteredUserIdList.add(user.id);
  }
}

This code seeks the following results:

--Whether any one of the user list data has an error ( `existUserError```) --List of user IDs that match in the filtering process ( `filterMatch```) in the user list data

The above may not be a good example as I wrote something that came to my mind. This example is written with the intention of telling you that there is a mixture of processes that are not very related to each other other than "using the same input data userList```".

Once written in for, in my experience, the internal processing often became bloated unless there was a person with a strong intention to refactor. In other words, many intentional processes are written inside the same for, and it becomes a huge for. From the standpoint of maintainability, even if I write it in for, I want it to be divided into multiple fors according to the intention of the process ... [^ 5].

2.3. `continue```/ `break``` is difficult to handle

As is widely known, `continue``` and `break``` are statements used to control loops. When writing for, the following situations are common.

--I want to end the loop and go to a special process when there is even one element in the array that causes an abnormal condition. --If one element of the array is incomplete, you want to go to the next element.

The following code is shown as an example.

for (var item : list){
  if(item.hasError()){
    //There is an element with an error → Add an error message and exit
    errorList.add(new Error("error!"));
    break;
  }
  if(item.hasNoData()){
    //If there is no data, go to the next element processing
    continue;
  }
  ...
}

This has the following problems:

--Intention cannot be read directly from `continue```/ break``` --Intentions such as "when there is even one element in the array that causes an abnormal condition" cannot be conveyed without comment. --You need to be careful about the processing order of `` continue/`` `break --For example, if you want to process all the elements of an array and you have described the process under continue/`` `break```, the described process. Is no longer guaranteed to work for all elements of the array

These problems are combined with the problem of 2.2 "I can write more than one thing I want to do". I think that it tends to be a code with a high degree of difficulty for human beings to follow the process and write it as intended.

In the case of development where the specifications are fluffy and the specifications are fixed at the same time as the implementation "If there is even one error in the array, *** only one *** error output" "Abnormal array elements *** Error output for each ***" In cases where specifications such as continueShould I usebreakIt obscures the point of whether to use It may become a technical debt later. In other words, the problem is that the implementer may not notice the discrepancy between the specification (what is intended) and the implementation. This may (or was) exposed when writing unit tests in a process that previously had no unit tests.

Alternatives to for (map, filter, reduce, etc.)

What you want to do with for can be replaced by using the following processing depending on the situation. Not all of the above issues can be solved, but they can be improved to some extent. Although it is written with the name in Java or JavaScript, there are similar ones in other languages ​​(method names may be different).

I think that some people may not be able to get it just by the above, so an example is shown below.

The code of ① can be rewritten to ②.


var filteredUserIdList = new ArrayList<Id>();
for (var user : userList){
  if(!filterMatch(user)){
    continue;
  }
  filteredUserIdList.add(user.id);
}


final var filteredUserIdList = userList.stream()
    .filter(u -> filterMatch(u))
    .map(u -> u.id)
    .collect(Collections.toUnmodifiableList());

We hope that you will feel that the first, if not complete, problem in the above example has been resolved.

By the way, once you get used to it, you can write continuous processing clearly with a method chain.

I'm guessing that people with little work experience tend to use for even when they don't have to write it (probably also applies to myself in the past). It may be good to tell this area as know-how before starting business.

important point

--Be careful about the order of processing. It may not be possible to simply replace it. ――Of course, the internal behavior and behavior differ depending on the programming language itself and the version of the language, so please check the specifications before using. --In the case of Java, exceptions and Stream do not seem to be compatible, so it is necessary to consider depending on the processing to be performed.

It is not necessary to avoid it depending on the purpose

Again, the above content is from my experience writing the code for the business logic part of the screen and API server.

Even in the Web industry, if you want to write a process that emphasizes data structure for performance (speed / memory constraints) (low-layer code), you may use for. In addition, there are cases where you can only choose to write with for due to performance, environment, language restrictions such as embedded environment. This article states that you want to avoid for for maintainability and readability, but it depends on your purpose.

Also, if the development members can not read the intention, it will not be possible to maintain before the merit of avoiding for. I'm wondering if it's too unreasonable to avoid it.

I've been in the embedded industry in the past, so I'd like to see more industries that don't have to fight the bug-proneness of for due to future advances in language and environment and selection.

[^ 1]: Readability in this article means that the intent is conveyed to the readers of the code (future self, development members). [^ 2]: If you write maliciously even in processing such as map and filter, you can write badly behaved code described later syntactically. The meaning of a high degree of freedom is that if you allow the use of for, you tend to tolerate a bad manner at the same time. [^ 3]: Maybe someone who understands to some extent can write with reduce or filter. [^ 4]: I've seen badly behaved code that rewrites external variables with map only for notational coercion, so it is necessary to stop it by code review etc. [^ 5]: Depending on the environment, it may be intentionally made into one for due to restrictions such as performance.

Recommended Posts

Reasons to avoid for and map, filter, reduce, etc.
Use swift Filter and Map
[Java] Stream (filter, map, forEach, reduce)
[Java] Corrective notes for "Invalid escape character" and "Cannot map to encoding MS932"
Deploy to Heroku (for second and subsequent edits)