I think that it is very common for some people, but in practice it often hits problems, and when it comes to Java tuning, there are many infrastructure-oriented tuning methods such as changing GC policies, so in the application I will mainly describe the TIPS-like things that can be realized (from what I remembered ...).

Performance problem case

Abuse of string concatenation

--Problem: OutOfMemoryError, slowdown --Cause: Mass generation of String strings --Solution: Compile with the latest JDK, use variable length string editing classes (StringBuffer, StringBuilder), etc.

** Details ** Specifically, the description is as follows. I often see it.

String Concatenated string= "";
Concatenated string= Concatenated string +Additional character string 1;
Concatenated string= Concatenated string +Additional character string 2;
Concatenated string= Concatenated string +Additional character string 3;
Concatenated string= Concatenated string +Additional character string 4;
System.out.print(Concatenated string);

(Additional strings 1 to 4 are String type variables. In the first place, I think that + = can be used, but there are quite a lot of people who write this way.)

A few joins are fine, but even with 5 or 6 joins, problems may occur depending on the number of executions.

In the above case, since an instance of the String class is created for each concatenation, a large number of small Java objects are generated, which affects the number of GC occurrences. (Increased number of GC occurrences ⇒ Performance deterioration) Also, if a large number of instances are generated by a loop statement, the number of instances may eat up the Native area of Java Heap. (Whether this happens depends on the JVM implementation)

The remedy is as follows.

String Concatenated string= "";
StringBuilder sb = new StringBuilder();
sb.append(Concatenated string)
  .append(Concatenated string 1)
  .append(Concatenated string 2)
  .append(Concatenated string 3)
  .append(Concatenated string 4);
System.out.print(sb.toString());

(Although the generation cost of StringBuilder increases, lowering the generation cost of String is generally more effective)

-** Recent Java seems to often optimize for the above code by compile-time optimization **. Tuning around here may not be necessary in the future (although it seems that it may not be optimized yet depending on the writing style etc.) --If you need to make the implementation thread-safe, use StringBuffer instead of StringBuilder. (StringBuffer is a little heavier because it supports thread safety)

--If the number of joins is known in advance below a certain level, setting the initial value of StringBuilder may improve the performance.

Abuse of heavy API (mainly Date type)

--Problem event: Speed deterioration --Cause: Mass generation of Date --Solution: Reduce the number of executions

** Details ** As a type often used in business applications, I think that the next to the character string is a numerical value or a date, but the problem often occurs with the date. This is because in the case of Java, the generation cost is heavy because there is access to the operating system when the Date is New.

However, since date systems are often handled by utility classes and date generation (new Date) is often performed within the utility, if a large number of date items are handled in a business application with a large number of items, it will be automatically repeated many times. It may result in a new date and poor performance.

As a personal experience, for example, there are the following cases.

--In the online application, the total number of items is thousands, but the date items are at the hundreds level, and the performance is poor for a few seconds due to the initialization of this date item. --The batch application handles millions of data, and when updating the business date for each data, the performance is poor at the level of several hours due to system date generation (Date generation) as the base date.

For example, the solution in this case is as follows.

--Add Date as a system date to the method of date utility class so that Date is generated outside the utility instead of inside. This will limit the number of Date generations to one if the same system date is acceptable. ――It is difficult to add arguments, but in reality, if there is a lot of bias in the input / output variations of the utility, the utility class will have a cache mechanism, and if the arguments to the utility are the same, information will be output from the cache. Suppress New Date (assuming a case where 3 patterns of dates are generated based on the product code etc. based on the system date) --In the first place, if you don't need to use the date utility function each time, stop it ... (In the case where you can set the same date for multiple items, but the utility generates it every time)

Slow down of external communication (mainly external systems such as hosts and DBs)

--Problem event: Insufficient resources of external communication destination, speed deterioration --Cause: Inappropriate for external communication destination? Request is flying --Solution: Reduce the number of executions by caching

** Details ** When acquiring user information or contract information from the host (legacy system), the host side may not be able to get back if it is acquired each time. In this case, it is acquired only once when logging in, and the information is retained in memory. I think that it is something that is done in a normal system, but sometimes when only specific processing is omitted or new communication occurs, it may become a problem if such consideration is omitted. .. ..

Similarly, for DB communication, things that are considered to be immutable during login, such as master data, may be cached. There are two patterns for caching, one is to simply store it in session data and have it for each user, and the other is to keep it in a singleton class and have it in common for all users.

These have the trade-off of increased memory usage, so you need to consider the cache amount separately. Also, in the past, it often became a DB server bottleneck, but recently, the performance of DB servers is getting better and better, and I think that there are many cases that have a cache mechanism such as JPA in advance.

Heavy low-level IO load such as file operations

--Problem event: OutOfMemoryError occurs, which is very slow when dealing with huge files etc. --Cause: The API used is inappropriate, the amount of data handled is large, and the JVM overhead is fatal. --Solution: Optimize the API used, leave it to the OS processing

** Details ** Mainly in Java batch, there are cases such as performance problems when handling files with data volume of several hundred M or several G or more, and cases where OutOfMemoryError cannot be processed in the first place.

The following cases can be considered as countermeasures.

--In the first place, if you are processing line by line with LeadLine etc., consider batch processing (in buffer units) with stream class (including nio class etc.). In this case, you may need to tune the buffer size, but if this solves it, it's okay (the original code is also called the problem). --If the above is not fast enough, you may be able to solve it dramatically by executing OS commands from within Java. For example, if you want to simply concatenate the B file after the A file, the code will be as follows (using the cat command).

Runtime runtime = Runtime.getRuntime();
runtime.exec("cat A file B file>Concatenated file");

(In addition to concatenation, you can use the diff command for file comparison, awk for string replacement, etc.)

The problem with this method is that it is OS-dependent, which imposes restrictions on portability. I would like to apply it to the minimum number of places, not to apply it to all places from the beginning. (If the test environment is Windows, the production is UNIX, etc., it is necessary to be able to return commands for each environment in the relevant implementation location, etc.)

--In the first place, the corresponding process is not a Java batch but a shell batch ...

Unnecessary synchronization processing

--Problem: Performance degradation when accessing multiple users due to Java synchronization processing --Cause: Inappropriate range of synchronization processing is implemented --Solution: Narrow the synchronization process to an appropriate range

** Details ** There is a method that holds a synchronization process (synchronize block) to handle singleton etc. in a utility class etc., and when you make another method by simply copying this, even though it is a method that does not require synchronization process The synchronize block is described. Or it's too secure, the synchronize block is too wide, and locks are very likely to occur. .. ..

The correspondence is simply as follows.

--Remove unnecessary synchronize blocks --Squeeze an appropriately wide synchronize block to an appropriate range

** However, if you narrow the synchronize block too much, it will become a thread-safe problem, so you need to be very careful. ** If there is no problem, it is also a good idea not to respond even if there is an unnecessary description. ..

Asynchronization of processing

--Problem event: The processing is very heavy, but it cannot be dealt with by a small hand. However, there are no synchronous processing requirements. --Solution: Shift the processing timing

** Details ** Since it will be a change at the architecture level, I would like to avoid going this far, but I think that it may actually be implemented based on the performance test results.

--Processing is shifted forward (intermediate DB is generated in batch in advance with the cache timing of master data as the JVM startup timing) --Shift the processing to the back (use a messaging system, etc., perform asynchronous processing by creating a separate thread, give up online processing, and batch some processing)

Processing multiplexing

--Problem event: The processing is very heavy, but the processing can be parallelized. In addition, there is a margin in terms of server resources. --Solution: Multiplex processing

** Details ** I'm assuming a batch, but I think there are some cases where it is implemented online in combination with asynchronization. For applications with multiplicity assumptions, we only search for the optimum multiplicity, but when dealing with applications without multiplicity assumptions, this is also a change at the architecture level, so it will be a major modification. .. ..

Identifying performance issues

Information gathering in logs

Check which processing is heavy, including the application itself, the access log of the WEB server, and the log of the external communication destination. Basically, I think that log confirmation is the highest priority. (This is the most difficult part because you can't log in production, but ...)

GC log h

This is also a log, but in the case of Java, performance degradation in GC is the most common case, so it is special. I think it depends on the JVM, but in general, the GC frequency, its type (with or without Full GC), GC time, etc. are the points to check.

Information gathering at the OS level

I think that a performance monitor is prepared for each OS, so check the CPU, memory, OS access, etc. In the case of Java applications, there should be no OS memory bottleneck, so it is mainly to identify whether it is a CPU bottleneck. Even if the CPU is 100%, the CPU usage may rise due to the frequent occurrence of GC due to lack of memory, so this alone does not tell us anything, but ...

Information gathering at the JVM level

If you can get a thread dump, you can check the internal state of Java by getting a thread dump in the performance problem state. For string-related problems, the majority of the number of objects in the Heap is occupied by String strings, and for synchronous processing problems, many methods with a synchronize clause in thread information are occupied. , And so on. In addition, depending on the JVM, it may be possible to obtain detailed information on Java object information.

Java app performance tuning