[JAVA] How the JVM JIT compiler works

I will summarize what I learned about the mechanism of the JVM's JIT compiler.

environment

macOS Mojava 10.14.4
AdoptOpenJDK 1.8
- 64bit
Scala 2.13.1

~/workspace/$ java -version
openjdk version "1.8.0_222"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_222-b10)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.222-b10, mixed mode)

What is JIT Comparite?

Just In Time compiler, which is a compiler implemented on the JVM. Just In Time, a compiler that compiles "what you need, when you need it".

Flow until execution on JVM

Roughly speaking, the following flow

Source code->compile->Intermediate code->Execute (compile)

Source

The JVM language does not compile all the source code of interest when you run the compilation (javac for java, sbt compile for scala, etc.). First, create an intermediate code. (java bytecode). By sandwiching the process of generating this intermediate code, the same code can be executed on any OS as long as there is a JVM environment. (JVM needs to include the one suitable for each OS)

After that, the created intermediate code is not compiled (converted to native code) at once. The interpreter compiles the source code each time it is executed. (Java interpreter in the above figure)

There are two reasons for this.

1. Compile and waste compilation time if the code is used only once

Compiling takes time, so for code that is called only once, executing it in the interpreter will reduce the total time to execute. On the other hand, for frequently called code, compiled code is faster to execute and should be compiled. The discussion of the threshold value on the JVM of whether to execute with an interpreter or to compile and execute will be described later.

2. You can collect the information available when compiling.

You can get the information you need to compile when you run it in the interpreter. Various optimizations can be performed at the time of compilation by using the acquired information. This optimization makes a difference in execution time even within this compiled code.

For example, consider the ʻequals () method`

I have the following code.

`test.scala`


val b = obj1.equals(obj2)

When the interpreter reaches the ʻequals () method, it becomes necessary to search whether the ʻequals () method is the method defined in obj1 or the method of the String object. With only an interpreter, it would be a waste of time to search each time the ʻequals ()` method is reached.

If the interpreter determines that obj1 is a method of the String object, it compiles the ʻequals () `method as a method of the String object. It compiles and eliminates the time it spends exploring during the interpreter, resulting in faster code.

In this way, the JIT compiler does not compile the code immediately because it cannot be optimized without executing and looking at the code.

Three types of JIT compiler

There are three types of JIT compilers. Starting with java8, the third hierarchical compiler is set by default.

Client compiler (C1)

Compile at an early stage

Server compiler (C2)

Gather information about the behavior of your code before compiling. As mentioned above, it is optimized and compiled, so it is faster than the client compiler.

Hierarchical compiler

A collection of client compiler and server compiler. In the early stages, it is compiled in C1 and when the optimization information is gathered (when the code gets hot), it is compiled in C2.

Verification

You can check the set compiler with java -version In my case

JVM version8
64bit --Server compiler (hierarchical compiler)

~/workspace/$ java -version
openjdk version "1.8.0_222"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_222-b10)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.222-b10, mixed mode)

Each compiler should be used properly depending on the application to be created. For example, if you are running a GUI application on a JVM, you should use a client compiler because it is better for UX to have a faster initial access time than to increase the processing speed as you use it.

Select the compiler according to the application and what you want to realize.

Compiled threshold

I mentioned earlier that bytecode is first executed in an interpreter. Then, at what timing can you switch from the interpreter to the JIT compiler? There are two thresholds

Call counter (standard compilation)

Number of times the target method was called

When this number exceeds the threshold, the target method is queued for compilation and compiled.

Bag edge counter

The number of times processing returns from the code in the loop within the method

When this number exceeds the threshold, the loop itself is the target of compilation and is compiled. The compilation performed at that time is called OSR compilation. When the compilation is complete, it is replaced with the code compiled on the stack, and the compiled code is executed from the next process.

tuning

The above counter thresholds are different between the client compiler and the server compiler. You have to tune this threshold properly.

Example

Lowering the threshold for standard compilation with the server compiler reduces the amount of information required for compilation, making it difficult to optimize and resulting in slow compilation code.

Still, it also has the benefit of lowering the threshold.

1. Warm-up time can be shortened a little

That's right.

2. Code that does not compile at high thresholds will also compile

In this regard, it is likely that the call counter / bag edge counter threshold will eventually be reached as the code continues to run. However, the counter value is subtracted over time.

As mentioned above, you have to tune properly.

Actually tune

Observe and tune the behavior of the JIT compiler with the following code

You can specify the jvm option in .jvmopts as shown below.

`.jvmopts`


-XX:+PrintCompilation
-XX:CompileThreshold=1

-XX:+PrintCompilation

It spits out the compile log as shown below

The format is

Timestamp compile ID attribute method name size deoptimized


$ sbt run
41    1       3       java.lang.Object::<init> (1 bytes)
42    2       3       java.lang.String::hashCode (55 bytes)
44    3       3       java.lang.String::charAt (29 bytes)
45    4       3       java.lang.String::equals (81 bytes)
45    5     n 0       java.lang.System::arraycopy (native)   (static)
45    6       3       java.lang.Math::min (11 bytes)
45    7       3       java.lang.String::length (6 bytes)
52    8       1       java.lang.Object::<init> (1 bytes)
52    1       3       java.lang.Object::<init> (1 bytes)   made not entrant
53    9       3       java.util.jar.Attributes$Name::isValid (32 bytes)
53   10       3       java.util.jar.Attributes$Name::isAlpha (30 bytes)
・ ・ ・ ・

XX:CompileThreshold=1000

You can specify how many times the method loop is executed before it is compiled.

Try it with the following code

`Test.scala`


object Test extends App{
  def compileTest() = {
    for (i <- 0 to 1000) {
      sampleLoop(i)
    }
  }

  def sampleLoop(num: Int) = {
    println(s"loopppp${num}")
  }

  println(compileTest())
}

`.jvmopts`


-XX:+PrintCompilation
-XX:CompileThreshold=1

result

Since I set -XX: CompileThreshold = 1, I can confirm that the compileTest method is compiled by executing this code once. Also, the sampleLoop method is also a loop, so it is compiled.

9983 9336       3       Test$$$Lambda$3666/873055587::apply$mcVI$sp (5 bytes)
9983 9338       3       Test$::sampleLoop (1 bytes)
9983 9337       3       Test$::$anonfun$compileTest$1 (8 bytes)
9984 9334       4       java.lang.invoke.MethodType::makeImpl (66 bytes)
9986 9339   !   3       scala.Enumeration$Val::toString (55 bytes)
・ ・ ・

The compileTest method is compiled 9 seconds after the JVM starts.

For example, what about the following settings?


object Test extends App{
  def compileTests() = {
    for (i <- 0 to 10) { //Change to 10 loops
      sampleLoop(i)
    }
  }

  def sampleLoop(num: Int) = {
    println(s"loopppp${num}")
  }

  println(compileTests())
}

`.jvmopts`


-XX:+PrintCompilation
-XX:CompileThreshold=100 #Change threshold to 100 times

If you set -XX: CompileThreshold = 100 etc., the compileTest method will not compile just by executing the above code once. Also, the sampleLoop method is not executed because it is not executed 100 times.

Summary

It is easy to understand if you actually look at the JIT compilation process.

reference

-java Performance

https://www.oracle.com/webfolder/technetwork/jp/javamagazine/Java-MA16-JIT.pdf