[Java] I tried to measure and compare the speed of GraalVM with JMH

9 minute read

Environment

  • macOS Mojava 10.14.4
  • Scala 2.13.1
  • sbt 1.3.8
  • GraalVM 20.1

The rough problem of the previous measurement

Previous article and compared the speed of GraalVm and standard JVM.

However, since the compile time, the warm-up time, and the pure execution time were mixed and measured, the pure speed comparison of Graal itself could not be done.

What I want to do this time is to compare the speed of Graal of GraalVM and C2 compiler of standard JVM.

Therefore, the pure execution time is compared by distinguishing between compilation (class file generation), warm-up, and pure execution time.

Specific problems

1. The compilation (class file generation) time was also included.

When executing with sbt, generate a mixed class file before executing and measure.

$ sbt compile
$ sbt run

2. Warm-up processing and post-compilation processing were messed up.

As described in this article,theJVMfirstexecutestheclassfileontheinterpreter(warm-up) and gathers the information necessary for compilation. The JIT compiler is designed to compile the target method.

**To measure the pure execution speed of Graal or C2 compiler, you have to warm up the target method -> measure the speed after compilation. **

According to this book (https://amzn.to/30GHzYk), the threshold for switching from the JIT compiler’s interpreter to the JIT compiler is 8000 times. In other words, if a method is executed (warmed up) 8000 times or more on the target, that method will be compiled by the JIT compiler, so the speed of the compiled code will be measured.

Speed measurement using JMH

Last time, I measured with the /usr/bin/time command. However, since pure speed comparison cannot be performed with that, JMH is used to output an accurate numerical value.

sbt library is available, so I used it.

Code to measure

A program that reads English sentences from a text file and spits out the top 10 most frequently used words.

├── resources
│  └── GraalTest.txt
└── scala
    └── graal
        └── Performance.scala

text file. I brought a proper English sentence.

Graal.txt


When people talk about Japan, they would always think about how innovative and technological this country gets! Or how pretty and neat the country is! Last but not the least, fashion, Cosplay and hype beast were always a big thing in the city of Japan .Coming to Japan with the intention of tourism would have been a great experience. Different culture.You can find a lot of unique things they sell in Japan! But as you live in Japan, you interact with the locals and everything doesn't seem. the way you thought of Japan.

The size is 525B.

$ll
- rw-r--r-- 1 kinsho staff 525B 6 14 10:18 GraalTest.txt

In the JMH library, you can specify measurement options with annotations. The settings are as follows.

  • ** Number of warm-ups ** Execute 10000 times once for the target method. As mentioned above, if the number of executions exceeds 8,000 times, the target method will be compiled by the JIT compiler, so warm up 10,000 times for the time being.

  • Number of executions Execute 10,000 times 30 times for the target method. Official document wrote that 10 to 20 iterations yield realistic numbers.

Performance.scala



package graal

import java.util.concurrent.TimeUnit

import org.openjdk.jmh.annotations._

import scala.annotation.tailrec
import scala.io.BufferedSource

// specify the number of forks
// default is 5
@Fork(1)
@Warmup(iterations = 1, batchSize = 10000)
@Measurement(iterations = 30, batchSize = 10000)
// measure the execution time per time
@BenchmarkMode(Array(Mode.SingleShotTime))
// unit of measure is ms
@OutputTimeUnit(TimeUnit.MILLISECONDS)
class GraalTest {

  // make this method a measurement target
  @Benchmark
  def run = {
    val fileName: String = "GraalTest.txt"
    val encode: String = "UTF-8"
    val source: BufferedSource = scala.io.Source.fromResource(fileName)
    val lines: Iterator[String] = source.getLines()
    val sortedTextList = lines.toList.mkString(" ").split(" ").sorted.toList
    val value = createMap(sortedTextList)
    val top10Words = value.toList.sortBy(_._2).reverse.take(10)
  }

  private def createMap(wordList: List[String]): Map[String, Long] = {
    @tailrec
    def loop(list: List[String], acc: Map[String, Long]): Map[String, Long] = {
      list match {
        case head :: tail if acc.isEmpty => {
          loop(tail, acc + (head -> 1L))
        }
        case head :: tail => {
          acc.get(head) match {
            case Some(value) => {
              loop(tail, acc.updated(head, value + 1L))
            }
            case None => {
              loop(tail, acc + (head -> 1L))
            }
          }
        }
        case head :: Nil => {
          acc.get(head) match {
            case Some(value) => {
              acc.updated(head, value + 1L)
            }
            case None => {
              acc + (head -> 1L)
            }
          }
        }
        case Nil => acc
      }
    }
    loop(wordList, Map.empty[String, Long])
  }
}


Run on standard JVM

First, compile

sbt:jitCompiler> jmh:compile
[info] Compiling 3 Scala sources to /Users/kinsho/workspace/loginfrastructure/jitCompiler/target/scala-2.13/classes ...
...
[info] Compiling 5 Java sources to /Users/kinsho/workspace/loginfrastructure/jitCompiler/target/scala-2.13/classes ...
[success] Total time: 5 s, completed 2020/06/14 12:08:28

Actually do it. From [info] # VM version: JDK 1.8.0_232, OpenJDK 64-Bit Server VM, 25.232-b09, you can see that you are running using standard JVM.

sbt:jitCompiler> jmh:run
[info] running (fork) org.openjdk.jmh.Main
[info] # JMH version: 1.21
[info] # VM version: JDK 1.8.0_232, OpenJDK 64-Bit Server VM, 25.232-b09
[info] # VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
[info] # VM options: <none>
[info] # Warmup: 1 iterations, single-shot each, 10000 calls per op[info] # Measurement: 30 iterations, single-shot each, 10000 calls per op
[info] # Timeout: 10 min per iteration
[info] # Threads: 1 thread
[info] # Benchmark mode: Single shot invocation time
[info] # Benchmark: graal.GraalTest.run
[info] # Run progress: 0.00% complete, ETA 00:00:00
[info] # Fork: 1 of 1
[info] # Warmup Iteration   1: 1075.011 ms/op
[info] Iteration   1: 555.518 ms/op
[info] Iteration   2: 634.679 ms/op
[info] Iteration   3: 552.381 ms/op
[info] Iteration   4: 687.170 ms/op
[info] Iteration   5: 538.699 ms/op
[info] Iteration   6: 537.693 ms/op
[info] Iteration   7: 622.123 ms/op
[info] Iteration   8: 801.559 ms/op
[info] Iteration   9: 611.831 ms/op
[info] Iteration  10: 567.022 ms/op
[info] Iteration  11: 590.506 ms/op
[info] Iteration  12: 666.859 ms/op
[info] Iteration  13: 564.290 ms/op
[info] Iteration  14: 552.446 ms/op
[info] Iteration  15: 713.195 ms/op
[info] Iteration  16: 781.514 ms/op
[info] Iteration  17: 685.628 ms/op
[info] Iteration  18: 570.931 ms/op
[info] Iteration  19: 557.543 ms/op
[info] Iteration  20: 559.926 ms/op
[info] Iteration  21: 677.823 ms/op
[info] Iteration  22: 551.438 ms/op
[info] Iteration  23: 564.687 ms/op
[info] Iteration  24: 560.043 ms/op
[info] Iteration  25: 553.672 ms/op
[info] Iteration  26: 548.359 ms/op
[info] Iteration  27: 777.705 ms/op
[info] Iteration  28: 794.434 ms/op
[info] Iteration  29: 917.850 ms/op
[info] Iteration  30: 553.160 ms/op
[info] Result "graal.GraalTest.run":
[info]   N = 30
[info]   mean =    628.356 ±(99.9%) 67.159 ms/op
[info]   Histogram, ms/op:
[info]     [ 500.000,  550.000) = 3
[info]     [ 550.000,  600.000) = 14
[info]     [ 600.000,  650.000) = 3
[info]     [ 650.000,  700.000) = 4
[info]     [ 700.000,  750.000) = 1
[info]     [ 750.000,  800.000) = 3
[info]     [ 800.000,  850.000) = 1
[info]     [ 850.000,  900.000) = 0
[info]     [ 900.000,  950.000) = 1
[info]   Percentiles, ms/op:
[info]       p(0.0000) =    537.693 ms/op
[info]      p(50.0000) =    568.977 ms/op
[info]      p(90.0000) =    793.142 ms/op
[info]      p(95.0000) =    853.890 ms/op
[info]      p(99.0000) =    917.850 ms/op
[info]      p(99.9000) =    917.850 ms/op
[info]      p(99.9900) =    917.850 ms/op
[info]      p(99.9990) =    917.850 ms/op
[info]      p(99.9999) =    917.850 ms/op
[info]     p(100.0000) =    917.850 ms/op
[info] # Run complete. Total time: 00:00:26
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark      Mode  Cnt    Score    Error  Units
[info] GraalTest.run    ss   30  628.356 ± 67.159  ms/op
[success] Total time: 26 s, completed 2020/06/14 12:26:25

標準JVMでの結果

結果がズラーっと記載されているが、まとめると、

ウォームアップ時間:1075.011 ms/op 純粋な速度:628.356 ms/op

ms/opは、対象メソッド一回の実行に対して、どれくらい時間がかかったのかを表す単位。 純粋な速度よりも、ウォームアップ時間の方が時間がかかっているので、10000回のウォームアップ後、正常にコンパイルされていることもわかる。

GraalVMで実行する

今度は本命、GraalVMで測定! まず、コンパイルする。

sbt:jitCompiler> jmh:compile
[info] Compiling 3 Scala sources to /Users/kinsho/workspace/loginfrastructure/jitCompiler/target/scala-2.13/classes ...
...
[info] Compiling 5 Java sources to /Users/kinsho/workspace/loginfrastructure/jitCompiler/target/scala-2.13/classes ...
[success] Total time: 5 s, completed 2020/06/14 12:18:06

実行する。

```shell

sbt:jitCompiler> jmh:run Processing 9 classes from /Users/kinsho/workspace/loginfrastructure/jitCompiler/target/scala-2.13/classes with “reflection” generator Writing out Java source to /Users/kinsho/workspace/loginfrastructure/jitCompiler/target/scala-2.13/src_managed/jmh and resources to /Users/kinsho/workspace/loginfrastructure/jitCompiler/target/scala-2.13/resource_managed/jmh [info] running (fork) org.openjdk.jmh.Main [info] # JMH version: 1.21 [info] # VM version: JDK 1.8.0_252, OpenJDK 64-Bit Server VM GraalVM CE 20.1.0, 25.252-b09-jvmci-20.1-b02 [info] # VM invoker: /Library/Java/JavaVirtualMachines/graalvm-ce-java8-20.1.0/Contents/Home/jre/bin/java [info] # VM options: [info] # Warmup: 1 iterations, single-shot each, 10000 calls per op [info] # Measurement: 30 iterations, single-shot each, 10000 calls per op [info] # Timeout: 10 min per iteration [info] # Threads: 1 thread [info] # Benchmark mode: Single shot invocation time [info] # Benchmark: graal.GraalTest.run [info] # Run progress: 0.00% complete, ETA 00:00:00 [info] # Fork: 1 of 1 [info] # Warmup Iteration 1: 1012.169 ms/op [info] Iteration 1: 546.073 ms/op [info] Iteration 2: 584.707 ms/op [info] Iteration 3: 547.958 ms/op [info] Iteration 4: 662.077 ms/op [info] Iteration 5: 549.202 ms/op [info] Iteration 6: 490.277 ms/op[info] Iteration 7: 620,376 ms / op [info] Iteration 8: 780,583 ms / op [info] Iteration 9: 576,626 ms / op [info] Iteration 10: 508.686 ms / op [info] Iteration 11: 543.451 ms / op [info] Iteration 12: 589.145 ms / op [info] Iteration 13: 515,498 ms / op [info] Iteration 14: 524.002 ms / op [info] Iteration 15: 717,083 ms / op [info] Iteration 16: 839.723 ms / op [info] Iteration 17: 523.987 ms / op [info] Iteration 18: 519,271 ms / op [info] Iteration 19: 525.304 ms / op [info] Iteration 20: 530,935 ms / op [info] Iteration 21: 634,322 ms / op [info] Iteration 22: 545,711 ms / op [info] Iteration 23: 513,338 ms / op [info] Iteration 24: 518,737 ms / op [info] Iteration 25: 519,130 ms / op [info] Iteration 26: 575.139 ms / op [info] Iteration 27: 745,525 ms / op [info] Iteration 28: 774.832 ms / op [info] Iteration 29: 830,160 ms / op [info] Iteration 30: 549,396 ms / op [info] Result "graal.GraalTest.run": [info] N = 30 [info] mean = 596,708 ± (99.9%) 68,925 ms / op [info] Histogram, ms / at: [info] [400,000, 450,000] = 0 [info] [450,000,500,000) = 1 [info] [500,000, 550,000] = 16 [info] [550,000, 600,000] = 4 [info] [600,000,650,000) = 2 [info] [650,000, 700,000) = 1 [info] [700,000, 750,000) = 2 [info] [750,000, 800,000) = 2 [info] [800,000, 850,000) = 2 [info] Percentiles, ms / at: [info] p (0.0000) = 490.277 ms / op [info] p (50.0000) = 548.580 ms / op [info] p (90.0000) = 780.008 ms / op [info] p (95.0000) = 834.463 ms / op [info] p (99.0000) = 839.723 ms / op [info] p (99.9000) = 839.723 ms / op [info] p (99.9900) = 839.723 ms / op [info] p (99.9990) = 839.723 ms / op [info] p (99.9999) = 839.723 ms / op [info] p (100.0000) = 839.723 ms / op [info] # Run complete. Total time: 00:00:25 [info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on [info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial [info] experiments, perform baseline and negative tests that provide experimental control, make sure [info] the benchmarking environment is safe on JVM / OS / HW level, ask for reviews from the domain experts. [info] Do not assume the numbers tell you what you want them to tell. [info] Benchmark Mode Cnt Score Error Units [info] GraalTest.run ss 30 596.708 ± 68.925 ms / op [success] Total time: 26 s, completed 2020/06/14 12:22:50 ``

GrailVm で の 結果

今 回 も 結果 が ズ ラ ー っ と 記載 さ れ て い る が 、 ま と め る と 、

** ウ ォ ー ム ア ッ プ 時間 : 1012.169 ms / at ** ** 純 粋 な 実 行 時間 : 596,708 ms / at **

全体 結果

測定 結果 、 Grail の 方 が 速 い こ と が わ か っ た。 (こ の 計 測 方法 が 正 し け れ ば。 あ と 、 考慮 す る べ き 点 が ま だ ま だ あ り そ う う そ う う そ そ そ う そ

参考

  • https://bufferings.hatenablog.com/entry/2018/10/14/232631
  • https://www.ibm.com/developerworks/jp/java/library/j-benchmark1.html
  • https://github.com/melix/jmh-gradle-plugin/issues/91
  • https://labs.septeni.co.jp/entry/2015/08/09/045258
  • https://qiita.com/taketora/items/0cee66b5d5552285f5de