I tried to measure and compare the speed of GraalVM with JMH

environment

Rough problem with the previous measurement

[Previous article](https://qiita.com/kinshotomoya/items/39a821dd6a6a52202c0a#%E5%AE%9F%E9%9A%9B%E3%81%AB%E4%BD%BF%E3%81%A3 % E3% 81% A6% E3% 81% BF% E3% 82% 8B), and we compared the speed of GraalVm and standard JVM.

However, since the compile time, warm-up time, and pure execution time were mixed and measured, it was not possible to make a pure speed comparison of Graal itself.

What I want to do this time is speed comparison between Graal of GraalVM and C2 compiler of standard JVM.

Therefore, compile (class file generation), warm-up, and pure execution time are distinguished from each other, and pure execution time is compared.

Specific problems

1. Compile (class file generation) time was also included.

When executing with sbt, before executing, generate a mixing class file and then measure.

$ sbt compile
$ sbt run

2. The warm-up process and the post-compile process were measured in a mess.

As I wrote in this article, the JVM first executes (warms up) the class file on the interprinter and collects the information necessary for compilation. The JIT compiler is designed to compile the target method.

** To measure the pure execution speed of Graal or C2 compiler, you have to warm up the target method-> measure the speed after compilation. ** **

According to this book, the threshold for switching from the JIT compiler's interprinter to the JIT compiler is 8000 times. In other words, if you execute (warm up) a method 8000 times or more, the method will be compiled by the JIT compiler, so you will measure the speed of the compiled code.

Measure speed using JMH

Last time, I measured with the / usr / bin / time command. However, since it is not possible to make a pure speed comparison, JMH is used to obtain accurate values.

There is sbt library, so I used it.

Code to measure

A program that reads English sentences from a text file and spits out the top 10 frequently used words.

├── resources
│   └── GraalTest.txt
└── scala
    └── graal
        └── Performance.scala

text file. I brought a suitable English sentence.

Graal.txt


When people talk about Japan, they would always think about how innovative and technological this country gets! Or how pretty and neat the country is! Last but not the least, fashion, Cosplay and hype beast were always a big thing in the city of Japan. Coming to Japan with the intention of tourism would have been a great experience. Different culture. You can find a lot of unique things they sell in Japan! But as you live in Japan, you interact with the locals and everything doesn’t seem the way you thought of Japan.

The size is 525B.

$ ll
-rw-r--r--  1 kinsho  staff   525B  6 14 10:18 GraalTest.txt

In the JMH library, measurement options can be specified with annotations. It was set as follows.

-** Warm-up count ** Execute 10000 times for the target method once. As mentioned above, if the execution exceeds 8000 times, the target method will be compiled by the JIT compiler, so warm up 10000 times for the time being.

Performance.scala



package graal

import java.util.concurrent.TimeUnit

import org.openjdk.jmh.annotations._

import scala.annotation.tailrec
import scala.io.BufferedSource

//Specify the number of forks
//The default is 5
@Fork(1)
@Warmup(iterations = 1, batchSize = 10000)
@Measurement(iterations = 30, batchSize = 10000)
//Measure the execution time per execution
@BenchmarkMode(Array(Mode.SingleShotTime))
//Measurement unit is ms
@OutputTimeUnit(TimeUnit.MILLISECONDS)
class GraalTest {

  //Target this method
  @Benchmark
  def run = {
    val fileName: String = "GraalTest.txt"
    val encode: String = "UTF-8"
    val source: BufferedSource = scala.io.Source.fromResource(fileName)
    val lines: Iterator[String] = source.getLines()
    val sortedTextList = lines.toList.mkString(" ").split(" ").sorted.toList
    val value = createMap(sortedTextList)
    val top10Words = value.toList.sortBy(_._2).reverse.take(10)
  }

  private def createMap(wordList: List[String]): Map[String, Long] = {
    @tailrec
    def loop(list: List[String], acc: Map[String, Long]): Map[String, Long] = {
      list match {
        case head :: tail if acc.isEmpty => {
          loop(tail, acc + (head -> 1L))
        }
        case head :: tail => {
          acc.get(head) match {
            case Some(value) => {
              loop(tail, acc.updated(head, value + 1L))
            }
            case None => {
              loop(tail, acc + (head -> 1L))
            }
          }
        }
        case head :: Nil => {
          acc.get(head) match {
            case Some(value) => {
              acc.updated(head, value + 1L)
            }
            case None => {
              acc + (head -> 1L)
            }
          }
        }
        case Nil => acc
      }
    }
    loop(wordList, Map.empty[String, Long])
  }
}


Run in standard JVM

First, compile

sbt:jitCompiler> jmh:compile
[info] Compiling 3 Scala sources to /Users/kinsho/workspace/loginfrastructure/jitCompiler/target/scala-2.13/classes ...
...
[info] Compiling 5 Java sources to /Users/kinsho/workspace/loginfrastructure/jitCompiler/target/scala-2.13/classes ...
[success] Total time: 5 s, completed 2020/06/14 12:08:28

Actually execute. From [info] # VM version: JDK 1.8.0_232, OpenJDK 64-Bit Server VM, 25.232-b09, you can see that it is running using the standard JVM.

sbt:jitCompiler> jmh:run
[info] running (fork) org.openjdk.jmh.Main
[info] # JMH version: 1.21
[info] # VM version: JDK 1.8.0_232, OpenJDK 64-Bit Server VM, 25.232-b09
[info] # VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
[info] # VM options: <none>
[info] # Warmup: 1 iterations, single-shot each, 10000 calls per op
[info] # Measurement: 30 iterations, single-shot each, 10000 calls per op
[info] # Timeout: 10 min per iteration
[info] # Threads: 1 thread
[info] # Benchmark mode: Single shot invocation time
[info] # Benchmark: graal.GraalTest.run
[info] # Run progress: 0.00% complete, ETA 00:00:00
[info] # Fork: 1 of 1
[info] # Warmup Iteration   1: 1075.011 ms/op
[info] Iteration   1: 555.518 ms/op
[info] Iteration   2: 634.679 ms/op
[info] Iteration   3: 552.381 ms/op
[info] Iteration   4: 687.170 ms/op
[info] Iteration   5: 538.699 ms/op
[info] Iteration   6: 537.693 ms/op
[info] Iteration   7: 622.123 ms/op
[info] Iteration   8: 801.559 ms/op
[info] Iteration   9: 611.831 ms/op
[info] Iteration  10: 567.022 ms/op
[info] Iteration  11: 590.506 ms/op
[info] Iteration  12: 666.859 ms/op
[info] Iteration  13: 564.290 ms/op
[info] Iteration  14: 552.446 ms/op
[info] Iteration  15: 713.195 ms/op
[info] Iteration  16: 781.514 ms/op
[info] Iteration  17: 685.628 ms/op
[info] Iteration  18: 570.931 ms/op
[info] Iteration  19: 557.543 ms/op
[info] Iteration  20: 559.926 ms/op
[info] Iteration  21: 677.823 ms/op
[info] Iteration  22: 551.438 ms/op
[info] Iteration  23: 564.687 ms/op
[info] Iteration  24: 560.043 ms/op
[info] Iteration  25: 553.672 ms/op
[info] Iteration  26: 548.359 ms/op
[info] Iteration  27: 777.705 ms/op
[info] Iteration  28: 794.434 ms/op
[info] Iteration  29: 917.850 ms/op
[info] Iteration  30: 553.160 ms/op
[info] Result "graal.GraalTest.run":
[info]   N = 30
[info]   mean =    628.356 ±(99.9%) 67.159 ms/op
[info]   Histogram, ms/op:
[info]     [ 500.000,  550.000) = 3
[info]     [ 550.000,  600.000) = 14
[info]     [ 600.000,  650.000) = 3
[info]     [ 650.000,  700.000) = 4
[info]     [ 700.000,  750.000) = 1
[info]     [ 750.000,  800.000) = 3
[info]     [ 800.000,  850.000) = 1
[info]     [ 850.000,  900.000) = 0
[info]     [ 900.000,  950.000) = 1
[info]   Percentiles, ms/op:
[info]       p(0.0000) =    537.693 ms/op
[info]      p(50.0000) =    568.977 ms/op
[info]      p(90.0000) =    793.142 ms/op
[info]      p(95.0000) =    853.890 ms/op
[info]      p(99.0000) =    917.850 ms/op
[info]      p(99.9000) =    917.850 ms/op
[info]      p(99.9900) =    917.850 ms/op
[info]      p(99.9990) =    917.850 ms/op
[info]      p(99.9999) =    917.850 ms/op
[info]     p(100.0000) =    917.850 ms/op
[info] # Run complete. Total time: 00:00:26
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark      Mode  Cnt    Score    Error  Units
[info] GraalTest.run    ss   30  628.356 ± 67.159  ms/op
[success] Total time: 26 s, completed 2020/06/14 12:26:25

Results in standard JVM

The results are described in a sloppy manner, but in summary,

** Warm-up time: 1075.011 ms / op ** ** Pure speed: 628.356 ms / op **

ms / op is a unit that expresses how long it took to execute the target method once. Since the warm-up time takes longer than the pure speed, you can also see that it compiles normally after 10,000 warm-ups.

Run on GraalVM

This time, the favorite, measured with GraalVM! First, compile.

sbt:jitCompiler> jmh:compile
[info] Compiling 3 Scala sources to /Users/kinsho/workspace/loginfrastructure/jitCompiler/target/scala-2.13/classes ...
...
[info] Compiling 5 Java sources to /Users/kinsho/workspace/loginfrastructure/jitCompiler/target/scala-2.13/classes ...
[success] Total time: 5 s, completed 2020/06/14 12:18:06

Execute.


sbt:jitCompiler> jmh:run
Processing 9 classes from /Users/kinsho/workspace/loginfrastructure/jitCompiler/target/scala-2.13/classes with "reflection" generator
Writing out Java source to /Users/kinsho/workspace/loginfrastructure/jitCompiler/target/scala-2.13/src_managed/jmh and resources to /Users/kinsho/workspace/loginfrastructure/jitCompiler/target/scala-2.13/resource_managed/jmh
[info] running (fork) org.openjdk.jmh.Main
[info] # JMH version: 1.21
[info] # VM version: JDK 1.8.0_252, OpenJDK 64-Bit Server VM GraalVM CE 20.1.0, 25.252-b09-jvmci-20.1-b02
[info] # VM invoker: /Library/Java/JavaVirtualMachines/graalvm-ce-java8-20.1.0/Contents/Home/jre/bin/java
[info] # VM options: <none>
[info] # Warmup: 1 iterations, single-shot each, 10000 calls per op
[info] # Measurement: 30 iterations, single-shot each, 10000 calls per op
[info] # Timeout: 10 min per iteration
[info] # Threads: 1 thread
[info] # Benchmark mode: Single shot invocation time
[info] # Benchmark: graal.GraalTest.run
[info] # Run progress: 0.00% complete, ETA 00:00:00
[info] # Fork: 1 of 1
[info] # Warmup Iteration   1: 1012.169 ms/op
[info] Iteration   1: 546.073 ms/op
[info] Iteration   2: 584.707 ms/op
[info] Iteration   3: 547.958 ms/op
[info] Iteration   4: 662.077 ms/op
[info] Iteration   5: 549.202 ms/op
[info] Iteration   6: 490.277 ms/op
[info] Iteration   7: 620.376 ms/op
[info] Iteration   8: 780.583 ms/op
[info] Iteration   9: 576.626 ms/op
[info] Iteration  10: 508.686 ms/op
[info] Iteration  11: 543.451 ms/op
[info] Iteration  12: 589.145 ms/op
[info] Iteration  13: 515.498 ms/op
[info] Iteration  14: 524.002 ms/op
[info] Iteration  15: 717.083 ms/op
[info] Iteration  16: 839.723 ms/op
[info] Iteration  17: 523.987 ms/op
[info] Iteration  18: 519.271 ms/op
[info] Iteration  19: 525.304 ms/op
[info] Iteration  20: 530.935 ms/op
[info] Iteration  21: 634.322 ms/op
[info] Iteration  22: 545.711 ms/op
[info] Iteration  23: 513.338 ms/op
[info] Iteration  24: 518.737 ms/op
[info] Iteration  25: 519.130 ms/op
[info] Iteration  26: 575.139 ms/op
[info] Iteration  27: 745.525 ms/op
[info] Iteration  28: 774.832 ms/op
[info] Iteration  29: 830.160 ms/op
[info] Iteration  30: 549.396 ms/op
[info] Result "graal.GraalTest.run":
[info]   N = 30
[info]   mean =    596.708 ±(99.9%) 68.925 ms/op
[info]   Histogram, ms/op:
[info]     [400.000, 450.000) = 0
[info]     [450.000, 500.000) = 1
[info]     [500.000, 550.000) = 16
[info]     [550.000, 600.000) = 4
[info]     [600.000, 650.000) = 2
[info]     [650.000, 700.000) = 1
[info]     [700.000, 750.000) = 2
[info]     [750.000, 800.000) = 2
[info]     [800.000, 850.000) = 2
[info]   Percentiles, ms/op:
[info]       p(0.0000) =    490.277 ms/op
[info]      p(50.0000) =    548.580 ms/op
[info]      p(90.0000) =    780.008 ms/op
[info]      p(95.0000) =    834.463 ms/op
[info]      p(99.0000) =    839.723 ms/op
[info]      p(99.9000) =    839.723 ms/op
[info]      p(99.9900) =    839.723 ms/op
[info]      p(99.9990) =    839.723 ms/op
[info]      p(99.9999) =    839.723 ms/op
[info]     p(100.0000) =    839.723 ms/op
[info] # Run complete. Total time: 00:00:25
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark      Mode  Cnt    Score    Error  Units
[info] GraalTest.run    ss   30  596.708 ± 68.925  ms/op
[success] Total time: 26 s, completed 2020/06/14 12:22:50

Results with GraalVm

The results are described in a sloppy manner this time as well, but in summary,

** Warm-up time: 1012.169 ms / op ** ** Pure run time: 596.708 ms / op **

Overall result

As a result of the measurement, it was found that Graal was faster. (If this measurement method is correct, there seems to be more points to consider)

reference

Recommended Posts

I tried to measure and compare the speed of GraalVM with JMH
I tried to compare the infrastructure technology of engineers these days with cooking.
I tried to increase the processing speed with spiritual engineering
I tried to summarize the basics of kotlin and java
I tried to solve the problem of "multi-stage selection" with Ruby
I tried to build the environment of PlantUML Server with Docker
I tried to check the operation of gRPC server with grpcurl
I tried to summarize the methods of Java String and StringBuilder
Compare the speed of the for statement and the extended for statement.
I tried to express the result of before and after of Date class with a number line
I tried to summarize the key points of gRPC design and development
I tried to visualize the access of Lambda → Athena with AWS X-Ray
I tried to read and output CSV with Outsystems
05. I tried to stub the source of Spring Boot
I started MySQL 5.7 with docker-compose and tried to connect
I tried to reduce the capacity of Spring Boot
I tried to check the operation of http request (Put) with Talented API Tester
I tried to investigate the mechanism of Emscripten by using it with the Sudoku solver
I tried to verify this and that of Spring @ Transactional
I tried JAX-RS and made a note of the procedure
I tried to summarize the basic grammar of Ruby briefly
Be sure to compare the result of Java compareTo with 0
Sazae-san's rock-paper-scissors I tried to verify the theoretical value and the measured value of the probability of the same hand 5 consecutive times with Ruby
I tried to interact with Java
I tried to explain the method
[For Swift beginners] I tried to summarize the messy layout cycle of ViewController and View
I want to return to the previous screen with kotlin and java!
I tried to implement the image preview function with Rails / jQuery
I tried to solve the problem of Google Tech Dev Guide
I tried to summarize the methods used
I tried to get started with WebAssembly
I tried to implement the Iterator pattern
I tried to summarize the Stream API
I want to display images with REST Controller of Java and Spring!
[Ruby] I want to extract only the value of the hash and only the key
I tried to solve the tribonacci sequence problem in Ruby, with recursion.
I want to pass the argument of Annotation and the argument of the calling method to aspect
I tried to make full use of the CPU core in Ruby
I translated the grammar of R and Java [Updated from time to time]
I tried to figure out the flow when performing image analysis with Vision Framework and Core ML
I tried to verify AdoptOpenJDK 11 (11.0.2) with Docker image
I tried to make Basic authentication with Java
I tried to manage struts configuration with Coggle
I tried to implement a function equivalent to Felica Lite with HCE-F of Android
[Rails] I tried to raise the Rails version from 5.0 to 5.2
I tried to manage login information with JMX
I tried to implement the client using Netty
I tried to organize the session in Rails
What I tried when I wanted to get all the fields of a bean
I tried to link grafana and postgres [docker-compose]
I also tried WebAssembly with Nim and C
I compared the characteristics of Java and .NET
I tried to chew C # (basic of encapsulation)
I want to var_dump the contents of the intent
I tried to link JavaFX and Spring Framework.
I tried to set tomcat to run the Servlet.
I tried to clone a web application full of bugs with Spring Boot
Measure the distance of the maze with breadth-first search
I checked the number of taxis with Ruby
I tried to make a web application that searches tweets with vue-word cloud and examines the tendency of what is written in the associated profile
I tried to express the phone number (landline / mobile phone) with a regular expression in Rails and write validation and test