O'Reilly Japan -Java Performance Summary of Chapter 3 of this book

Chapter 1 Introduction -Qiita Chapter 2 Performance Testing Approach -Qiita ← Previous article Chapter 3 Java Performance Toolbox --Qiita ← This article Chapter 4 Mechanism of JIT compiler -Qiita ← Next article Chapter 5 Basics of Garbage Collection -Qiita

3.1 Tools and analysis that comes with the operating system

The tool to start with is a tool that has nothing to do with Java.

On Unix-based systems, sar command, vmstat, iostat //qiita.com/pyon_kiti_jp/items/a44ac6e9229ba7b90c6b), there is prstat, In Windows, there is a typeperf.

CPU usage

CPU usage can be classified into the following two types.

--User time: The time the CPU is running the application --System time: The time the CPU is running the OS kernel System time is not application independent. It writes to disk, transfers to the network, and so on.

In Windows, system time is called privileged time.

The goal is to maximize CPU usage and shorten execution time.

The cause of the CPU being idle

--Waiting for lock release with sync primitive --Waiting for something (for example, a response from the database). --There is no processing to be performed. It is possible that.

Conversely, you may want to limit the CPU usage per program. In such cases, you can force the CPU cycle to idle for a period of time, or lower the priority.

CPU run queue

Both Windows and Unix systems have a mechanism for monitoring executable threads (not waiting for I / O or sleeping). This mechanism is called run queue. In Windows it is called a processor queue and can be seen with typeperf.

C:> typeperf -si 1 "\System\Processor Queue Length"
"05/11/2013 19:09:42.678","0.000000"
"05/11/2013 19:09:43.678","0.000000"

Since the run queue includes the one that is currently running, it must be ** 1 or more **. On the other hand, the processor queue of Windows does not include the one currently running, so it is ** 0 or more **.

Performance suffers when running more threads than there are CPUs. On Unix, the number of run queues == the number of CPUs should be set, and on Windows, the length of the processor queue should be 0. However, this is not an absolute principle. If the length of the run queue increases from time to time, there is no problem. On the other hand, if the run queue continues to be too long, the load on the machine will increase. You need to distribute the process to another machine or optimize your code.

Disk usage

It's a hint to confirm that something is wrong. For example, if wMB / s (number of bytes written per second) is low but w / s (number of writes per second) is high in ʻiostat -x`, it may be better to combine the write processing. Conversely, if you write too much, you know that it is the bottleneck.

You can see if the system is swapping. May affect performance. In vmstat, you can see si (swap in) and so (swap out).

Network utilization

Standard system tools are inadequate for monitoring network traffic. Netstat is used on Unix-like systems. Type perf on Windows.

For network bandwidth, nicstat is used on Unix-like systems.

3.2 Java monitoring tool

--jcmd: Get information about the specified process --jconsole: You can see the behavior of the JVM in the GUI. --jhat: You can browse heap dumps in your web browser. --jmap: You can get heap dump and memory information. --jinfo: Can display JVM system properties --jstack: Dump the stack of Java processes. --jstat: Display information about GC and class loading. --jvisualvm: JVM monitoring. Can analyze heap dumps.

Basic information about the JVM

Execution time

jcmd ${Process ID} VM.uptime

System properties

You can see the same value as System.getProperties () (the one you specify -Dhogehoge at startup).

jcmd ${Process ID} VM.system_properties

jinfo -sysprops ${Process ID}

JVM version

jcmd ${Process ID} VM.version

JVM command line arguments

jcmd ${Process ID} VM.command_line

JVM tuning flag

jcmd ${Process ID} VM.flags -all

If you remove -all, will it be only the specified one?

Show platform-specific default values

java -XX:+PrintFlagsFinal -version

A value other than the default is specified for those with: =.

Dynamic change of flag

First, you can see all the values set in ↓.

jinfo -flags ${Process ID}

If you want to see each setting item

jinfo -flag PrintGCDetails ${Process ID}

When changing the setting value

jinfo -flag -PrintGCDetails ${Process ID}
jinfo -flag PrintGCDetails ${Process ID}

Thread information

You can check the number of threads running on the GUI with jconsole or jvisualvm.

How to display the thread stack

jstack ${Process ID}

jcmd ${Process ID} Thread.print

Class information

Use jconsole or jstat. jstat also gives you information about compilation.

Dynamic analysis of garbage collection

In jconsole, heap usage can be displayed graphically. You can run GC with jcmd (like you could with jconsole?) jmap gives an overview of the heap. jstat provides various views that show what the GC is doing.

Ex-post analysis of heap dumps

Can be obtained with jvisualvm (GUI tool) or jcmd or jmap If it is a standard tool, it can be analyzed with jvisualvm or jhat There is also a third-party tool called the Eclipse Memory Analyzer Tool.

3.3 Profiling tool

Sampling profiler

There are sampling mode and instrumented mode Since the sampling mode has the least overhead, the performance characteristics due to profiler intervention can be minimized.

However, many sampling profilers are not accurate. For example, an event that is called periodically by a timer and only detects the thread that was executed when the timer occurred.

In most cases, the method that appears at the beginning of the profiling result only accounts for 2-3%, so if you do your best to make it twice as fast, it may only be 1% faster.

instrumented profiler

Unlike the sampling type, you can also see the number of times each method is called and the average number of times each method is called per second. From this, for example, it can be understood whether the implementation should be speeded up or improved so as to reduce the number of executions.

The profiler in instrumented mode may be inaccurate in terms of performance because the code for counting the number of calls is rewritten and charged in the bytecode. For example, the size of the method code may increase and it may be determined that inlining is unnecessary (detailed in Chapter 4).

Sampling profilers can only profile threads at safepoints (memory is allocated), but instrumented mode profilers are.

Timeline of blocked methods and threads

Blocking methods (waiting methods such as LockSupport # park and Object # wait ()) do not consume CPU time (do not increase CPU usage), so even if they appear at the top of the profiling results, they cannot be optimized. .. As a result, most profilers do not show blocked methods by default.

For the blocking method, you can see the behavior by looking at the execution status of the thread. For VisualVM, the Threads tab.

Native profiler

With a native profiler, you can profile the JVM itself.

One of the native profilers is Oracle Solaris Studio. It's named Solaris, but it also works on Linux.

When run on Solaris, you can take advantage of the internal structure of the Solaris kernel to get more information.

There was a description like that. I don't have a Solaris machine so I can't try it.

Data that can only be obtained with native tools includes the time spent on GC.

3.4 Java Mission Control It's in the commercial version of Java under the name jmc, not in the open source version. A commercial license is required to use it.

JFR

A function called JFR (Java Flight Recorder) is a key function of jmc. You can see the event, such as the thread blocking.

You can see events related to garbage collection in the JFR Memory view. In Chapters 5 and 6, you should read while being aware of how this tool is useful.

In the Overview tab of the JFR Code view, you can see the aggregated values for each package. It is unusual to have this feature.

You can also get accurate information about lock inflation. In addition, you can get information that cannot be obtained with jcmd or jconsole.

As explained in Chapter 9, to acquire a lock, wait in a simple loop (called spin), and acquire the lock using a code specific to the CPU or OS in a process called lock inflation. do.

Enable JFR

To enable it, at the command line to launch the application -XX:+UnlockCommercialFeatures -XX:+FlightRecorder Is given a flag.

JFR can take two recording methods. There is continuous recording for a certain period of time, and a ring buffer is used for continuous recording.

You can specify how to record with parameters such as -XX: + FlightRecorderOptions = $ {parameter string}. Options are described here [https://docs.oracle.com/javacomponents/jp/jmc-5-4/jfr-runtime-guide/comline.htm#BABJEIEH).

You can also set the running JVM with the following command (although the -XX: + FlightRecorder option must be specified in advance).

jcmd ${Process ID} JFR.start [option]

For continuous recording, the data in the ring buffer can be output to a file with the following command.

jcmd ${Process ID} JFR.dump [option]

Information about the recording being executed can be output with the following command (By the way, it seems that multiple recordings can be made in one process, so I wonder if it is often used when multiple recordings are made).

jcmd ${Process ID} JFR.check [verbose]

At ↓, stop recording.

jcmd ${Process ID} JFR.stop [option]

JFR event selection

JFR can be extended. It seems that you can create your own event. Collecting events inevitably involves overhead. However, there are some events that you want to get even with some overhead. For example, monitor TLAB (Thread Local Area Buffer) events and see if objects are allocated directly to the old area.

Java Performance Chapter 3 Java Performance Toolbox