O'Reilly Japan -Java Performance Summary of Chapter 3 of this book
Chapter 1 Introduction -Qiita Chapter 2 Performance Testing Approach -Qiita ← Previous article Chapter 3 Java Performance Toolbox --Qiita ← This article Chapter 4 Mechanism of JIT compiler -Qiita ← Next article Chapter 5 Basics of Garbage Collection -Qiita
The tool to start with is a tool that has nothing to do with Java.
On Unix-based systems, sar command, vmstat, iostat //qiita.com/pyon_kiti_jp/items/a44ac6e9229ba7b90c6b), there is prstat, In Windows, there is a typeperf.
CPU usage can be classified into the following two types.
--User time: The time the CPU is running the application --System time: The time the CPU is running the OS kernel System time is not application independent. It writes to disk, transfers to the network, and so on.
The goal is to maximize CPU usage and shorten execution time.
The cause of the CPU being idle
--Waiting for lock release with sync primitive --Waiting for something (for example, a response from the database). --There is no processing to be performed. It is possible that.
Conversely, you may want to limit the CPU usage per program. In such cases, you can force the CPU cycle to idle for a period of time, or lower the priority.
Both Windows and Unix systems have a mechanism for monitoring executable threads (not waiting for I / O or sleeping). This mechanism is called run queue. In Windows it is called a processor queue and can be seen with typeperf.
C:> typeperf -si 1 "\System\Processor Queue Length"
"05/11/2013 19:09:42.678","0.000000"
"05/11/2013 19:09:43.678","0.000000"
Since the run queue includes the one that is currently running, it must be ** 1 or more **. On the other hand, the processor queue of Windows does not include the one currently running, so it is ** 0 or more **.
Performance suffers when running more threads than there are CPUs. On Unix, the number of run queues == the number of CPUs should be set, and on Windows, the length of the processor queue should be 0. However, this is not an absolute principle. If the length of the run queue increases from time to time, there is no problem. On the other hand, if the run queue continues to be too long, the load on the machine will increase. You need to distribute the process to another machine or optimize your code.
It's a hint to confirm that something is wrong. For example, if wMB / s (number of bytes written per second) is low but w / s (number of writes per second) is high in ʻiostat -x`, it may be better to combine the write processing. Conversely, if you write too much, you know that it is the bottleneck.
You can see if the system is swapping. May affect performance. In vmstat, you can see si (swap in) and so (swap out).
Standard system tools are inadequate for monitoring network traffic. Netstat is used on Unix-like systems. Type perf on Windows.
For network bandwidth, nicstat is used on Unix-like systems.
--jcmd: Get information about the specified process --jconsole: You can see the behavior of the JVM in the GUI. --jhat: You can browse heap dumps in your web browser. --jmap: You can get heap dump and memory information. --jinfo: Can display JVM system properties --jstack: Dump the stack of Java processes. --jstat: Display information about GC and class loading. --jvisualvm: JVM monitoring. Can analyze heap dumps.
jcmd ${Process ID} VM.uptime
You can see the same value as System.getProperties ()
(the one you specify -Dhogehoge
at startup).
jcmd ${Process ID} VM.system_properties
jinfo -sysprops ${Process ID}
jcmd ${Process ID} VM.version
jcmd ${Process ID} VM.command_line
jcmd ${Process ID} VM.flags -all
If you remove -all, will it be only the specified one?
java -XX:+PrintFlagsFinal -version
A value other than the default is specified for those with: =.
First, you can see all the values set in ↓.
jinfo -flags ${Process ID}
If you want to see each setting item
jinfo -flag PrintGCDetails ${Process ID}
When changing the setting value
jinfo -flag -PrintGCDetails ${Process ID}
jinfo -flag PrintGCDetails ${Process ID}
You can check the number of threads running on the GUI with jconsole or jvisualvm.
How to display the thread stack
jstack ${Process ID}
jcmd ${Process ID} Thread.print
Use jconsole or jstat. jstat also gives you information about compilation.
In jconsole, heap usage can be displayed graphically. You can run GC with jcmd (like you could with jconsole?) jmap gives an overview of the heap. jstat provides various views that show what the GC is doing.
Can be obtained with jvisualvm (GUI tool) or jcmd or jmap If it is a standard tool, it can be analyzed with jvisualvm or jhat There is also a third-party tool called the Eclipse Memory Analyzer Tool.
There are sampling mode and instrumented mode Since the sampling mode has the least overhead, the performance characteristics due to profiler intervention can be minimized.
However, many sampling profilers are not accurate. For example, an event that is called periodically by a timer and only detects the thread that was executed when the timer occurred.
In most cases, the method that appears at the beginning of the profiling result only accounts for 2-3%, so if you do your best to make it twice as fast, it may only be 1% faster.
Unlike the sampling type, you can also see the number of times each method is called and the average number of times each method is called per second. From this, for example, it can be understood whether the implementation should be speeded up or improved so as to reduce the number of executions.
The profiler in instrumented mode may be inaccurate in terms of performance because the code for counting the number of calls is rewritten and charged in the bytecode. For example, the size of the method code may increase and it may be determined that inlining is unnecessary (detailed in Chapter 4).
Sampling profilers can only profile threads at safepoints (memory is allocated), but instrumented mode profilers are.
Blocking methods (waiting methods such as LockSupport # park and Object # wait ()) do not consume CPU time (do not increase CPU usage), so even if they appear at the top of the profiling results, they cannot be optimized. .. As a result, most profilers do not show blocked methods by default.
For the blocking method, you can see the behavior by looking at the execution status of the thread. For VisualVM, the Threads tab.
With a native profiler, you can profile the JVM itself.
One of the native profilers is Oracle Solaris Studio. It's named Solaris, but it also works on Linux.
When run on Solaris, you can take advantage of the internal structure of the Solaris kernel to get more information.
There was a description like that. I don't have a Solaris machine so I can't try it.
Data that can only be obtained with native tools includes the time spent on GC.
3.4 Java Mission Control It's in the commercial version of Java under the name jmc, not in the open source version. A commercial license is required to use it.
JFR
A function called JFR (Java Flight Recorder) is a key function of jmc. You can see the event, such as the thread blocking.
You can see events related to garbage collection in the JFR Memory view. In Chapters 5 and 6, you should read while being aware of how this tool is useful.
In the Overview tab of the JFR Code view, you can see the aggregated values for each package. It is unusual to have this feature.
You can also get accurate information about lock inflation. In addition, you can get information that cannot be obtained with jcmd or jconsole.
To enable it, at the command line to launch the application
-XX:+UnlockCommercialFeatures
-XX:+FlightRecorder
Is given a flag.
JFR can take two recording methods. There is continuous recording for a certain period of time, and a ring buffer is used for continuous recording.
You can specify how to record with parameters such as -XX: + FlightRecorderOptions = $ {parameter string}
.
Options are described here [https://docs.oracle.com/javacomponents/jp/jmc-5-4/jfr-runtime-guide/comline.htm#BABJEIEH).
You can also set the running JVM with the following command (although the -XX: + FlightRecorder
option must be specified in advance).
jcmd ${Process ID} JFR.start [option]
For continuous recording, the data in the ring buffer can be output to a file with the following command.
jcmd ${Process ID} JFR.dump [option]
Information about the recording being executed can be output with the following command (By the way, it seems that multiple recordings can be made in one process, so I wonder if it is often used when multiple recordings are made).
jcmd ${Process ID} JFR.check [verbose]
At ↓, stop recording.
jcmd ${Process ID} JFR.stop [option]
JFR can be extended. It seems that you can create your own event. Collecting events inevitably involves overhead. However, there are some events that you want to get even with some overhead. For example, monitor TLAB (Thread Local Area Buffer) events and see if objects are allocated directly to the old area.
Recommended Posts