This article describes some common ** Java ** performance diagnostic tools and highlights the basic principles and best practices of ** JProfiler **.

This blog is a translation from the English version. You can check the original from here. We use some machine translation. We would appreciate it if you could point out any translation errors. *

background

Performance diagnostics are a problem that software engineers often face and have to solve in their daily work. You can get great benefits by improving the performance of your application. Java is one of the most popular programming languages. Its performance diagnostics have long been the focus of attention throughout the industry. In Java applications, many factors can cause performance issues. Such factors include thread control, disk I / O, database access, network I / O, and garbage collection (GC). To find these problems, you need a good performance diagnostic tool. This article describes some common Java performance diagnostic tools and is representative of these tools JProfiler. Introducing the basic principles and best practices of spm = a2c65.11461447.0.0.4e294d65AIiUSB). The version of JProfiler featured in this article is JProfiler10.1.4.

A brief introduction to Java performance diagnostic tools

There are a variety of performance diagnostic tools in the Java world, from simple command line tools such as jmap and jstat to comprehensive graphical diagnostic tools such as JVisualvm and JProfiler. The following sections briefly describe each of these tools.

Simple command line tool

The Java Development Kit (JDK) provides many built-in command line tools. These tools help you get information about the target Java Virtual Machine (JVM) from different aspects and different layers.

--jinfo --You can view and adjust various parameters of the target JVM in real time. --jstack --You can get thread stack information for the target Java process, detect deadlocks, and find infinite loops. --jmap --Get memory related information for the target Java process. This includes different Java heap usage, statistics on objects in the Java heap, loaded classes, and more. --jstat --A lightweight and versatile monitoring tool. You can get a lot of information about the loaded classes of the target Java process, just-in-time (JIT) compilation, garbage collection, and memory usage. --jcmd --Comprehensive than jstat. You can get various information about target Java process performance statistics, Java Flight Recorder (JFR), memory usage, garbage collection, thread stacking, and JVM runtime.

Comprehensive graphical diagnostic tool

You can use any or any combination of the above command line tools to get basic performance information about your target Java application. However, these tools have the following drawbacks:

It is not possible to obtain method-level analysis data such as the call relationship between different methods and the frequency and duration of method calls. These are very important for identifying application performance bottlenecks.
To use these, you need to log on to the host machine of the target Java application, which is not very convenient. 3, analysis data is generated on the terminal, the result is not displayed.

Below are some comprehensive graphical performance diagnostic tools.

JVisualvm JVisualvm is a built-in visual performance diagnostic provided by the JDK. It is a tool. Use various methods such as JMX, jstatd, and Attach API to acquire analysis data of the target JVM such as CPU usage rate, memory usage rate, number of threads, number of heaps, and number of stacks. You can also view the amount and size of each object in the Java heap, the number of Java method calls, Java method execution time, and more.

JProfiler JProfiler is a performance diagnostic tool for Java applications developed by ej-technologies. It focuses on four important topics.

Method Calls-Method call analysis helps you understand what your application is doing and find ways to improve performance.
Allocation-Through analysis of objects, reference chains, and garbage collection on the heap, this feature allows you to fix memory leaks and optimize memory usage.
Threads and Locks-JProfiler provides multiple parse views on threads and locks to help you discover multithreading issues. 4, High Level Subsystem-Many performance issues occur at higher semantic levels. For example, in a Java Database Connectivity (JDBC) call, you might want to find out which SQL statement is the slowest. JProfiler supports integrated analysis of these subsystems.

Distributed application performance diagnostics

If you just want to diagnose performance bottlenecks in a standalone Java application, the diagnostic tools above are sufficient to meet your needs. However, as modern stand-alone systems gradually evolve into distributed systems and microservices, the above tools will no longer meet the requirements. Therefore, Jaeger and [ARMS](https://cn.aliyun.com/product/arms?spm=a2c65. You need to take advantage of the end-to-end tracing capabilities of distributed tracing systems such as 11461447.0.0.4e294d65AIiUSB) and SkyWalking. Various distributed trace systems are commercially available, but the implementation mechanism is similar. It records trace information by code tracking, sends the recorded data to the central processing system via SDK or agent, and provides a query interface for displaying and analyzing the results. For more information on the principles of distributed tracing systems, see Jaeger's article entitled OpenTracing Implementation.

Introduction of JProfiler

Core components

JProfiler consists of a JProfiler agent for collecting analysis data from the target JVM, a JProfiler UI for visually analyzing the data, and a command line utility that provides various functions. Below is a complete picture of the important interactions between them.

JProfiler agent

The JProfiler agent is implemented as a native library. Load it at JVM startup using the parameter -agentpath: or the JVM Attach Mechanism (http://lovestblog.cn/blog/2014/06/18/jvm-attach/?spm You can load it while the application is running using = a2c65.11461447.0.0.4e294d65AIiUSB). After the JProfiler agent is loaded, it configures the JVM Tools Interface (JVMTI) environment to monitor all types of events generated by the JVM, such as creating threads and loading classes. For example, when it detects a class loading event, the JProfiler agent inserts its own bytecode into these classes to perform the measurement.

JProfiler UI The JProfiler UI is launched individually and connects to the profiling agent through the socket. That is, it doesn't matter if the profiled JVM is running on the local machine or on a remote machine-the communication mechanism between the profiling agent and the JProfiler UI is always the same.

From the JProfiler UI, you can instruct the agent to record data, view profiling data in the UI, and save snapshots to disk.

Command line tools

JProfiler provides a set of command line tools for implementing various features.

--jpcontroller --Used to control how the agent collects data. Send instructions to the agent through the JProfiler MBean registered by the agent. --jpenable --Used to load the agent into a running JVM. --jpdump --Used to capture a heap snapshot of a running JVM. --jpexport & jpcompare --Used to extract data from previously saved snapshots and create HTML reports.

Installation

JProfiler supports performance diagnostics for both local and remote Java applications. If you need to collect and view the analysis data of the remote JVM in real time, complete the following steps:

Install JProfiler UI locally.
Install the JProfiler agent on the remote host machine and load it on the target JVM.
Connect the JProfiler UI to the agent.

For more information on the installation procedure, see Installing JProfiler and [JVM] Profiling.

best practice

Here, the performance of Alibaba Cloud LOG Java Producer (hereinafter, Producer) which is a LogHub class library. I will show you how to diagnose using JProfiler. If you experience performance issues when using your application or Producer, you can do the same to find the root cause. If you are new to Producer, we recommend that you read this article first. Alibaba Cloud LOG Java Producer-A powerful tool for migrating logs to the cloud.

The sample code used here is [SamplePerformance.java](https://github.com/aliyun/aliyun-log-producer-sample/blob/master/src/main/java/com/aliyun/openservices/aliyun/log /producer/sample/SamplePerformance.java?spm=a2c65.11461447.0.0.4e294d65AIiUSB&file=SamplePerformance.java).

JProfiler settings

Data collection mode

JProfiler has two data collection methods: sampling and measurement.

--Sampling --Suitable for scenarios that do not require high data collection accuracy. The advantage of this method is that it has less impact on system performance. The downside is that some features such as method level statistics are not supported. --Instrumentation --A complete data collection mode with high accuracy support. The disadvantages are that many classes have to be analyzed and the impact on application performance is relatively heavy. To reduce the impact, it is recommended to use it with a filter. In this example, we need to get method-level statistics, so we choose the instrumentation method. The filter is configured so that the agent records CPU data for only two classes under the java package, com.aliyun.openservices.aliyun.log.producer and com.aliyun.openservices.log.Client. I am.

Application launch mode

You can specify various parameters to the JProfiler agent to control the launch mode of your application.

--Wait for a connection from the JProfiler GUI --The application will only launch if the JProfiler GUI establishes a connection with the profiling agent and completes the profiling settings. This option allows you to profile the launch phase of your application. Commands that can be used to enable this option: -agentpath: <path to native library> = port = 8849

--Launch now and connect later with the JProfiler GUI --The JProfiler GUI establishes a connection with the profiling agent and sends profiling settings when needed. This option is flexible, but it does not allow you to profile the launch phase of your application. Commands that can be used to enable this option: -agentpath: <path to native library> = port = 8849, nowait.

--If the profile is offline and JProfiler cannot connect --You need to set a trigger to record the data and save the snapshot that can be opened later in the JProfiler GUI. The commands you can use to enable this option are -agentpath: <path to native library> = offline, id = xxx, config = / config.xml.

In a test environment, you need to determine the performance of your application during the launch phase. Therefore, we will use the default WAIT option here.

Diagnose application performance using JProfiler

Once you have configured your profiling, you can proceed to the producer's performance diagnostics.

Overview

On the overview page, you can clearly see graphs (telemetry) of various metrics such as memory, GC activity, classes, threads, CPU load, and so on.

Based on this telemetry, we can make the following assumptions:

A large number of objects are created while the application is running. The life cycle of these objects is very short, and most objects are quickly recycled by the garbage collector. These objects do not cause a continuous increase in memory usage.
As expected, the number of loaded classes increases rapidly during the boot period and then stabilizes.
Many threads are blocked while the application is running. Particular attention should be paid to this issue.
When starting the application, the CPU usage will be high. We need to find out the cause.

CPU view

The number of executions, execution time, and call relationships of each method in the application are displayed in the CPU view. These will help you find the methods that have the greatest impact on your application's performance.

Call tree

The call tree uses a tree graph to hierarchically display the call relationships between different methods. In addition, JProfiler sorts submethods by total execution time, so you can quickly find key methods.

For Producer, the method SendProducerBatchTask.run () takes most of the time to execute. If you keep looking down, you'll see that most of the time it takes to execute the Client.PutLogs () method.

Hotspot

If you have many application methods and many of the sub-methods are running at short intervals, you can use the hotspot view to quickly find performance issues. This view allows you to sort the methods based on various factors such as individual execution time, total execution time, average execution time, and number of calls. The individual execution times are the total execution times of the methods minus the total execution times of all submethods.

In this view, you can see that the following three methods are the slowest to execute. You can see that the three methods Client.PutLogs (), LogGroup.toByteArray (), and SamplePerformance $ 1.run () take the longest to execute individually.

Call graph

After finding the key methods, the call graph view allows you to see all the methods that are directly related to these key methods. This will help you find a solution to your problem and develop an optimal performance optimization policy.

Here you can see that most of the execution time of the method Client.PutLogs () is spent serializing the object. Therefore, the key to optimizing performance is to provide more efficient serialization methods.

Live memory

The live memory view gives you detailed memory allocation and usage, which can help you determine if there is a memory leak.

All Objects The All Objects view shows the number and total size of the various objects in the current heap. As you can see in the following figure, many LogContent objects are created while the application is running.

Allocation Call Tree The Allocation Call Tree view shows the amount of memory allocated to each method in the form of a tree diagram. As you can see, SamplePerformance $ 1.run () and SendProducerBatchTask.run () are consuming a lot of memory.

Allocation Hot Spots If you have many methods, you can quickly see which method has the most objects assigned in the Allocation Hot Spots view.

Thread History The Thread History view shows the state of each thread at different times.

Tasks performed by different threads have different characteristics.

--Thread Pool-1-thread-<M> periodically calls theProducer.send ()method to send data asynchronously. These continued to run while the application was running, but most of them were blocked after that. The cause of this phenomenon is that the Producer's data transmission speed is slower than the data generation speed, and the cache size of each Producer instance is limited. After launching the application, pool-1-thread- <M> remained running for a while because Producer had enough memory to cache the data waiting to be sent. This explains why the CPU usage was high when the application was launched. pool-1-thread-<M> must wait for the producer to free up enough space. As a result, a large number of threads are blocked.

--ʻAliyun-log-producer-0-moverdetects expired batches and sends them to iothreadPool. The data accumulation rate is fast and the producer batch is sent to IOThreadPool bypool-1-thread- ` shortly after the cached data size is reached. As a result, the mover thread remained idle most of the time.

--ʻAliyun-log-producer-0-io-thread- sends data from IOThreadPool to the specified logstore and spends most of its time on network I / O status. --ʻAliyun-log-producer-0-success-batch-handler processes batches successfully sent to the log store. The callback is simple and takes a very short time to execute. As a result, the SuccessBatchHandler remained idle most of the time. --ʻAliyun-log-producer-0-failure-batch-handler` handles batches that fail to send to the log store. In our case, there is no data that failed to be sent. It remained an idol all the time. According to our analysis, the status of these threads is within our expectations.

Detected overhead hotspot

When the application finishes running, JProfiler displays a dialog box that displays frequently called methods with very short execution times. Next time, you can mitigate the impact of JProfiler on your application's performance by configuring the JProfiler agent to ignore these methods.

Overview

Based on JProfiler's diagnostics, the application has no major performance issues or memory leaks. The next optimization step is to improve the serialization efficiency of the object.

References

-Introduction to Jprofiler

Alibaba Cloud is the No. 1 (2019 Gartner) cloud infrastructure operator in the Asia Pacific region with two data centers in Japan and more than 60 availability zones in the world. Click here for more information on Alibaba Cloud. Alibaba Cloud Japan Official Page *

Basic principles and best practices of "J Profiler", a representative Java performance diagnostic tool

background

A brief introduction to Java performance diagnostic tools

Simple command line tool

Comprehensive graphical diagnostic tool

Distributed application performance diagnostics

Introduction of JProfiler

Core components

JProfiler agent

Command line tools

Installation

best practice

JProfiler settings

Data collection mode

Application launch mode

Diagnose application performance using JProfiler

Overview

CPU view

Call tree

Hotspot

Call graph

Live memory

Detected overhead hotspot

Overview

References