The basic knowledge of the JVM was introduced in this article. [JVM] Necessary knowledge for OOM (Out Of Memory) failure handling
I will touch on specific OOM solutions from now on.
From here, when you actually run into an OOM problem, you will learn actions to solve it. Here, we will also introduce tools that can be used concretely. (Refer to the link for how to use it.)
To summarize the problems that occurred this time, It was an event that the java process went down once a week. When I checked the memory usage rate with the sar command, it was used up to around 100% just before it fell. The JVM crash log contained the word OOM.
The following is a summary of the survey policy, survey method, points to note, etc. When I was investigating, I didn't know how to investigate, and I missed important information. I was interpreting incorrect data strangely.
So the spear survey method described here is the best method I can think of at the moment.
You may want to investigate in the following order.
** If you go first, when dealing with OOM problems, the problems may be non-linearly intertwined, so It is necessary to have a steel mentality that calmly formulates hypotheses and steadily verifies them. ** **
** By looking at this log, you can narrow down the OOM problem considerably. ** ** By repeating the hypothesis verification after that, I think that the speed to reach the cause will be much faster, so Please check with the utmost care. (I was fighting without knowing that at first)
By the way, this log is the log that is output when the JVM crashes. If nothing is set in the JVM startup option, it will be output to the current directory. If you want to specify the output destination, you can decide by setting the following startup options.
java -XX:ErrorFile=/var/log/java/java_error%p.log
Now read hs_err_
-There is a problem with the Java heap
Exception in thread "main": java.lang.OutOfMemoryError: Java heap space
Java heap space indicates that an object could not be allocated in the Java heap a. I couldn't allocate an object to the java heap area! Can be interpreted as That is, the capacity of the java heap area is small. → Increasing the capacity will solve the problem.
the message might be an indication that the application is unintentionally holding references to objects There is also. An object has been referenced all the time and is gradually running out of memory without being subject to GC deletion! Can be interpreted as So this is a memory leak → The problem is solved by getting a heap dump, identifying the gradually increasing objects, and recovering the source. See ** 3. If there is a problem with the Java heap, identify the problem from the heap dump ** below
-There is a problem with the Permanent heap
Exception in thread "main": java.lang.OutOfMemoryError: PermGen space
PermGen space indicates that the permanent generation is full. a. There is not enough Permanent area! Can be interpreted as In other words, it is necessary to secure a sufficient area with -XX: MaxPermSize option.
In addition, static variables and classes that are first loaded into the class loader are stored in this area.
-There is a problem with the Java heap.
Exception in thread "main": java.lang.OutOfMemoryError: Requested array size exceeds VM limit
Requested array size exceeds VM limit indicates that the application (or APIs used by that application) attempted to allocate an array that is larger than the heap size. a. The array couldn't be allocated in the Java heap area! Can be interpreted as That is, the capacity of the java heap area is small. → Increasing the capacity will solve the problem. Alternatively, the problem can be solved by taking a heap dump, identifying the gradually increasing objects, and reclaiming the source. See ** 3. If there is a problem with the Java heap, identify the problem from the heap dump ** below
-There is a problem with the C heap
Exception in thread "main": java.lang.OutOfMemoryError: request <size> bytes for <reason>. Out of swap space?
the HotSpot VM code reports this apparent exception when an allocation from the native heap failed and the native heap might be close to exhaustion. a. Memory was not allocated to the native heap (C heap)! Can be interpreted as By the way, it was this problem that I dealt with. When you hit this problem, you can basically assume that the number of threads is the cause. So ** 4. If you have a problem with non-heap, see ** Identify the problem from the number of threads **
For details, refer to the following site.
I think the information so far has narrowed down the problem to some extent. From here, the problem is transformed from a hypothesis to a fact.
As mentioned in the input section, by setting -verbose: gc as a startup option, You can get the gc log. If you look at this log, you can see how the heap area fluctuates as a result of running minor GC and major GC. ** GC Viewer ** is very useful for visualizing and viewing.
This article is useful for how to use and view the tool https://qiita.com/i_matsui/items/aabbdaa169c6ae51ecb3
By visualizing and comparing the results of heap dumps using Memory Analyzer, You'll see if there really is a problem with the Java heap. The following articles are very useful for how to use Memory Analyzer.
The specific method is -Get a heap dump. (Described in the input section)
Once you know which object is the problem, modify the source. However, if the heap area is insufficient, you can increase it.
The number of threads can be compared by comparing the thread dumps.
#pid confirmation
jcmd -l
#Get thread dump
jstack <pid> > threaddump.txt
Also, by using jconsole, you can visualize the transition of the number of threads, so You can see if the number of threads is increasing in proportion to the time.
I haven't used it, but I'll introduce it because it seems to be very useful. https://github.com/irockel/tda
You can see the number of threads and memory usage with the following command, so you can compare whether it is increasing over time.
ps auxww -L | grep -e java -e PID | grep -v grep
I referred to this article. http://d.hatena.ne.jp/rx7/20101219/p1
Request: I think there are other better ways. Please let me know if you know.
One of the ideas of Linux is a mechanism to actively use free memory. The annoyance of this is not visible with the ps command. For example, if you look at the memory usage of a process with ps aux, the java process is using 30% memory. However, when looking at the usage rate of the entire memory with sar -r 1, it is about 90%.
If this is happening, it is likely that it is being used for caching. By the way, about 60% of the memory was used for the page cache at my time as well.
#Clear all page cache
# echo 1 > /proc/sys/vm/drop_caches
You can delete the page cache by the above method, but it is subtle because it also deletes the necessary cache. At worst, regular execution with cron solves the problem of memory being used for page cache.
Of course, you should also investigate the cause of why the page is cached. You can suspect that a large amount of logs are being output (I / O is occurring).
The following articles will be helpful https://tech.mercari.com/entry/2015/07/16/170310
so. A murderer lurks in Linux. In order to prevent panic when Linux runs out of memory There is a specification to forcibly kill a process that is using memory.
When a process is killed by OOM killer, which process was killed is output to the following file.
less /var/log/messages
02:53:58 xxxxx1 kernel: Out of memory: Killed process 28036, UID 80, (xmllint).
Also, the log that appears when the JVM crashes, the pid of hs_err_
The following articles will be helpful. https://blog.ybbo.net/2013/07/10/oom-killer%E3%81%AB%E3%81%A4%E3%81%84%E3%81%A6/
-If there are environmental differences such as the local environment and the verification environment, it may not be possible to reproduce. Behavior changes depending on the OS such as Linux (Because there may be a problem other than the java process)
-Be careful about the timing when you get a heap dump for batch processing. For example, if you take a heap dump of a java process that runs regularly when the load is high, Of course, there are too many objects to compare well. Therefore, when comparing, you should get a heap dump under the same conditions and conditions and compare.
・ Do not conduct surveys without hypotheses or policies I think this is the most important thing. As I research, I research various places that I am interested in and look at graphs. It's almost meaningless, so you should stop. I sometimes do it.
We have summarized how to identify specific problems when OOM problems occur and how to solve them. I hope it helps people who are suffering from OOM problems like me.
Also, since I just summarized what I investigated, methods other than those introduced here, I think there are more perspectives. If you know it, please let me know.