Organize Java GC mechanism

I've recently been working on a memory-intensive Java process and needed some knowledge of GC, so I'll summarize what I've investigated. It's an era when I look it up, but I wrote it to organize my thoughts. I would appreciate it if you could point out that there is a possibility that you are misunderstanding.

Note: This article is not about the latest GC affairs, but about the classic method.

JVM First of all, from the basic mechanism of Java. Java processes run on top of virtual machines called JVMs. This mechanism works on various OSs and makes it possible to execute compiled Java code (class files) in various environments without worrying about differences in the environment.

There are several types of JVMs, but this article assumes the HotSpot VM used by OpenJDK. (I don't know the difference from other JVMs)

Heap area

Memory is allocated when you start a Java process. The area that is mainly used is called the heap area, and data that changes dynamically in the process of executing processing is basically allocated here. It is important to optimize the memory usage in this heap area for applications that run in the same process for a long time.

When specifying the size to allocate, specifically, as an option

-Xms20m -Xmx100m

If you specify Initial size of memory allocated to heap area: 20MB Maximum size of memory allocated to heap area: 100MB

It will be the setting.

Unnecessary memory area

If you create an application without being aware of the memory usage status, garbage data (garbage) that is no longer used will be created.

In this article, I will introduce an example of garbage collection because it was explained in an easy-to-understand manner. https://www.oracle.com/webfolder/technetwork/jp/javamagazine/Java-MA16-GC.pdf

For example, if you have the following class

class TreeNode {
    public TreeNode left, right;
    public int data;
    TreeNode(TreeNode l, TreeNode r, int d) {
        left = l; right = r; data = d;
    }
    public void setLeft(TreeNode l) { left = l;}
    public void setRight(TreeNode r) {right = r;}
}

Create a TreeNode by the following process.

TreeNode left = new TreeNode(null, null, 13);
TreeNode right = new TreeNode(null, null, 19);
TreeNode root = new TreeNode(left, right, 17);

This means that the root node is referencing the left and right nodes.

Suppose you add a process to replace the right node here.

root.setRight(new TreeNode(null, null, 21));

Then, the 19th node that was originally in the right node will not be referenced by anyone, and will be in the state shown in the figure below.

In this state, the instance of TreeNode with data = 19 becomes an object that is not referenced by anyone, so it became garbage.

If unused data continues to be generated, useless memory will continue to accumulate and the capacity will eventually reach its limit. In order to prevent this, GC (garbage collection) is required as a mechanism to automatically release wasted memory in the heap area.

Role of GC and heap area

As mentioned earlier, GC is a mechanism for freeing memory that is no longer needed. It scrutinizes the data in memory, leaves it as valid data if there is a reference, and releases it if there is no reference, judging that it is unnecessary. However, it is inefficient to simply scrutinize all the memory space, so it is managed internally according to the existence period of the data.

Young data is called Young Generation, old data is called Old Generation, and data that is known to be less likely to change is called Permanent Generation.

heap memory (4).png

Basically, memory allocation occurs frequently, but the idea that most of them do not live long is divided into new data (Young Generation) and long-term referenced data (Old Generation). This makes it possible to efficiently check only the data contained in Young Generation by targeting the GC.

There is also an area called Permanent Generation, which stores information about loaded classes that is guaranteed to remain the same to some extent.

Reference: Open JDK documentation

GC cycle

There are multiple GC algorithms.

And so on.

This time, I will explain the method used in Serial GC and Parallel GC.

The Young area of the heap area is divided into Eden and Survivor as shown in the figure below, and GC is performed by making good use of each area. heap memory (7).png

Each area has the following roles.

Minor GC

A GC that targets only the Young generation is called a minor GC. It has the following features.

--Processing time is short --Occurs when Eden is full --Move (promote) to Old when it becomes a GC target a certain number of times --Stop the world during GC

It is difficult to explain in words, so I will explain it with a figure.

A minor GC will occur when new memory is allocated and Eden is full. Unreferenced data will be deleted, but valid data will be copied to the Survivor area. Also, the Eden area will be completely empty. heap memory (11).png

Furthermore, if Eden becomes full again in this state, minor GC will occur again and the result will be as shown in the figure below. heap memory (9).png

This time I entered Survivor 2 after GC. In the Survivor area, the data will be copied to whichever is free, and 1 and 2 will be moved back and forth. Also, like Eden, data that is not referenced from the Suvivor area will be deleted.

Next is promotion to Old. Every time a GC occurs, Young's data is recorded the number of times, and when it exceeds a certain number of times, it moves to Old. heap memory (10).png

By repeating GC many times in this way, the movement from Young to Old occurs. This number can be specified as an option, so you can control how often you go to Old.

-XX:MaxTenuringThreshold=N

FullGC I understand how data is transferred from Young to Old, but with this alone, the capacity of Old will continue to increase, and the capacity will be limited somewhere. That's where FullGC comes in. FullGC occurs when the allocation to Old fails and cleans the memory including both Old and Young. heap memory (12).png

This frees up space that is no longer needed in the Old area and allows you to copy the data that was in the Survivor area.

As with minor GC, the application will stop during Full GC. Moreover, since the stop time is long because Old is included, it seems important to suppress the occurrence by using the memory as much as possible in the Young area.

Summary

--Minor GC occurs when the Eden area is full --Free the Young area by minor GC and promote to Old if the conditions are met --FullGC occurs when the Old area is full --FullGC frees the Old area and secures space for promotion

It turned out that it was a cycle.

Finally

I tried to organize the basic mechanism of GC in Java. I was confused by the fact that there are various types of JVM and what I said in each article is slightly different, but I intend to summarize it based on the sources of Oracle and Open JDK. Now that the GC algorithm selected by default is G1GC, I'll summarize it next. First, I introduced the underlying memory release mechanism.

reference

Open JDK Document Oracle Document JAVAMAGAZINE / Open JDK and new garbage collector JVM GC Algorithm and Tuning

Recommended Posts

Organize Java GC mechanism
A note about Java GC
Java GC method determination conditions
Java reference mechanism (stack and heap)
Java
Java
Getting started with the JVM's GC mechanism
Console input in Java (understanding the mechanism)
[Java] Calculation mechanism, operators and type conversion