java core: Can't reach safepoint and hang? !!

JVM hang-case where thread dump cannot be output

It is ambiguous to determine what caused the hang, but in general, for the event "JVM hung?", Kill -QUIT does not output a thread dump, or a thread dump. Is output, but the response to the request is divided into the phenomenon that it does not return no matter how long you wait. `This post describes a pattern that does not output a thread dump either. `` If a thread dump is output, Java hang up? Please also refer to it.

safepoint

When the Java VM outputs a thread dump or performs a Full GC, it processes all currently running Java Threads to a location called safepoint, and then performs the desired operation. When the Java VM hangs because it cannot output a thread dump, it is often the case that this safepoint cannot be reached, and checking the status of each thread helps identify the cause.

Check SafepointStatus for each Java thread

Excuse me for the miso, but I created a gdb python script that displays the SafepointStatus for each JavaThread. https://github.com/takimai39/gdb-java

View the core of the hung JVM

Let's run the code at https://bugs.openjdk.java.net/browse/JDK-8064749 to create a situation where it actually hangs.

$ java -XX:-UseCompilerSafepoints Stuck
^Z
[1]+Stop java-XX:-UseCompilerSafepoints Stuck
$ kill -QUIT %1

[1]+Stop java-XX:-UseCompilerSafepoints Stuck
$ fg
java -XX:-UseCompilerSafepoints Stuck

Suspend with CTRL + Z, send kill -QUIT, and return with fg, but no thread dump is spit out. Get the core and check the status of safepoint.

$ jps
29617 Stuck
29657 Jps
$ gcore 29617
...
Saved corefile core.29617
$ gdb /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131-11.b12.el7.x86_64/bin/java ./core.29617
...
(gdb) source sp.py
(gdb) safepoint
--------------------------------------------------------------------
Status:                 SafepointSynchronize::_synchronizing
waiting to block:       1
LWP     Java Thread Name        Status
--------------------------------------------------------------------
LWP 29618 DestroyJavaVM          ThreadSafepointState::_at_safepoint
LWP 29630 Timer                  ThreadSafepointState::_at_safepoint
LWP 29629 Worker                 ThreadSafepointState::_running
LWP 29627 Service Thread         ThreadSafepointState::_at_safepoint
LWP 29626 C1 CompilerThread1     ThreadSafepointState::_at_safepoint
LWP 29625 C2 CompilerThread0     ThreadSafepointState::_at_safepoint
LWP 29624 Signal Dispatcher      ThreadSafepointState::_at_safepoint
LWP 29623 ?                      ThreadSafepointState::_at_safepoint
LWP 29622 Reference Handler      ThreadSafepointState::_at_safepoint
(gdb)

Since SafepointSynchronize :: _state is _synchronizing, the JVM is trying to keep all JavaThreads in safepoint. However, since waiting to block is 1, one thread has not yet reached the safepoint. Of course, JVM features that need to go to the safepoint as well, such as thread dumps, will not work in this situation. In this case it is causing the LWP 29629 thread to hang.

Check the stack with OpenJDK unwinder

https://qiita.com/takimai39/items/a78b7a64a501d77efed8 Let's check the stack of the target thread (LWP 29629) with this OpenJDK unwinder.

(gdb) source dbg8.py
Installing openjdk unwinder
(gdb) source sp.py
(gdb) info thread
  Id   Target Id         Frame 
  14   Thread 0x7f373cd10700 (LWP 29630) 0x00007f3756e55945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  13   Thread 0x7f373ce11700 (LWP 29629) 0x00007f3741109660 in ?? ()
  12   Thread 0x7f373cf12700 (LWP 29628) 0x00007f3756e55cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  11   Thread 0x7f373d013700 (LWP 29627) 0x00007f3756e55945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  10   Thread 0x7f373d114700 (LWP 29626) 0x00007f3756e55945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  9    Thread 0x7f373d215700 (LWP 29625) 0x00007f3756e55945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  8    Thread 0x7f373d316700 (LWP 29624) 0x00007f3756e55945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  7    Thread 0x7f373d417700 (LWP 29623) 0x00007f3756e55945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  6    Thread 0x7f373d518700 (LWP 29622) 0x00007f3756e55945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5    Thread 0x7f373d619700 (LWP 29621) 0x00007f375653ae47 in sched_yield () from /lib64/libc.so.6
  4    Thread 0x7f3740a02700 (LWP 29620) 0x00007f3756e55945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3    Thread 0x7f3740b03700 (LWP 29619) 0x00007f3756e55945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2    Thread 0x7f375726d700 (LWP 29618) 0x00007f3756e55945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1    Thread 0x7f375726e740 (LWP 29617) 0x00007f3756e52f57 in pthread_join () from /lib64/libpthread.so.0
(gdb) thread 13
[Switching to thread 13 (Thread 0x7f373ce11700 (LWP 29629))]
#0  0x00007f3741109660 in ?? ()
(gdb) bt
#0  0x00007f3741109660 in [compiled offset=0x40] Stuck$Worker.run() () at Stuck.java:35
#1  0x00007f37410004e7 in StubRoutines (1) ()
#2  0x00007f3755b12b4a in JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*) (result=0x7f373ce10dc0, m=<optimized out>, args=<optimized out>, __the_thread__=0x7f37500ea800) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.131-11.b12.el7.x86_64/openjdk/hotspot/src/share/vm/runtime/javaCalls.cpp:406
#3  0x00007f3755b0ffe4 in JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*) (__the_thread__=0x7f37500ea800, args=0x7f373ce10d30, method=<error reading variable: access outside bounds of object referenced via synthetic pointer>, result=0x7f373ce10dc0)
    at /usr/src/debug/java-1.8.0-openjdk-1.8.0.131-11.b12.el7.x86_64/openjdk/hotspot/src/share/vm/runtime/javaCalls.cpp:307
#4  0x00007f3755b0ffe4 in JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*) (result=result@entry=0x7f373ce10dc0, spec_klass=..., name=<optimized out>, signature=<optimized out>, args=args@entry=0x7f373ce10d30, __the_thread__=__the_thread__@entry=0x7f37500ea800)
    at /usr/src/debug/java-1.8.0-openjdk-1.8.0.131-11.b12.el7.x86_64/openjdk/hotspot/src/share/vm/runtime/javaCalls.cpp:204
#5  0x00007f3755b105f9 in JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*) (result=result@entry=0x7f373ce10dc0, receiver=..., spec_klass=..., name=<optimized out>, signature=<optimized out>, __the_thread__=__the_thread__@entry=0x7f37500ea800)
    at /usr/src/debug/java-1.8.0-openjdk-1.8.0.131-11.b12.el7.x86_64/openjdk/hotspot/src/share/vm/runtime/javaCalls.cpp:210
#6  0x00007f3755b55301 in thread_entry(JavaThread*, Thread*) (thread=<optimized out>, __the_thread__=0x7f37500ea800)
    at /usr/src/debug/java-1.8.0-openjdk-1.8.0.131-11.b12.el7.x86_64/openjdk/hotspot/src/share/vm/prims/jvm.cpp:2974
#7  0x00007f3755ef4c72 in JavaThread::thread_main_inner() (this=0x7f37500ea800)
    at /usr/src/debug/java-1.8.0-openjdk-1.8.0.131-11.b12.el7.x86_64/openjdk/hotspot/src/share/vm/runtime/thread.cpp:1710
#8  0x00007f3755d4be12 in java_start(Thread*) (thread=0x7f37500ea800)
    at /usr/src/debug/java-1.8.0-openjdk-1.8.0.131-11.b12.el7.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:790
#9  0x00007f3756e51e25 in start_thread () at /lib64/libpthread.so.0
#10 0x00007f375655634d in clone () at /lib64/libc.so.6
(gdb)

frame # 0 is the 35th line of Stuck.java, which is a tight loop with while () as shown below. It has been compiled into native code by HotSpot because it says compiled. In other words, it is the target operation for which the specification of "-XX: -UseCompilerSafepoints" is effective. (The default is + UseCompilerSafepoints, so you usually won't run into this issue.)

 33         @Override
 34         public void run() {
 35             while (!isDone) { // <--here!
 36                 // burn
 37             }
 38         }

In the case of a JVM hang that cannot output a thread dump, it is possible to find the cause thread and processing contents by checking the status of safepoint for each JavaThread in this way.

Recommended Posts

java core: Can't reach safepoint and hang? !!
Java hang up?
Java and JavaScript
XXE and Java
Getters and setters (Java)
[Java] Thread and Runnable
Java true and false
[Java] String comparison and && and ||
java core: chopped core file
Java --Serialization and Deserialization
[Java] Arguments and parameters
timedatectl and Java TimeZone
[Java] Branch and repeat
[Java] Variables and types
java (classes and instances)
[Java] Overload and override