A memo that experimented whether a thread that ran out of control in an infinite loop in Java could be forcibly stopped from the outside

--While studying Java programming, a thread went out of control with 100% CPU usage due to a logic bug. ――It is a logic bug when processing certain special data, and only one thread that received the data is out of control. The other threads continue to operate normally. ――Therefore, I investigated and experimented with this memo to see if only the runaway thread could be forcibly stopped from the outside. --In this memo, the part where the CPU usage rate is 100% is not reproduced, and the experiment was performed with a loop that simply ignores ʻInterruptedException in Thread.sleep () `.

Conclusion

Conclusion as of 2018-03-21:

--Unless you can attach it later with jdb, it seems better to think that there is no way to terminate only a specific Java thread that has run away in an infinite loop from the outside. --It is possible to call after attaching with gdb the same process as attaching with jdb and throwing an Exception, but it will be very implementation-dependent and not realistic. --It seems better to give up that there is no choice but to forcibly terminate the entire Java process, including other normal threads. --In the first place, it is better to make an effort to write code that ends properly by the method recommended by Java, without creating a bug that causes threads to loop infinitely. --Reference: https://docs.oracle.com/javase/jp/8/docs/technotes/guides/concurrency/threadPrimitiveDeprecation.html --Even if an infinite loop should occur, the Java process should be forcibly terminated → restarted to create a design and architecture with redundancy and fault tolerance so that service provision can be continued. ――Dangerous processing that is likely to cause an infinite loop is converted into a microservice as a separate process and provided with redundancy.

Experiment environment

Experimented with two types of CentOS6 / 7 (both x86_64 version). Start with GCP's Compute Engine and install the following packages.

sudo yum install -y java-1.8.0-openjdk java-1.8.0-openjdk-devel
sudo yum groupinstall -y "Development tools"

Java version details (when checking 2018-03-21, it was displayed the same as CentOS 6/7 environment)

$ java -version
openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)

$ javac -version
javac 1.8.0_161

Infinite Loop Sample: InfiniteLoop.java

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;

public class InfiniteLoop {
    static class Looper implements Runnable {
        @Override
        public void run() {
            int cnt = 0;
            while (true) {
                //Infinite loop ignoring interrupts
                try {
                    System.out.println("InifiniteLoop-Looper count " + cnt);
                    Thread.sleep(2000);
                } catch (InterruptedException ignored) {
                }
                cnt++;
            }
        }
    }

    public static void main(String[] args) {
        ExecutorService pool = Executors.newSingleThreadExecutor();
        Future<?> f = pool.submit(new Looper());
        try {
            // Future.get()To wait for the task to finish
            f.get();
        } catch (Throwable t) {
            t.printStackTrace();
        }
    }
}

compile:

$ javac InfiniteLoop.java

Experiment 1: Attach with jdb and terminate the thread

reference:

Execute the InfiniteLoop class with the JVM option to attach with jdb later.

$ java -Xrunjdwp:transport=dt_socket,address=9000,server=y,suspend=n InfiniteLoop

Get the java PID from another terminal and check the Java stack trace with jstack.

$ pidof java
13667

$ jstack 13667
2018-03-21 07:52:59
Full thread dump OpenJDK 64-Bit Server VM (25.161-b14 mixed mode):
(...)

"pool-1-thread-1" #10 prio=5 os_prio=0 tid=0x00007f0d840f3b30 nid=0x356f waiting on condition [0x00007f0d6d8c8000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at InfiniteLoop$Looper.run(InfiniteLoop.java:14)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

(...)

From the stack trace, we can see that the thread named pool-1-thread-1 is in an infinite loop.

Now that we know the thread name, we start jdb, attach it to the JVM, and terminate the thread in the loop.

$ jdb -attach 9000
Set uncaught java.lang.Throwable
Set deferred uncaught java.lang.Throwable
Initializing jdb ...

(Display thread list)
> threads
Group system:
  (java.lang.ref.Reference$ReferenceHandler)0x1b9 Reference Handler cond. waiting
  (java.lang.ref.Finalizer$FinalizerThread)0x1ba  Finalizer         cond. waiting
  (java.lang.Thread)0x1bb                         Signal Dispatcher running
Group main:
  (java.lang.Thread)0x1                           main              cond. waiting
  (java.lang.Thread)0x1bd                         pool-1-thread-1   sleeping

(Thread name"pool-1-thread-1",Make sure 0x1bd is the thread in the loop.)

(Thread you want to stop:Suspend 0x1bd and kill while throwing Exception after stepping)
> thread 0x1bd
pool-1-thread-1[1] suspend 0x1bd
pool-1-thread-1[1] step
>
Step completed: "thread=pool-1-thread-1", InfiniteLoop$Looper.run(), line=16 bci=33
16                    }

pool-1-thread-1[1] kill 0x1bd new java.lang.Exception("kill from jdb")
killing thread: pool-1-thread-1
pool-1-thread-1[1] instance of java.lang.Thread(name='pool-1-thread-1', id=445) killed

(Exit the debugger after restarting with cont)
pool-1-thread-1[1] cont
> exit

The execution result of Infinite Loop was affected by the operation of jdb as follows.

$ java -Xrunjdwp:transport=dt_socket,address=9000,server=y,suspend=n InfiniteLoop
(...)
InifiniteLoop-Looper count 32
InifiniteLoop-Looper count 33
InifiniteLoop-Looper count 34
(Run suspend from jdb->count display stops)
(When killing from jdb, the following is displayed)
java.util.concurrent.ExecutionException: java.lang.Exception: kill from jdb
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at InfiniteLoop.main(InfiniteLoop.java:27)
Caused by: java.lang.Exception: kill from jdb
        at InfiniteLoop$Looper.run(InfiniteLoop.java:16)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
(When you exit jdb, you will see the following)
Listening for transport dt_socket at address: 9000
(Ctrl-End with C)

It can be seen that the Runnable task is terminated by killing from jdb, and the Future.get () wait is released by throwing an ExecutionException. You can also see that the cause of the ExecutionException is the exception thrown by killing jdb.

Experiment 2: Attach with gdb and kill the thread with pthread_kill (2)

--Java threads are (apparently) realized by pthreads in Linux. --In that case, like pthread, the pthread of a certain process cannot be terminated by a signal from an external process. --However, you can attach it to a process with gdb and call pthread_kill (2) to kill it by sending a signal "from inside". --I experimented with a simple example (SIGKILL = 9) to see if that approach is possible with Java threads.

reference:

Execute the InfiniteLoop class without any debugging options.

$ java InfiniteLoop

Check the location of the java executable:

$ which java
/usr/bin/java
$ ls -l /usr/bin/java
(...) /usr/bin/java -> /etc/alternatives/java
$ ls -l /etc/alternatives/java
(...) /etc/alternatives/java -> /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java
$ ls -l /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java
(...) /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java

Check the Java PID and get a stack trace.

$ jstack `pidof java`
2018-03-21 08:28:38
Full thread dump OpenJDK 64-Bit Server VM (25.161-b14 mixed mode):
(...)

"pool-1-thread-1" #8 prio=5 os_prio=0 tid=0x00007ffb500ed000 nid=0x6a05 waiting on condition [0x00007ffb54146000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at InfiniteLoop$Looper.run(InfiniteLoop.java:14)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
(...)

In Linux, nid = 0x ... of the thread information displayed by jstack is the LWP ID specified by pthread_kill (2).

Attach with the gdb executable name PID and execute the ʻinfo threads` command.

$ gdb /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java `pidof java`

(gdb) info threads
(...)
  4 Thread 0x7ffb54248700 (LWP 27140)  0x0000003e2020ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3 Thread 0x7ffb54147700 (LWP 27141)  0x0000003e2020ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2 Thread 0x7ffb40fff700 (LWP 27164)  0x0000003e2020eb2d in accept () from /lib64/libpthread.so.0
* 1 Thread 0x7ffb57342700 (LWP 27131)  0x0000003e202082fd in pthread_join () from /lib64/libpthread.so.0

The result of executing the ʻinfo threadscommand with gdb is a decimal number display. If you look for27141, which is the decimal number of nid = 0x6a05` of the thread of the infinite loop displayed this time, the third thread from the bottom corresponds to it.

  3 Thread 0x7ffb54147700 (LWP 27141)  0x0000003e2020ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

Now we call call pthread_kill (pthread_t, int) ... ** What should I specify for the first argument? ** ʻinfo threadsWhich of the corresponding lines in the execution result should be specified? ** Also, what should be specified for the signal of the second argument to be equivalent to Java'sThread.stop ()`? ** **

** In the first place, how can I call the exact same behavior as when a thread terminates in Java from within the process attached with gdb? ** **

→ This is a complete stalemate. Even if I tried to call it properly, the Java process ended with SIGSEGV, and although it was not so much, it did not result in "isolating and terminating only the runaway thread".

Even if gdb can reproduce the process of raising an exception from the inside like when terminating with jdb by reading the Java source code or parsing the system call with strace, it is probably a considerable internal implementation. Isn't it dependent and not stable and easy to reproduce in the actual operating environment?

Summary as of 2018-03-21

--The most reliable way to terminate only a specific Java thread that has run away in an infinite loop is to attach it with jdb and throw an exception from inside. --In principle, it seems possible to attach with gdb and send a signal from the inside with call pthread_kill () or call raise () to terminate it, but in this experiment, you can find out the correct procedure. I couldn't verify it. --Even if it can be done, the situation will be different from the thread termination in the original JVM, so it is highly likely that the JVM itself will terminate abnormally. --Also, even if you perform the same processing as throwing an exception from the inside, it seems that it will be affected by the symbol table and memory layout, so it is unlikely that it will be a procedure that can be easily reused in the daily operating environment. .. --From the above, we conclude that it is difficult to terminate only a specific Java thread that has run away unless it is a Java process that can be attached with jdb from the beginning, and it seems better to give up. ――Rather than that, it is better to do the test properly and try not to embed a bug that causes runaway in the first place. --As you can see in the Java documentation, it's also important to make efforts to develop threads properly.

It wasn't organized neatly, but that's it.

Recommended Posts

A memo that experimented whether a thread that ran out of control in an infinite loop in Java could be forcibly stopped from the outside
A program (Java) that outputs the sum of odd and even numbers in an array
Sample program that returns the hash value of a file in Java
The story that the Servlet could not be loaded in the Java Web application
I didn't know that inner classes could be defined in the [Java] interface