Java hang up?

Introduction

Is Java on the server side hanging vaguely with no response to access from the browser? May be asked. Why are you suspicious of Java VM? I would like to say </ FONT>, but I think that is the reason why I want to proceed with the investigation in the following order.

  • Requires the opinion of the support staff as Java VM or middleware running on it
  • If there is no problem, the answer to that effect is OK
  • With the opinion of the support staff, I would like to contact the network or database staff as needed.

I will post a common example of this situation.

Java thread dump

It can be said that the Java VM itself is not hung because the Java VM function is working properly when the Java thread dump can be obtained, but let's take a look at the contents first.

** Thread dump of problem part **

"default task-4" #116 prio=5 os_prio=0 tid=0x0000000003ba3800 nid=0x7894 runnable [0x00007f58c80c7000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at oracle.net.ns.Packet.receive(Packet.java:311)
        at oracle.net.ns.DataPacket.receive(DataPacket.java:105)
        at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:305)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:249)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:171)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:89)
        at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123)
        at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79)
        at oracle.jdbc.driver.T4CMAREngineStream.unmarshalUB1(T4CMAREngineStream.java:426)
        at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:390)
        at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:249)
        at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:566)
        at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:202)
        at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:45)
        at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:766)
        at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:897)
        at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1026)
        at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1244)
        - locked <0x00000000fcd35220> (a oracle.jdbc.driver.T4CConnection)
        at oracle.jdbc.driver.OracleStatementWrapper.executeQuery(OracleStatementWrapper.java:420)
        at org.jboss.jca.adapters.jdbc.WrappedStatement.executeQuery(WrappedStatement.java:397)
        at abc.doGet(abc.java:37)
...Omission...

In this example, when you access the servlet called abc, you will not get a response, but if you look at the stack related to that abc, it looks like the above. From bottom to top, you can see from doExecuteWithTimeout in jdbc to NetInputStream.read and socketRead0, waiting for a read. Waiting a few tens of seconds and getting a thread dump again does not change this situation. It seems that the jdbc timeout has been reached due to somehow doExecuteWithTimeout. In other words, the fifth operation below does not proceed to the next process while waiting.

  1. Request HTTP GET from browser to server
  2. The Servlet throws SQL to oracle with jdbc to create the response content.
  3. The SQL response was not obtained within the query-timeout time.
  4. Issue a cancel to jdbc with statement.cancel as a timeout process
  5. Waiting for the response to the cancellation.
  6. If a response to cancellation is obtained, it will be treated as SQLTimeoutException and the exception will be handled ... Maybe.

** Note: statement.cancel is canceled by issuing it to a DB that responds correctly **

So was there a problem with the database? ?? ?? You might think that.

The view from the database side

From the contents of the thread dump, ask if there is an error on the database side during that time period ... ** No abnormalities are found. ** What do you suspect DB in the first place? I was told </ FONT>.

Is the cancellation packet being sent?

If the database is okay, you're wondering if statement.cancel is really executed and sent as a packet to the database. So, let's reproduce the problem and capture the packet for confirmation.

** On the system side running Java **

statement.Packet corresponding to cancel

13:14:08.948939 IP rhel74.38860 > 192.168.1.25.1521: Flags [P.U], seq 4510:4511, ack 5404, win 44020, urg 1, length 1
..s....
        0x0010:  c0a8 0119 97cc 05f1 f070 d6a2 c3a2 111d  ......-..p......
        0x0020:  5038 abf4 cdeb 0001 21                   P8......!

Notice the Flags [P.U], which have the P.U or PUSH and URG flags set. And the user data is one character of "!". From this content, you can see that statement.cancel is executed and the cancellation request is sent correctly.

** On the receiving side? ** ** When I take a packet capture and check it on the database side system, the packet corresponding to the above cannot be found. It is like that have been discarded somewhere because it has passed through the plurality of network devices Yara NAT.

  • In my experience, I have seen a pattern in which only the TCP URG flag was dropped. The oracle database side does not respond to "!" Packets that do not have a URG set.

In this way, if the statement cancellation to the database is not reached as expected, it will not be possible to time out and cancel the SQL, which takes a very long time, resulting in endless response. I will.

The problem is in the network

I suspect that some mechanism, such as NAT or firewall in the network path, is not handling this URG flagged packet correctly. Therefore, based on the captured packet capture, I think that the person in charge of the network will be inquired that the packet containing this URG flag has not arrived. Also What do you suspect the network? You might be called </ FONT>, but ...

There are many different errors when a problem surface.

In the case of such a pattern waiting for a response from jdbc and cancellation is not effective, it is not always the case that there is only one type of event. For example, after starting a transaction, JTA (Java Transaction API) can be set to recover an incomplete transaction. If the recovery operation is set to try every 30 minutes until 24 hours have passed, even if the recovery operation is tried, it will wait with the same cancellation response. Every 30 minutes, the number of recovery threads increases, and as a result many threads are finally created. Too Many open files error and resource error such as Unable to create new native thread. It may lead to. Also, since it will not give up until 24 hours have passed, the same operation will be repeated even if it is restarted.

Based on these points, I think it is important to check the passage of time and the contents of the thread dump to understand what is happening.

We hope you find this useful.

Recommended Posts