I didn't design or develop it, and I could barely divide the whole thing by participating in the review, but I don't have time to look at the source from now on, and the person who developed it is not aware of the performance aspect. ** How did you deal with the dark situation where you suddenly blew fire in terms of performance and wanted to do something in a couple of hours after being summoned **? Let's write it in% **.
The first part was an introduction to tools (OS). Defense Techniques When You Have to Fight the Performance of Unfamiliar Applications (Part 1) https://qiita.com/nfujita55a/items/3760cd099ca890f5a4d4
** This second part is an introduction to tools (Java) **, and the second part is an introduction to tools (PostgreSQL). I wrote ** what happened after the process that was identified in the first part ** was identified as Java **.
…… I think there are some strange things, so I'm looking forward to pointing out.
The work pattern and the assumed environment are very legacy. It can be in the cloud (AWS EC2) or on-premise
[Web server Linux Apache]
|
[AP server Linux Java+Tomcat+WEB application, Java+Batch program]
|
[DB server Linux PostgrteSQL]
It is assumed that the configuration is.
** If you are having performance issues with your Java app, get this first. ** Since the status and stack trace of all threads are output to the Java process, how each thread is called and what is being executed at the moment of acquisition is output.
Such output is obtained for all threads in the process. This is an example output from a thread called HogeWorker. The state of the thread is written as "State: RUNNABLE".
"HogeWorer" daemon prio=10 tid=0x00007f5c2706d900 nid=0xdfb runnable [0x00007f5c3d760000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:143)
at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:112)
at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:71)
at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:282)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1741)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
- locked <0x0000000529000980> (a org.postgresql.core.v3.QueryExecutorImpl)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:561)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:419)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:304)
at com.example.a.b.c.DddEee.foo(DddEee.java:234)
at com.example.a.b.c.DddEee.bar(DddEee.java:345)
(Abbreviation com.The following 20 lines are lined up in a row)
at com.example.h.i.j.KkkLll.foo(KkkLll.java:123)
at com.example.h.i.j.KkkLll.baz(KkkLll.java:12)
at java.lang.Thread.run(Thread.java:745)
** Oh, I don't hesitate to get it when it's overloaded or the processing is slow (important). ** Acquire multiple times (3 to 5 times) with an interval of 15 seconds. Take it multiple times to see if they all show the same tendency.
If the thread dump cannot be obtained by the following method (it is not recorded or the command is not returned), there are many cases where the Java process itself is close to OutOfMemoryError and nothing can be done. In such a case, I think it is better to stop the Java process and then recover it. (In many cases, it can only be stopped by # kill -KILL process ID)
** 1. kill -3 process ID **
This is easy if you know the standard main redirect file for the Java process (typical example: Tomcat's catalina.out). However, do not mistake ** -3 **. If you forget this or do -HUP or -TERM or -KILL, the Java process will stop.
# ps auxww | grep java
For the PID of the Java process identified in
# kill -3 Process ID
Get as. If any user of the process runs as the root user, a thread dump will be added to the standard main redirect file of the Java process.
** 2. jstack command **
Some people may feel that the standard output of this batch is somewhere, or that the kill command is a bit ..., so in that case you can also get it with the jstack command. (I feel this is legitimate)
# ps auxww | grep java
Look at the PID and Java path of the Java process identified in, and the execution user of that process.
# su -User running the process
Switch with
$ /path/to/java_home/bin/jps
And check if the PID you were looking at with the ps command is displayed
(In Tomcat, 12345 BootStrap appears, so confirm that 12345 is the process you want to see.)
$ /path/to/java_home/bin/jstack process id
The stack trace is output as a command execution result.
If the Java execution user cannot switch to nobody (running with nobody is better for security than running as root or a user with a shell)
# ps auxww | grep java
Look at the PID and Java path of the Java process identified in, and the execution user of that process.
# su - -m Execution user of the process-c '/path/to/java_home/bin/jps'
And check if the PID you were looking at with the ps command is displayed
# su - -m Execution user of the process-c '/path/to/java_home/bin/jstack process id'
There is also a way to output the stack trace to the command execution result. Since the terminal is filled with the stack trace, it may be better to redirect such as ">" ">>" and write it in the file.
** 3. (Extra edition) Tomcat running as a service on Windows **
Tomcat running on Windows services
You can't get a thread dump in either case, so you can easily get one by pressing Thread Dump from the Tomcat icon in the system tray.
Then, it will be output to the Tomcat installation destination \ logs \ tomcatX-stdout today's date.log.
When looking at thread dumps (especially if you're not familiar with the target system but need to look it up)
I will see it in two stages.
** 1. (Just in case) Is there enough memory in the JVM? **
At the end of the retrieved thread dump, there is output about such Java heap memory.
Heap
PSYoungGen total 94208K, used 25568K [0x000000094aa90000, 0x000000094da80000, 0x0000000800000000)
eden space 90112K, 30% used [0x000000094aa90000,0x000000094b800310,0x000000094d680000)
from space 4096K, 46% used [0x000000094d880000,0x000000094d970000,0x000000094da80000)
to space 4096K, 0% used [0x000000094d680000,0x000000094d680000,0x000000094d880000)
ParOldGen total 732160K, used 587892K [0x0000000680000000, 0x0000000696580000, 0x000000094aa90000)
object space 732160K, 80% used [0x0000000680000000,0x0000000691f0ea58,0x0000000696580000)
Metaspace used 2425K, capacity 4498K, committed 4864K, reserved 1056768K
class space used 262K, capacity 386K, committed 512K, reserved 1048576K
For typical web applications and batch processing, Java's OutOfMemoryError is a shortage of Old space. "Maximum allocation of heap memory by specifying -Xmx of Java process x -XX: NewRatio ÷ (-XX: NewRatio + 1)" is the maximum value of the Old area. (Example: If -Xmx is 4G and -XX: NewRatio is the default 2 of ServerVM, 4GB x 2/3 = 2.66GB)
** Check if ParOldGen> used (used 587892K above) is approaching the maximum value of this Old area. If it's close and the CPU is busy with vmstat, this Java is close to his OutOfMemoryError and determines that he hasn't turned the GC process and devoted his power to the original process **. In such a case, I think it is better to stop the Java process once (add memory to the OS in some cases), increase the maximum Java heap value Xmx, and re-execute. It's awkward to handle half-finished processing situations, but ** it just wastes time waiting for it to improve. ** **
If there seems to be no problem with the memory heap, proceed to investigate the thread.
** 2. Survey of target threads **
In the case of WEB application such as Tomcat (this category because Tomcat is also included in Spring Boot), many threads are output. First of all
Can be ignored. This is a typical thing to ignore.
"http-bio-8080-exec-19" daemon prio=10 tid=0x00008d13e0003800 nid=0x4894 waiting on condition [0x00008d13fc2b7000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000009217dee78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:104)
at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:32)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
"ajp-bio-8009-exec-123" daemon prio=10 tid=0x00008d13ec06d000 nid=0x3dfc runnable [0x00008d13d5ad5000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at org.apache.coyote.ajp.AjpProcessor.read(AjpProcessor.java:312)
at org.apache.coyote.ajp.AjpProcessor.readMessage(AjpProcessor.java:367)
at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:118)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:620)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)
- locked <0x000000092f13a8c0> (a org.apache.tomcat.util.net.SocketWrapper)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
What to look for in order to investigate
(I think you'll understand it as soon as you get used to it ...). The following example, which is the stack trace given earlier, is subject to investigation because it satisfies three conditions (com.example is illustrated as a project-specific package name).
"HogeWorer" daemon prio=10 tid=0x00007f5c2706d900 nid=0xdfb runnable [0x00007f5c3d760000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:143)
at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:112)
at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:71)
at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:282)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1741)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
- locked <0x0000000529000980> (a org.postgresql.core.v3.QueryExecutorImpl)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:561)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:419)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:304)
at com.example.a.b.c.DddEee.foo(DddEee.java:234)
at com.example.a.b.c.DddEee.bar(DddEee.java:345)
(Abbreviation com.The following 20 lines are lined up in a row)
at com.example.h.i.j.KkkLll.foo(KkkLll.java:123)
at com.example.h.i.j.KkkLll.baz(KkkLll.java:12)
at java.lang.Thread.run(Thread.java:745)
** It's an unfamiliar system, so it's no use looking at the contents of the project-specific com.example. ** Instead, when you follow from the top, look for the project-specific com.example and see the above process from the first place (fffGgg.java:234 in the above example).
On the stack trace (* unless also used by JNI)
Since it is arranged in the order of **, the processing content is inferred from "java or javax package / general library" **. The above example is very easy to understand
In other words, specify the processing content as to whether the select ... SQL is thrown to the DB and waiting for a return. If the thread dump is filled with such threads, the investigation target will move to the DBMS. (Judge that Java process is waiting for DB)
Also, if you look closely at the stack trace above, you won't see the connection pooling library (Commns-DBCP or Tomcat's DBCP). We are also looking at what happens to TCP between AP and DB (DBMS may be up and up with the establishment of a new TCP).
"Java is waiting for the return of the DB" is a typical example that you will come across when dealing with the performance of a WEB application, but I will write two more examples to note.
If the stack trace contains the project-specific package name as shown below, the status of the ** thread is BLOCKED **, and there are many such threads, it is exclusively packed.
"ajp-bio-8009-exec-124" daemon prio=10 tid=0x00008d13ec06d000 nid=0x3dfc runnable [0x00009d13d5ad5000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.example.zzz.YyyXxx.foo(YyyXxx.java:345)
- waiting to lock <0x0000000123456789> (a java.lang.Object)
at com.example.zzz.YyyXxx.bar(YyyXxx.java:567)
(Hereafter)
at java.lang.Thread.run(Thread.java:745)
In such a case, if you look for 0x0000000123456789 of "waiting to lock <0x0000000123456789>" in the thread dump, there is a thread that is locked <0x0000000123456789> somewhere (except for deadlock), so check that thread. .. If the thread status is RUNNABLE, investigate its processing. (This is the beginning of the traffic jam). If the status of that thread is also BLOCKED, it is waiting for exclusion in a chain, so if you check it in the same way, you will reach the thread of RUNNABLE which is the head of the traffic jam somewhere. Either you can manage the reason why the leading thread cannot gain performance, or you can modify the program so that it is not exclusive.
This is a terrible pattern that reduces appetite the moment you see it, but as shown below, when the thread status is RUNNABLE and the stack trace starts with regex **, there is a performance problem with the regular expression. .. ** Regular expressions are a combination of strings that match a pattern, and a single match can take a tremendous amount of time **. I've seen that it took 3 hours to match when I put a specific string (80 characters) in a regular expression (about 20 characters) that seems to be nothing. Be especially careful when putting web requests into regular expressions. It's a situation close to ReDos.
At the same time, if there is no other CPU-consuming process in the stack trace and the us is heavily loaded in the vmstat of the AP server that acquired the stack trace, the cause of the problem is certain. This is the area where the program has to be refurbished. (Although there is an idea to cut a request that would be terrible with WAF etc.)
"ajp-bio-8009-exec-123" daemon prio=10 tid=0x00008d13ec06d000 nid=0x3dfc runnable [0x00008d13d5ad5000]
java.lang.Thread.State: RUNNABLE
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$Curly.match1(Pattern.java:4287)
at java.util.regex.Pattern$Curly.match(Pattern.java:4236)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
(Abbreviation)
at com.example.zzz.YyyXxx.foo(YyyXxx.java:345)
(Abbreviation)
(Everyone loves it) I'll write two settings that are easy to use for unfamiliar systems running on Tomcat and are effective when dealing with performance issues. ** Wake up in the middle of the night and it's hard to set this up, so everyone should write it down. ** (Remarks without knowing the destination)
The access log setting of Tomcat is like this by default
xml:Tomcat_8.5.61_server.xml
<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
prefix="localhost_access_log" suffix=".txt"
pattern="%h %l %u %t "%r" %s %b" />
This is the output.
192.168.0.1 - - [19/Dec/2020:22:01:02 +0900] "GET /foo/1 HTTP/1.1" 200 12345
192.168.0.1 - - [19/Dec/2020:22:01:05 +0900] "GET /bar/2 HTTP/1.1" 200 23456
192.168.0.1 - - [19/Dec/2020:22:01:07 +0900] "GET /baz/3 HTTP/1.1" 200 34567
192.168.0.1 - - [19/Dec/2020:22:01:14 +0900] "GET /foo/4 HTTP/1.1" 200 11234
192.168.0.1 - - [19/Dec/2020:22:02:09 +0900] "GET /bar/5 HTTP/1.1" 200 22345
192.168.0.1 - - [19/Dec/2020:22:02:12 +0900] "GET /baz/6 HTTP/1.1" 200 33456
I want to add ** "session ID", "processing time (milliseconds)", and "thread name" **, so I'll do this.
xml:Tomcat_8.5.61_server.xml(After change)
<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
prefix="localhost_access_log" suffix=".txt"
pattern="%h %l %u %t "%r" %s %b %S %D %I" />
This
192.168.0.1 - - [19/Dec/2020:22:01:02 +0900] "GET /foo/1 HTTP/1.1" 200 12345 0123456789ABCDEF0123456789ABCDEF 254 ajp-bio-8009-exec-123
192.168.0.1 - - [19/Dec/2020:22:01:05 +0900] "GET /foo/1 HTTP/1.1" 200 23456 ABCDEF0123456789ABCDEF0123456789 10245 ajp-bio-8009-exec-112
192.168.0.1 - - [19/Dec/2020:22:01:07 +0900] "GET /baz/3 HTTP/1.1" 200 34567 0123456789ABCDEF0123456789ABCDEF 547 ajp-bio-8009-exec-157
192.168.0.1 - - [19/Dec/2020:22:01:14 +0900] "GET /foo/4 HTTP/1.1" 200 11234 0123456789ABCDEF0123456789ABCDEF 286 ajp-bio-8009-exec-114
192.168.0.1 - - [19/Dec/2020:22:02:09 +0900] "GET /baz/3 HTTP/1.1" 200 22345 ABCDEF0123456789ABCDEF0123456789 34578 ajp-bio-8009-exec-118
192.168.0.1 - - [19/Dec/2020:22:02:12 +0900] "GET /baz/6 HTTP/1.1" 200 33456 0123456789ABCDEF0123456789ABCDEF 125 ajp-bio-8009-exec-129
It changes to.
Then, when you look at the access log
Such actions are possible.
That doesn't mean you can always hit a thread dump, so if you run into problems ** set StuckThreadDetectionValve to that Tomcat ** (it's a personal recommendation to have it in before you run into problems). ).
StuckThreadDetectionValve is a great thing from Tomcat 7.0.13 that ** automatically outputs a stack trace of requests that are taking a long time to process **.
In server.xml, change it as follows (because I want to put it inside the Host element), 20 is the number of seconds of the threshold to consider "processing is taking a long time".
However, every 10 seconds, Tomcat checks if "processing is taking a long time", so if you write 20, a stack of requests that takes a minimum of 20 seconds and a maximum of 30 seconds or more. The trace is output, and becomes.
Reference) Detecting request processing delay with Tomcat https://qiita.com/tamura__246/items/dae31a6a7e24c756b756
xml:Tomcat_8.5.61_server.xml
<Host name="localhost" appBase="webapps"
unpackWARs="true" autoDeploy="true">
<!--Abbreviation-->
</Host>
Change like this
xml:Tomcat_8.5.61_server.xml(After change)
<Host name="localhost" appBase="webapps"
unpackWARs="true" autoDeploy="true">
<!--Abbreviation-->
<Valve className="org.apache.catalina.valves.StuckThreadDetectionValve" threshold="20" />
</Host>
Then, the stack trace is output with the thread name. Because it is the WARN level of Tomcat
It will be output around. From this stack trace, we will investigate what the problem handling is doing.
In the second part, we will finally move on to PostgreSQL.
Recommended Posts