When I was looking at Cloudwatch, the CPU load was high when the instance was started, so I was asked to check it.
There are many ways to do it if you look it up, but for your own memorandum.
Red Hat Enterprise Linux Server release 7.5 (On AWS EC2)
Is the CPU really overloaded? Isn't it a CloudWatch bug? So check with the sar command. If the environment does not include the sar command, install it.
$ sudo yum install sysstat
By the way, this time it happened in the production environment, but in the production environment it is rare that you can log in directly, so you can also get the production log and see the sa file.
$sar -f <File path>
09:10:02 CPU%user %nice %system %iowait %steal %idle
09:20:01 all 3.97 0.00 0.35 0.00 0.01 95.67
09:30:01 all 4.01 0.00 0.36 0.00 0.01 95.62
09:40:02 all 3.87 0.00 0.42 0.00 0.01 95.69
09:50:01 all 4.05 0.00 0.35 0.00 0.01 95.59
10:00:01 all 4.04 0.00 0.37 0.00 0.01 95.58
10:10:01 all 3.95 0.00 0.35 0.00 0.01 95.69
10:20:02 all 3.98 0.00 0.36 0.00 0.01 95.65
10:30:01 all 4.04 0.00 0.36 0.00 0.01 95.59
10:40:01 all 3.81 0.00 0.41 0.00 0.01 95.76
10:50:02 all 3.96 0.00 0.36 0.00 0.01 95.67
11:00:01 all 4.14 0.00 0.35 0.00 0.01 95.49
11:10:01 all 3.94 0.00 0.35 0.00 0.01 95.71
You can see the result like this. When I look at it, I can't see that the load is particularly high.
When looking at cloudwatch, the same phenomenon can be seen in the staging environment, so reproduce it in the staging environment. Also, 10 minutes is too wide, so let's try 1 minute.
$ vi /etc/cron.d/sysstat
Change the following \ * / 10 to \ * / 1
# Run system activity accounting tool every 10 minutes
*/10 * * * * root /usr/lib64/sa/sa1 1 1
# 0 * * * * root /usr/lib64/sa/sa1 600 6 &
# Generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib64/sa/sa2 -A
Reboot the OS and check
19:53:01 CPU%user %nice %system %iowait %steal %idle
19:54:01 all 68.91 0.00 3.04 0.05 0.01 28.00
19:55:01 all 2.93 0.01 0.41 0.01 0.02 96.63
19:56:01 all 8.20 0.00 0.86 0.02 0.01 90.91
19:57:01 all 4.65 0.00 0.34 0.01 0.01 94.99
19:58:01 all 0.26 0.00 0.07 0.00 0.01 99.67
19:59:01 all 4.97 0.01 0.36 0.00 0.02 94.65
20:00:01 all 4.93 0.00 0.39 0.00 0.01 94.67
20:01:01 all 3.11 0.00 0.54 0.00 0.01 96.34
20:02:01 all 4.97 0.00 0.38 0.00 0.01 94.64
20:03:01 all 4.87 0.00 0.34 0.00 0.01 94.78
Yes. The CPU is under heavy load.
First, identify processes with high CPU load
$ ps -aux
You can see the CPU consumption rate by looking at the% CPU column. This time tomcat was 140 or something. (There is also a way to check with the top command, but I used the ps command because it was hard to see this time)
Then check which thread is under load.
$ ps -L aux
When this command is typed, the following output is output (taken when the CPU load has settled down).
USER PID LWP %CPU NLWP %MEM VSZ RSS TTY STAT START TIME COMMAND
test 1007 1016 0.4 56 16.8 6749028 1299196 ? Sl 16:52 0:01 /usr/bin/java -Djava.util.logging.config.file=/opt/test/tomcat/conf/logging.properties -Djava
test 1007 1029 2.2 56 16.8 6749028 1299196 ? Sl 16:52 0:05 /usr/bin/java -Djava.util.logging.config.file=/opt/test/tomcat/conf/logging.properties -Djava
test 1007 1032 1.6 56 16.8 6749028 1299196 ? Sl 16:52 0:04 /usr/bin/java -Djava.util.logging.config.file=/opt/test/tomcat/conf/logging.properties -Djava
test 1007 1053 1.9 56 16.8 6749028 1299196 ? Sl 16:52 0:04 /usr/bin/java -Djava.util.logging.config.file=/opt/test/tomcat/conf/logging.properties -Djava
test 1007 1057 0.0 56 16.8 6749028 1299196 ? Sl 16:52 0:00 /usr/bin/java -Djava.util.logging.config.file=/opt/test/tomcat/conf/logging.properties -Djava
Make a note of the heavily loaded PID and LWP here.
I knew it, but the high-load process is Jaba! is. So let's check the thread dump: airplane:
sudo -u test jstack -l <High load PID> > /tmp/jstack
In this way you get a thread dump. If you take it once every few minutes, you can see the change.
Looking in / tmp / jstack, the output is as follows
"localhost-startStop-1" #17 daemon prio=5 os_prio=0 tid=0x00007f6fe0001db0 nid=0x5ee runnable [0x00007f700c062000]
java.lang.Thread.State: RUNNABLE
at org.springframework.asm.Type.getReturnType(Type.java:386)
at org.springframework.core.type.classreading.AnnotationMetadataReadingVisitor.visitMethod(AnnotationMetadataReadingVisitor.java:81)
at org.springframework.asm.ClassReader.readMethod(ClassReader.java:1061)
at org.springframework.asm.ClassReader.accept(ClassReader.java:729)
at org.springframework.asm.ClassReader.accept(ClassReader.java:527)
at org.springframework.core.type.classreading.SimpleMetadataReader.<init>(SimpleMetadataReader.java:64)
at org.springframework.core.type.classreading.SimpleMetadataReaderFactory.getMetadataReader(SimpleMetadataReaderFactory.java:102)
at org.springframework.core.type.classreading.CachingMetadataReaderFactory.getMetadataReader(CachingMetadataReaderFactory.java:102)
- locked <0x00000006f3a459d8> (a org.springframework.core.type.classreading.CachingMetadataReaderFactory$1)
at org.springframework.context.annotation.ClassPathScanningCandidateComponentProvider.findCandidateComponents(ClassPathScanningCandidateComponentProvider.java:287)
at org.springframework.context.annotation.ClassPathBeanDefinitionScanner.doScan(ClassPathBeanDefinitionScanner.java:272)
at org.springframework.context.annotation.ComponentScanAnnotationParser.parse(ComponentScanAnnotationParser.java:135)
at org.springframework.context.annotation.ConfigurationClassParser.doProcessConfigurationClass(ConfigurationClassParser.java:289)
at org.springframework.context.annotation.ConfigurationClassParser.processConfigurationClass(ConfigurationClassParser.java:247)
at org.springframework.context.annotation.ConfigurationClassParser.parse(ConfigurationClassParser.java:200)
at org.springframework.context.annotation.ConfigurationClassParser.parse(ConfigurationClassParser.java:169)
at org.springframework.context.annotation.ConfigurationClassPostProcessor.processConfigBeanDefinitions(ConfigurationClassPostProcessor.java:308)
at org.springframework.context.annotation.ConfigurationClassPostProcessor.postProcessBeanDefinitionRegistry(ConfigurationClassPostProcessor.java:228)
at org.springframework.context.support.PostProcessorRegistrationDelegate.invokeBeanDefinitionRegistryPostProcessors(PostProcessorRegistrationDelegate.java:272)
at org.springframework.context.support.PostProcessorRegistrationDelegate.invokeBeanFactoryPostProcessors(PostProcessorRegistrationDelegate.java:92)
at org.springframework.context.support.AbstractApplicationContext.invokeBeanFactoryPostProcessors(AbstractApplicationContext.java:687)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:525)
- locked <0x0000000708be6ac8> (a java.lang.Object)
(Omitted below)
The number after 0x where this nid = 0x5ee
is written is thevalue obtained by converting the LID seen with
ps -L aux earlier into a
hexadecimal number.
Look for a thread dump that matches the LID from the many written thread dumps.
Refer to the following site for how to read the thread dump https://yohei-a.hatenablog.jp/entry/20150101/1420112104
Looking in the thread dump, I couldn't find any strange threads. In the first place, it consumes a certain amount of CPU when starting tomcat, and in this environment, since two tomcats are started, it takes time for the CPU load to settle down. I'm glad it wasn't abnormal, but I learned a lot about how to read the thread dump, so I learned a lot: relaxed: