Defense against dark magic Advent Calendar 2019 This is the article on the 16th day.
In some cases, the difficulty of troubleshooting is unintentionally increasing due to the progress of deployment automation, container-type virtualization, and microservices.
--Almost no awareness of application settings and access routes in daily development work
--The container is too light to contain ps
or even netstat
--Infrastructure as Code (No documentation exists)
--There is a foundation for log management, but it is useless because I want to check other than logs to see if the logs I want are collected.
This time, we will discuss how to proceed with troubleshooting ** in these cases even if you are not familiar with the target system.
You have obtained the information required for SSH login to the Linux server and the login was successful. You can even switch to the root user! But ** I don't know anything other than login information **.
typewriter@server:~ $ sudo su -
root@server:~ # eixt
-bash: eixt: command not found
root@server:~ # exit
logout
typewriter@server:~ $
Meanwhile, I received an ambiguous inquiry e-mail saying "Accessing a web page causes an error" and is being asked to respond.
Let's move on.
The only way to troubleshoot from scratch is to find and find out.
** ps ** (Neither docker ps
nor pstree
is used)
You can also search for the currently running processes by using the top command if there are many processes. ).
##Display all processes in BSD format / user-oriented format
$ ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 3052 3.2 19.0 1122908 89636 ? Ssl 02:24 0:03 /usr/bin/dockerd --default-ulimit nofile=1024:4096
root 3086 0.4 4.5 1004052 21332 ? Ssl 02:24 0:00 docker-containerd --config /var/run/docker/containerd/containerd.toml
root 3865 0.0 1.1 10632 5332 ? Ss 02:25 0:00 nginx: master process nginx -g daemon off;
101 3914 0.0 0.5 11088 2588 ? S 02:25 0:00 nginx: worker process
Nginx may be running on the Docker container. Let's check.
The best way to check is to use docker ps
or docker top CONTAINER
. However, it is also possible to display the process tree with the f
( --forest
) option of the ps command and check it from the parent-child relationship.
##Tree display of all processes in BSD format and user-oriented format
$ ps auxf
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 3052 0.4 16.7 1122908 78684 ? Ssl 02:24 0:05 /usr/bin/dockerd --default-ulimit nofile=1024:4096
root 3086 0.4 4.8 1005108 22660 ? Ssl 02:24 0:04 \_ docker-containerd --config /var/run/docker/containerd/containerd.toml
root 3829 0.0 0.8 7512 4184 ? Sl 02:25 0:00 | \_ docker-containerd-shim -namespace moby -workdir /var/lib/docker/contain
root 3865 0.0 1.1 10632 5332 ? Ss 02:25 0:00 | \_ nginx: master process nginx -g daemon off;
101 3914 0.0 0.5 11088 2588 ? S 02:25 0:00 | \_ nginx: worker process
It's running on Docker!
** File descriptor **
In Linux (POSIX), each process has a list of file descriptors, or in short, a "list of open files". This can be seen through procfs (although in most cases the same user as the boot process). Must be).
Let's take a look at the file descriptor of the nginx process.
$ sudo ls -l /proc/3865/fd
total 0
lr-x------ 1 root root 64 Dec 15 02:25 0 -> 'pipe:[16216]'
l-wx------ 1 root root 64 Dec 15 02:25 1 -> 'pipe:[16217]'
l-wx------ 1 root root 64 Dec 15 02:25 2 -> 'pipe:[16218]'
lrwx------ 1 root root 64 Dec 15 02:25 4 -> 'socket:[75443]'
lrwx------ 1 root root 64 Dec 15 02:25 5 -> 'socket:[75444]'
lrwx------ 1 root root 64 Dec 15 02:25 6 -> 'socket:[16370]'
I didn't get any files with just pipes and sockets (aside, Linux allows pipes and sockets to be treated as special files).
But don't give up. [The file descriptor of the process is set to 0
as standard input (stdin), 1
as standard output (stdout), and 2
as standard error output (stderr)](https://linuxjm.osdn. Let's try to suck the standard output with cat
from jp / html / LDP_man-pages / man3 / stdin.3.html # lbAD).
$ sudo cat /proc/3865/fd/1
##(Access the web page here)
xxx.xx.xxx.xxx - - [15/Dec/2019:06:03:51 +0000] "GET / HTTP/1.1" 200 612 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" "-"
## ( Ctrl+Stop sucking with C key)
^C
$
The access log of nginx came out. Spitting logs to standard output is an orthodox way of operating a container.
If it is output to a file, you will see the path as follows:
$ sudo ls -l /proc/11178/fd
Total 0
lrwx------1 root root 64 December 15 14:35 0 -> /dev/null
lrwx------1 root root 64 December 15 14:35 1 -> /dev/null
l-wx------1 root root 64 December 15 14:35 2 -> /var/log/nginx/error.log
l-wx------1 root root 64 December 15 14:35 44 -> /var/log/nginx/access.log
The location of the config file is not yet known. Since it is nginx, it is * almost * sure that it is / etc / nginx /
, but let's pretend not to know it.
The strace
command can monitor system calls and signals. System calls also include reading and writing files.
$ sudo strace -t -p 3865
strace: Process 3865 attached
05:36:54 rt_sigsuspend([], 8) = ? ERESTARTNOHAND (To be restarted if no handler)
Monitoring has started (end with Ctrl + C
). In this state, let nginx read the configuration file.
nginx reloads the configuration file when it receives the HUP
signal. Apache is a ʻUSR1` signal, and other applications may be able to reload with a specific signal.
HUP changing configuration, keeping up with a changed time zone (only for FreeBSD and Linux), starting new worker processes with a new configuration, graceful shutdown of old worker processes
You can send a HUP
signal with a noisy command called the kill
command.
$ sudo kill -HUP 3865
05:37:25 --- SIGHUP {si_signo=SIGHUP, si_code=SI_USER, si_pid=53, si_uid=0} ---
05:37:25 rt_sigreturn({mask=[HUP INT QUIT USR1 USR2 ALRM TERM CHLD WINCH IO]}) = -1 EINTR (Interrupted system call)
05:37:25 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
05:37:25 uname({sysname="Linux", nodename="1b01e2a57209", ...}) = 0
05:37:25 openat(AT_FDCWD, "/etc/nginx/nginx.conf", O_RDONLY) = 8
05:37:25 fstat(8, {st_mode=S_IFREG|0644, st_size=643, ...}) = 0
There was a movement in the terminal I was monitoring with strace
. The focus is on the ʻopenatsystem call. You can see that you have opened the
/etc/nginx/nginx.conf` file.
I know the location of the config file. Congratulations!
It's a little annoying when the problem is likely to be on another server. You can check the reverse proxy settings of nginx, or you may be able to find other servers by the following methods.
Hosts in the same subnet can be found by ARP scan, although I won't go into detail. If you don't mind the cache contents, you can check it with the ʻarp` command.
##Check the cache contents
$ arp -a
ip-172-31-16-1.ap-northeast-1.compute.internal (172.31.16.1) at 06:d0:4e:xx:xx:xx [ether] on eth0
##ARP scan (arp-The scan command needs to be installed separately)
###Calculate the ARP scan range (subnet)
$ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 172.31.24.219 netmask 255.255.240.0 broadcast 172.31.31.255
###Subnet from above(Network address/CIDR)Is 172.31.16.0/20
$ sudo arp-scan 172.31.16.0/20
Interface: eth0, datalink type: EN10MB (Ethernet)
Starting arp-scan 1.9.2 with 4096 hosts (http://www.nta-monitor.com/tools-resources/security-tools/arp-scan/)
172.31.16.1 06:d0:4e:xx:xx:xx (Unknown)
172.31.26.132 06:4e:7e:xx:xx:xx (Unknown)
172.31.29.38 06:8b:fe:xx:xx:xx (Unknown)
netstat,iptables(ip_conntrack),nftables(nf_conntrack)
You can use the netstat
command to see which TCP connections and TCP / UDP ports are listening.
$ sudo netstat -anp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 3214/sshd
tcp 0 0 172.31.24.219:22 xxx.xx.xxx.xxx:58168 ESTABLISHED 32051/sshd: USERNAME
tcp6 0 0 :::22 :::* LISTEN 3214/sshd
tcp6 0 0 :::80 :::* LISTEN 3003/docker-proxy
This is the execution result on the server (outside the container) where nginx was started earlier. I don't feel any sign of nginx, but this is normal. This is because communication with the Docker container is NATed with iptables (nftables).
If you hurry and do netstat
inside the container, you may get command not found
. But calm down. Let's check from outside the container.
You can check the NAT settings with the ʻiptables` command.
$ sudo iptables -L
Chain FORWARD (policy DROP)
target prot opt source destination
DOCKER all -- anywhere anywhere
Chain DOCKER (1 references)
target prot opt source destination
ACCEPT tcp -- anywhere ip-172-17-0-2.ap-northeast-1.compute.internal tcp dpt:http
And TCP connections NATed by iptables / nftables can be found in procfs from ʻip_conntrackor
nf_conntrack`.
$ sudo cat /proc/net/nf_conntrack
ipv4 2 tcp 6 431997 ESTABLISHED src=xxx.xx.xxx.xxx dst=172.31.24.219 sport=57245 dport=80 src=172.17.0.2 dst=xxx.xx.xxx.xxx sport=80 dport=57245 [ASSURED] mark=0 zone=0 use=2
ipv4 2 tcp 6 103 TIME_WAIT src=172.17.0.2 dst=xx.xx.xx.xxx sport=46684 dport=8080 src=xx.xx.xx.xxx dst=172.31.24.219 sport=8080 dport=46684 [ASSURED] mark=0 zone=0 use=2
The first line is the connection where access to the TCP 80 port is NATed to the Docker container. The second line is the NAT of the connection from inside the Docker container to the TCP 8080 port on the external host.
From the second line, we can infer that nginx may be reverse proxying to an external host.
** From consoles such as AWS and GCP ** (load balancer system)
If the communication destination found is a load balancer, the existence of the server beyond that cannot be searched. Unfortunately, I have no choice but to look at the load balancer settings.
** From everyone's command history **
If you think "someone must be doing it", set a precedent. The command history output by the history
command is saved in .bash_history
for bash.
$ sudo cat /home/*/.bash_history | grep ssh
ssh -p 11122 example.com
ssh [email protected]
When you get to a suspicious server, check it out. In addition to the commands explained so far, the following commands and elements can also be used.
In addition to file descriptors and NAT sessions, the proc file system can acquire various information such as command strings at startup including parameters.
In addition to procfs man page, linux procfs thorough introduction --SIer but I want to do technology blog can be confirmed with the output example.
/var/log/{syslog,messages}
Collected by syslogd (rsyslogd) Various logs of the system are output. Rarely, but when OOM Killer runs, which kills a process when it runs out of memory, it keeps a record (For OOM Killer, The OOM CTF And [Out Of Memory Management](see https://www.kernel.org/doc/gorman/html/understand/understand016.html).
Dec 15 10:51:07 ip-172-31-24-219 kernel: Out of memory: Kill process 6359 (bash) score 635 or sacrifice child
Dec 15 10:51:07 ip-172-31-24-219 kernel: Killed process 6360 (yes) total-vm:114636kB, anon-rss:88kB, file-rss:0kB, shmem-rss:0kB
Dec 15 10:51:07 ip-172-31-24-219 kernel: oom_reaper: reaped process 6360 (yes), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Although it partially overlaps, it is a major command group that outputs memory, CPU, I / O, and other information. I won't explain it, but it's worth remembering that you can also display LWPs (threads) with ps -L
.
## PID:Process number, LWP: LWP(thread)ID, NLWP: LWP(thread)number
$ ps -efL
UID PID PPID LWP C NLWP STIME TTY TIME CMD
root 3564 1 3564 0 8 02:24 ? 00:00:00 /usr/libexec/amazon-ecs-init start
root 3564 1 3567 0 8 02:24 ? 00:00:00 /usr/libexec/amazon-ecs-init start
root 3564 1 3568 0 8 02:24 ? 00:00:00 /usr/libexec/amazon-ecs-init start
(abridgement)
With the methods described so far, you can now find servers, applications, settings, log files, and see what the server looks like.
You will be able to respond to inquiries such as "Accessing a web page causes an error".
unicorn_err.log:E, [2019-12-15T21:18:43.339882 #10627] ERROR -- : worker=4 PID:14246 timeout (98s > 60s), killing
unicorn_err.log:E, [2019-12-15T21:18:43.339910 #10627] ERROR -- : worker=5 PID:14254 timeout (80s > 60s), killing
Wow, it takes more than 60 seconds to process! ??
(does not continue)
The commands and elements used this time are as follows.
--Processes, services