This article is written by Linux beginners. This article should be used as a reference only, as important data may be lost.
CentOS7 / Apache2.4 / PHP5.4 / MariaDB5.5 / Zabbix Server4.4.6 / Sakura's VPS 1G
Normally, server monitoring is my main job, but As a part of my study, I personally set up Zabbix server.
An alert like this from that Zabbix server ...
Problem started at hh:mm:ss on yyyy.mm.dd Problem name: /: Disk space is critically low (used > 90%) Host: Zabbix server Severity: Average
As you can see by reading, it seems that the disk usage of the Zabbix server that was left unattended has exceeded 90%.
That's why I'm going to check the graph for the time being.
Certainly the disk is tight, so I will log in to the server and investigate.
[root@hostname user]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda4 45G 40G 2.7G 94% /
devtmpfs 484M 0 484M 0% /dev
tmpfs 496M 0 496M 0% /dev/shm
tmpfs 496M 51M 446M 11% /run
tmpfs 496M 0 496M 0% /sys/fs/cgroup
/dev/vda2 477M 103M 345M 23% /boot
tmpfs 100M 0 100M 0% /run/user/1000
Check the free disk space with the df command Adjust the unit with the -h option
For the time being, there seems to be no free space in /.
[root@hostname user]# du -sh /*
0 /bin
101M /boot
0 /dev
34M /etc
84K /home
0 /lib
0 /lib64
16K /lost+found
4.0K /media
4.0K /mnt
8.0K /opt
0 /proc
68K /root
51M /run
0 /sbin
4.0K /srv
0 /sys
48K /tmp
1.6G /usr
24G /var
Check the disk space used by the file directory with the du command Show only the required part with the -s option Adjusted for easy viewing with the -h option
For the time being, it seems that the cause is / var, which is the most used.
[root@hostname user]# du -sh /var/*
4.0K /var/account
4.0K /var/adm
120M /var/cache
4.0K /var/crash
20K /var/db
8.0K /var/empty
4.0K /var/games
4.0K /var/gopher
12K /var/kerberos
311M /var/lib
4.0K /var/local
0 /var/lock
27G /var/log
0 /var/mail
4.0K /var/nis
4.0K /var/opt
4.0K /var/preserve
0 /var/run
116K /var/spool
32K /var/tmp
12K /var/www
4.0K /var/yp
I use the du command in the same way, but this time I investigated / var, and as a result, it seems that / var / log is the cause.
[root@hostname user]# du -sh /var/log/*
4.0K /var/log/anaconda
39M /var/log/audit
0 /var/log/boot.log
12K /var/log/boot.log-20200427
137M /var/log/btmp
222M /var/log/btmp-20200501
4.0K /var/log/chrony
128K /var/log/cron
164K /var/log/cron-20200419
160K /var/log/cron-20200426
160K /var/log/cron-20200503
160K /var/log/cron-20200510
36K /var/log/dmesg
2.7M /var/log/httpd
40K /var/log/lastlog
0 /var/log/maillog
0 /var/log/maillog-20200426
4.0K /var/log/maillog-20200503
0 /var/log/maillog-20200510
20K /var/log/mariadb
236K /var/log/messages
276K /var/log/messages-20200419
272K /var/log/messages-20200426
344K /var/log/messages-20200503
280K /var/log/messages-20200510
4.0K /var/log/qemu-ga
4.0K /var/log/rhsm
22M /var/log/sa
55M /var/log/secure
59M /var/log/secure-20200419
44M /var/log/secure-20200426
64M /var/log/secure-20200503
63M /var/log/secure-20200510
0 /var/log/spooler
0 /var/log/spooler-20200426
0 /var/log/spooler-20200503
0 /var/log/spooler-20200510
20K /var/log/tuned
48K /var/log/wtmp
4.0K /var/log/yum.log
23G /var/log/zabbix
Investigate the next hierarchy, the cause seems to be / var / log / zabbix (Zabbix server in the first place). The used capacity is clearly different.
Now that we know the cause, we will consider countermeasures. After investigating, it seems that the tmpwatch command can be used.
[root@hostname user]# yum -y install tmpwatch
For now, use the yum command to install the tmpwatch command.
[root@hostname user]# tmpwatch -d -m 720 /var/log/zabbix
Delete old files with the tmpwatch command Exclude direction with -d option Specify the time with -m
If it's not misidentified, it should delete files that haven't been updated for more than 720 hours.
Go to check the graph again.
For the time being, it seems that the disk capacity has improved a little. After executing the command, I also confirmed an alert that the 90% disk usage was below 90%, so for the time being it was a success ... (Although not less than 80 percent)
I forgot the point, but I didn't reboot. Click here for the result of restart
It seems that the old log file was deleted properly.
For the time being, this is a relief
Recommended Posts