Caution

This article is written by Linux beginners. This article should be used as a reference only, as important data may be lost.

environment

CentOS7 / Apache2.4 / PHP5.4 / MariaDB5.5 / Zabbix Server4.4.6 / Sakura's VPS 1G

Start

Normally, server monitoring is my main job, but As a part of my study, I personally set up Zabbix server.

An alert like this from that Zabbix server ...

Problem started at hh：mm：ss on yyyy.mm.dd Problem name: /: Disk space is critically low (used > 90%) Host: Zabbix server Severity: Average

Some information is hidden

As you can see by reading, it seems that the disk usage of the Zabbix server that was left unattended has exceeded 90%.

Actually, an alert was issued when it exceeded 80%, but I left it completely.

That's why I'm going to check the graph for the time being.

Certainly the disk is tight, so I will log in to the server and investigate.

Investigation

[root@hostname user]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda4        45G   40G  2.7G  94% /
devtmpfs        484M     0  484M   0% /dev
tmpfs           496M     0  496M   0% /dev/shm
tmpfs           496M   51M  446M  11% /run
tmpfs           496M     0  496M   0% /sys/fs/cgroup
/dev/vda2       477M  103M  345M  23% /boot
tmpfs           100M     0  100M   0% /run/user/1000

Check the free disk space with the df command Adjust the unit with the -h option

For the time being, there seems to be no free space in /.

[root@hostname user]# du -sh /*
0       /bin
101M    /boot
0       /dev
34M     /etc
84K     /home
0       /lib
0       /lib64
16K     /lost+found
4.0K    /media
4.0K    /mnt
8.0K    /opt
0       /proc
68K     /root
51M     /run
0       /sbin
4.0K    /srv
0       /sys
48K     /tmp
1.6G    /usr
24G     /var

Check the disk space used by the file directory with the du command Show only the required part with the -s option Adjusted for easy viewing with the -h option

For the time being, it seems that the cause is / var, which is the most used.

[root@hostname user]# du -sh /var/*
4.0K    /var/account
4.0K    /var/adm
120M    /var/cache
4.0K    /var/crash
20K     /var/db
8.0K    /var/empty
4.0K    /var/games
4.0K    /var/gopher
12K     /var/kerberos
311M    /var/lib
4.0K    /var/local
0       /var/lock
27G     /var/log
0       /var/mail
4.0K    /var/nis
4.0K    /var/opt
4.0K    /var/preserve
0       /var/run
116K    /var/spool
32K     /var/tmp
12K     /var/www
4.0K    /var/yp

I use the du command in the same way, but this time I investigated / var, and as a result, it seems that / var / log is the cause.

[root@hostname user]# du -sh /var/log/*
4.0K    /var/log/anaconda
39M     /var/log/audit
0       /var/log/boot.log
12K     /var/log/boot.log-20200427
137M    /var/log/btmp
222M    /var/log/btmp-20200501
4.0K    /var/log/chrony
128K    /var/log/cron
164K    /var/log/cron-20200419
160K    /var/log/cron-20200426
160K    /var/log/cron-20200503
160K    /var/log/cron-20200510
36K     /var/log/dmesg
2.7M    /var/log/httpd
40K     /var/log/lastlog
0       /var/log/maillog
0       /var/log/maillog-20200426
4.0K    /var/log/maillog-20200503
0       /var/log/maillog-20200510
20K     /var/log/mariadb
236K    /var/log/messages
276K    /var/log/messages-20200419
272K    /var/log/messages-20200426
344K    /var/log/messages-20200503
280K    /var/log/messages-20200510
4.0K    /var/log/qemu-ga
4.0K    /var/log/rhsm
22M     /var/log/sa
55M     /var/log/secure
59M     /var/log/secure-20200419
44M     /var/log/secure-20200426
64M     /var/log/secure-20200503
63M     /var/log/secure-20200510
0       /var/log/spooler
0       /var/log/spooler-20200426
0       /var/log/spooler-20200503
0       /var/log/spooler-20200510
20K     /var/log/tuned
48K     /var/log/wtmp
4.0K    /var/log/yum.log
23G     /var/log/zabbix

Investigate the next hierarchy, the cause seems to be / var / log / zabbix (Zabbix server in the first place). The used capacity is clearly different.

Countermeasures

Now that we know the cause, we will consider countermeasures. After investigating, it seems that the tmpwatch command can be used.

[root@hostname user]# yum -y install tmpwatch

For now, use the yum command to install the tmpwatch command.

[root@hostname user]# tmpwatch -d -m 720 /var/log/zabbix

Delete old files with the tmpwatch command Exclude direction with -d option Specify the time with -m

If it's not misidentified, it should delete files that haven't been updated for more than 720 hours.

Go to check the graph again.

For the time being, it seems that the disk capacity has improved a little. After executing the command, I also confirmed an alert that the 90% disk usage was below 90%, so for the time being it was a success ... (Although not less than 80 percent)

Postscript

I forgot the point, but I didn't reboot. Click here for the result of restart

It seems that the old log file was deleted properly.

For the time being, this is a relief