The story of stopping the production service with the hostname command

Inspired by an Advent Calendar, I would like to regret what I did in a production environment.

Production work

This is the event in my second year as a member of society. At that time, I was in charge of maintaining BtoB systems running in a cloud environment. As a maintenance member, I remember that there were two infrastructure and three apps. I was in charge of infrastructure and the other member was the leader.

On this day, I was supposed to change the settings of the middleware in the server. I told the system owner that there was no outage. Therefore, there is no switching to the maintenance screen or maintenance notification. As usual, I sent an email to the owner to contact me to start the production work and started the production work. We will proceed according to the verified procedure. When the setting change itself was completed and the operation was confirmed, I was informed that the service was down. .. .. I don't know what happened and I remember my head turning white. I asked the leader for help and the leader started the investigation. Then, the leader said, "The host name is strange."

↓ Host name at that time

^i

He noticed by looking at the host name displayed at the prompt after logging in to the server. By the way, I didn't notice because I wasn't logging out of the server or launching a new prompt.

Cause of failure

If you check the history, the following command will be displayed. .. ..

hostname ^i

As anyone who understands it, I have rewritten the host name. I didn't realize that I had rewritten the hostname by mistakenly typing "hostname -i" as "hostname ^ i". I didn't know at the time that the hostname command was a command that could also change the hostname.

(Reference) hostname command
Command execution example Description
hostname Show host name
hostname -i Show IP address
hostname <string> Host name<string>change to

Disaster recovery

Changing the host name by the hostname command becomes invalid when the OS is restarted (it returns to the host name before the change). Therefore, the OS was restarted and the host name was restored. The failure has been successfully recovered.

Why did the tragedy happen?

-You have manually typed a command that is not in the procedure manual. Only the minimum commands were described in the procedure manual, and the commands for confirmation were omitted. ・ The work was carried out by one person. Due to man-hour problems, it was basically a one-person work.

What did you do to prevent the tragedy from happening again?

What was implemented as a recurrence prevention measure at that time -Do not execute any commands other than those described in the procedure manual. Describe all necessary procedures such as commands for confirmation. ・ Be sure to copy and paste commands, not by hand. Eliminate command input mistakes by hand. -Use commands that do not affect the system when checking. Avoid using commands that may change settings as much as possible. ・ Be sure to carry out the work by two people and perform a double check. Secure a system to immediately notice typos. By distributing the responsibilities between the two people, there is room in the heart. What I can think of now -Automate the work itself. Minimize room for human intervention and reduce human error.

This is my Misogi. Thank you for reading.

Recommended Posts

The story of stopping the production service with the hostname command
The story of doing deep learning with TPU
The story of misreading the swap line of the top command
The story of sys.path.append ()
About the service command
The story of replacing Nvidia GTX 1650 with Linux Mint 20.1.
The story of sharing the pyenv environment with multiple users
[Apache] The story of prefork
The story of implementing the popular Facebook Messenger Bot with python
Get UNIXTIME at the beginning of today with a command
Let's execute the command on time with the bot of discord
Check the memory status of the server with the Linux free command
Check the operating status of the server with the Linux top command
The story of displaying images with OpenCV or PIL (only)
The story of rubyist struggling with python :: Dict data with pycall
The story of making a question box bot with discord.py
A story stuck with the installation of the machine learning library JAX
The story of not being able to run pygame with pycharm
March 14th is Pi Day. The story of calculating pi with python
The story of participating in AtCoder
The story of making a standard driver for db with python.
The story of outputting the planetarium master in pdf format with Pycairo
The story of the "hole" in the file
The story of visualizing popular Qiita tags with Bar Chart Race
How to monitor the execution status of sqlldr with the pv command
Using cgo with the go command
The hostname command may be multifunctional
Hit the top command with htop
Calculated the ease of stopping the house of the board game "Bunkers" with Python
The story of remounting the application server
The story of writing a program
The story of making a module that skips mail with python
The story of a Parking Sensor in 10 minutes with GrovePi + Starter Kit
The story of making a university 100 yen breakfast LINE bot with Python
The story of having a hard time introducing OpenCV with M1 MAC
The story of making a sound camera with Touch Designer and ReSpeaker
The story of trying to push SSH_AUTH_SOCK obsolete on screen with LD_PRELOAD
The story of using mysqlclient because PyMySQL cannot be used with Django 2.2
The story of trying to reconnect the client
Ask for Pi with the bc command
The story of an error in PyOCR
Align the size of the colorbar with matplotlib
The story of verifying the open data of COVID-19
The story of adding MeCab to ubuntu 16.04
The story of making Python an exe
Check the existence of the file with python
The story of making an immutable mold
The story of manipulating python global variables
Migemo version of the: find command,: mfind
The third night of the loop with for
The story of trying deep3d and losing
The story of deciphering Keras' LSTM model.predict
The second night of the loop with for
Try rewriting the file with the less command
The story of blackjack A processing (python)
The story of pep8 changing to pycodestyle
Notice the completion of a time-consuming command
Count the number of characters with echo
The story of making a web application that records extensive reading with Django
Get only the source code of the PyPI package with pip from the command line
The story of the learning method that acquired LinuC Level 1 with only ping -t