For some reason, I have been working on Linux for a year and a half, so I will write about the experience I have done so far ~~ (dying) ~~. Especially recently, it was possible to operate at the BIOS level on-premise, but there are many patterns that can not be done on the cloud and get stuck. (There is also a pattern that can be saved)
I would like to prevent tampering and minimize scratches including recovery.
Modified the config file (/ etc / selinux / config) to disable SElinux. Where it should be "SELINUX = disabled", I accidentally set "SELINUX TYPE = disabled". Or a typo. Details are introduced here, so I will omit it. https://qiita.com/daisuke0115/items/4b0ed3a5888cf81efd0a
It's good if you can connect to the machine at the bios level, but if it's a cloud such as AWS, it will get stuck.
→ You can save by editing by mounting the corresponding EBS on another EC2. http://blog.serverworks.co.jp/tech/2020/04/14/post-82871/
However, like a domestic cloud? If the disc cannot be replaced easily, it will clog. You will need to revert from backup or rebuild.
It is safe to do this by copying and doing more.
cat /etc/selinux/config | grep SELINUX=disabled
Correct example
/dev/sda3 /home xfs defaults,nofail 0 2
In my case, I forgot to write xfs in the middle and restarted, and ssh connection was disabled. .. If you do this too, you can save it in the same way as above. The cloud environment I was touching couldn't be used, so I restored it from a backup.
sudo mount -a
After confirming that there are no ** errors **, restart.
I ignored the error and restarted with the momentum of construction. .. Ulsite
Add a cloud server Select the nofail option when mounting a disk! https://inaba-serverdesign.jp/blog/20170210/cloud_disk_mount_fstab_nofail.html
Let's add nofail as shown in.
(I think it's a little different from editing mistakes)
As a general rule, root ssh is invalid on the server. In rare cases, it may be temporarily enabled due to toilet bowls in a closed environment. In my case, I just got used to it, and I changed it on the public server that I had set up as a verification ...
→ Detected and changed by attack detection of cloud service (Azure). The password is complicated, so I got nothing.
I thought that I would stop the verification server, and when I operated it, I chose to end it.
The screen that says terminated and I turn blue
Let's enable termination protection. Now let's take AMIs, take snapshots, and in some cases control at the IAM level.
I didn't do anything this time, so I spent about 3 hours rebuilding it. .. Even though it is a verification server, it hurts.
I was reminded of this failure.
Fail safe (Fail safe, fail safe, English: fail safe) is to always control safely when a failure occurs due to a malfunction or malfunction in some device or system. Or one of the reliability designs with such a design method. This is based on the premise that devices and systems will "always fail". https://ja.m.wikipedia.org/wiki/%E3%83%95%E3%82%A7%E3%82%A4%E3%83%AB%E3%82%BB%E3%83%BC%E3%83%95
It is often used in information processing tests, but it is a necessary idea when designing a system. Incorporate the concept of fail-safe as it is. Rather, "human beings always fail", so work on the premise of failure. That consciousness is important. (Detailedly, it may be a different reliability design depending on the measures, but it is omitted)
In this case, consider the following.
It can be applied to everyday life, and there are no good examples ...
Since the cloud service can be started and stopped easily, it may not be possible to work from the console (BIOS level), and it may be troublesome in case of trouble. Let's prevent mistakes that can be prevented like the hiyari hat in the manufacturing industry in advance by failure cases and risk imagination at the time of operation.
People who have done it in production environment Advent Calendar 2019 https://qiita.com/advent-calendar/2019/yarakashi-production