At work, I was in charge of verifying LifeKeeper and incorporating it into the construction procedure manual. This time, it will be the content of troubleshooting when verifying LifeKeeper. During the test after building LifeKeeper, there was an event that the behavior of interface down became strange. I will explain the cause and countermeasures here.
It will be this environment.
--HyperVisor (Omitted this time) vSphere6.7(ESXi6.7)
--VM ("server" part) RHEL6.9
--Shared disk / Quorum disk # 1, # 2 VMDK
We tested the behavior of dropping the service NIC one by one.
** Command executed **
# ifdown eth0
-** Interface down on server # 1 **
-** Interface down on server # 2 **
When we conducted this test, there were differences in the results.
Service NIC for server # 1 ** → Automatic recovery after a few seconds **
Service NIC for server # 2 ** → Stay in DOWN state **
I was wondering, so I investigated.
** Server # 2 "NetworkManager" was up. ** **
Implemented service stop / automatic start / stop of "NetworkManager" of server # 2
# service NetworkManager stop
# chkconfig NetworkManager off
When I tested it again, I confirmed that it behaved the same as server # 1.
What did you think? This is the first time I have published an article about LifeKeeper. I think this article will be helpful for those who are designing and building a LifeKeeper HA cluster.
The content of this time is the result of suspicious investigation from middleware (LifeKeeper). I hope you find even one helpful.
** Please follow us on Twitter if you like! ** ** https://twitter.com/satton6987
** I mainly mutter about career hacks and technologies of infrastructure engineers. ** **