[LifeKeeper x Linux] When the interface down behavior after HA cluster construction is strange

Introduction

At work, I was in charge of verifying LifeKeeper and incorporating it into the construction procedure manual. This time, it will be the content of troubleshooting when verifying LifeKeeper. During the test after building LifeKeeper, there was an event that the behavior of interface down became strange. I will explain the cause and countermeasures here.

environment

It will be this environment.

図.JPG

--HyperVisor (Omitted this time) vSphere6.7(ESXi6.7)

--VM ("server" part) RHEL6.9

NIC ** Service NIC x 1 ** ** Communication path NIC x 2 **

--Shared disk / Quorum disk # 1, # 2 VMDK

What happened

We tested the behavior of dropping the service NIC one by one.

** Command executed **

# ifdown eth0

-** Interface down on server # 1 **

サーバー#1.JPG

-** Interface down on server # 2 **

サーバー#2.JPG

When we conducted this test, there were differences in the results.

Service NIC for server # 1 ** → Automatic recovery after a few seconds **

Service NIC for server # 2 ** → Stay in DOWN state **

I was wondering, so I investigated.

Cause

** Server # 2 "NetworkManager" was up. ** **

Server 1 "NetworkManager" is off

Countermeasures

Implemented service stop / automatic start / stop of "NetworkManager" of server # 2

# service NetworkManager stop
# chkconfig  NetworkManager off

When I tested it again, I confirmed that it behaved the same as server # 1.

Finally

What did you think? This is the first time I have published an article about LifeKeeper. I think this article will be helpful for those who are designing and building a LifeKeeper HA cluster.

The content of this time is the result of suspicious investigation from middleware (LifeKeeper). I hope you find even one helpful.

** Please follow us on Twitter if you like! ** ** https://twitter.com/satton6987

** I mainly mutter about career hacks and technologies of infrastructure engineers. ** **