Try Health Check on Azure App Service.

Introduction

Hello.

Are you using App Service?

It is a typical PaaS that works just by deploying a web application, but when it comes to operation, there is a problem in terms of resilience. When the App Service is deployed, the health check for the app is not enabled, so even if the app does not respond normally, the internal load balancer will continue to distribute requests to the unhealthy app. When this happens, it can be a source of trouble, such as service health monitoring problems or the continued distribution of specific user requests to unhealthy instances.

Therefore, I will try "Health Check" this time. Although it is a preview, you can try it immediately on the Azure environment that can be deployed now, so we will verify this function.

Click here for documentation: https://github.com/projectkudu/kudu/wiki/Health-Check-(Preview)

Preparing for Health Check

Health Check is a function that checks the HTTP health of each instance of the application and disconnects the instance if the response is invalid. Currently, it is not possible to set the detection threshold, but if HTTP Ping continues to be unsuccessful for 5 times, it will be excluded from the request distribution destination and the abnormal instance will be automatically restarted. It becomes the behavior. The success or failure of HTTP Ping is judged by sending a GET request to a specific URI and responding with a status code in the 200s within 2 minutes.

To take advantage of this feature, you need to edit the target App Service resource directly in Azure Resource Explorer (https://resources.azure.com/). (Currently, it's a bit interesting to use the preview tool to take advantage of the preview feature)

Go to Resource Explorer and go to "subscriptions"-> "Subscriptions"-> "resourceGroups"-> "Resource Groups"-> "providers"-> "Microsoft.Web"-> "sites"-> " Expand the tree with the target AppService "->" config ". Enter Edit mode of the config resource and rewrite the "healthCheckPath" below. By default, it is null, and the Health Check function can be enabled by defining an arbitrary path here. image.png This time, I chose / status.

Next, implement the URI corresponding to / status on the application side. This is an implementation of the REST API that checks if the app is working properly and returns a 200 status code if there are no problems. The content varies from app to app, but in general, check communication with external services (Storage Blob, SQL Database, etc.) used by the app, and say OK if normal. Implement the API. Since this is a verification, we have implemented a Spring Boot app that allows you to specify the status code that will be the return value in PUT. The sample code is as follows.

StatucController.java


package com.example.springboot;
import java.net.InetAddress;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PutMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;

@RestController
@RequestMapping(value = "/status")
public class StatusController {
    private static HttpStatus status = HttpStatus.OK;

    @GetMapping
    public ResponseEntity<String> get() {
        return new ResponseEntity<String>(this.getLocalhost() + "/" + status.getReasonPhrase(), status);
    }

    @PutMapping
    public ResponseEntity<String> put(@RequestParam(name="code", defaultValue = "200") Integer code) {
        switch (code.intValue()) {
            case 200:
                status = HttpStatus.OK;
                break;
            case 500:
                status = HttpStatus.INTERNAL_SERVER_ERROR;
                break;
            case 400:
                status = HttpStatus.BAD_REQUEST;
                break;
            case 502:
                status = HttpStatus.BAD_GATEWAY;
                break;
            default:
                status = HttpStatus.OK;
                break;
        }
        return new ResponseEntity<String>(this.getLocalhost() + "/" + status.getReasonPhrase(), status);
    }

    private String getLocalhost() {
        try {
            return InetAddress.getLocalHost().getHostAddress();
        } catch(Exception e) {
            e.printStackTrace();
            return "0.0.0.0";
        }
    }
}

Operation check

As a matter of fact (details omitted), after deploying the app on the App Service, access the app in Chrome. image.png It's a simple app, but the IP address (172.16.1.5) is displayed. This IP address will be the internal IP of each instance of App Service, so it will be the information to identify which instance this session is connecting to.

This time I tried to access from Edge. image.png Only the address is different from what you see in Chrome. App Service's internal load balancer works on Sticky, so in principle, even if you have multiple backend instances, you will go to the same instance every time. (By the way, the weighting of the load balancer distribution seems to be related to the amount of access, and Chrome reloaded about 10 times to connect to different instances)

Now use Postman to access / status. image.png I was able to confirm that it was returned with a 200 status code. This is connecting to an instance of 172.16.1.5. Now, let's change the response from / status so that it returns a 500 status code. image.png In this verification app, by dynamically setting the HTTP status code returned from / status, it is expected that an error in the app will be detected and the abnormal instance will be eliminated and restarted on the App Service side. I am.

Wait about 10 minutes and try updating the Chrome side. image.png The IP has changed to 172.16.1.2 and it seems that the instance has been restarted. Just in case, I will check it with Postman. image.png Similarly, the access destination has changed to 172.16.1.2. From this, it can be confirmed that the Health Check function operates, the instance determined to be abnormal is disconnected, and restarted.

I will also check it on the Azure portal side. image.png The orange line is the instance that was intentionally modified to make the response to HTTP Ping abnormal this time. On the metric, you can tell that the instance was unresponsive from 14:59 to 15:14. After that, the service has been restored because it was automatically restarted.

The Health Check function needs to be implemented on the application side, but it is an effective function to improve the recoverability of the service. This time, I only checked the basic behavior, but I would like to take some time to check the detailed behavior.

That's it for trying Health Check on App Service.

Recommended Posts

Try Health Check on Azure App Service.
Azure App Service (Windows) Tomcat configuration changes