When building a service with microservices, it is very important to implement it in consideration of fault tolerance. Retry policies, bulkheads, circuit breakers, etc. are very important concepts that are also defined in microservices design patterns.

MicroProfile's Fault Tolerance provides the functionality you need to build these fault-tolerant services. The implementation is easy to develop based on annotations with CDI and operates by leveraging the CDI interceptor (classes must be implemented as CDI beans). This separates the redundant code for business logic and Fault Torerance and makes it easy to implement.

MicroProfile's Fault Tolerance policy can be managed outside the external settings, and policy management can also be performed using MicroProfile Config.

Key features included in the Fault Tolerance specification

Features provided by Fault Tolerance	Annotations to use and a brief description
1.time out:	@Use Timeout annotation. Defines the maximum time required for processing
2.retry:	@Use Retry annotation. Set the retry operation when processing fails
3.Fallback:	@Use Fallback annotation. Provides an alternative method (call another method) when processing fails
4.Bulkhead(Septum)：	@Use Bulkhead annotation. Limit the number of concurrent executions. As a result, when the load is high, the load is concentrated on a single process and the response is reduced, which prevents chain failures to the entire system.
5.Circuit breaker:	@Use CircuitBreaker annotation. Makes the process call automatically fail immediately if the process repeatedly fails
6.asynchronous:	@Use Asynchronous annotation. Make the process asynchronous

Basically, if you want to apply any of the above policies (multiple specifications are possible), you can set it by just adding an annotation to the class or method to be implemented.

1. Timeout (@Timeout) policy

By setting a timeout, you can prevent waiting for the processing to complete. If you do not set a timeout, if there is a network failure, or if the connection destination is too heavy to return a response immediately, the caller's connection pool worker threads will be exhausted, which may put a load on the caller as well. not. Therefore, when implementing multiple microservices or calling an external service, set a timeout in the cooperation between each service.

@Timeout(400) //Connection timeout value 400ms(0.4 sec)
public Connection getConnectionForServiceA() {
   Connection conn = connectionService();
   return conn;
}

@Timeout annotation can be added at the class or method level. If the timeout value is reached, a TimeoutException will be thrown.

2. Retry (@Retry) policy

If you have a minor network failure or you don't get a reply from the destination, you can use the @Retry annotation to retry the process call.

You can configure the following in the retry policy:

Parameters	Description
maxRetries:	Maximum number of retries
delay:	Retry interval
delayUnit:	delay unit
maxDuration:	Maximum duration of retry
durationUnit:	duration unit
jitter:	Random change in retry delay(Clock signal timing (or cycle) deviation)
jitterDelayUnit:	jitter unit
retryOn:	Failure to retry(Exception, Error)Specify
abortOn:	Failure to cancel(Exception, Error)Specify

The @Retry annotation can be added at the class level or method level, and when added to a class, it will be applied to all methods existing in the class. When attached to a method, only the specified method will be the target. If you add annotations in the class and also in the method, the settings specified in the method will take effect.

If the process ends normally, the result is returned normally.
If the thrown exception is specified with abortOn, the thrown exception will be retransmitted.
If you specify the thrown exception with retryOn, the method call will be retried.
Otherwise, resend the thrown exception

It can also be used with other Fault Tolerance annotations.

    /**
     * serviceA()When a method call throws an exception
     *If the exception is not IOException, try again.
     */
    @Retry(retryOn = Exception.class, abortOn = IOException.class)
    public void invokeService() {
        callServiceA();
    }

    /**
     *Set the maximum number of retries to 90 and the maximum duration of retries to 1000 ms
     *When the maximum number of retries is reached, no retries are performed even if the maximum number of retries has not been reached.
     */
    @Retry(maxRetries = 90, maxDuration= 1000)
    public void serviceB() {
        callServiceB();
    }

    /**
    *Clock frequency shift(jitter)Assuming 400ms-400ms ~ 400ms
    * 0 (delay - jitter) 〜 800ms (delay + jitter )It is expected that retries will be made with the difference of.
    *3200, assuming the maximum delay occurs/800=At 4, the minimum number of trials is 4 or more,
    *Set the maximum number of trials not to exceed 10
    */
    @Retry(delay = 400, maxDuration= 3200, jitter= 400, maxRetries = 10)
    public Connection serviceA() {
        return getConnectionForServiceA();
    }

3. Fallback (@Fallback) policy

The @Fallback annotation can be specified at the method level. If an exception occurs in the annotated method and it ends, the method specified in the fallback method is called.

The @Fallback annotation can be used alone or with other Fault Tolerance annotations. When used with other annotations, the fallback is called after all other Fault Tolerance processing has been done.

For example, if @Retry is defined, fallback processing will be performed if the maximum number of retries is exceeded. Also, if @CircuitBreaker is defined together, it will be called immediately if the method call fails. And whenever the circuit is open, the fallback method is called.

3.1 Implementation example of fallback processing by implementing FallbackHandler

Define a class (ServiceInvocationAFallbackHandler) for FallbackHandler that implements the FallbackHandler interface. Then implement an alternative process within the handle method.

Here, it is implemented to reply the character string defined by the property of app.serviceinvokeA.FallbackReplyMessage or the environment variable by using MicroProfile Config.

@Dependent
public class ServiceInvocationAFallbackHandler implements FallbackHandler<String> {

    @ConfigProperty(name="app.serviceinvokeA.FallbackReplyMessage", defaultValue = "Unconfigured Default Reply")
    private String replyString;

    @Override
    public String handle(ExecutionContext ec) {
        return replyString;
    }
}

When the BusinessLogicServiceBean # invokeServiceA () method below is called, a RuntimeException is raised internally here, but the process is retried three times. After all retries fail, ServiceInvocationAFallbackHandler # handle () is called. Will be.

@RequestScoped
public class BusinessLogicServiceBean {

    //Specify the implementation class of FallbackHandler@Add Fallback annotation
    //Maximum number of retries(3 times)If it exceeds, handle of FallbackHandler()The method is called
    @Retry(maxRetries = 3)
    @Fallback(ServiceInvocationAFallbackHandler.class)
    public String invokeServiceA() {
        throw new RuntimeException("Connection failed");
        return null;
    }   
}

3.2 Implementation example of fallback processing with fallbackMethod specified

Describe the method name to be called as an alternative directly in the @Fallback annotation.

Here, the fallbackForServiceB () method is defined as an alternative method.

@RequestScoped
public class BusinessLogicServiceBean {

    @Retry(maxRetries = 3)
    @Fallback(fallbackMethod= "fallbackForServiceB")
    public String invokeServiceB() {
        counterForInvokingServiceB++;
       return nameService();
    }


    @ConfigProperty(name="app.serviceinvokeB.FallbackReplyMessage", defaultValue = "Unconfigured Default Reply")
    private String replyString;

    private String fallbackForInvokeServiceB() {
        return replyString;
    }

4. Bulkhead (@Bulkhead) policy

The bulkhead pattern is used to prevent some system failures from propagating throughout the system and causing the entire system to go down. The MicroProfile implementation limits the number of concurrent requests that access the instance.

The Bulkhead pattern is effective when applied to components that can be called in large numbers or services that cause poor response under heavy load.

The @Bulkhead annotation can be added at the class level or method level, and when added to a class, it will be applied to all methods existing in the class. When attached to a method, only the specified method will be the target. If you add annotations in the class and also in the method, the settings specified in the method will take effect.

The bulkhead can be set in the following two ways.

Thread pool isolation: (when used with @Asynchronous annotation) Sets the maximum number of simultaneous requests for the waiting queue size in the thread pool.
Separation of semaphores: (when not used with @Asynchronous annotation) Only the number of simultaneous requests can be set.

4.1 Example of separation by thread pool

When used with the @Asynchronous annotation, thread pool isolation is applied. In the example below, up to 5 simultaneous requests are allowed and 8 requests are kept in the wait queue.

//Up to 5 simultaneous requests allowed, up to 8 requests allowed in the wait queue
@Asynchronous
@Bulkhead(value = 5, waitingTaskQueue = 8)
public Future<Connection> invokeServiceA() {
   Connection conn = null;
   counterForInvokingServiceA++;
   conn = connectionService();
   return CompletableFuture.completedFuture(conn);
}

4.2 Example of separation by semaphore

If you do not use the @Asynchronous annotation, simply define the number of simultaneous requests.

@Bulkhead(5) //Up to 5 simultaneous requests are allowed
public Connection invokeServiceA() {
   Connection conn = null;
   counterForInvokingServiceA++;
   conn = connectionService();
   return conn;
}

5. Circuit Breaker (@CircuitBreaker) Policy

The circuit breaker prevents repeated calls to the failed service so that the failed service or API call fails immediately. If a service call fails frequently, the circuit breaker opens and no call to that service is attempted until a certain amount of time has passed.

The @CircuitBreaker annotation can be added at the class level or method level, and when added to a class, it will be applied to all methods existing in the class. When attached to a method, only the specified method will be the target. When annotation is added in the class and added to the method, the setting specified in the method is valid.

Three states of the circuit breaker

Closed: (normal time)

Normally the circuit breaker is closed. The circuit breaker keeps track of the latest results by recording whether each call was successful or unsuccessful. When the failure rate exceeds the failureRatio, the circuit breaker opens.

Open: (when a failure occurs)

If the circuit breaker is open, calls to services running on the circuit breaker will immediately fail with CircuitBreakerOpenException. After a while (configurable), the circuit breaker will transition to the half-open state.

Half open: (Confirming disaster recovery)

In the half-open state, service call attempts begin (configurable number). If either call fails, the circuit breaker will return to the open state again. If all attempts are successful, the circuit breaker transitions to the closed state.

Circuit breaker implementation example 1

@CircuitBreaker(successThreshold = 10, requestVolumeThreshold = 4, failureRatio=0.5, delay = 1000)
public Connection serviceA() {
   Connection conn = null;
   counterForInvokingServiceA++;
   conn = connectionService();
   return conn;
}

Parameters	Description
requestVolumeThreshold：	Rolling window to use when the circuit breaker is "closed"(Number of denominators to calculate failure ratio)Size
failureRatio：	Failure ratio in the rolling window to "open" the circuit breaker
successThreshold：	Number of attempts to move to closed when the circuit breaker is "half open"
delay and delayUnit:	Time to keep the circuit breaker "open"

In the above, the circuit will "open" if two (4 x 0.5) failures occur during four consecutive calls, which is the number of rolling windows specified by requestVolumeThreshold. The circuit remains "open" for 1,000 milliseconds before moving to "half open". After 10 successful calls in "Half Open", the circuit will be "Closed" again.

Request 1-success
Request 2-Failure
Request 3-success
Request 4-success
Request 5-Failure
Request 6-CircuitBreakerOpenException

For the above request, two of the last four requests would fail and the failureRatio would reach 0.5, so "Request 5" would open the circuit and raise a CircuitBreakerOpenException.

Added exception definition to consider success / failure

The failOn and skipOn parameters are used to determine which exceptions are considered to fail because they determine whether the circuit breaker should be "open".

@CircuitBreaker(successThreshold = 10, requestVolumeThreshold = 4, failureRatio=0.5, delay = 1000,failOn = {ExceptionA.class, ExceptionB.class}, skipOn = ExceptionBSub.class))
public Connection serviceA() {
   Connection conn = null;
   counterForInvokingServiceA++;
   conn = connectionService();
   return conn;
}

If the exception specified for failOn occurs, it is considered as a failure. If the exception specified for skipOn occurs, it is considered successful.

6. Asynchronous (@Asynchronous) policy

The main features of Fault Tolerance are as described in Architecture. , These are the functions listed in 1-5 above. So asynchronous processing is not directly related to Fault Tolerance. However, asynchronous processing is very important in distributed processing, and by combining it with various Fault Tolerance functions, it has been incorporated into the specifications to work more effectively.

Source: Architecture

As mentioned above, the Fault Tolerance specification is to focus on the following aspects:

Timeout: Define a duration for timeout

Retry: Define a criteria on when to retry

Fallback: provide an alternative solution for a failed execution.

CircuitBreaker: offer a way of fail fast by automatically failing execution to prevent the system overloading and indefinite wait or timeout by the clients.

Bulkhead: isolate failures in part of the system while the rest part of the system can still function.

The @Asynchronous annotation can be added at the class level or method level, and when added to a class, it will be applied to all methods existing in the class. When attached to a method, only the specified method will be the target. When annotation is added in the class and added to the method, the setting specified in the method is valid.

As soon as a method with the @Asynchronous annotation is called, it returns Future or CompletionStage. The rest of the method body processing is executed in a separate thread. The returned Future or CompletionStage will not have the correct value until the asynchronous process is complete. If an exception occurs during processing, Future or CompletionStage will end with that exception. If the process completes successfully, Future or CompletionStage returns a return value (itself Future or CompletionStage).

@Asynchronous
public CompletionStage <Connection> serviceA（）{
   Connection conn = null;
   counterForInvokingServiceA ++;
   conn = connectionService（）;
   return CompletableFuture.completedFuture（conn）;
}

In the above example, the call to the serviceA method is asynchronous. The call to serviceA returns CompletionStage, and the method body is executed in a separate thread.

Caution: When calling @Asynchronous from a CDI RequestScope, the RequestScope must be active during the asynchronous method call. Methods annotated with @Asynchronous should return the Future or CompletionStage of the java.util.concurrent package. Otherwise, you will get a FaultToleranceDefinitionException.

How to overwrite the setting value described in the source code

As you can see in each section, Fault Tolerance policies can be applied using annotations in most cases, with some exceptions. After implementing the source code, if you want to change the value implemented by annotation, you can also use MicroProfile Config to overwrite the setting value.

The parameters in the annotation can be overridden in the configuration properties using the following naming convention:

<classname>/<methodname>/<annotation>/<parameter>

For example, if you want to overwrite the parameters specified in the Timeout specified in a specific method or the Retry annotation externally, write as follows in MicroProfile Config.

com.yoshio3.FaultToleranceService.resilient.ResilienceController/checkTimeout/Timeout/value=2000
com.yoshio3.FaultToleranceService.resilient.ResilienceController/checkTimeout/Retry/maxDuration=3000

If you want to apply it to the whole class, you can delete the method name part and apply it to the whole class as shown below.

com.yoshio3.FaultToleranceService.resilient.ResilienceController/Timeout/value=2000
com.yoshio3.FaultToleranceService.resilient.ResilienceController/Retry/maxDuration=3000

And if you want to apply the same rules to all the code in your project, you can just list the annotations and parameter settings.

Timeout/value=2000
Retry/maxDuration=3000

at the end

Here we have reviewed the code for building a fault-tolerant application that leverages MicroProfile Fault Tolerance. Next, I would like to actually build an application that uses Fault Tolerance and link multiple fault-tolerant services on Azure.

[JAVA] About MicroProfile Fault Tolerance