Introduction

In this article, I will describe (one aspect) of HA/DR support in Couchbase.

This section describes the case where the standby cluster is hot standby for HA/DR.

As a distributed system/cluster, Couchbase supports node/server failure level HA, but here we will explain the features for the above cases (as a response to more advanced HA requirements). I will.

Cross Data Center Replication (XDCR)

Couchbase's Cross-Data Center Replication (XDCR) is a feature for data replication, although the name implies that its target is between different data centers. The actual function of the software is to replicate data across multiple clusters. Of course, it can be used even between multiple clusters in the same data center, but since it is generally more difficult to realize replication between different topologies, such a name was adopted (for value appeal). I think you can think of it as being.

XDCR has various functions that can be customized by the user, such as filtering of data to be replicated, and is not necessarily specialized for the HA/DR requirements handled here, but here, it is between multiple clusters. It suffices to understand that replication is supported.

Synchronizing data between multiple clusters is the first step in achieving DR, but that alone does not mean that HA for the system has been achieved. At this time, the first idea is to realize a switch between clusters (outside the application) by using virtual IP or the like. Couchbase enables HA without the use of such external mechanisms, that is, without the downtime it requires.

Multi-cluster awareness (MCA)

Couchbase's Java SDK has a feature named Multi-Cluster Awareness (MCA). This provides transparent access from the application to Couchbase while providing DR with multiple clusters when the client (typically the application server) uses Couchbase Server as the backend data platform. ..

Java SDK commentary

From here, I will introduce Java programming using MCA. This is not intended as a guide for a complete implementation, but by showing the actual code, it is possible to have a concrete image of what MCA can do (especially for developers). I hope it will be easier.

First of all, when using MCA, you will need to import the following classes. I think you can get an overview of what range is covered by just looking at the class name.

import com.couchbase.client.mc.ClusterSpec;
import com.couchbase.client.mc.MultiClusterClient;
import com.couchbase.client.mc.BucketFacade;
import com.couchbase.client.mc.coordination.Coordinator;
import com.couchbase.client.mc.coordination.Coordinators;
import com.couchbase.client.mc.coordination.IsolatedCoordinator;
import com.couchbase.client.mc.coordination.TopologyBehavior;
import com.couchbase.client.mc.detection.FailureDetectorFactory;
import com.couchbase.client.mc.detection.NodeHealthFailureDetector;
import com.couchbase.client.mc.detection.DisjunctionFailureDetectorFactory;
import com.couchbase.client.mc.detection.FailureDetectorFactories;
import com.couchbase.client.mc.detection.TrafficMonitoringFailureDetector;

coordinator

First, build a Coordinator with the options you want.

Coordinator coordinator = Coordinators.isolated(newIsolatedCoordinator.Options()
  .clusterSpecs(specs)
  .activeEntries(specs.size())
  .failoverNumNodes(2)
  .gracePeriod(TIMEOUT)
  .topologyBehavior(TopologyBehavior.WRAP_AT_END)
  .serviceTypes(serviceTypes)
);

Failure detection

Next, build the FailureDetector (designed with the factory pattern).


TrafficMonitoringFailureDetector.Options trafficOptions = TrafficMonitoringFailureDetector.options()
  .maxFailedOperations(5)
  .failureInterval(60);
 
FailureDetectorFactory<TrafficMonitoringFailureDetector> traffic = FailureDetectorFactories.trafficMonitoring(coordinator,trafficOptions);
 
NodeHealthFailureDetector.Options healthOptions = NodeHealthFailureDetector.options();
 
FailureDetectorFactory<NodeHealthFailureDetector> health = FailureDetectorFactories.nodeHealth(coordinator,healthOptions);
 
DisjunctionFailureDetectorFactory detector = FailureDetectorFactories.disjunction(traffic,health);

Database abstraction (façade)

MultiClusterClient abstracts access to multiple clusters/databases. You can see that the coordinator and detector built earlier are used when building the client. Buckets/databases are designed with a facade pattern.

MultiClusterClient client = new MultiClusterClient(coordinator, detector);
client.authenticate(options.stringValueOf("id"), options.stringValueOf("password"));
bucket = new BucketFacade(client.openBucket(bucketName, null), TIMEOUT, TimeUnit.MILLISECONDS);

Points to keep in mind when using XDCR + MCA

XDCR supports both uni-directional and bi-directional replication. When using the combination of XDCR and MCA, the replication settings must be bidirectional. This is not surprising given that the cut-down to the DR environment also needs to be bi-directional (taking into account recovery, of course), but it has a one-way image of replication. I think that there may be some directions, so I will note it.

Finally

By the way, you may have read this far and have the following questions. In the first place, the application server itself is not immune to disasters, and if it is (typically) located in the same data center as the database, why not just protect the database from disasters? The question itself is sensible (assuming that assumption). In such a situation, you don't need to use the complex features introduced here to connect your application to Couchbase, just access it in a very simple way (of course it is possible). .. Then, the front end configured with multiple topologies (such as F5 BIG-IP) for the application server will control the traffic in case of a disaster. Again, you can take advantage of Couchbase's XDCR capabilities (as a stateful data layer). On the other hand, in the current situation surrounding systems, the use of the cloud and the hybrid use of the cloud and data center are not uncommon. In such a heterogeneous environment, there is no doubt that the more advanced functions (XDCR + MCA) introduced here are useful.

Next step: Monitoring

As another aspect of system operation, Couchbase also provides various functions (including cooperation with other systems) for monitoring. I would like to organize this separately in the future.

Reference information

Intro to Couchbase HA/DR: Java Multi-Cluster Aware Client

HA/DR support in Couchbase (Java SDK commentary)