Introduction

It is the 22nd day of Future Advent Calender2 2019. By the way, the article of Advent Calendar 1 is from here.

This is my fifth year at the company, and every year at this time of year, I would like to know what kind of business and technology I have faced over the past year. It's a good opportunity to reconsider, but since all the projects are suitable for the name of December, I have to use precious holidays. I can't write an article, so it's quite difficult. .. .. First, insure the quality of the article.

Overview

In 2019, he has led the design and development of applications using NoSQL databases such as KVS. Looking back, I wrote only Cassandra articles, so this time I would like to focus on gRPC and write articles. Until now, I have been struggling because I have never been serious about API design and development, focusing on infrastructure and middleware design and construction, but I hope that I can give back to everyone the knowledge I gained through gRPC design and development.

What is gRPC?

gRPC is an open source developed by Google that uses Protocol Buffers to serialize data and The feature is that it can realize faster communication than REST. In gRPC, API specifications are added to the definition file called proto file using IDL (interface definition language). By defining the source template required for the client / server, even between different languages such as Java, C ++, Python, Go, etc. One of the features is that the IF that matches each can be automatically generated from the proto file. The proto file is defined according to the language specification of proto3.

Points to adopt gRPC

REST is often compared when adopting gRPC as the API. The API server I was in charge of designing and developing this time is an API for CRUD to the backend data store layer, The point that data acquired from KVS etc. can be structurally defined by ** proto and communicated faster than REST ** The biggest merit of adopting it as a data store API was.

Also, it is very costly to match API specifications to realize a microservice architecture. That's what gRPC can do ** keep the interface with strict rules **. It can be said that the merit of adopting it is higher than that of REST, which allows free design.

However, there is no absolute advantage over REST, so from the perspective of enjoying the above benefits. I think it is better to select by case-by-case.

gRPC design and development tips

1. How to manage proto files

1-1. Define deeply nested structural data by dividing the proto file

Since Cassandra was adopted as the data store, the data handled is not a flat data hierarchy, but It was a deeply nested structural data. Therefore, it is possible to define deeply nested structural data in one proto file. It can be done, but readability and maintainability were poor, so the proto file was divided and defined as shown below.

Example of defining in one file

It is possible to define the hierarchical structure in one file, but you can express a multi-layer model in one proto. As the nesting becomes deeper in the 2nd and 3rd layers, both readability and maintainability deteriorate.


syntax = "proto3";
option java_package = "jp.co.sample.datastore.common.model";

package common;

message ParentModel {
  string parentId    = 1;
  string parentNm    = 2;
  ChildModel child   = 3; //Specify ChildModel defined in the file as the type
}

message ChildModel {
  string childId    = 1;
  string childNm    = 2;
}

Example of defining with multiple files

Since ChildModel and ParentModel can be defined separately in separate files This time, the files were managed separately for each structural unit. As I'll talk about later, Cassandra is a user-defined type. (Since any structure called UDT can be defined by DDL, the proto model is also divided for each UDT.


syntax = "proto3";
option java_package = "jp.co.sample.datastore.common.model";

package common;

message ChildModel {
  string childId    = 1;
  string childNm    = 2;
}


syntax = "proto3";
option java_multiple_files = true;
option java_package = "jp.co.sample.datastore.common.model";

package common;

import "common/child_model.proto"; //Specify the prototype that defines the ChildModel

message ParentModel {
  string parentId    = 1;
  string parentNm    = 2;
  ChildModel child   = 3;
}

1-2. Manage proto files together with DDL

In gRPC, API specifications are defined and managed in the proto file, so basically this proto file is managed by Git etc. It can be said that API specifications can always be kept up-to-date by version control.

However, when using it as a data store API, the request / response from the application It would have been better to manage the proto for the parameter definition, but the data acquired from Cassandra was used in the proto. In order to handle it structurally, it was necessary to maintain consistency with the table definition on the Cassandra side.

It is common for DDL changes and updates to occur during application development. Therefore, Cassandra's DDL was managed by the table definition file in the in-house standard format, so Even if the table definition is changed by automatically generating the proto file using the definition as input The consistency between the two can now be guaranteed.

As the scale of development grows, it becomes difficult to absorb the difference between proto and DDL. It is better to arrange the mechanism from the beginning.

1-3. IF module management method

From proto file to proto file to client / server required interface Sources can be automatically generated for each language.

However, it is very annoying to automatically generate from the proto file and commit the source every time. As the number of developers increases, it becomes a complicated task, so generate an interface module from the latest proto file I managed the package by linking with the repository with nexus.

This time, both client / server were developed in Java, so via gradle Defined to get the package from nexus.

2. Implementation of common processing using custom options

In API design, it is necessary to perform validation design for request parameters. In gRPC, you must always specify a type such as string or int when defining data in a proto file. Collection types such as map and set can also be defined and handled.

Therefore, it is not necessary to perform type checking on the request parameters from the client, Consideration is required for other validations such as mandatory checks and digit check.

In a proto file, you can define a file or field using Custom Options. You can take custom options from the gRPC model and implement any handling.

2-1. Custom option definition example


syntax = "proto3";

option java_multiple_files = true;
option java_package = "jp.co.sample.datastore.option.model";

package option;

import "google/protobuf/descriptor.proto";

extend google.protobuf.FieldOptions {
  bool required = 50000; //Mandatory check option
}

extend google.protobuf.FieldOptions {
  int32 strlen = 50001; //Digit check option
}

Define the custom options prepared above in any field.

syntax = "proto3";

option java_multiple_files = true;
option java_package = "jp.co.sample.datastore.common.model";

package common;

import "option/custom_option.proto"; //Import proto with custom options defined

message User {
  string user_id            = 1[(required)=true,(strlen)=8]; //Multiple options can be defined
  string user_name          = 2[(required)=true];
}

2-2. How to get custom options (Java)

This is a sample to get the custom option set in the field from User of message model defined above. In "User.getDescriptorForType (). GetFields ()", the fieldDescriptor which is the meta information of the User model is You can get it, and you can get the option information by handling the FieldDescriptor.

for(Descriptors.FieldDescriptor fds: User.getDescriptorForType().getFields()){
    System.out.println(fds.getName())
    for(Map.Entry<Descriptors.FieldDescriptor,Object> entry : fds.getOptions.getAllFields().entrySet()){
        System.out.println("option:" + entry.getKey().getName() + "=" entry.getValue());
    }
}

/*Output result*/
// user_id
// option:required=true
// option:strlen=8
// user_nm
// option:required=true

2-3. Validation implementation example

You can also check the existence of the Message Field Descriptor with "hasExtension ()". Implementing arbitrary optional validation processing for each field from the gRPC model It will be possible. Also, the gRPC model inherits a common interface class called Message type. By casting to Message type and handling FieldDescriptor, general-purpose processing can be implemented without depending on the model.


if(fds.getOptions().hasExtension(CustomOption.required)){
  //From field meta information with hasExtension"required"Check if the option exists

  Object value = fds.getOptions().getExtension(CustomOption.required); //Get the contents of the option with getExtension
  //Validation processing implementation
}

3. Allow gRPC models to explicitly handle empty strings, 0

No value is set in the model interface extracted by defining string or int in the proto file. When the value of the field is fetched, ** the default value is an empty string for stirng and 0 ** for int32 / int64.

For example, take a gRPC model from a client and use the value set in the field for the datastore Whether the client intentionally fills in empty strings or 0s to initialize when updating, with the default value of the gRPC model There is a problem that it is not possible to judge on the server side whether it is just not set (update is unnecessary) and process it.

To solve that problem, gRPC provides a wrapper class that can be determined by defining them.

3-1. Example of proto file definition using wrapper class


message Test{
  string       value1  = 1; //Cannot determine whether the empty string is set or the default value
  int32        value2  = 2; //Cannot determine if 0 is set or the default value
  StringValue  value3  = 3; //You can judge whether the empty string is set or the default value
  Int32Value   value4  = 4; //Can be determined whether 0 is set or the default value
}

3-2. Value existence check implementation example


    Test.Builder testBuilder = Test.newBuilder();

    //Explicitly empty string,Set to 0
    testBuilder
        .setValue1("")
        .setValue2(0)
        .setValue3(StringValue.newBuilder().setValue(""))
        .setValue4(Int32Value.newBuilder().setValue(0))
        ;

    for(Descriptors.FieldDescriptor fds : testBuilder.build().getDescriptorForType().getFields()) {
        if (testBuilder.hasField(fds)) {
            System.out.println(fds.getName() + " has field");
        } else {
            System.out.println(fds.getName() + " has not field");
        }
    }

    /*Output example*/
    // value1 has not field
    // value2 has not field
    // value3 has field
    // value4 has field

4. Dynamically generate queries from gRPC models

Since Cassandra was used as the data store, for Cassandra tables I had to implement my own query called CQL to do CRUD operations.

Cassandra's CQL is basically SQL-based, so it's relatively intuitive to implement, For CAS-aware queries and deep structural hierarchy items (frozen UDT) for simultaneous update control Developers are aware of queries that cannot be expressed in SQL, such as the addition and deletion of Update statements, Maps, and Set elements. Since it had to be implemented, it can be CRUDed to the datastore by passing the gRPC model class as an argument. The process was hidden. (KVS version OR / mapper-like)

The point of dynamically generating a query from the Model class was also described in the custom option example. The point is that processing can be implemented for general purposes by handling the Field Descriptor using the Message type. When designing common processing for your gRPC model, be aware of using the Message type.

4.1 Cql SELECT statement implementation example


	public BuiltStatement select(Message message) {
		BuiltStatement select;
		try {
			//Table name set
			String table = message.getDescriptorForType().getOptions().getExtension(CustomOption.entityOptions)
					.getTableName();

			//CQL generation
			Select.Selection selection = QueryBuilder.select();
			Map<String, Object> partitionKeyMap = new HashMap<>();

			for (Descriptors.FieldDescriptor fds : message.getDescriptorForType().getFields()) {

				//SELECT clause creation
				if (fds.getName().equals("select_enum")) {
					if (message.getRepeatedFieldCount(fds) > 0) {
						IntStream.range(0, message.getRepeatedFieldCount(fds)).forEach(
								i -> selection.column(message.getRepeatedField(fds, i).toString()));
					} else {
						selection.all();
					}
				}

				//Partition key extraction
				if (fds.getOptions().getExtension(CustomOption.attributeOptions).getPartitionKey() > 0
						|| fds.getOptions().getExtension(CustomOption.attributeOptions).getClusteringKey() > 0) {
					partitionKeyMap.put(fds.getName(), message.getField(fds));
				}
			}

			//FROM clause generation
			select = selection.json().from(getTableMetadata(table));

			//WHERE clause creation
			for (Map.Entry<String, Object> entry : partitionKeyMap.entrySet()) {

				Object value = entry.getValue();

				if (value instanceof String) {
					((Select) select).where(eq(entry.getKey(), value));
				} else if 
                    ...Type discrimination processing omitted
				} else {
					logger.debug("The partition type is incorrect");
					throw new RuntimeException("unsupported type");
				}
			}
			return select;
		} catch (Exception e) {
			e.printStackTrace();
			throw new RuntimeException(e);
		}
	}

Not only Cassandra but also ElasticSearch as a full-text search engine is used. Queries to be thrown to ElasticSearch also use the above Message class to dynamically generate queries from the gRPC model. I designed it so that app developers can CRUD into data stores without having to implement queries directly.

5. Convenient processing tips using the gRPC model

As I mentioned a bit above, here are some tips to keep in mind when working with gRPC models. This time, it is implemented in gRPC-Java, so please keep it as a reference when implementing it in a language other than Java. (I didn't have enough time, so I'll add tips later ...)

5-1. Output in Json format from gRPC model

Output Json format from gRPC model. If preservingProtoFieldNames is added, the field names defined in proto will be output. If you do not add preservingProtoFieldNames, it will be output in Camel case, so use it properly depending on the purpose.


JsonFormat.printer().preservingProtoFieldNames().print(gRPC model) //Output with field name according to proto definition
JsonFormat.printer().print(gRPC model) //Output in camel case

5-2. GRPC model type determination


for (Descriptors.FieldDescriptor fds :gRPC model.getDescriptorForType().getFields()) {
	if (fds.isMapField()) {
        //Determine if the field is a Map type
	} else if (fds).isRepeated()) {
        //Determine if the field is a Set type
	} else {
        //Types other than collection
    }
}

5-3. Get the value by specifying the field name from the Message class


String val = (String) messageModel.getField(messageModel.getDescriptorForType().findFieldByName("Field name"));

5-4. Merge between gRPC models

An example of merging values from one model to another. By using .ignoringUnknownFields (), even if the target field does not exist in the merge destination, it will be ignored.


JsonFormat.parser().ignoringUnknownFields().merge(
		JsonFormat.printer().preservingProtoFieldNames().print(merge original model),Merge destination model);

[JAVA] I tried to summarize the key points of gRPC design and development