I tried to make PyTorch model API in Azure environment using TorchServe

The service "Torch Serve" that can easily convert the model made with PyTorch into an API seemed to be useful, so I actually touched it and made an article. This time, the content is like a tutorial for using the basic functions of Torch Serve.

Introduction

TorchServe is an open source service jointly developed by Facebook and AWS. You can easily publish the model created with PyTorch as an API without implementing the API part at all. When converting a model to API, all you need is a general file such as a model implemented in PyTorch and a weight file. In addition to the API for inference, the API for model management and model usage history is automatically prepared.

In this article, based on the contents of Torch Serve Quick Start, I will explain the procedure to use Torch Serve on Azure. (Note that the parts that do not work with the procedure as it is in TorchServe Quick Start are corrected in this article.)

Referenced page

Other than Quick Start, I referred to "I tried hosting PyTorch's deep learning model using TorchServe". (Thank you··!) To be honest, there is a lot of overlap with the above article. So, the difference points of this article when compared with the above article are listed first.

--Introduced from Azure environment construction --Introducing two ways to use the inference API ――Three APIs are introduced in general --The Docker part is omitted in this article (I'm planning to write it separately).

Torch Serve Architecture Overview

First, I will briefly introduce the architectural configuration of Torch Serve. (From "Torch Serve Quick Start")

Management API --It is an API for managing the model to be converted to API. The client sends the following processing request through this API. --Check the status of the server --Registration/deregistration of model to be converted to API --Set the number of workers for each model
Inference API --API for making inference requests to the model.
Frontend --A component that handles requests/responses to TorchServe.
Backend --The component that controls the actual processing of the model. --A process called a "worker" is assigned to each instance of the model to perform the actual processing.
Model Store --A directory that archives models in a format that can be handled by TorchServe. --You can specify either a local directory or a directory on the cloud.

procedure

This time, we will introduce the procedure from environment construction to model API conversion in the following flow.

-Environment construction on Azure --Create an Azure VM and install the required packages -Model Archive --You must archive your model before you can deploy it with TorchServe. --Specifically, use the command line tool torch-model-archiver to convert the model to the model archive file (.mar extension) format. -Start TorchServe server --Start the TorchServe server and convert the archived model to API. -Using API --Send a request to the API. --API SSL --Make the API SSL.

Environment construction on Azure

Create an Azure VM

First, create a VM on Azure. The main setting values are as follows. Also, don't forget that you can connect via SSH.

Setting items	Set value
image	Ubuntu Server18.04LTS- Gen1
region	United States Latin America
size	Standard_NC6_Promo - 6 vcpu
Other	Default

Since it deviates from the theme of this article, I will omit the detailed steps when creating an Azure VM. Please refer to "Generate and store SSH key in Azure portal" in the Microsoft document.

Connect to the created Azure VM with SSH

Once you have created the Azure VM, connect with SSH. Visual Studio Code Remote Eplore is recommended as a tool to connect to Azure VM by SSH. Again, the specific procedure is easy to understand in "Develop on EC2 using VS Code's Remote --SSH function", so I will leave it to that. EC2 is taken as an example, but it is basically the same in Azure.

Of course, there is no problem with teraterm.

Installation of cuda related packages

After connecting to the Azure VM with SSH, install the cuda related packages with the following command. Reference: Nvidia Download Guide

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda

After the installation is complete, make sure it works.

`input`


nvidia-smi

`output`


azureuser@torch-serve-vm:~/torchserve-examples$ nvidia-smi
Fri Jan  1 09:13:03 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04    Driver Version: 460.27.04    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000FD24:00:00.0 Off |                    0 |
| N/A   45C    P0    71W / 149W |   3737MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     96934      C   /usr/bin/python3                 1377MiB |
|    0   N/A  N/A     96935      C   /usr/bin/python3                 1257MiB |
|    0   N/A  N/A     96936      C   /usr/bin/python3                 1097MiB |
+-----------------------------------------------------------------------------+

Installation of java related packages

Then install the java packages needed to run TorchServe.

sudo apt-get install openjdk-11-jdk
update-java-alternatives -l

Installation of Python related packages

Install Python related packages. TorchServe requires Python 3.8 or higher, but the Azure VM you create has Python 3.6 installed by default. Therefore, we will prepare a virtual environment once and install Python 3.8 in it.

The usage and problems of Docker are the same, so let me write a separate article ...!

First, install the packages needed to use Python.

sudo apt-get update
sudo apt-get install -y python3.8
sudo apt-get autoremove -y
sudo apt-get install python3-venv python3.8-venv python3.8-dev -y

Next, create and move a working directory for torchserve-related work.

mkdir torchserve-sample
cd torchserve-sample

Create a virtual environment with the following command and enter the virtual environment.

`Activate virtual environment`


python3.8 -m venv py38ts
source py38ts/bin/activate

The following is the state of entering the virtual environment.

`Virtual environment`


(py38ts) azureuser@vm2-torchserve:~/torchserve-sample$

Just in case, run Python and make sure the version is 3.8. If it looks like the following, the environment has been built without any problems.

`Python version check`


(py38ts) azureuser@vm-torchserve:~/torchserve-sample$ python
Python 3.8.0 (default, Oct 28 2019, 16:14:01) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

After this, install various Python packages with pip install, but an error will occur when installing the sentence piece, so first install the build tool with the following command. (Reference: "Stumbled when installing sentencepiece on ubuntu"

`Preparation for sentencepiece installation`


sudo apt-get install cmake build-essential pkg-config libgoogle-perftools-dev　-y

Pip install the required Python packages.

The official document of Torch Serve introduces that this operation is performed by another procedure, but here we will install the package directly.

`Python related package installation`


#update pip
python -m pip install -U pip
#Python package installation
pip install torch==1.7.1 torchvision==0.8.2 torchtext==0.8.1 torchaudio==0.7.2 sentencepiece psutil future pillow captum packaging transformers
#TorchServe related packages
pip install torchserve torch-model-archiver

If you can install it so far, TorhServe itself will be ready for use.

Finally, git clone the torchserve repository to get sample code to run the tutorial.

TorchServe itself can be used without executing this command. To the last, I am doing git clone to use the sample code.

`Get sample code for torchserve`


git clone https://github.com/pytorch/serve.git

You have now created an environment for the steps in this tutorial.

Model archive

In TorchServe, it is necessary to archive the target model before converting the model to API. Here, I will introduce the sample file published on TorchServe's github and the procedure of model archiving using the trained model by PyTorch.

Download public weight files

For the model to be archived, we will use densenet for image classification as an example this time. First, download the densenet weight file published by PyTorch.

`Weight file download`


wget https://download.pytorch.org/models/densenet161-8d451a50.pth

The densenet weights file now exists in the current directory. If you execute the ls command, you can confirm that the model has been downloaded as shown below.

`input`

ls

`output`


densenet161-8d451a50.pth  py38ts

Convert model to .mar format

Then archive the model of densenet161. Specifically, the operation is to convert the set of files used when making inferences in the model to .mar format files.

When archiving the model, use the command line tool torch-model-archiver. Execute the following command to put the model in a format that can be hosted on TorchServe.

`Archive the model`


# .Create a directory to save mar format files
mkdir model_store
# torch-model-In archiver, from the model script file and the trained parameters.Create a mar format file
torch-model-archiver --model-name densenet161 \
--version 1.0 --model-file serve/examples/image_classifier/densenet_161/model.py \
--serialized-file densenet161-8d451a50.pth \
--export-path model_store \
--extra-files serve/examples/image_classifier/index_to_name.json \
--handler image_classifier

You can check each parameter of torch-model-archiver with the following command. It can also be found in the Torch Serve documentation (https://github.com/pytorch/serve/blob/master/model-archiver/README.md#torch-model-archiver-for-torchserve).

`Parameter confirmation`


torch-model-archiver -h

Here, the meaning of the parameters set this time is described.

Parameters	meaning
--model-name	The name of the model when handled by Torch Serve. On TorchServe, this model is treated as "densenet161"
--version	The version of the model to register.
--model-file	Contains the model classes implemented in PyTorch.It is a py format file. This time, I am using the densenet class provided for the TorchServe sample. When actually creating it, PyTorch`torch.nn.Models`Specifies the model file that inherits the class. When using TorchScript, it is not necessary to specify this parameter.
--serialized-file	A trained parameter file for the model to deploy. .pth extension or.In many cases, the pt extension (for TorchScript) is specified. This time, the densenet161 that I downloaded earlier-8d451a50.I am using pth.
--export-path	File for model archive (.The folder path to output the mar extension) is specified. This time I created it earlier,`model_store`The folder is specified.
--handler	The handler performs the following processing.It is a py file. -Model instantiation -Conversion around input / output such as pre-processing and post-processing at the time of inference Addition of processing at the time of inference ・ Addition of processing at the time of inference In the handlerDefault handler（image_classifier/object_detector/text_classifier/image_Youcanalsospecifysegmenter)CustomhandlerIt is also possible to create. This time the default handler image_Use classifier. If you specify a custom handler, you will set the handler path.
--extra-files	Model files, weight parameters, and handlers implemented in PyTorchOther thanThe file that is dependent on is specified. This time, index_to_name.json is specified. This is a file for associating the index (numeric value) of the inference result with the class name (character string), and is used in the default handler.

Check archive results

You can see that densenet161.mar (densenet archive file) is stored in the model_store directory output as the output destination as shown below.

`input`



ls model_store

`output`


densenet161.mar

API conversion of TorchServe

Next, start the TorchServe server with the following command and actually make inferences.

Launch TochServe

Start the Torch Serve server with the following command. When you do this, the densenet161 API will be hosted.

`Server startup`


torchserve --start --ncs --model-store model_store --models densenet161=densenet161.mar

The meaning of the above command is as follows.

Parameters	Meaning of parameters
--start	Start TorchServe. By the way, when you stop--Use the stop command.
--ncs	`no-config-snapshots`It is an abbreviation of, and if this is set, it will not save the state of the running server. If this is set, the model specified when the server starts can be used immediately, so this time it is set. (I'm not sure why, so I'll check it.) For details on Toch Serve's snapshot,HereIt is described in.
--model-store	.This is the directory where the files with the mar extension are saved. This time, model_I created a directory called store and saved it there.
--models	The model that the server loads (.Specify a file with a mar extension). When specifying=It takes the form. It is also possible to specify multiple models by separating them with a space. .. Model nameは、今回はモデルをアーカイブする際につけた"densenet161"Specify, Model pathは、パラメータ"--model-store"The relative path from the folder specified in is specified.

You can check each parameter of torchserve with the following command, and it is also described in TorchServe document here.

`Help display for torchserve`


torchserve -h

With this, we were able to convert the model of densenet161 to API without implementing any API part.

Use of API

torhserve has three APIs, each of which is (by default) distinguished by a port number.

API type	Overview	Default address	document
Inference API	An endpoint used when making inferences using a model	http://127.0.0.1:8080	InferenceAPI
API for model management	It is an API used for model management such as model registration, status confirmation, and worker number setting.	http://127.0.0.1:8081	ManagementAPI
API for indicators	An endpoint for checking the indicators of the specified model. You can also view model metrics on the dashboard through this API.	http://127.0.0.1:8082	MetricsAPI

For security reasons, these endpoints are only accessible from the localhost by default. The settings for accessing from other than localhost will also be described later.

Below, I will introduce each of the above three APIs a little more.

Inference API

First is the inference API. As the name implies, it's an endpoint used when making inferences using a model. The default address is http://127.0.0.1:8080. The documentation for TochServe is here (https://github.com/pytorch/serve/blob/6c56b7ddee00a14fcdfab9bedf37f011e11fdece/docs/inference_api.md).

This time, the model for image classification has been converted to API, so first download the image to be inferred.

`Image download`


curl -O https://raw.githubusercontent.com/pytorch/serve/master/docs/images/kitten_small.jpg

The downloaded image is an image of such a kitten. (From "Torch Serve Quick Start")

In fact, TochServe supports two types of APIs, REST and gRPC.

REST

First is REST API. Here we use curl to send a request to the REST API.

Send a request to the classification API with the following command and try to classify the above images.

`Inference to REST API`


curl http://127.0.0.1:8080/predictions/densenet161 -T kitten_small.jpg

Then, the result will be returned in this way.

`Inference result`


{
  "tabby": 0.5237818360328674,
  "tiger_cat": 0.18530148267745972,
  "lynx": 0.15431325137615204,
  "tiger": 0.05681790038943291,
  "Egyptian_cat": 0.047028690576553345
}

Please refer to this document for details of parameters when making a request to REST API.

gRPC

Next is gRPC API. What is gRPC in the first place? For those who say, I think this "I've just started gRPC, so I've summarized it in an easy-to-understand manner" is easy to understand. In a nutshell, just for the sake of understanding this article, the image is "I'm a REST companion, but I can execute API server methods in the same way as local methods, not URLs."

Install the Python package to call the gRPC API from Python.

`grpc package installation`


pip install -U grpcio protobuf grpcio-tools

Next, use the sample interface definition file (.proto extension) published by TorchServe's github to generate the code for the server and client.

With gRPC, you can write the interface specifications in a file with a .proto extension and generate code from it.

`Generate code for gRPC from torchserve sample`


python -m grpc_tools.protoc --proto_path=./serve/frontend/server/src/main/resources/proto/ --python_out=./serve/ts_scripts --grpc_python_out=./serve/ts_scripts ./serve/frontend/server/src/main/resources/proto/inference.proto ./serve/frontend/server/src/main/resources/proto/management.proto

Once the server and client code has been generated, execute the client code to make inferences.

`Inference execution for gRPC API`


python ./serve/ts_scripts/torchserve_grpc_client.py infer densenet161 kitten_small.jpg

The result is the same as when requesting with curl.

`Inference result`


{
  "tabby": 0.5237818360328674,
  "tiger_cat": 0.18530148267745972,
  "lynx": 0.15431325137615204,
  "tiger": 0.05681790038943291,
  "Egyptian_cat": 0.047028690576553345
}

Request to model management API

Next is the API for model management. This is an API used for model management such as model registration, status confirmation, and worker number setting. The default address is http://127.0.0.1:8081. Details can also be found in this document (https://github.com/pytorch/serve/blob/6c56b7ddee00a14fcdfab9bedf37f011e11fdece/docs/management_api.md).

First, let's check the model currently registered in torch serve using this API.

`Model confirmation`


curl http://127.0.0.1:8081/models

Then, you can see that densent161 is registered as shown below.

`Results of registered models`


{
  "models": [
    {
      "modelName": "densenet161",
      "modelUrl": "densenet161.mar"
    }
  ]
}

Next, let's register a new model. Download and archive the vgg11 model with the following command.

`Download and archive vgg11`


wget https://download.pytorch.org/models/vgg11-bbd30ac9.pth
torch-model-archiver \
--model-name vgg11 \
--version 1.0 \
--model-file ./serve/examples/image_classifier/vgg_11/model.py \
--serialized-file vgg11-bbd30ac9.pth \
--export-path model_store \
--handler ./serve/examples/image_classifier/vgg_11/vgg_handler.py \
--extra-files ./serve/examples/image_classifier/index_to_name.json

After archiving the model, it's time to use the model management API to register the model. By registering the model, you will be able to make inferences using the newly archived vgg11.

`Model registration`


curl -X POST "http://127.0.0.1:8081/models?url=vgg11.mar&initial_workers=1"

When registering the model, it must be done with the POST method. The meanings of the query parameters set here are as follows.

Parameters	meaning
url	The path of the archive file of the model to be registered is specified. This path is at server startup`--model-store`Directory set in (this time`model_store`It is a relative path from the folder). Also,.You can also specify the URL if the mar file is on the web.
initial_workers	Specifies the number of workers to allocate when inferring this model. Note that inference is not possible if the number of workers is 0.

When inferring using the vgg11 model you just registered, change the URL path and send the request as shown below.

`Inference with vgg11`


curl http://127.0.0.1:8080/predictions/vgg11 -T kitten_small.jpg

The inference result is as follows.

`result`


{
  "tabby": 0.3414705991744995,
  "Egyptian_cat": 0.3293682634830475,
  "lynx": 0.1927071064710617,
  "tiger_cat": 0.097527414560318,
  "Persian_cat": 0.009637275710701942
}

Request to API for indicators

Finally, there is the API for indicators. You can get the time taken during the inference, the number of times the inference was executed, and so on.

I will actually send the request.

`Request to metric API`


curl http://127.0.0.1:8082/metrics

You can get the cumulative queue time, request count, and inference execution time for denset161 and vgg11 as follows:

`result`


# HELP ts_queue_latency_microseconds Cumulative queue duration in microseconds
# TYPE ts_queue_latency_microseconds counter
ts_queue_latency_microseconds{uuid="973e4a46-b0d0-4662-bbf4-86d14ccef821",model_name="densenet161",model_version="default",} 1.07813245195E8
ts_queue_latency_microseconds{uuid="973e4a46-b0d0-4662-bbf4-86d14ccef821",model_name="vgg11",model_version="default",} 94.401
# HELP ts_inference_requests_total Total number of inference requests.
# TYPE ts_inference_requests_total counter
ts_inference_requests_total{uuid="973e4a46-b0d0-4662-bbf4-86d14ccef821",model_name="densenet161",model_version="default",} 2.0
ts_inference_requests_total{uuid="973e4a46-b0d0-4662-bbf4-86d14ccef821",model_name="vgg11",model_version="default",} 1.0
# HELP ts_inference_latency_microseconds Cumulative inference duration in microseconds
# TYPE ts_inference_latency_microseconds counter
ts_inference_latency_microseconds{uuid="973e4a46-b0d0-4662-bbf4-86d14ccef821",model_name="densenet161",model_version="default",} 1.0795621729599999E8
ts_inference_latency_microseconds{uuid="973e4a46-b0d0-4662-bbf4-86d14ccef821",model_name="vgg11",model_version="default",} 30823.147
(py38ts) azureuser@vm2-torchserve:~/torchserve-sample$ curl http://127.0.0.1:8080/predictions/vgg11 -T kitten_small.jpg

Although not executed in this article, it is possible to display it on the dashboard as shown below by installing an additional service. (Quoted from TorchServe Document)

SSL communication settings

Finally, I will also introduce the procedure for making the API SSL. If the server is already running, first stop the server once with the following command.

`Server outage`


torchserve --stop

Generate a private key and certificate (and public key) for SSL communication.

`Private key and certificate generation`


openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout mykey.key -out mycert.pem

This time, each setting value is set as follows.

`Settings when creating a private key and certificate`


Country Name (2 letter code) [AU]:JP
State or Province Name (full name) [Some-State]:Tokyo
Locality Name (eg, city) []:Shinagawa
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Org
Organizational Unit Name (eg, section) []:OrgUnit
Common Name (e.g. server FQDN or YOUR name) []:torchserve-sample
Email Address []:[email protected]

Next, create a file called config.properties in the current directory. This file is a configuration file for describing settings related to the server. For details on the contents that can be set in config.properties, refer to this document.

`config.Properties file creation`


touch config.properties

Set the config.properties you just created as follows.

`config.properties`


inference_address=https://127.0.0.1:8443
management_address=https://127.0.0.1:8444
metrics_address=https://127.0.0.1:8445
private_key_file=mykey.key
certificate_file=mycert.pem

In the above file, the IP address is changed so that it can be accessed from the outside, and the port number is also changed to clearly indicate that it is SSL. In addition, the private key and certificate created earlier for use in SSL communication are specified.

Then, start the torchserve server again with the following command.

`Start torch serve`


torchserve --start --model-store model_store --models densenet161=densenet161.mar vgg11=vgg11.mar --ts-config config.properties

There are two major differences from when torcserve is started for the first time.

--ts-config=config.properties --The first time I used the default settings, so I didn't set anything, but this time, I specified the config.properties created earlier. Torchserve will now start the server based on the settings in config.properties.
--models densenet161=densenet161.mar vgg11=vgg11.mar --At the first time, only densenet161 was used, but this time the added vgg11 is also included in the API conversion model.

Now let's send a request to the SSL-enabled API. When using densenet161 and when using vgg11, they are as follows.

"-K" is added at the end of the request to force the certificate to be trusted.

`Request execution to SSL-enabled API`


curl "https://127.0.0.1:8443/predictions/vgg11" -T kitten_small.jpg -k
curl "https://127.0.0.1:8443/predictions/densenet161" -T kitten_small.jpg -k

What I couldn't touch this time

This time, I focused on introducing the basic functions, so I could not introduce the following elements. I would like to write a separate article about these.

--Deploying using Docker --API deployment with authentication function --Deploy to other than local environment (Azure Kubenetes Services, etc.) --Various log settings

Summary

--TorchServe allows you to deploy a model without implementing the API part --Three APIs are automatically created for inference, model management, and indicators. --In particular, the inference API can be used not only with REST but also with gRPC. --By describing the settings in config.properties, SSL can be used.