I tried to make PyTorch model API in Azure environment using TorchServe

The service "Torch Serve" that can easily convert the model made with PyTorch into an API seemed to be useful, so I actually touched it and made an article. This time, the content is like a tutorial for using the basic functions of Torch Serve.

Introduction

TorchServe is an open source service jointly developed by Facebook and AWS. You can easily publish the model created with PyTorch as an API without implementing the API part at all. When converting a model to API, all you need is a general file such as a model implemented in PyTorch and a weight file. In addition to the API for inference, the API for model management and model usage history is automatically prepared.

In this article, based on the contents of Torch Serve Quick Start, I will explain the procedure to use Torch Serve on Azure. (Note that the parts that do not work with the procedure as it is in TorchServe Quick Start are corrected in this article.)

Referenced page

Other than Quick Start, I referred to "I tried hosting PyTorch's deep learning model using TorchServe". (Thank you··!) To be honest, there is a lot of overlap with the above article. So, the difference points of this article when compared with the above article are listed first.

--Introduced from Azure environment construction --Introducing two ways to use the inference API ――Three APIs are introduced in general --The Docker part is omitted in this article (I'm planning to write it separately).

Torch Serve Architecture Overview

First, I will briefly introduce the architectural configuration of Torch Serve. architecture.jpg (From "Torch Serve Quick Start")

procedure

This time, we will introduce the procedure from environment construction to model API conversion in the following flow.

-Environment construction on Azure --Create an Azure VM and install the required packages -Model Archive --You must archive your model before you can deploy it with TorchServe. --Specifically, use the command line tool torch-model-archiver to convert the model to the model archive file (.mar extension) format. -Start TorchServe server --Start the TorchServe server and convert the archived model to API. -Using API --Send a request to the API. --API SSL --Make the API SSL.

Environment construction on Azure

Create an Azure VM

First, create a VM on Azure. The main setting values ​​are as follows. Also, don't forget that you can connect via SSH.

Setting items Set value
image Ubuntu Server18.04LTS- Gen1
region United States Latin America
size Standard_NC6_Promo - 6 vcpu
Other Default

Since it deviates from the theme of this article, I will omit the detailed steps when creating an Azure VM. Please refer to "Generate and store SSH key in Azure portal" in the Microsoft document.

Connect to the created Azure VM with SSH

Once you have created the Azure VM, connect with SSH. Visual Studio Code Remote Eplore is recommended as a tool to connect to Azure VM by SSH. Again, the specific procedure is easy to understand in "Develop on EC2 using VS Code's Remote --SSH function", so I will leave it to that. EC2 is taken as an example, but it is basically the same in Azure.

Installation of cuda related packages

After connecting to the Azure VM with SSH, install the cuda related packages with the following command. Reference: Nvidia Download Guide

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda

After the installation is complete, make sure it works.

input


nvidia-smi

output


azureuser@torch-serve-vm:~/torchserve-examples$ nvidia-smi
Fri Jan  1 09:13:03 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04    Driver Version: 460.27.04    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000FD24:00:00.0 Off |                    0 |
| N/A   45C    P0    71W / 149W |   3737MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     96934      C   /usr/bin/python3                 1377MiB |
|    0   N/A  N/A     96935      C   /usr/bin/python3                 1257MiB |
|    0   N/A  N/A     96936      C   /usr/bin/python3                 1097MiB |
+-----------------------------------------------------------------------------+

Installation of java related packages

Then install the java packages needed to run TorchServe.

sudo apt-get install openjdk-11-jdk
update-java-alternatives -l

Installation of Python related packages

Install Python related packages. TorchServe requires Python 3.8 or higher, but the Azure VM you create has Python 3.6 installed by default. Therefore, we will prepare a virtual environment once and install Python 3.8 in it.

First, install the packages needed to use Python.

sudo apt-get update
sudo apt-get install -y python3.8
sudo apt-get autoremove -y
sudo apt-get install python3-venv python3.8-venv python3.8-dev -y 

Next, create and move a working directory for torchserve-related work.

mkdir torchserve-sample
cd torchserve-sample

Create a virtual environment with the following command and enter the virtual environment.

Activate virtual environment


python3.8 -m venv py38ts
source py38ts/bin/activate

The following is the state of entering the virtual environment.

Virtual environment


(py38ts) azureuser@vm2-torchserve:~/torchserve-sample$ 

Just in case, run Python and make sure the version is 3.8. If it looks like the following, the environment has been built without any problems.

Python version check


(py38ts) azureuser@vm-torchserve:~/torchserve-sample$ python
Python 3.8.0 (default, Oct 28 2019, 16:14:01) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

After this, install various Python packages with pip install, but an error will occur when installing the sentence piece, so first install the build tool with the following command. (Reference: "Stumbled when installing sentencepiece on ubuntu"

Preparation for sentencepiece installation


sudo apt-get install cmake build-essential pkg-config libgoogle-perftools-dev -y 

Pip install the required Python packages.

Python related package installation


#update pip
python -m pip install -U pip
#Python package installation
pip install torch==1.7.1 torchvision==0.8.2 torchtext==0.8.1 torchaudio==0.7.2 sentencepiece psutil future pillow captum packaging transformers
#TorchServe related packages
pip install torchserve torch-model-archiver

If you can install it so far, TorhServe itself will be ready for use.

Finally, git clone the torchserve repository to get sample code to run the tutorial.

Get sample code for torchserve


git clone https://github.com/pytorch/serve.git

You have now created an environment for the steps in this tutorial.

Model archive

In TorchServe, it is necessary to archive the target model before converting the model to API. Here, I will introduce the sample file published on TorchServe's github and the procedure of model archiving using the trained model by PyTorch.

Download public weight files

For the model to be archived, we will use densenet for image classification as an example this time. First, download the densenet weight file published by PyTorch.

Weight file download


wget https://download.pytorch.org/models/densenet161-8d451a50.pth

The densenet weights file now exists in the current directory. If you execute the ls command, you can confirm that the model has been downloaded as shown below.

input


ls

output


densenet161-8d451a50.pth  py38ts

Convert model to .mar format

Then archive the model of densenet161. Specifically, the operation is to convert the set of files used when making inferences in the model to .mar format files.

When archiving the model, use the command line tool torch-model-archiver. Execute the following command to put the model in a format that can be hosted on TorchServe.

Archive the model


# .Create a directory to save mar format files
mkdir model_store
# torch-model-In archiver, from the model script file and the trained parameters.Create a mar format file
torch-model-archiver --model-name densenet161 \
--version 1.0 --model-file serve/examples/image_classifier/densenet_161/model.py \
--serialized-file densenet161-8d451a50.pth \
--export-path model_store \
--extra-files serve/examples/image_classifier/index_to_name.json \
--handler image_classifier

You can check each parameter of torch-model-archiver with the following command. It can also be found in the Torch Serve documentation (https://github.com/pytorch/serve/blob/master/model-archiver/README.md#torch-model-archiver-for-torchserve).

Parameter confirmation


torch-model-archiver -h 

Here, the meaning of the parameters set this time is described.

Parameters meaning
--model-name The name of the model when handled by Torch Serve.
On TorchServe, this model is treated as "densenet161"
--version The version of the model to register.
--model-file Contains the model classes implemented in PyTorch.It is a py format file.
This time, I am using the densenet class provided for the TorchServe sample.
When actually creating it, PyTorchtorch.nn.ModelsSpecifies the model file that inherits the class.
When using TorchScript, it is not necessary to specify this parameter.
--serialized-file A trained parameter file for the model to deploy.
.pth extension or.In many cases, the pt extension (for TorchScript) is specified.
This time, the densenet161 that I downloaded earlier-8d451a50.I am using pth.
--export-path File for model archive (.The folder path to output the mar extension) is specified.
This time I created it earlier,model_storeThe folder is specified.
--handler The handler performs the following processing.It is a py file.
-Model instantiation
-Conversion around input / output such as pre-processing and post-processing at the time of inference Addition of processing at the time of inference
・ Addition of processing at the time of inference
In the handlerDefault handler(image_classifier/object_detector/text_classifier/image_Youcanalsospecifysegmenter)CustomhandlerIt is also possible to create.
This time the default handler image_Use classifier.
If you specify a custom handler, you will set the handler path.
--extra-files Model files, weight parameters, and handlers implemented in PyTorchOther thanThe file that is dependent on is specified.
This time, index_to_name.json is specified.
This is a file for associating the index (numeric value) of the inference result with the class name (character string), and is used in the default handler.

Check archive results

You can see that densenet161.mar (densenet archive file) is stored in the model_store directory output as the output destination as shown below.

input



ls model_store

output


densenet161.mar

API conversion of TorchServe

Next, start the TorchServe server with the following command and actually make inferences.

Launch TochServe

Start the Torch Serve server with the following command. When you do this, the densenet161 API will be hosted.

Server startup


torchserve --start --ncs --model-store model_store --models densenet161=densenet161.mar

The meaning of the above command is as follows.

Parameters Meaning of parameters
--start Start TorchServe.
By the way, when you stop--Use the stop command.
--ncs no-config-snapshotsIt is an abbreviation of, and if this is set, it will not save the state of the running server.
If this is set, the model specified when the server starts can be used immediately, so this time it is set. (I'm not sure why, so I'll check it.)
For details on Toch Serve's snapshot,HereIt is described in.
--model-store .This is the directory where the files with the mar extension are saved.
This time, model_I created a directory called store and saved it there.
--models The model that the server loads (.Specify a file with a mar extension).
When specifying=It takes the form.
It is also possible to specify multiple models by separating them with a space. ..
Model nameは、今回はモデルをアーカイブする際につけた"densenet161"Specify,
Model pathは、パラメータ"--model-store"The relative path from the folder specified in is specified.

You can check each parameter of torchserve with the following command, and it is also described in TorchServe document here.

Help display for torchserve


torchserve -h

With this, we were able to convert the model of densenet161 to API without implementing any API part.

Use of API

torhserve has three APIs, each of which is (by default) distinguished by a port number.

API type Overview Default address document
Inference API An endpoint used when making inferences using a model http://127.0.0.1:8080 InferenceAPI
API for model management It is an API used for model management such as model registration, status confirmation, and worker number setting. http://127.0.0.1:8081 ManagementAPI
API for indicators An endpoint for checking the indicators of the specified model.
You can also view model metrics on the dashboard through this API.
http://127.0.0.1:8082 MetricsAPI

For security reasons, these endpoints are only accessible from the localhost by default. The settings for accessing from other than localhost will also be described later.

Below, I will introduce each of the above three APIs a little more.

Inference API

First is the inference API. As the name implies, it's an endpoint used when making inferences using a model. The default address is http://127.0.0.1:8080. The documentation for TochServe is here (https://github.com/pytorch/serve/blob/6c56b7ddee00a14fcdfab9bedf37f011e11fdece/docs/inference_api.md).

This time, the model for image classification has been converted to API, so first download the image to be inferred.

Image download


curl -O https://raw.githubusercontent.com/pytorch/serve/master/docs/images/kitten_small.jpg

The downloaded image is an image of such a kitten. kitten_small.jpg (From "Torch Serve Quick Start")

In fact, TochServe supports two types of APIs, REST and gRPC.

REST

First is REST API. Here we use curl to send a request to the REST API.

Send a request to the classification API with the following command and try to classify the above images.

Inference to REST API


curl http://127.0.0.1:8080/predictions/densenet161 -T kitten_small.jpg

Then, the result will be returned in this way.

Inference result


{
  "tabby": 0.5237818360328674,
  "tiger_cat": 0.18530148267745972,
  "lynx": 0.15431325137615204,
  "tiger": 0.05681790038943291,
  "Egyptian_cat": 0.047028690576553345
}

Please refer to this document for details of parameters when making a request to REST API.

gRPC

Next is gRPC API. What is gRPC in the first place? For those who say, I think this "I've just started gRPC, so I've summarized it in an easy-to-understand manner" is easy to understand. In a nutshell, just for the sake of understanding this article, the image is "I'm a REST companion, but I can execute API server methods in the same way as local methods, not URLs."

Install the Python package to call the gRPC API from Python.

grpc package installation


pip install -U grpcio protobuf grpcio-tools

Next, use the sample interface definition file (.proto extension) published by TorchServe's github to generate the code for the server and client.

Generate code for gRPC from torchserve sample


python -m grpc_tools.protoc --proto_path=./serve/frontend/server/src/main/resources/proto/ --python_out=./serve/ts_scripts --grpc_python_out=./serve/ts_scripts ./serve/frontend/server/src/main/resources/proto/inference.proto ./serve/frontend/server/src/main/resources/proto/management.proto

Once the server and client code has been generated, execute the client code to make inferences.

Inference execution for gRPC API


python ./serve/ts_scripts/torchserve_grpc_client.py infer densenet161 kitten_small.jpg

The result is the same as when requesting with curl.

Inference result


{
  "tabby": 0.5237818360328674,
  "tiger_cat": 0.18530148267745972,
  "lynx": 0.15431325137615204,
  "tiger": 0.05681790038943291,
  "Egyptian_cat": 0.047028690576553345
}

Request to model management API

Next is the API for model management. This is an API used for model management such as model registration, status confirmation, and worker number setting. The default address is http://127.0.0.1:8081. Details can also be found in this document (https://github.com/pytorch/serve/blob/6c56b7ddee00a14fcdfab9bedf37f011e11fdece/docs/management_api.md).

First, let's check the model currently registered in torch serve using this API.

Model confirmation


curl http://127.0.0.1:8081/models

Then, you can see that densent161 is registered as shown below.

Results of registered models


{
  "models": [
    {
      "modelName": "densenet161",
      "modelUrl": "densenet161.mar"
    }
  ]
}

Next, let's register a new model. Download and archive the vgg11 model with the following command.

Download and archive vgg11


wget https://download.pytorch.org/models/vgg11-bbd30ac9.pth
torch-model-archiver \
--model-name vgg11 \
--version 1.0 \
--model-file ./serve/examples/image_classifier/vgg_11/model.py \
--serialized-file vgg11-bbd30ac9.pth \
--export-path model_store \
--handler ./serve/examples/image_classifier/vgg_11/vgg_handler.py \
--extra-files ./serve/examples/image_classifier/index_to_name.json

After archiving the model, it's time to use the model management API to register the model. By registering the model, you will be able to make inferences using the newly archived vgg11.

Model registration


curl -X POST "http://127.0.0.1:8081/models?url=vgg11.mar&initial_workers=1"

When registering the model, it must be done with the POST method. The meanings of the query parameters set here are as follows.

Parameters meaning
url The path of the archive file of the model to be registered is specified.
This path is at server startup--model-storeDirectory set in (this timemodel_storeIt is a relative path from the folder).
Also,.You can also specify the URL if the mar file is on the web.
initial_workers Specifies the number of workers to allocate when inferring this model.
Note that inference is not possible if the number of workers is 0.

When inferring using the vgg11 model you just registered, change the URL path and send the request as shown below.

Inference with vgg11


curl http://127.0.0.1:8080/predictions/vgg11 -T kitten_small.jpg

The inference result is as follows.

result


{
  "tabby": 0.3414705991744995,
  "Egyptian_cat": 0.3293682634830475,
  "lynx": 0.1927071064710617,
  "tiger_cat": 0.097527414560318,
  "Persian_cat": 0.009637275710701942
}

Request to API for indicators

Finally, there is the API for indicators. You can get the time taken during the inference, the number of times the inference was executed, and so on.

I will actually send the request.

Request to metric API


curl http://127.0.0.1:8082/metrics

You can get the cumulative queue time, request count, and inference execution time for denset161 and vgg11 as follows:

result


# HELP ts_queue_latency_microseconds Cumulative queue duration in microseconds
# TYPE ts_queue_latency_microseconds counter
ts_queue_latency_microseconds{uuid="973e4a46-b0d0-4662-bbf4-86d14ccef821",model_name="densenet161",model_version="default",} 1.07813245195E8
ts_queue_latency_microseconds{uuid="973e4a46-b0d0-4662-bbf4-86d14ccef821",model_name="vgg11",model_version="default",} 94.401
# HELP ts_inference_requests_total Total number of inference requests.
# TYPE ts_inference_requests_total counter
ts_inference_requests_total{uuid="973e4a46-b0d0-4662-bbf4-86d14ccef821",model_name="densenet161",model_version="default",} 2.0
ts_inference_requests_total{uuid="973e4a46-b0d0-4662-bbf4-86d14ccef821",model_name="vgg11",model_version="default",} 1.0
# HELP ts_inference_latency_microseconds Cumulative inference duration in microseconds
# TYPE ts_inference_latency_microseconds counter
ts_inference_latency_microseconds{uuid="973e4a46-b0d0-4662-bbf4-86d14ccef821",model_name="densenet161",model_version="default",} 1.0795621729599999E8
ts_inference_latency_microseconds{uuid="973e4a46-b0d0-4662-bbf4-86d14ccef821",model_name="vgg11",model_version="default",} 30823.147
(py38ts) azureuser@vm2-torchserve:~/torchserve-sample$ curl http://127.0.0.1:8080/predictions/vgg11 -T kitten_small.jpg

Although not executed in this article, it is possible to display it on the dashboard as shown below by installing an additional service. metrics.png (Quoted from TorchServe Document)

SSL communication settings

Finally, I will also introduce the procedure for making the API SSL. If the server is already running, first stop the server once with the following command.

Server outage


torchserve --stop

Generate a private key and certificate (and public key) for SSL communication.

Private key and certificate generation


openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout mykey.key -out mycert.pem

This time, each setting value is set as follows.

Settings when creating a private key and certificate


Country Name (2 letter code) [AU]:JP
State or Province Name (full name) [Some-State]:Tokyo
Locality Name (eg, city) []:Shinagawa
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Org
Organizational Unit Name (eg, section) []:OrgUnit
Common Name (e.g. server FQDN or YOUR name) []:torchserve-sample
Email Address []:[email protected]

Next, create a file called config.properties in the current directory. This file is a configuration file for describing settings related to the server. For details on the contents that can be set in config.properties, refer to this document.

config.Properties file creation


touch config.properties

Set the config.properties you just created as follows.

config.properties


inference_address=https://127.0.0.1:8443
management_address=https://127.0.0.1:8444
metrics_address=https://127.0.0.1:8445
private_key_file=mykey.key
certificate_file=mycert.pem

In the above file, the IP address is changed so that it can be accessed from the outside, and the port number is also changed to clearly indicate that it is SSL. In addition, the private key and certificate created earlier for use in SSL communication are specified.

Then, start the torchserve server again with the following command.

Start torch serve


torchserve --start --model-store model_store --models densenet161=densenet161.mar vgg11=vgg11.mar --ts-config config.properties

There are two major differences from when torcserve is started for the first time.

Now let's send a request to the SSL-enabled API. When using densenet161 and when using vgg11, they are as follows.

Request execution to SSL-enabled API


curl "https://127.0.0.1:8443/predictions/vgg11" -T kitten_small.jpg -k
curl "https://127.0.0.1:8443/predictions/densenet161" -T kitten_small.jpg -k

What I couldn't touch this time

This time, I focused on introducing the basic functions, so I could not introduce the following elements. I would like to write a separate article about these.

--Deploying using Docker --API deployment with authentication function --Deploy to other than local environment (Azure Kubenetes Services, etc.) --Various log settings

Summary

--TorchServe allows you to deploy a model without implementing the API part --Three APIs are automatically created for inference, model management, and indicators. --In particular, the inference API can be used not only with REST but also with gRPC. --By describing the settings in config.properties, SSL can be used.

Recommended Posts

I tried to make PyTorch model API in Azure environment using TorchServe
I tried to make a stopwatch using tkinter in python
I tried using Azure Speech to Text.
I tried to make a Web API
I tried to implement TOPIC MODEL in Python
I tried to make a ○ ✕ game using TensorFlow
PyTorch Learning Note 2 (I tried using a pre-trained model)
I tried to create API list.csv in Python from swagger.yaml
I tried to search videos using Youtube Data API (beginner)
I tried to make a simple text editor using PyQt
I tried to implement SSD with PyTorch now (model edition)
I tried to create Quip API
I tried to explain Pytorch dataset
I tried to touch Tesla's API
I tried to automate the construction of a hands-on environment using IBM Cloud's SoftLayer API
I tried using the checkio API
I tried to make a regular expression of "amount" using Python
I tried to make a regular expression of "time" using Python
I tried to implement anomaly detection using a hidden Markov model
I tried to make a regular expression of "date" using Python
I tried to analyze my favorite singer (SHISHAMO) using Spotify API
I tried to make an analysis base of 5 patterns in 3 years
I tried to make a todo application using bottle with python
[Python] I tried to get various information using YouTube Data API!
I tried using docomo speech recognition API and Google Speech API in Java
I tried to implement PLSA in Python
I tried using Twitter api and Line api
I tried to implement permutation in Python
I tried to implement PLSA in Python 2
I tried using Bayesian Optimization in Python
I tried to classify text using TensorFlow
I tried putting virtualenv in Cygwin environment
I tried to implement ADALINE in Python
I tried to touch the COTOHA API
I tried to implement PPO in Python
I tried to implement CVAE with PyTorch
I tried using the BigQuery Storage API
I tried to predict Covid-19 using Darts
I tried to execute SQL from the local environment using Looker SDK
How to make a model for object detection using YOLO in 3 hours
I learned scraping using selenium to make a horse racing prediction model.
I tried to make "Sakurai-san" a LINE BOT with API Gateway + Lambda
I tried to classify guitar chords in real time using machine learning
I tried to summarize various sentences using the automatic summarization API "summpy"
I tried hosting Pytorch's deep learning model using TorchServe on Amazon SageMaker
I tried using NVDashboard (for those who use GPU in jupyter environment)
I implemented the VGG16 model in Keras and tried to identify CIFAR10
I tried to train the RWA (Recurrent Weighted Average) model in Keras
I tried using AWS Rekognition's Detect Labels API
I tried to detect Mario with pytorch + yolov3
I tried to implement reading Dataset with PyTorch
I tried using Remote API on GAE / J
I tried to organize the evaluation indexes used in machine learning (regression model)
I tried to integrate with Keras in TFv1.1
I tried to synthesize WAV files using Pydub.
I tried to touch the API of ebay
[Azure] I tried to create a Linux virtual machine in Azure of Microsoft Learn
I tried to make AI for Smash Bros.
I tried to make a translation BOT that works on Discord using googletrans
I tried to create an environment to check regularly using Selenium with AWS Fargate
I tried to implement selection sort in python