The service "Torch Serve" that can easily convert the model made with PyTorch into an API seemed to be useful, so I actually touched it and made an article. This time, the content is like a tutorial for using the basic functions of Torch Serve.
TorchServe is an open source service jointly developed by Facebook and AWS. You can easily publish the model created with PyTorch as an API without implementing the API part at all. When converting a model to API, all you need is a general file such as a model implemented in PyTorch and a weight file. In addition to the API for inference, the API for model management and model usage history is automatically prepared.
In this article, based on the contents of Torch Serve Quick Start, I will explain the procedure to use Torch Serve on Azure. (Note that the parts that do not work with the procedure as it is in TorchServe Quick Start are corrected in this article.)
Other than Quick Start, I referred to "I tried hosting PyTorch's deep learning model using TorchServe". (Thank you··!) To be honest, there is a lot of overlap with the above article. So, the difference points of this article when compared with the above article are listed first.
--Introduced from Azure environment construction --Introducing two ways to use the inference API ――Three APIs are introduced in general --The Docker part is omitted in this article (I'm planning to write it separately).
First, I will briefly introduce the architectural configuration of Torch Serve. (From "Torch Serve Quick Start")
This time, we will introduce the procedure from environment construction to model API conversion in the following flow.
-Environment construction on Azure
--Create an Azure VM and install the required packages
-Model Archive
--You must archive your model before you can deploy it with TorchServe.
--Specifically, use the command line tool torch-model-archiver
to convert the model to the model archive file (.mar extension) format.
-Start TorchServe server
--Start the TorchServe server and convert the archived model to API.
-Using API
--Send a request to the API.
--API SSL
--Make the API SSL.
First, create a VM on Azure. The main setting values are as follows. Also, don't forget that you can connect via SSH.
Setting items | Set value |
---|---|
image | Ubuntu Server18.04LTS- Gen1 |
region | United States Latin America |
size | Standard_NC6_Promo - 6 vcpu |
Other | Default |
Since it deviates from the theme of this article, I will omit the detailed steps when creating an Azure VM. Please refer to "Generate and store SSH key in Azure portal" in the Microsoft document.
Once you have created the Azure VM, connect with SSH. Visual Studio Code Remote Eplore is recommended as a tool to connect to Azure VM by SSH. Again, the specific procedure is easy to understand in "Develop on EC2 using VS Code's Remote --SSH function", so I will leave it to that. EC2 is taken as an example, but it is basically the same in Azure.
After connecting to the Azure VM with SSH, install the cuda related packages with the following command. Reference: Nvidia Download Guide
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda
After the installation is complete, make sure it works.
input
nvidia-smi
output
azureuser@torch-serve-vm:~/torchserve-examples$ nvidia-smi
Fri Jan 1 09:13:03 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 0000FD24:00:00.0 Off | 0 |
| N/A 45C P0 71W / 149W | 3737MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 96934 C /usr/bin/python3 1377MiB |
| 0 N/A N/A 96935 C /usr/bin/python3 1257MiB |
| 0 N/A N/A 96936 C /usr/bin/python3 1097MiB |
+-----------------------------------------------------------------------------+
Then install the java packages needed to run TorchServe.
sudo apt-get install openjdk-11-jdk
update-java-alternatives -l
Install Python related packages. TorchServe requires Python 3.8 or higher, but the Azure VM you create has Python 3.6 installed by default. Therefore, we will prepare a virtual environment once and install Python 3.8 in it.
First, install the packages needed to use Python.
sudo apt-get update
sudo apt-get install -y python3.8
sudo apt-get autoremove -y
sudo apt-get install python3-venv python3.8-venv python3.8-dev -y
Next, create and move a working directory for torchserve-related work.
mkdir torchserve-sample
cd torchserve-sample
Create a virtual environment with the following command and enter the virtual environment.
Activate virtual environment
python3.8 -m venv py38ts
source py38ts/bin/activate
The following is the state of entering the virtual environment.
Virtual environment
(py38ts) azureuser@vm2-torchserve:~/torchserve-sample$
Just in case, run Python and make sure the version is 3.8. If it looks like the following, the environment has been built without any problems.
Python version check
(py38ts) azureuser@vm-torchserve:~/torchserve-sample$ python
Python 3.8.0 (default, Oct 28 2019, 16:14:01)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
After this, install various Python packages with pip install, but an error will occur when installing the sentence piece, so first install the build tool with the following command. (Reference: "Stumbled when installing sentencepiece on ubuntu"
Preparation for sentencepiece installation
sudo apt-get install cmake build-essential pkg-config libgoogle-perftools-dev -y
Pip install the required Python packages.
Python related package installation
#update pip
python -m pip install -U pip
#Python package installation
pip install torch==1.7.1 torchvision==0.8.2 torchtext==0.8.1 torchaudio==0.7.2 sentencepiece psutil future pillow captum packaging transformers
#TorchServe related packages
pip install torchserve torch-model-archiver
If you can install it so far, TorhServe itself will be ready for use.
Finally, git clone the torchserve repository to get sample code to run the tutorial.
Get sample code for torchserve
git clone https://github.com/pytorch/serve.git
You have now created an environment for the steps in this tutorial.
In TorchServe, it is necessary to archive the target model before converting the model to API. Here, I will introduce the sample file published on TorchServe's github and the procedure of model archiving using the trained model by PyTorch.
For the model to be archived, we will use densenet for image classification as an example this time. First, download the densenet weight file published by PyTorch.
Weight file download
wget https://download.pytorch.org/models/densenet161-8d451a50.pth
The densenet weights file now exists in the current directory. If you execute the ls command, you can confirm that the model has been downloaded as shown below.
input
ls
output
densenet161-8d451a50.pth py38ts
Then archive the model of densenet161. Specifically, the operation is to convert the set of files used when making inferences in the model to .mar format files.
When archiving the model, use the command line tool torch-model-archiver
.
Execute the following command to put the model in a format that can be hosted on TorchServe.
Archive the model
# .Create a directory to save mar format files
mkdir model_store
# torch-model-In archiver, from the model script file and the trained parameters.Create a mar format file
torch-model-archiver --model-name densenet161 \
--version 1.0 --model-file serve/examples/image_classifier/densenet_161/model.py \
--serialized-file densenet161-8d451a50.pth \
--export-path model_store \
--extra-files serve/examples/image_classifier/index_to_name.json \
--handler image_classifier
You can check each parameter of torch-model-archiver with the following command. It can also be found in the Torch Serve documentation (https://github.com/pytorch/serve/blob/master/model-archiver/README.md#torch-model-archiver-for-torchserve).
Parameter confirmation
torch-model-archiver -h
Here, the meaning of the parameters set this time is described.
Parameters | meaning |
---|---|
--model-name | The name of the model when handled by Torch Serve. On TorchServe, this model is treated as "densenet161" |
--version | The version of the model to register. |
--model-file | Contains the model classes implemented in PyTorch.It is a py format file. This time, I am using the densenet class provided for the TorchServe sample. When actually creating it, PyTorch torch.nn.Models Specifies the model file that inherits the class.When using TorchScript, it is not necessary to specify this parameter. |
--serialized-file | A trained parameter file for the model to deploy. .pth extension or.In many cases, the pt extension (for TorchScript) is specified. This time, the densenet161 that I downloaded earlier-8d451a50.I am using pth. |
--export-path | File for model archive (.The folder path to output the mar extension) is specified. This time I created it earlier, model_store The folder is specified. |
--handler | The handler performs the following processing.It is a py file. -Model instantiation -Conversion around input / output such as pre-processing and post-processing at the time of inference Addition of processing at the time of inference ・ Addition of processing at the time of inference In the handlerDefault handler(image_classifier/object_detector/text_classifier/image_Youcanalsospecifysegmenter)CustomhandlerIt is also possible to create. This time the default handler image_Use classifier. If you specify a custom handler, you will set the handler path. |
--extra-files | Model files, weight parameters, and handlers implemented in PyTorchOther thanThe file that is dependent on is specified. This time, index_to_name.json is specified. This is a file for associating the index (numeric value) of the inference result with the class name (character string), and is used in the default handler. |
You can see that densenet161.mar
(densenet archive file) is stored in the model_store
directory output as the output destination as shown below.
input
ls model_store
output
densenet161.mar
Next, start the TorchServe server with the following command and actually make inferences.
Start the Torch Serve server with the following command. When you do this, the densenet161 API will be hosted.
Server startup
torchserve --start --ncs --model-store model_store --models densenet161=densenet161.mar
The meaning of the above command is as follows.
Parameters | Meaning of parameters |
---|---|
--start | Start TorchServe. By the way, when you stop--Use the stop command. |
--ncs | no-config-snapshots It is an abbreviation of, and if this is set, it will not save the state of the running server.If this is set, the model specified when the server starts can be used immediately, so this time it is set. (I'm not sure why, so I'll check it.) For details on Toch Serve's snapshot,HereIt is described in. |
--model-store | .This is the directory where the files with the mar extension are saved. This time, model_I created a directory called store and saved it there. |
--models | The model that the server loads (.Specify a file with a mar extension). When specifying It is also possible to specify multiple models by separating them with a space. .. Model nameは、今回はモデルをアーカイブする際につけた"densenet161"Specify, Model pathは、パラメータ"--model-store"The relative path from the folder specified in is specified. |
You can check each parameter of torchserve with the following command, and it is also described in TorchServe document here.
Help display for torchserve
torchserve -h
With this, we were able to convert the model of densenet161 to API without implementing any API part.
torhserve has three APIs, each of which is (by default) distinguished by a port number.
API type | Overview | Default address | document |
---|---|---|---|
Inference API | An endpoint used when making inferences using a model | http://127.0.0.1:8080 | InferenceAPI |
API for model management | It is an API used for model management such as model registration, status confirmation, and worker number setting. | http://127.0.0.1:8081 | ManagementAPI |
API for indicators | An endpoint for checking the indicators of the specified model. You can also view model metrics on the dashboard through this API. |
http://127.0.0.1:8082 | MetricsAPI |
For security reasons, these endpoints are only accessible from the localhost by default. The settings for accessing from other than localhost will also be described later.
Below, I will introduce each of the above three APIs a little more.
First is the inference API. As the name implies, it's an endpoint used when making inferences using a model. The default address is http://127.0.0.1:8080. The documentation for TochServe is here (https://github.com/pytorch/serve/blob/6c56b7ddee00a14fcdfab9bedf37f011e11fdece/docs/inference_api.md).
This time, the model for image classification has been converted to API, so first download the image to be inferred.
Image download
curl -O https://raw.githubusercontent.com/pytorch/serve/master/docs/images/kitten_small.jpg
The downloaded image is an image of such a kitten. (From "Torch Serve Quick Start")
In fact, TochServe supports two types of APIs, REST and gRPC.
REST
First is REST API. Here we use curl to send a request to the REST API.
Send a request to the classification API with the following command and try to classify the above images.
Inference to REST API
curl http://127.0.0.1:8080/predictions/densenet161 -T kitten_small.jpg
Then, the result will be returned in this way.
Inference result
{
"tabby": 0.5237818360328674,
"tiger_cat": 0.18530148267745972,
"lynx": 0.15431325137615204,
"tiger": 0.05681790038943291,
"Egyptian_cat": 0.047028690576553345
}
Please refer to this document for details of parameters when making a request to REST API.
gRPC
Next is gRPC API. What is gRPC in the first place? For those who say, I think this "I've just started gRPC, so I've summarized it in an easy-to-understand manner" is easy to understand. In a nutshell, just for the sake of understanding this article, the image is "I'm a REST companion, but I can execute API server methods in the same way as local methods, not URLs."
Install the Python package to call the gRPC API from Python.
grpc package installation
pip install -U grpcio protobuf grpcio-tools
Next, use the sample interface definition file (.proto extension) published by TorchServe's github to generate the code for the server and client.
Generate code for gRPC from torchserve sample
python -m grpc_tools.protoc --proto_path=./serve/frontend/server/src/main/resources/proto/ --python_out=./serve/ts_scripts --grpc_python_out=./serve/ts_scripts ./serve/frontend/server/src/main/resources/proto/inference.proto ./serve/frontend/server/src/main/resources/proto/management.proto
Once the server and client code has been generated, execute the client code to make inferences.
Inference execution for gRPC API
python ./serve/ts_scripts/torchserve_grpc_client.py infer densenet161 kitten_small.jpg
The result is the same as when requesting with curl.
Inference result
{
"tabby": 0.5237818360328674,
"tiger_cat": 0.18530148267745972,
"lynx": 0.15431325137615204,
"tiger": 0.05681790038943291,
"Egyptian_cat": 0.047028690576553345
}
Next is the API for model management. This is an API used for model management such as model registration, status confirmation, and worker number setting. The default address is http://127.0.0.1:8081. Details can also be found in this document (https://github.com/pytorch/serve/blob/6c56b7ddee00a14fcdfab9bedf37f011e11fdece/docs/management_api.md).
First, let's check the model currently registered in torch serve using this API.
Model confirmation
curl http://127.0.0.1:8081/models
Then, you can see that densent161 is registered as shown below.
Results of registered models
{
"models": [
{
"modelName": "densenet161",
"modelUrl": "densenet161.mar"
}
]
}
Next, let's register a new model. Download and archive the vgg11 model with the following command.
Download and archive vgg11
wget https://download.pytorch.org/models/vgg11-bbd30ac9.pth
torch-model-archiver \
--model-name vgg11 \
--version 1.0 \
--model-file ./serve/examples/image_classifier/vgg_11/model.py \
--serialized-file vgg11-bbd30ac9.pth \
--export-path model_store \
--handler ./serve/examples/image_classifier/vgg_11/vgg_handler.py \
--extra-files ./serve/examples/image_classifier/index_to_name.json
After archiving the model, it's time to use the model management API to register the model. By registering the model, you will be able to make inferences using the newly archived vgg11.
Model registration
curl -X POST "http://127.0.0.1:8081/models?url=vgg11.mar&initial_workers=1"
When registering the model, it must be done with the POST method. The meanings of the query parameters set here are as follows.
Parameters | meaning |
---|---|
url | The path of the archive file of the model to be registered is specified. This path is at server startup --model-store Directory set in (this timemodel_store It is a relative path from the folder).Also,.You can also specify the URL if the mar file is on the web. |
initial_workers | Specifies the number of workers to allocate when inferring this model. Note that inference is not possible if the number of workers is 0. |
When inferring using the vgg11 model you just registered, change the URL path and send the request as shown below.
Inference with vgg11
curl http://127.0.0.1:8080/predictions/vgg11 -T kitten_small.jpg
The inference result is as follows.
result
{
"tabby": 0.3414705991744995,
"Egyptian_cat": 0.3293682634830475,
"lynx": 0.1927071064710617,
"tiger_cat": 0.097527414560318,
"Persian_cat": 0.009637275710701942
}
Finally, there is the API for indicators. You can get the time taken during the inference, the number of times the inference was executed, and so on.
I will actually send the request.
Request to metric API
curl http://127.0.0.1:8082/metrics
You can get the cumulative queue time, request count, and inference execution time for denset161 and vgg11 as follows:
result
# HELP ts_queue_latency_microseconds Cumulative queue duration in microseconds
# TYPE ts_queue_latency_microseconds counter
ts_queue_latency_microseconds{uuid="973e4a46-b0d0-4662-bbf4-86d14ccef821",model_name="densenet161",model_version="default",} 1.07813245195E8
ts_queue_latency_microseconds{uuid="973e4a46-b0d0-4662-bbf4-86d14ccef821",model_name="vgg11",model_version="default",} 94.401
# HELP ts_inference_requests_total Total number of inference requests.
# TYPE ts_inference_requests_total counter
ts_inference_requests_total{uuid="973e4a46-b0d0-4662-bbf4-86d14ccef821",model_name="densenet161",model_version="default",} 2.0
ts_inference_requests_total{uuid="973e4a46-b0d0-4662-bbf4-86d14ccef821",model_name="vgg11",model_version="default",} 1.0
# HELP ts_inference_latency_microseconds Cumulative inference duration in microseconds
# TYPE ts_inference_latency_microseconds counter
ts_inference_latency_microseconds{uuid="973e4a46-b0d0-4662-bbf4-86d14ccef821",model_name="densenet161",model_version="default",} 1.0795621729599999E8
ts_inference_latency_microseconds{uuid="973e4a46-b0d0-4662-bbf4-86d14ccef821",model_name="vgg11",model_version="default",} 30823.147
(py38ts) azureuser@vm2-torchserve:~/torchserve-sample$ curl http://127.0.0.1:8080/predictions/vgg11 -T kitten_small.jpg
Although not executed in this article, it is possible to display it on the dashboard as shown below by installing an additional service. (Quoted from TorchServe Document)
Finally, I will also introduce the procedure for making the API SSL. If the server is already running, first stop the server once with the following command.
Server outage
torchserve --stop
Generate a private key and certificate (and public key) for SSL communication.
Private key and certificate generation
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout mykey.key -out mycert.pem
This time, each setting value is set as follows.
Settings when creating a private key and certificate
Country Name (2 letter code) [AU]:JP
State or Province Name (full name) [Some-State]:Tokyo
Locality Name (eg, city) []:Shinagawa
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Org
Organizational Unit Name (eg, section) []:OrgUnit
Common Name (e.g. server FQDN or YOUR name) []:torchserve-sample
Email Address []:[email protected]
Next, create a file called config.properties
in the current directory.
This file is a configuration file for describing settings related to the server.
For details on the contents that can be set in config.properties
, refer to this document.
config.Properties file creation
touch config.properties
Set the config.properties
you just created as follows.
config.properties
inference_address=https://127.0.0.1:8443
management_address=https://127.0.0.1:8444
metrics_address=https://127.0.0.1:8445
private_key_file=mykey.key
certificate_file=mycert.pem
In the above file, the IP address is changed so that it can be accessed from the outside, and the port number is also changed to clearly indicate that it is SSL. In addition, the private key and certificate created earlier for use in SSL communication are specified.
Then, start the torchserve server again with the following command.
Start torch serve
torchserve --start --model-store model_store --models densenet161=densenet161.mar vgg11=vgg11.mar --ts-config config.properties
There are two major differences from when torcserve is started for the first time.
--ts-config=config.properties
--The first time I used the default settings, so I didn't set anything, but this time, I specified the config.properties
created earlier.
Torchserve will now start the server based on the settings in config.properties
.--models densenet161=densenet161.mar vgg11=vgg11.mar
--At the first time, only densenet161 was used, but this time the added vgg11 is also included in the API conversion model.Now let's send a request to the SSL-enabled API. When using densenet161 and when using vgg11, they are as follows.
Request execution to SSL-enabled API
curl "https://127.0.0.1:8443/predictions/vgg11" -T kitten_small.jpg -k
curl "https://127.0.0.1:8443/predictions/densenet161" -T kitten_small.jpg -k
This time, I focused on introducing the basic functions, so I could not introduce the following elements. I would like to write a separate article about these.
--Deploying using Docker --API deployment with authentication function --Deploy to other than local environment (Azure Kubenetes Services, etc.) --Various log settings
--TorchServe allows you to deploy a model without implementing the API part
--Three APIs are automatically created for inference, model management, and indicators.
--In particular, the inference API can be used not only with REST but also with gRPC.
--By describing the settings in config.properties
, SSL can be used.
Recommended Posts