SageMaker is a service that provides a complete set of machine learning workloads. Using the data stored in S3 etc., it provides all the functions required for machine learning projects such as model development with Jupyter notebook, code management with Git repository, training job creation, hosting of inference endpoints. I will.
I read "Deploying a PyTorch model for large-scale inference using TorchServe" on the Amazon Web Services blog and tried hosting the model using Amazon SageMaker. Below, we will introduce the procedure and the story around it.
Please see this article for the transformation part of the model.
First, create a bucket in S3. This time I created a bucket named torchserve-model
. The region is "Asia Pacific (Tokyo)" and everything except the name is the default.
When you open the Amazon SageMaker console, you'll see a menu in the left pane.
Select Notebook Instance from the Notebook menu and click Create Notebook Instance. Set the following items for instance settings, and set the others as default.
--Notebook instance settings --Notebook instance name: sagemaker-sample --Permissions and encryption --IAM Role: Create a new role
On the IAM role creation screen, specify the S3 bucket you created earlier.
After entering the settings, click Create Notebook Instance. You will be returned to the notebook instance screen, so click the name of the created instance to enter the details screen. From the IAM role ARN link, open the IAM screen, click "Attach Policy", and attach the "Amazon EC2ContainerRegistryFullAccess" policy. This is the policy you will need to work with ECR later.
When the status becomes In service, start JupyterLab with "Open JupyterLab".
First, start Terminal from Other of Laucher.
sh-4.2$ ls
anaconda3 Nvidia_Cloud_EULA.pdf sample-notebooks tools
examples README sample-notebooks-1594876987 tutorials
LICENSE SageMaker src
sh-4.2$ ls SageMaker/
lost+found
The explorer on the left side of the screen displays the files under SageMaker /
.
Git is also installed.
sh-4.2$ git --version
git version 2.14.5
In the following we will create a notebook and host the model, but you can do the same with the tutorial notebook. You can clone the sample code with SageMaker /
.
sh-4.2$ cd SageMaker
sh-4.2$ git clone https://github.com/shashankprasanna/torchserve-examples.git
All the steps are described in deploy_torchserve.ipynb
. When you open your notebook, you will be asked which Python kernel to use, so select conda_pytorch_p36
.
First, create a new folder from the folder button in the left pane, and double-click to enter the created folder. Then create a notebook.
Select the notebook with conda_pytorch_p36
. Rename the notebook to deploy_torchserve.ipynb
.
Perform an installation of the library that transforms the Pytorch model for deployment in the cell.
deploy_torchserve.ipynb
!git clone https://github.com/pytorch/serve.git
!pip install serve/model-archiver/
This time we will host the densenet161
model. Download the trained weights file. Also, since the sample model class is included in the library cloned earlier, use the weight file and class to convert it to the hosted format.
deploy_torchserve.ipynb
!wget -q https://download.pytorch.org/models/densenet161-8d451a50.pth
deploy_torchserve.ipynb
model_file_name = 'densenet161'
!torch-model-archiver --model-name {model_file_name} \
--version 1.0 --model-file serve/examples/image_classifier/densenet_161/model.py \
--serialized-file densenet161-8d451a50.pth \
--extra-files serve/examples/image_classifier/index_to_name.json \
--handler image_classifier
When executed, densenet161.mar
will be output to the current directory.
Store the created file in S3.
deploy_torchserve.ipynb
#Create a boto3 session to get region and account information
import boto3, time, json
sess = boto3.Session()
sm = sess.client('sagemaker')
region = sess.region_name
account = boto3.client('sts').get_caller_identity().get('Account')
import sagemaker
role = sagemaker.get_execution_role()
sagemaker_session = sagemaker.Session(boto_session=sess)
#By the way, the contents are as follows.
# print(region, account, role)
# ap-northeast-1
# xxxxxxxxxxxx
# arn:aws:iam::xxxxxxxxxxxx:role/service-role/AmazonSageMaker-ExecutionRole-20200716T140377
deploy_torchserve.ipynb
#Specify the Amazon SageMaker S3 bucket name
bucket_name = 'torchserve-model'
prefix = 'torchserve'
# print(bucket_name, prefix)
# sagemaker-ap-northeast-1-xxxxxxxxxxxx torchserve
deploy_torchserve.ipynb
#Amazon SageMaker has a tar model.Since it is assumed to be in the gz file, densenet161.Compressed tar from mar file.Create a gz file.
!tar cvfz {model_file_name}.tar.gz densenet161.mar
deploy_torchserve.ipynb
#Upload your model to an S3 bucket under your model's directory.
!aws s3 cp {model_file_name}.tar.gz s3://{bucket_name}/{prefix}/models/
Then create the container registry with ECR.
deploy_torchserve.ipynb
registry_name = 'torchserve'
!aws ecr create-repository --repository-name torchserve
# {
# "repository": {
# "repositoryArn": "arn:aws:ecr:ap-northeast-1:xxxxxxxxxxxx:repository/torchserve",
# "registryId": "xxxxxxxxxxxx:repository",
# "repositoryName": "torchserve",
# "repositoryUri": "xxxxxxxxxxxx:repository.dkr.ecr.ap-northeast-1.amazonaws.com/torchserve",
# "createdAt": 1594893256.0,
# "imageTagMutability": "MUTABLE",
# "imageScanningConfiguration": {
# "scanOnPush": false
# }
# }
# }
Once away from the notebook, click the "+" button in the left pane and select "Text File" from Launcher to create a Docker file.
Dockerfile
FROM ubuntu:18.04
ENV PYTHONUNBUFFERED TRUE
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
fakeroot \
ca-certificates \
dpkg-dev \
g++ \
python3-dev \
openjdk-11-jdk \
curl \
vim \
&& rm -rf /var/lib/apt/lists/* \
&& cd /tmp \
&& curl -O https://bootstrap.pypa.io/get-pip.py \
&& python3 get-pip.py
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1
RUN update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1
RUN pip install --no-cache-dir psutil \
--no-cache-dir torch \
--no-cache-dir torchvision
ADD serve serve
RUN pip install ../serve/
COPY dockerd-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh
RUN chmod +x /usr/local/bin/dockerd-entrypoint.sh
RUN mkdir -p /home/model-server/ && mkdir -p /home/model-server/tmp
COPY config.properties /home/model-server/config.properties
WORKDIR /home/model-server
ENV TEMP=/home/model-server/tmp
ENTRYPOINT ["/usr/local/bin/dockerd-entrypoint.sh"]
CMD ["serve"]
The contents of Dockerfike are set as follows.
--PYTHONUNBUFFERED TRUE
prevents stdout and stderr from buffering.
--If you set DEBIAN_FRONTEND = noninteractive
, No interactive settings.
----no-install-recommends
is not required, do not install recommended packages.
--ʻUpdate-alternatives` [changes priority] for Python and pip to use (https://codechacha.com/en/change-python-version/).
Create dockerd-entrypoint.sh
and config.properties
as well.
dockerd-entrypoint.sh
#!/bin/bash
set -e
if [[ "$1" = "serve" ]]; then
shift 1
printenv
ls /opt
torchserve --start --ts-config /home/model-server/config.properties
else
eval "$@"
fi
# prevent docker exit
tail -f /dev/null
The following code is written for the shell script.
-- set -e
: If there is an error, the shell script will be stopped there.
--$ 1
: This is the first argument.
--shift 1
: Shifts the order of the arguments. This allows you to pass arguments to the next command as if they were given from the beginning.
--printenv
: Print the contents of environment variables. * It will be output to CloudWatch logs, which will be introduced later.
--ʻEval "$ @" : Expand the argument as a command and execute that command. Used when executing commands other than
serve. --
tail -f / dev / null`: Dummy command to keep the container running.
config.properties
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
number_of_netty_threads=32
job_queue_size=1000
model_store=/opt/ml/model
load_models=all
It is a supplement about the setting. See here for more information.
--number_of_netty_threads
: The total number of threads on the front end, defaulting to the number of logical processors available in the JVM.
--job_queue_size
: The number of inference jobs that the front end queues before the back end serves, defaults to 100.
--model_store
: Model storage location. * When using SageMaker, the model is stored from S3 in / opt / ml / model /
.
--load_models
: Same effect as –models
at startup. Specify the model to deploy. When ʻall, deploy all the models stored in
model_store`.
Create a container image and store it in the registry. v1
is the image tag, and ʻimageis the image name including the tag. When using ECR, give an image name according to the rules of <registry name> / <image name>: <tag>. <Registry name> matches the return value
repositoryUri` when the registry was created.
The build took about 15 minutes.
deploy_torchserve.ipynb
image_label = 'v1'
image = f'{account}.dkr.ecr.{region}.amazonaws.com/{registry_name}:{image_label}'
# print(image_label, image)
# v1 xxxxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/torchserve:v1
deploy_torchserve.ipynb
!docker build -t {registry_name}:{image_label} .
!$(aws ecr get-login --no-include-email --region {region})
!docker tag {registry_name}:{image_label} {image}
!docker push {image}
# Sending build context to Docker daemon 399.7MB
# Step 1/16 : FROM ubuntu:18.04
# 18.04: Pulling from library/ubuntu
# 5296b23d: Pulling fs layer
# 2a4a0f38: Pulling fs layer
# ...
# 9d6bc5ec: Preparing
# 0faa4f76: Pushed 1.503GB/1.499GBv1: digest:
# sha256:bb75ec50d8b0eaeea67f24ce072bce8b70262b99a826e808c35882619d093b4e size: 3247
It's finally time to host the inference endpoint. Create a model to deploy with the following code.
deploy_torchserve.ipynb
import sagemaker
from sagemaker.model import Model
from sagemaker.predictor import RealTimePredictor
role = sagemaker.get_execution_role()
model_data = f's3://{bucket_name}/{prefix}/models/{model_file_name}.tar.gz'
sm_model_name = 'torchserve-densenet161'
torchserve_model = Model(model_data = model_data,
image = image,
role = role,
predictor_cls=RealTimePredictor,
name = sm_model_name)
Deploy the endpoint with the following code. It took about 5 minutes to deploy.
deploy_torchserve.ipynb
endpoint_name = 'torchserve-endpoint-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
predictor = torchserve_model.deploy(instance_type='ml.m4.xlarge',
initial_instance_count=1,
endpoint_name = endpoint_name)
You can see the progress of the deployment in Cloud Watch logs. You can view the list of endpoints by opening the CloudWatch console, clicking Log Groups in the left pane, and typing / aws / sagemaker / Endpoints
in the search bar.
You can see the deployment log by clicking to open the details screen and checking the log in Log Stream.
If the deployment is not successful, I think it is outputting an Error. By the way, if an error occurs, it will continue to retry for about an hour to redeploy, so if you think something is wrong, you should check the log as soon as possible.
Make a request to see if it's working properly.
deploy_torchserve.ipynb
!wget -q https://s3.amazonaws.com/model-server/inputs/kitten.jpg
file_name = 'kitten.jpg'
with open(file_name, 'rb') as f:
payload = f.read()
payload = payload
response = predictor.predict(data=payload)
print(*json.loads(response), sep = '\n')
# {'tiger_cat': 0.4693359136581421}
# {'tabby': 0.4633873701095581}
# {'Egyptian_cat': 0.06456154584884644}
# {'lynx': 0.001282821292988956}
# {'plastic_bag': 0.00023323031200561672}
If you can get the predictor
instance, you can make a request by the above method, but if you make a request from the outside, you need SDK. Open a Python interactive shell on an external PC and try making a request using boto3
.
$ !wget -q https://s3.amazonaws.com/model-server/inputs/kitten.jpg
$ python
>>> import json
>>> import boto3
>>> endpoint_name = 'torchserve-endpoint-2020-07-16-13-16-12'
>>> file_name = 'kitten.jpg'
>>> with open(file_name, 'rb') as f:
... payload = f.read()
... payload = payload
>>> client = boto3.client('runtime.sagemaker',
aws_access_key_id='XXXXXXXXXXXXXXXXXXXX',
aws_secret_access_key='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
region_name='ap-northeast-1')
>>> response = client.invoke_endpoint(EndpointName=endpoint_name,
... ContentType='application/x-image',
... Body=payload)
>>> print(*json.loads(response['Body'].read()), sep = '\n')
{'tiger_cat': 0.4693359136581421}
{'tabby': 0.4633873701095581}
{'Egyptian_cat': 0.06456154584884644}
{'lynx': 0.001282821292988956}
{'plastic_bag': 0.00023323031200561672}
I was able to confirm that the response was returned correctly.
You can also check the deployed model, deployment settings, and endpoint information from the console.
How was it (laughs)? SageMaker is very convenient. Wouldn't it be a lot easier if you were hosting a bit of inference on the backend? If you want to customize the interface, More flexible customization seems to be possible, but Since TorchServe can be served by other than SageMaker (previous article), it seems better to develop it according to the TorchServe format for reuse on AWS.
Recommended Posts