TensorFlow Serving is a flexible, high-performance machine learning model serving system designed for production environments. With TensorFlow Serving, you can easily host a model created in TensorFlow and expose the API.
See the TensorFlow Serving Documentation (https://www.tensorflow.org/tfx/guide/serving) for more information.
This time, I used TensorFlow Serving on AWS EC2 to host a deep learning model of TensorFlow. At the end of the article, I also try it with Docker.
Enter "Deep Learning AMI" in the AMI search bar to search for the AMI you want to use. This time, I used "Deep Learning AMI (Ubuntu 18.04) Version 30.0 --ami-0b1b56cbf0f8fcea3". I used "p2.xlarge" as the instance type. The security group is set up so that ssh and http can be connected from the development environment, and all other settings are left as default.
Log in to EC2 and build the environment.
~$ ls
LICENSE README examples tools
Nvidia_Cloud_EULA.pdf anaconda3 src tutorials
The installation procedure is introduced on the Official Site.
First, add the TensorFlow Serving URI to sources.list.d.
~$ echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2943 100 2943 0 0 18166 0 --:--:-- --:--:-- --:--:-- 18166
OK
Perform the installation.
~$ sudo apt-get update && apt-get install tensorflow-model-server
~$ tensorflow_model_server --version
TensorFlow ModelServer: 1.15.0-rc2+dev.sha.1ab7d59
TensorFlow Library: 1.15.2
This completes the installation.
From here, we will create a model to deploy. First, prepare a working directory.
~$ mkdir tfexample
~$ cd tfexample
Start jupyter-lab and build the model.
~/tfexample$ jupyter-lab --no-browser --port=8888 --ip=0.0.0.0 --allow-root
...
http://127.0.0.1:8888/?token=b92a7ceefb20c7ab3e475474dbde66a771870de1d8f5bd70
...
Since there is a part where the URL is displayed in the standard output, rewrite the part of 127.0.0.1
to the IP address of the instance and access it.
When jupyer lab starts, select the kernel of conda_tensorflow2_py36 and open the notebook. Rename it to tfmodel.ipynb
.
This time I will make a model with Fashionmnist.
tfmodel.ipynb
import sys
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
import os
import tempfile
print('TensorFlow version: {}'.format(tf.__version__))
# TensorFlow version: 2.1.0
tfmodel.ipynb
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
# scale the values to 0.0 to 1.0
train_images = train_images / 255.0
test_images = test_images / 255.0
# reshape for feeding into the model
train_images = train_images.reshape(train_images.shape[0], 28, 28, 1)
test_images = test_images.reshape(test_images.shape[0], 28, 28, 1)
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
print('\ntrain_images.shape: {}, of {}'.format(train_images.shape, train_images.dtype))
print('test_images.shape: {}, of {}'.format(test_images.shape, test_images.dtype))
# train_images.shape: (60000, 28, 28, 1), of float64
# test_images.shape: (10000, 28, 28, 1), of float64
tfmodel.ipynb
model = keras.Sequential([
keras.layers.Conv2D(input_shape=(28,28,1), filters=8, kernel_size=3,
strides=2, activation='relu', name='Conv1'),
keras.layers.Flatten(),
keras.layers.Dense(10, activation=tf.nn.softmax, name='Softmax')
])
model.summary()
testing = False
epochs = 5
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=epochs)
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('\nTest accuracy: {}'.format(test_acc))
# Model: "sequential"
# _________________________________________________________________
# Layer (type) Output Shape Param #
# =================================================================
# Conv1 (Conv2D) (None, 13, 13, 8) 80
# _________________________________________________________________
# flatten (Flatten) (None, 1352) 0
# _________________________________________________________________
# Softmax (Dense) (None, 10) 13530
# =================================================================
# Total params: 13,610
# Trainable params: 13,610
# Non-trainable params: 0
# _________________________________________________________________
# Train on 60000 samples
# Epoch 1/5
# 60000/60000 [==============================] - 46s 770us/sample - loss: 0.5398 - accuracy: 0.8182
# Epoch 2/5
# 60000/60000 [==============================] - 5s 76us/sample - loss: 0.3849 - accuracy: 0.8643
# Epoch 3/5
# 60000/60000 [==============================] - 5s 76us/sample - loss: 0.3513 - accuracy: 0.8751
# Epoch 4/5
# 60000/60000 [==============================] - 5s 76us/sample - loss: 0.3329 - accuracy: 0.8820
# Epoch 5/5
# 60000/60000 [==============================] - 5s 76us/sample - loss: 0.3204 - accuracy: 0.8847
# 10000/10000 [==============================] - 1s 78us/sample - loss: 0.3475 - accuracy: 0.8779
# Test accuracy: 0.8779000043869019
tfmodel.ipynb
MODEL_DIR = tempfile.gettempdir()
version = 1
export_path = os.path.join(MODEL_DIR, str(version))
print('export_path = {}\n'.format(export_path))
tf.keras.models.save_model(
model,
export_path,
overwrite=True,
include_optimizer=True,
save_format=None,
signatures=None,
options=None
)
print('\nSaved model:')
!ls -l {export_path}
# export_path = /tmp/1
# WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
# Instructions for updating:
# If using Keras pass *_constraint arguments to layers.
# INFO:tensorflow:Assets written to: /tmp/1/assets
# Saved model:
# total 84
# drwxr-xr-x 2 ubuntu ubuntu 4096 Jul 17 10:49 assets
# -rw-rw-r-- 1 ubuntu ubuntu 74970 Jul 17 10:49 saved_model.pb
# drwxr-xr-x 2 ubuntu ubuntu 4096 Jul 17 10:49 variables
The save destination of the model was created by the tempfile
module. This time the model is stored in / tmp / 1
.
Open another terminal, log in to your instance, and start the server.
~$ export MODEL_DIR=/tmp
~$ tensorflow_model_server \
--rest_api_port=8501 \
--model_name=fashion_model \
--model_base_path="${MODEL_DIR}"
It seems that the structure should be such that there is a directory indicating the version under model_base_path
, and the model is saved under it.
model_base_path/
├ 1/
│ ├ assets/
│ ├ variables/
│ └ saved_model.pb
├ 2/
│ ├ (Omitted below)
I will throw a request and check it. Go back to your notebook and make a request.
tfmodel.ipynb
def show(idx, title):
plt.figure()
plt.imshow(test_images[idx].reshape(28,28), cmap = "gray")
plt.axis('off')
plt.title('\n\n{}'.format(title), fontdict={'size': 16})
tfmodel.ipynb
import json
data = json.dumps({"signature_name": "serving_default", "instances": test_images[0:3].tolist()})
print('Data: {} ... {}'.format(data[:50], data[len(data)-52:]))
# Data: {"signature_name": "serving_default", "instances": ... [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0]]]]}
tfmodel.ipynb
import requests
headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/fashion_model:predict', data=data, headers=headers)
predictions = json.loads(json_response.text)['predictions']
show(0, 'The model thought this was a {} (class {}), and it was actually a {} (class {})'.format(
class_names[np.argmax(predictions[0])], np.argmax(predictions[0]), class_names[test_labels[0]], test_labels[0]))
Send data in json format by POST. We will set the data for the ʻinstances` key, but we will predict it in batch, so we need to be careful about the shape.
By the way, the contents of predictions
are as follows.
predictions[0]
# [7.71279588e-07,
# 4.52205953e-08,
# 5.55571035e-07,
# 1.59779923e-08,
# 2.27421737e-07,
# 0.00600787532,
# 8.29056205e-07,
# 0.0466650613,
# 0.00145569211,
# 0.945868969]
The probabilities for each class are stored in the list. This is the same output as the following code.
model.predict(test_images[0:3]).tolist()[0]
~$ docker --version
Docker version 19.03.11, build 42e35e61f3
~$ docker pull tensorflow/serving
~$ docker run -d -t --rm -p 8501:8501 -v "/tmp:/models/fashion_model" -e MODEL_NAME=fashion_model tensorflow/serving
The entry points are as follows. The RESTful API port is 8501, the gRPC port is 8500, and the model_base_path
is$ {MODEL_BASE_PATH} / $ {MODEL_NAME}
.
tensorflow_model_server --port=8500 --rest_api_port=8501 \
--model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME}
The entry point file is stored in /usr/bin/tf_serving_entrypoint.sh
and actually contains the following code:
#!/bin/bash
tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"
Therefore, when using docker, all you have to do is mount the model storage path of the host on docker's model_base_path
.
--Supports gRPC interface. --Model path, maximum batch size, number of threads, timeout can be specified in the config file. ――It seems that you can customize the input and output format of the model called Signature. ](Https://qiita.com/t_shimmura/items/1ebd2414310f827ed608)
Recommended Posts