Introduction

TensorFlow Serving is a flexible, high-performance machine learning model serving system designed for production environments. With TensorFlow Serving, you can easily host a model created in TensorFlow and expose the API.

See the TensorFlow Serving Documentation (https://www.tensorflow.org/tfx/guide/serving) for more information.

This time, I used TensorFlow Serving on AWS EC2 to host a deep learning model of TensorFlow. At the end of the article, I also try it with Docker.

procedure

EC2 instance creation

Enter "Deep Learning AMI" in the AMI search bar to search for the AMI you want to use. This time, I used "Deep Learning AMI (Ubuntu 18.04) Version 30.0 --ami-0b1b56cbf0f8fcea3". I used "p2.xlarge" as the instance type. The security group is set up so that ssh and http can be connected from the development environment, and all other settings are left as default.

Environment

~$ ls
LICENSE                README     examples  tools
Nvidia_Cloud_EULA.pdf  anaconda3  src       tutorials

The installation procedure is introduced on the Official Site.

First, add the TensorFlow Serving URI to sources.list.d.

~$ echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -

deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2943  100  2943    0     0  18166      0 --:--:-- --:--:-- --:--:-- 18166
OK

Perform the installation.

~$ sudo apt-get update && apt-get install tensorflow-model-server
~$ tensorflow_model_server --version
TensorFlow ModelServer: 1.15.0-rc2+dev.sha.1ab7d59
TensorFlow Library: 1.15.2

This completes the installation.

Model building

From here, we will create a model to deploy. First, prepare a working directory.

~$ mkdir tfexample
~$ cd tfexample

Start jupyter-lab and build the model.

The following commands are open to all ips, so use a security group to narrow down the access range to the development environment.

~/tfexample$ jupyter-lab --no-browser --port=8888 --ip=0.0.0.0 --allow-root

...
http://127.0.0.1:8888/?token=b92a7ceefb20c7ab3e475474dbde66a771870de1d8f5bd70
...

Since there is a part where the URL is displayed in the standard output, rewrite the part of 127.0.0.1 to the IP address of the instance and access it.

When jupyer lab starts, select the kernel of conda_tensorflow2_py36 and open the notebook. Rename it to tfmodel.ipynb.

This time I will make a model with Fashionmnist.

`tfmodel.ipynb`


import sys
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
import os
import tempfile

print('TensorFlow version: {}'.format(tf.__version__))
# TensorFlow version: 2.1.0

`tfmodel.ipynb`


fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

# scale the values to 0.0 to 1.0
train_images = train_images / 255.0
test_images = test_images / 255.0

# reshape for feeding into the model
train_images = train_images.reshape(train_images.shape[0], 28, 28, 1)
test_images = test_images.reshape(test_images.shape[0], 28, 28, 1)

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

print('\ntrain_images.shape: {}, of {}'.format(train_images.shape, train_images.dtype))
print('test_images.shape: {}, of {}'.format(test_images.shape, test_images.dtype))
# train_images.shape: (60000, 28, 28, 1), of float64
# test_images.shape: (10000, 28, 28, 1), of float64

`tfmodel.ipynb`


model = keras.Sequential([
  keras.layers.Conv2D(input_shape=(28,28,1), filters=8, kernel_size=3, 
                      strides=2, activation='relu', name='Conv1'),
  keras.layers.Flatten(),
  keras.layers.Dense(10, activation=tf.nn.softmax, name='Softmax')
])
model.summary()

testing = False
epochs = 5

model.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=epochs)

test_loss, test_acc = model.evaluate(test_images, test_labels)
print('\nTest accuracy: {}'.format(test_acc))

# Model: "sequential"
# _________________________________________________________________
# Layer (type)                 Output Shape              Param #   
# =================================================================
# Conv1 (Conv2D)               (None, 13, 13, 8)         80        
# _________________________________________________________________
# flatten (Flatten)            (None, 1352)              0         
# _________________________________________________________________
# Softmax (Dense)              (None, 10)                13530     
# =================================================================
# Total params: 13,610
# Trainable params: 13,610
# Non-trainable params: 0
# _________________________________________________________________
# Train on 60000 samples
# Epoch 1/5
# 60000/60000 [==============================] - 46s 770us/sample - loss: 0.5398 - accuracy: 0.8182
# Epoch 2/5
# 60000/60000 [==============================] - 5s 76us/sample - loss: 0.3849 - accuracy: 0.8643
# Epoch 3/5
# 60000/60000 [==============================] - 5s 76us/sample - loss: 0.3513 - accuracy: 0.8751
# Epoch 4/5
# 60000/60000 [==============================] - 5s 76us/sample - loss: 0.3329 - accuracy: 0.8820
# Epoch 5/5
# 60000/60000 [==============================] - 5s 76us/sample - loss: 0.3204 - accuracy: 0.8847
# 10000/10000 [==============================] - 1s 78us/sample - loss: 0.3475 - accuracy: 0.8779

# Test accuracy: 0.8779000043869019

`tfmodel.ipynb`


MODEL_DIR = tempfile.gettempdir()
version = 1
export_path = os.path.join(MODEL_DIR, str(version))
print('export_path = {}\n'.format(export_path))

tf.keras.models.save_model(
    model,
    export_path,
    overwrite=True,
    include_optimizer=True,
    save_format=None,
    signatures=None,
    options=None
)

print('\nSaved model:')
!ls -l {export_path}

# export_path = /tmp/1

# WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
# Instructions for updating:
# If using Keras pass *_constraint arguments to layers.
# INFO:tensorflow:Assets written to: /tmp/1/assets

# Saved model:
# total 84
# drwxr-xr-x 2 ubuntu ubuntu  4096 Jul 17 10:49 assets
# -rw-rw-r-- 1 ubuntu ubuntu 74970 Jul 17 10:49 saved_model.pb
# drwxr-xr-x 2 ubuntu ubuntu  4096 Jul 17 10:49 variables

The save destination of the model was created by the tempfile module. This time the model is stored in / tmp / 1.

Model host

Open another terminal, log in to your instance, and start the server.

~$ export MODEL_DIR=/tmp
~$ tensorflow_model_server \
  --rest_api_port=8501 \
  --model_name=fashion_model \
  --model_base_path="${MODEL_DIR}"

It seems that the structure should be such that there is a directory indicating the version under model_base_path, and the model is saved under it.

model_base_path/
　├ 1/
　│　├ assets/
　│　├ variables/
　│　└ saved_model.pb
　├ 2/
│ ├ (Omitted below)

I will throw a request and check it. Go back to your notebook and make a request.

`tfmodel.ipynb`


def show(idx, title):
    plt.figure()
    plt.imshow(test_images[idx].reshape(28,28), cmap = "gray")
    plt.axis('off')
    plt.title('\n\n{}'.format(title), fontdict={'size': 16})

`tfmodel.ipynb`


import json

data = json.dumps({"signature_name": "serving_default", "instances": test_images[0:3].tolist()})
print('Data: {} ... {}'.format(data[:50], data[len(data)-52:]))
# Data: {"signature_name": "serving_default", "instances": ...  [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0]]]]}

`tfmodel.ipynb`


import requests

headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/fashion_model:predict', data=data, headers=headers)
predictions = json.loads(json_response.text)['predictions']

show(0, 'The model thought this was a {} (class {}), and it was actually a {} (class {})'.format(
  class_names[np.argmax(predictions[0])], np.argmax(predictions[0]), class_names[test_labels[0]], test_labels[0]))

If an error occurs, try restarting the server or reconfiguring CUDA.

Send data in json format by POST. We will set the data for the ʻinstances` key, but we will predict it in batch, so we need to be careful about the shape.

By the way, the contents of predictions are as follows.

predictions[0]

# [7.71279588e-07,
#  4.52205953e-08,
#  5.55571035e-07,
#  1.59779923e-08,
#  2.27421737e-07,
#  0.00600787532,
#  8.29056205e-07,
#  0.0466650613,
#  0.00145569211,
#  0.945868969]

The probabilities for each class are stored in the list. This is the same output as the following code.

model.predict(test_images[0:3]).tolist()[0]

Host with docker

~$ docker --version
Docker version 19.03.11, build 42e35e61f3

~$ docker pull tensorflow/serving
~$ docker run -d -t --rm -p 8501:8501 -v "/tmp:/models/fashion_model" -e MODEL_NAME=fashion_model tensorflow/serving

The entry points are as follows. The RESTful API port is 8501, the gRPC port is 8500, and the model_base_path is$ {MODEL_BASE_PATH} / $ {MODEL_NAME}.

tensorflow_model_server --port=8500 --rest_api_port=8501 \
  --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME}

The entry point file is stored in /usr/bin/tf_serving_entrypoint.sh and actually contains the following code:

#!/bin/bash 

tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"

Therefore, when using docker, all you have to do is mount the model storage path of the host on docker's model_base_path.

Other notes

--Supports gRPC interface. --Model path, maximum batch size, number of threads, timeout can be specified in the config file. ――It seems that you can customize the input and output format of the model called Signature. ](Https://qiita.com/t_shimmura/items/1ebd2414310f827ed608)

[DOCKER] I tried hosting a TensorFlow deep learning model using TensorFlow Serving

Introduction

procedure

EC2 instance creation

Environment

Model building

tfmodel.ipynb

tfmodel.ipynb

tfmodel.ipynb

tfmodel.ipynb

Model host

tfmodel.ipynb

tfmodel.ipynb

tfmodel.ipynb

Host with docker

Other notes

`tfmodel.ipynb`

`tfmodel.ipynb`

`tfmodel.ipynb`

`tfmodel.ipynb`

`tfmodel.ipynb`

`tfmodel.ipynb`

`tfmodel.ipynb`