Using the model of Keras Applications, I briefly write the conversion and inference to the model for TensorRT. Installations such as TensorRT are out of scope as they utilize the NVIDIA GPU Cloud container.
--Docker installed --And GPU container is available
Start the execution environment with the following. Since Jupyter is already installed, you can try the following code on Jupyter.
docker run -it --rm --gpus all -p 8888:8888 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/tensorflow:20.02-tf2-py3 bash
# (Optional)
jupyter lab
--nvcr.io/nvidia/tensorflow
is a container registered in NVIDIA GPU Cloud.
--In this container, TensorFlow 2.1, TensorRT 7.0, and other Jupyter are already installed.
-(Optional) shm-size, ulimit
: Since the main memory is used a lot when converting the model, it is also set as a countermeasure for memory allocation failure.
-(Optional) 8888 port is for Jupyter.
Memory constraints for error countermeasures that often occur when using Keras.
Memory limit example
import tensorflow as tf
for dev in tf.config.experimental.list_physical_devices('GPU'):
tf.config.experimental.set_memory_growth(dev, True)
First, output the target model. The point is ** save format **.
Specify tf
forsave_format
and save it with Tensorflow Saved Model.
(Not necessary because it is the default in TensorFlow 2.X)
Model storage in Keras
from tensorflow.keras.applications.vgg16 import VGG16
model = VGG16(weights='imagenet')
model.save('./vgg16', save_format='tf')
Then transform the model.
TensorRT conversion single precision floating point (Float32) version
from tensorflow.python.compiler.tensorrt import trt_convert as trt
converter = trt.TrtGraphConverterV2(input_saved_model_dir='./vgg16',
conversion_params=trt.DEFAULT_TRT_CONVERSION_PARAMS)
converter.convert()
converter.save('./vgg16-tensorrt')
If you want to convert with Float16, change the parameters of converter.
converter = trt.TrtGraphConverterV2(input_saved_model_dir='./vgg16',
conversion_params=trt.DEFAULT_TRT_CONVERSION_PARAMS._replace(precision_mode=trt.TrtPrecisionMode.FP16))
Calibration is required for 8-bit integers. I think you should use the data used for learning.
In this VGG16 setting, it is passed as a Shape of $ (N, 224, 224, 3) $.
import numpy as np
def calibration_input_fn(): #Calibration data generation function
yield np.random.uniform(size=(5, 224, 224, 3)).astype(np.float32), #At the end,Don't forget
converter = trt.TrtGraphConverterV2(input_saved_model_dir='./vgg16',
conversion_params=trt.DEFAULT_TRT_CONVERSION_PARAMS._replace(precision_mode=trt.TrtPrecisionMode.INT8, use_calibration=True))
converter.convert(calibration_input_fn=calibration_input_fn)
converter.save('./vgg16-tensorrt')
Load the transformed model and retrieve the objects used for inference. Then, inference runs by calling the object as a function.
model = tf.saved_model.load('./vgg16-tensorrt', tags=[tf.saved_model.SERVING])
infer = model.signatures[tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
#Dummy input
x = np.random.uniform(size=(3, 224, 224, 3)).astype(np.float32)
#inference
y = infer(tf.convert_to_tensor(x))['predictions']
Reference: The input Shape can be taken below.
infer.inputs[0].shape
>>> TensorShape([None, 224, 224, 3])
Finally, a rough comparison of execution results (TensorRT above). Even if it is suitable like this, the execution speed is improved. Since the memory usage will also decrease, it seems that we can do various things such as executing multiple models, and I think that the degree of benefit will change greatly depending on the GPU architecture of the execution environment.
Execution environment: GeForce GTX 1080 Ti
Recommended Posts