This article uses TPU in Google Colaboratory. Unlike the GPU that works just by switching the runtime, there were some points to add to the code, so I will write it as a memorandum.
Google Colaboratory You have tensorflow 1.15.0 installed.
TPUtest.py
import tensorflow as tf
import distuitls
print(distutils.version.LooseVersion(tf.__version__))
#>>1.15.0
Classify mnist using CNN. According to Google, TPU is not optimized for CNN, so it may be a slight disadvantage to TPU. However, I don't want to strictly evaluate the performance, so it's okay.
TPUtest.py
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
import numpy as np
#Data download
(X_train, y_train), (X_test, y_test) = mnist.load_data()
#Divide by 255
X_train = X_train/255
X_test = X_test/255
#Change the shape of image data
X_train = X_train.reshape(-1,28,28,1).astype(np.float32)
X_test = X_test.reshape(-1,28,28,1).astype(np.float32)
#Correct label one-Convert to hot
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
TPUtest.py
from tensorflow.keras.layers import Conv2D, Dense, ReLU, Flatten, Input, MaxPool2D, Dropout
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import plot_model
def getModel():
model = Sequential()
model.add(Conv2D(3,3,input_shape=(28,28,1)))
model.add(MaxPool2D(2))
model.add(ReLU())
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(1024))
model.add(ReLU())
model.add(Dense(10, activation="softmax"))
return model
model = getModel()
#drawing
plot_model(model, show_shapes=True, show_layer_names=False)
#compile
model.compile(Adam(), loss="categorical_crossentropy", metrics=["acc"])
It is a very ordinary model.
TPUtest.py
%%time
model.fit(X_train, y_train, epochs=10, validation_data=(X_test,y_test))
TPUtest.py
%%time
y_pred = model.predict(X_test)
TPUtest.py
from sklearn.metrics import accuracy_score
import numpy as np
#one-Undo hot vector
y_pred = np.argmax(y_pred, axis=1)
y_test = np.argmax(y_test, axis=1)
print(accuracy_score(y_pred, y_test))
#>>0.9854
Run this code at three different runtimes and compare.
The execution result is as follows.
runtime | Training time | Estimated time | Predicted score |
---|---|---|---|
CPU | 37s/epoch | 1.49s | 0.9854 |
GPU | 13s/epoch | 0.54s | 0.9859 |
TPU | 37s/epoch | 2.76s | 0.9863 |
... TPU isn't working?
First check the device
TPUtest.py
import os
import tensorflow as tf
import pprint
if 'COLAB_TPU_ADDR' not in os.environ:
print('ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!')
else:
tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']
print ('TPU address is', tpu_address)
with tf.Session(tpu_address) as session:
devices = session.list_devices()
print('TPU devices:')
pprint.pprint(devices)
It is OK if it is displayed in a row.
A little ingenuity is required when creating and compiling the model.
TPUtest.py
def getModel():
#Up to this point, it is omitted because it is the same as GPU
return model
#Various setups of TPU
resolver = tf.contrib.cluster_resolver.TPUClusterResolver('grpc://' + os.environ['COLAB_TPU_ADDR'])
tf.contrib.distribute.initialize_tpu_system(resolver)
strategy = tf.contrib.distribute.TPUStrategy(resolver)
with strategy.scope():#I need to add this
model = getModel()
model.compile(Adam(), loss="categorical_crossentropy", metrics=["acc"])
#The rest fits normally
model.fit(X_train, y_train, epochs=10, validation_data=(X_test,y_test))
Learning progresses clearly more comfortably than when using a CPU. Let's try the prediction as it is.
TPUtest.py
y_pred = model.predict(X_test)
e? It works, but it's not too slow ...? Isn't the prediction over ?? The execution result is as follows.
runtime | Training time | Estimated time | Predicted score |
---|---|---|---|
CPU(Repost) | 37s/epoch | 1.49s | 0.9854 |
GPU(Repost) | 13s/epoch | 0.54s | 0.9859 |
TPU | 17.7s/epoch | 15min 15s | 0.9853 |
I found that making predictions with TPU would be ridiculous. Why is validation going at (yet) common sense speed ...?
Since it is unavoidable, learning is done on the TPU and then prediction is done on the CPU.
TPUtest.py
#Learning with TPU
model.fit(X_train, y_train, epochs=10, validation_data=(X_test,y_test))
model.save_weights("./weight.h5")#Save weights to file
#CPU prediction
cpu_model = getModel()#Build a model with CPU
cpu_model.load_weights("./weight.h5")#Load the saved weight
y_pred = cpu_model.predict(X_test)# cpu_Predicted by model
The final performance is as follows.
runtime | Training time | Estimated time | Predicted score |
---|---|---|---|
CPU | 37s/epoch | 1.49s | 0.9854 |
GPU | 13s/epoch | 0.54s | 0.9859 |
TPU | 17.7s/epoch | 1.22s(Use CPU) | 0.9853 |
To be honest, I feel that GPU is easier, but as I mentioned earlier, CNN seems to be a weak field of TPU, for example, LSTM has more than double the learning speed of GPU, so it may be possible to use it properly depending on the situation. .. You can simply run two runtimes at the same time.
I ran into an error reasonably ...
Error message
InvalidArgumentError: Cannot assign a device for operation conv2d_1/kernel/IsInitialized/VarIsInitializedOp: node conv2d_1/kernel/IsInitialized/VarIsInitializedOp (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) was explicitly assigned to /job:worker/replica:0/task:0/device:TPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device.
[[conv2d_1/kernel/IsInitialized/VarIsInitializedOp]]
I got an error when I created a model using keras. It worked by using tensorflow.keras instead of keras. It's a trap, isn't it?
Error message
InvalidArgumentError: Unsupported data type for TPU: double, caused by output IteratorGetNext:0
It seems that TPU does not support Double type, so convert it to np.float32 before training.
Error message
InvalidArgumentError: No OpKernel was registered to support Op 'TPUReplicatedInput' used by node input0_1 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) with these attrs: [T=DT_INT32, N=8]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
<no registered kernels>
[[input0_1]]
An error that happened only once by chance. I restarted it for the time being and it worked. I don't want to verify the reproducibility, so I don't.
Error message
InvalidArgumentError: Cannot assign a device for operation lstm_1/random_uniform/RandomUniform: node lstm_1/random_uniform/RandomUniform (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) was explicitly assigned to /job:worker/replica:0/task:0/device:TPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device.
[[lstm_1/random_uniform/RandomUniform]]
I can't access the TPU ...? Reboot for the time being.
InternalError Error message
InternalError: Failed to serialize message
It occurred when I had the LSTM read a large amount of data. It worked when I reduced the amount. Is it a memory error? (It's a mystery because it works if you pass the same amount of data to the GPU runtime. Well, it didn't make sense because the processing time is too long even on the GPU and it doesn't end in 12h)
KeyError Error message
KeyError: 'COLAB_TPU_ADDR'
Occurs when the runtime is not TPU. Please switch to TPU and execute.
https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/fashion_mnist.ipynb#scrollTo=2a5cGsSTEBQD
Recommended Posts