What is this?

This article uses TPU in Google Colaboratory. Unlike the GPU that works just by switching the runtime, there were some points to add to the code, so I will write it as a memorandum.

environment

Google Colaboratory You have tensorflow 1.15.0 installed.

`TPUtest.py`


import tensorflow as tf
import distuitls

print(distutils.version.LooseVersion(tf.__version__))
#>>1.15.0

verification code

Classify mnist using CNN. According to Google, TPU is not optimized for CNN, so it may be a slight disadvantage to TPU. However, I don't want to strictly evaluate the performance, so it's okay.

Data preparation and processing

`TPUtest.py`


from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
import numpy as np

#Data download
(X_train, y_train), (X_test, y_test) = mnist.load_data()

#Divide by 255
X_train = X_train/255
X_test = X_test/255

#Change the shape of image data
X_train = X_train.reshape(-1,28,28,1).astype(np.float32)
X_test = X_test.reshape(-1,28,28,1).astype(np.float32)

#Correct label one-Convert to hot
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

Model building and compiling

`TPUtest.py`


from tensorflow.keras.layers import Conv2D, Dense, ReLU, Flatten, Input, MaxPool2D, Dropout
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import plot_model

def getModel():
  model = Sequential()
  model.add(Conv2D(3,3,input_shape=(28,28,1)))
  model.add(MaxPool2D(2))
  model.add(ReLU())
  model.add(Dropout(0.2))
  model.add(Flatten())
  model.add(Dense(1024))
  model.add(ReLU())
  model.add(Dense(10, activation="softmax"))
  return model

model = getModel()

#drawing
plot_model(model, show_shapes=True, show_layer_names=False)

#compile
model.compile(Adam(), loss="categorical_crossentropy", metrics=["acc"])

It is a very ordinary model.

Training

`TPUtest.py`


%%time
model.fit(X_train, y_train, epochs=10, validation_data=(X_test,y_test))

Forecast

`TPUtest.py`


%%time
y_pred = model.predict(X_test)

Display of forecast results

`TPUtest.py`


from sklearn.metrics import accuracy_score
import numpy as np

#one-Undo hot vector
y_pred = np.argmax(y_pred, axis=1)
y_test = np.argmax(y_test, axis=1)
print(accuracy_score(y_pred, y_test))
#>>0.9854

Run this code at three different runtimes and compare.

Execution result

The execution result is as follows.

runtime	Training time	Estimated time	Predicted score
CPU	37s/epoch	1.49s	0.9854
GPU	13s/epoch	0.54s	0.9859
TPU	37s/epoch	2.76s	0.9863

... TPU isn't working?

Make the TPU work

First check the device

`TPUtest.py`


import os
import tensorflow as tf
import pprint

if 'COLAB_TPU_ADDR' not in os.environ:
  print('ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!')
else:
  tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']
  print ('TPU address is', tpu_address)

  with tf.Session(tpu_address) as session:
    devices = session.list_devices()
    
  print('TPU devices:')
  pprint.pprint(devices)

It is OK if it is displayed in a row.

Compile for TPU

A little ingenuity is required when creating and compiling the model.

`TPUtest.py`


def getModel():
  #Up to this point, it is omitted because it is the same as GPU
  return model

#Various setups of TPU
resolver = tf.contrib.cluster_resolver.TPUClusterResolver('grpc://' + os.environ['COLAB_TPU_ADDR'])
tf.contrib.distribute.initialize_tpu_system(resolver)
strategy = tf.contrib.distribute.TPUStrategy(resolver)


with strategy.scope():#I need to add this
  model = getModel()
  model.compile(Adam(), loss="categorical_crossentropy", metrics=["acc"])

#The rest fits normally
model.fit(X_train, y_train, epochs=10, validation_data=(X_test,y_test))

Learning progresses clearly more comfortably than when using a CPU. Let's try the prediction as it is.

`TPUtest.py`


y_pred = model.predict(X_test)

e? It works, but it's not too slow ...? Isn't the prediction over ?? The execution result is as follows.

runtime	Training time	Estimated time	Predicted score
CPU(Repost)	37s/epoch	1.49s	0.9854
GPU(Repost)	13s/epoch	0.54s	0.9859
TPU	17.7s/epoch	15min 15s	0.9853

Make predictions at a decent speed

I found that making predictions with TPU would be ridiculous. Why is validation going at (yet) common sense speed ...?

Since it is unavoidable, learning is done on the TPU and then prediction is done on the CPU.

`TPUtest.py`


#Learning with TPU
model.fit(X_train, y_train, epochs=10, validation_data=(X_test,y_test))
model.save_weights("./weight.h5")#Save weights to file

#CPU prediction
cpu_model = getModel()#Build a model with CPU
cpu_model.load_weights("./weight.h5")#Load the saved weight
y_pred = cpu_model.predict(X_test)# cpu_Predicted by model

The final performance is as follows.

runtime	Training time	Estimated time	Predicted score
CPU	37s/epoch	1.49s	0.9854
GPU	13s/epoch	0.54s	0.9859
TPU	17.7s/epoch	1.22s(Use CPU)	0.9853

To be honest, I feel that GPU is easier, but as I mentioned earlier, CNN seems to be a weak field of TPU, for example, LSTM has more than double the learning speed of GPU, so it may be possible to use it properly depending on the situation. .. You can simply run two runtimes at the same time.

I was addicted to

I ran into an error reasonably ...

InvalidArgumentError Part 1

Error message

InvalidArgumentError: Cannot assign a device for operation conv2d_1/kernel/IsInitialized/VarIsInitializedOp: node conv2d_1/kernel/IsInitialized/VarIsInitializedOp (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748)  was explicitly assigned to /job:worker/replica:0/task:0/device:TPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device.
	 [[conv2d_1/kernel/IsInitialized/VarIsInitializedOp]]

I got an error when I created a model using keras. It worked by using tensorflow.keras instead of keras. It's a trap, isn't it?

InvalidArgumentError Part 2

Error message

InvalidArgumentError: Unsupported data type for TPU: double, caused by output IteratorGetNext:0

It seems that TPU does not support Double type, so convert it to np.float32 before training.

InvalidArgumentError Part 3

Error message

InvalidArgumentError: No OpKernel was registered to support Op 'TPUReplicatedInput' used by node input0_1 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) with these attrs: [T=DT_INT32, N=8]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
  <no registered kernels>

	 [[input0_1]]

An error that happened only once by chance. I restarted it for the time being and it worked. I don't want to verify the reproducibility, so I don't.

InvalidArgumentError Part 4

Error message

InvalidArgumentError: Cannot assign a device for operation lstm_1/random_uniform/RandomUniform: node lstm_1/random_uniform/RandomUniform (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748)  was explicitly assigned to /job:worker/replica:0/task:0/device:TPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device.
	 [[lstm_1/random_uniform/RandomUniform]]

I can't access the TPU ...? Reboot for the time being.

InternalError Error message

InternalError: Failed to serialize message

It occurred when I had the LSTM read a large amount of data. It worked when I reduced the amount. Is it a memory error? (It's a mystery because it works if you pass the same amount of data to the GPU runtime. Well, it didn't make sense because the processing time is too long even on the GPU and it doesn't end in 12h)

KeyError Error message

KeyError: 'COLAB_TPU_ADDR'

Occurs when the runtime is not TPU. Please switch to TPU and execute.

References

https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/fashion_mnist.ipynb#scrollTo=2a5cGsSTEBQD

Use TPU and Keras with Google Colaboratory

What is this?

environment

TPUtest.py

verification code

Data preparation and processing

TPUtest.py

Model building and compiling

TPUtest.py

Training

TPUtest.py

Forecast

TPUtest.py

Display of forecast results

TPUtest.py

Execution result

Make the TPU work

TPUtest.py

Compile for TPU

TPUtest.py

TPUtest.py

Make predictions at a decent speed

TPUtest.py

I was addicted to

InvalidArgumentError Part 1

InvalidArgumentError Part 2

InvalidArgumentError Part 3

InvalidArgumentError Part 4

References

`TPUtest.py`

`TPUtest.py`

`TPUtest.py`

`TPUtest.py`

`TPUtest.py`

`TPUtest.py`

`TPUtest.py`

`TPUtest.py`

`TPUtest.py`

`TPUtest.py`