Overview

When training a model with TensorFlow, you want to see if the training is working. By giving the argument verbose = 1 tomodel.fit (), it is possible to check the progress of training as shown below.

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
60000/60000 [==============================] - 11s 184us/sample - loss: 0.4968 - accuracy: 0.8223 - val_loss: 0.4216 - val_accuracy: 0.8481
Epoch 2/5
60000/60000 [==============================] - 11s 176us/sample - loss: 0.3847 - accuracy: 0.8587 - val_loss: 0.4056 - val_accuracy: 0.8545
Epoch 3/5
60000/60000 [==============================] - 11s 176us/sample - loss: 0.3495 - accuracy: 0.8727 - val_loss: 0.3600 - val_accuracy: 0.8700
Epoch 4/5
60000/60000 [==============================] - 11s 179us/sample - loss: 0.3282 - accuracy: 0.8795 - val_loss: 0.3636 - val_accuracy: 0.8694
Epoch 5/5
60000/60000 [==============================] - 11s 176us/sample - loss: 0.3115 - accuracy: 0.8839 - val_loss: 0.3438 - val_accuracy: 0.8764

However, it is difficult to understand from this output whether the training is successful.

Therefore, let's consider visualization using TensorBord. TensorBord visualizes the progress of your training as shown below and helps you to visually understand whether your training is successful or not.


Here, the accuracy and loss of successful and unsuccessful training are plotted on the graph.

In TensorBord, in addition to accuracy and loss, distortion and histogram can also be visualized and confirmed.


Visualization of distributions	Histogram visualization

When monitoring the model during training, accuracy and loss can be checked by pressing the update button of TensorBord, but distributions and histogram are not so. I searched for this method but couldn't find it in Japanese, so I'll write it down here.

environment

Pull the nightly image of TensorFlow with Docker. When I output the version of TensorFlow, it was 2.1.0-dev20200101.

docker pull tensorflow/tensorflow:nightly-gpu-py3-jupyter

Start this. At this time, what is important is port forwarding. First, forward port 8888 to start jupyter. In addition, you must forward port 6006, which TensorBoard uses by default. Use the -p option for port forwarding.

docker run --gpus all -it -v `pwd`:/tf -p 8888:8888 -p 6006:6006 tensorflow/tensorflow:nightly-gpu-py3-jupyter

When you execute the above command, a link to open Jupyter Notebook will be displayed. Access it and open Jupyter Notebook.

After that, I will program with TensorFlow appropriately.

Preparing to use Tensor Bord

It seems that you need to load the extension to use TensorBord. Also, when you try to load the extension, you will be asked to install cloud-tpu-client. Therefore, it is recommended to do the following.

!pip install cloud-tpu-client
%load_ext tensorboard

Next, create a callback that writes out the log file to be visualized during learning. You don't have to implement it yourself to create it, just create an instance of the class tensorflow.keras.callbacks.TensorBoard.

tb_cb = tf.keras.callbacks.TensorBoard(
    log_dir=log_dir,
    histogram_freq=1,
    write_images=True
)

A log file is generated by passing this instance to the argument of model.fit ().

history=model.fit(
        datagen_train,
        steps_per_epoch=len(x_train) // batch_size,
        validation_data=datagen_validate,
        validation_steps=len(x_validate) // batch_size,
        epochs=epochs,
        shuffle=False,
        callbacks=[tb_cb] 
    )

Launch TensorBord on Jupyter Notebook

This is a very important point. Launching TensorBoard on a Juoyter Notebook is easy, but if you don't set the options properly, you won't be able to see distributions or history while learning. In my case, writing the following worked fine.

%tensorboard --logdir log --bind_all --port 6006 --reload_multifile true

Basically, TensorBoard can be started by specifying the directory where the log file is saved with the -logdir option. In my case (maybe a combination of Docker + Jupyter Notebook?), It didn't start without adding --bind_all and --port 6006. When started, the TensorBoard can be operated in the output cell.

Now, it's important to add the final --reload_multifile true. Without this, you will not be able to see distributions and history while learning.

In addition, this TensorBoard must be started before model.fit (). This is because Jupyter Notebook can only run one cell at a time.

Summary

Finally, let's summarize the overall flow.

Build an environment if necessary

docker pull tensorflow/tensorflow:nightly-gpu-py3-jupyter docker run --gpus all -it -v pwd:/tf -p 8888:8888 -p 6006:6006 tensorflow/tensorflow:nightly-gpu-py3-jupyter ```

Read extensions

!pip install cloud-tpu-client %load_ext tensorboard ```

Data preprocessing and learning model construction Since it varies from person to person, it is omitted here.
Create an instance of the class tensorflow.keras.callbacks.TensorBoard

tb_cb = tf.keras.callbacks.TensorBoard( log_dir=log_dir, histogram_freq=1, write_images=True ) ```

Launch TensorBoard

%tensorboard --logdir log --bind_all --port 6006 --reload_multifile true ```

Start training

history=model.fit( datagen_train, steps_per_epoch=len(x_train) // batch_size, validation_data=datagen_validate, validation_steps=len(x_validate) // batch_size, epochs=epochs, shuffle=False, callbacks=[tb_cb] ) ```

that's all.

reference

https://www.tensorflow.org/tensorboard/tensorboard_in_notebooks Here's how to launch TensorBoard on Jupyter Notebook.
https://github.com/tensorflow/tensorboard The Frequently Asked Questions section of the README.md here has a description of the --reload_multifile option.