Use Colab only as an external GPU environment (as of 2020.6 / Mac environment)

I will write about how to use Colab only as an external GPU environment, not as an editor.

Benefits of Colab

For MACs that can't use Cuda (GPU), Google Colaboratory is one of the nice choices, but it also has its disadvantages.

Demerit

--The editor function is poor compared to vs code. --When executing a project consisting of data files and module files in Colab Pro, it is troublesome to pass the path. --Collab's habit may not mount the folder. --The uploaded file disappears after restarting the runtime.

solution

――So, after coding on the local machine, move the file to Colab and let Colab only execute it on the GPU. ――For a project that requires a long time of calculation, uploading the file itself is not so much trouble, so it is surprisingly effective.

procedure

--Combine the main python files, module files, and data files in one folder. --Upload to Google Drive. --From Colab, set the runtime to GPU. --Mount Google Drive. As of 2020.6, there is a mount icon of Google Drive, so click it and it's OK. Alternatively, you can mount it with the following command.

from google.colab import drive 
drive.mount('/content/drive')

The Google Drive folder will be displayed in the left pane as shown below. image.png --This time, try running main.py in the face_age_detection directory of google drive. --First, move the current directory to the folder where main.py is located.


import os
os.chdir('/content/drive/My Drive/colab/face_detection/')

Alternatively, do the following on the command line: (Note that if you use cd on the Colab command line, you need to add% and enclose the path in quotes.)

%cd "content/drive/My drive/colab/face_detection/"

If you check it, it has been changed properly.


!pwd
/content/drive/My Drive/colab/face_age_detection

Run the Python file.

!python main.py

test

I tried it in a keras project that recognizes facial expressions. (By the way, if you double-click the .py file on the Colab screen, the editor screen will appear on the right side, so you can edit the source code directly.) In the case of this project, there are about 20,000 100 * 100 images, Conv has 3 layers, Dense has 2 layers, and it is a little heavy processing.


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os, zipfile, io, re
from PIL import Image
from sklearn.model_selection import train_test_split
import tensorflow.keras
from tensorflow.keras.models import Sequential, Model, load_model
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import GlobalAveragePooling2D
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.preprocessing.image import ImageDataGenerator

image_size = 100

X = np.load("data/X_data.npy" )
Y = np.load("data/Y_data.npy")

X_train, X_test, y_train, y_test = train_test_split(
    X,
    Y,
    random_state = 0, #Random seed. If it is 0, it will be random every time. Constant random when seeded with an integer.
    test_size = 0.2
)
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255

X_train, X_val, y_train, y_val = train_test_split(
    X_train,
    y_train,
    random_state = 0, #The order of the data is shuffled.
    test_size = 0.2
)

def create_model(X, y):
    model = Sequential()
    model.add(Conv2D(32, (3, 3), padding='same',input_shape=X.shape[1:], activation='relu'))
    model.add(Conv2D(32, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
    model.add(Flatten()) #Convert to one-dimensional vector
    model.add(Dense(64, activation='relu'))        
    model.add(Dense(1))
    model.compile(loss='mse',optimizer='Adam',metrics=['mae'])
    return model

#Functions for visualizing the learning process
def plot_history(history):
    plt.plot(history.history['loss'],"o-",label="loss",)
    plt.plot(history.history['val_loss'],"o-",label="val_loss")
    plt.title('model loss')
    plt.xlabel('epoch')
    plt.ylabel('loss')
    plt.legend(loc='upper right')
    plt.show()
    
# Data Augmentation
datagen = ImageDataGenerator(
    featurewise_center = False,
    samplewise_center = False,
    featurewise_std_normalization = False,
    samplewise_std_normalization = False,
    zca_whitening = False,
    rotation_range = 0,
    width_shift_range = 0.1,
    height_shift_range = 0.1,
    horizontal_flip = True,
    vertical_flip = False
)

# EarlyStopping
early_stopping = EarlyStopping(
    monitor = 'val_loss',
    patience = 10,
    verbose = 1
)

# reduce learning rate
reduce_lr = ReduceLROnPlateau(
    monitor = 'val_loss',
    factor = 0.1,
    patience = 3,
    verbose = 1
)

model = create_model(X_train, y_train)
history = model.fit_generator(
datagen.flow(X_train, y_train, batch_size = 128),
    steps_per_epoch = X_train.shape[0] // 128,
    epochs = 50,
    validation_data = (X_val, y_val),
    callbacks = [early_stopping, reduce_lr],
    shuffle = True,
    verbose = 1)

model.save('age_prediction.hdf5')
plot_history(history) 

model = load_model('age_prediction.hdf5')

preds=model.predict(X_test[0:30])

plt.figure(figsize=(16, 6))
for i in range(30):
    plt.subplot(3, 10, i+1)
    plt.axis("off")
    pred = round(preds[i][0],1)
    true = y_test[i]
    if abs(pred - true) < 3.0:
        plt.title(str(true) + '\n' + str(pred))
    else:
        plt.title(str(true) + '\n' + str(pred), color = "red")
    plt.imshow(X_test[i])
plt.show()

When I actually run it,

Local machine(CPU 2.4Ghz 8-Core Core i9)116 minutes
Colab(GPU)28 minutes

It was about 2.5 times the performance of the CPU.

Recommended Posts

Use Colab only as an external GPU environment (as of 2020.6 / Mac environment)
GPU ~ Implementation of PlaidML on Mac ~ (as of May 2020)
PATH when using ANACONDA virtual environment with Pycharm (as of Mac 2020/10/03)
Environment construction of python3.8 on mac
[Python] Building an environment with Anaconda [Mac]
Use of virtualenv, Python's independent execution environment
Use Ghost.py as an alternative to PhantomJS