I tried to process and transform the image and expand the data for machine learning

Overview

Deep learning requires a large amount of data to be prepared for effective learning. However, when it is difficult to prepare a large amount of data, a technique called Data Augmentation may be used to inflate (proliferate) a small amount of data at hand and use it for learning.

When the data is "image", ** translation **, ** rotation **, ** scaling **, ** upside down **, ** left / right inversion **, ** brightness adjustment Data is expanded by image processing that combines deformation and processing such as **. f1ex_arr.png

In the environment of Tensorflow (2.x) + Keras, a class called ʻImageDataGenerator is prepared for data expansion, and if you use this, you can use ** data with random image processing (this is extended in this article. It is relatively easy to generate ** an image). This article describes data expansion with this ʻImageDataGenerator.

Also, without using ʻImageDataGenerator, libraries such as ** affine transformations of ** OpenCV ** and ** tf.keras ** (cv2.warpAffineandtensorflow.keras.preprocessing.image.apply_affine_transform" Use `) to perform ** translation, rotation, and scaling ** to try to expand the data manually. We also compare the processing speeds of both libraries (the result is an overwhelming victory for OpenCV).

Execution environment

We are checking the execution in the Google Colab. Environment.

opencv-python      4.1.2.30
tensorflow         2.1.0rc1

Extension of image data with ImageDataGenerator

Preparation: Import library

Switch the version of Tesorflow and import the library. It also uses matplotlib to see the images before and after processing, so import that as well.

Preparation: Import library


%tensorflow_version 2.x
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

Preparation: Preparation of image data

Acquire the target data for processing. Here, we will use the training data of "** CIFAR-10 **". CIFAR-10 is a dataset consisting of 10 types of images: "airplane", "car", "bird", "cat", "deer", "dog", "frog", "horse", "ship", and "truck".

Preparation: Preparation of image data


(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
img_cifar10 = x_train/255.
print(img_cifar10.shape) # -> (50000, 32, 32, 3)  32x32 3ch RGB 

ʻImg_cifar10` contains $ 3 $ ch-RGB $ 50,000 $ images of $ 32 \ times 32 $ px in numpy.ndarray format.

Preparation: Definition of image display function

Create a function to display the image. Define showImage (...) to display only one image and showImageArray (...) to display multiple images at the same time by giving an image array.

Preparation: Definition of image display function


def showImage(img,title=None):
  plt.figure(figsize=(3, 3))
  plt.gcf().patch.set_facecolor('white')
  plt.xticks([])
  plt.yticks([])
  plt.title(title)
  plt.imshow(img)
  plt.show()
  plt.close()

def showImageArray(img_arry):
  n_cols = 8
  n_rows = ((len(img_arry)-1)//n_cols)+1
  fig, ax = plt.subplots(nrows=n_rows, ncols=n_cols, figsize=(10, 1.25*n_rows))
  fig.patch.set_facecolor('white')
  for i,ax in enumerate( ax.flatten() ):
    if i < len(img_arry):
      ax.imshow(img_arry[i])
      ax.set_xticks([])
      ax.set_yticks([])
    else :
      ax.axis('off') #Margin processing
  plt.show()
  plt.close()

Call these functions as follows:

Calling the image display function


# img_Display the 12th image of cifar10
showImage(img_cifar10[12],title='CIFAR-10 Train Data [12]')

#Images from 0th to 39th (=40 sheets) is displayed
showImageArray(img_cifar10[:40])

For showImage (...), give numpy.ndarray with shape (32,32,3) as an argument. Also, for showImageArray (...), give numpy.ndarray with shape (arr_len, 32,32,3) as an argument (where ʻarr_len` is the length of the image array).

The execution result is as follows. f1.png f2.png

Extension of image data with ImageDataGenerator

ImageDataGenerator class Can load image files in the specified directory and expand the data. Here, it is used for data expansion purposes.

When used for data expansion, give the parameter "** What kind of processing (move, rotate, enlarge?) Is performed randomly at what intensity **" at the time of initialization. For example, initialize it by giving the following parameters.

ImageDataGenerator initialization


ImageDataGenerator = tf.keras.preprocessing.image.ImageDataGenerator
image_data_generator = ImageDataGenerator(
  rotation_range=20,       #Randomly rotate within ± 20 degrees
  width_shift_range=8,     #Randomly move left and right within ± 8px range
  height_shift_range=4,    #Randomly move up and down within ± 4px range
  zoom_range=(0.8, 1.2),   #Randomly 0.8~1.Zoom in 2x range
  horizontal_flip=True,    #Randomly flip left and right
  channel_shift_range=0.2) #Random shift range of channel value (brightness)

Given width_shift_range and height_shift_range as decimal values less than $ 1 $, you can specify a random range as a percentage of the image size. Also, although not used this time, you can add upside down processing with vertical_flip = True.

Generate only one extended image

Using this generator, I will try to expand the data for the "horse" image of ʻimg_cifar10 [12] `(First, ** only one ** extended image will be generated). When you execute the following code, image processing will be performed randomly within the parameter range given in the above initialization.

Generate only one extended image


org_img = img_cifar10[12].copy() #The target is the 12th "horse"
ex_img  = image_data_generator.flow( org_img.reshape(1,32,32,3), batch_size=1)[0][0]
print(ex_img.shape) # -> (32, 32, 3)
showImage(ex_img,title='CIFAR-10 Train Data [12] Ex')

Use the flow (...) method to get the expanded image. As an argument, give the original image ** array ** (converted to an array with one element with .reshape (1,32,32,3) because it does not support single input). The return value will be NumpyArrayIterator, so evaluate with [0] [0] and the 0th element. Is acquired and stored in ʻex_img`.

The execution result is as follows. Left and right are reversed, horizontal movement is applied, and it is brighter overall (execution results change with each execution). f1ex.png

By the way, if you rotate the image or move it up / down / left / right, a gap will be created, but the edge of the image will be automatically stretched and filled (the result will be a natural image). If you do not want this to be applied, specify fill_mode ='constant' in the initialization. Then, the margins are filled with black as shown below. In addition, you can specify fill_mode ='reflect', fill_mode ='wrap', and fill_mode ='nearest' (default). f1ex2.png

Generate multiple extended images

Next, 39 extended images are generated from the "horse" image of ʻimg_cifar10 [12]. ʻImage_data_generator.flow (...) returns NumpyArrayIterator, so usenext ()to retrieve the images sequentially.

Generate one or more extended images


org_img = img_cifar10[12].copy()
ex_img = np.empty([40, 32, 32, 3]) #Prepare an area that can store 40 sheets including the original
ex_img[0,:,:,:] = org_img          #Store the original on the 0th sheet
iter_ = image_data_generator.flow( org_img.reshape(1,32,32,3), batch_size=1)
for i in range(1,40):
  ex_img[i,:,:,:] = iter_.next()[0] #Sequentially store the generated images
showImageArray(ex_img)

The execution result is as follows. f1ex_arr.png

Generate one extended image for each of the 0th to 23rd images of CIFAR-10

In the previous section, we generated multiple extended images for one image. This time, for each of the 40 images from the 0th to the 23rd of the training data of CIFAR-10, one extended image is generated.

Generate one extended image for each image array


showImageArray(img_cifar10[:24]) # CIFAR-10 Display the 0th to 23rd original sheets

ex_img = np.empty([24, 32, 32, 3])
ex_img = image_data_generator.flow(img_cifar10[:24], batch_size=24, shuffle=False)[0]
showImageArray(ex_img)           # CIFAR-10 Expand 0 to 23 is displayed

The execution result is as follows. First, here is the original data from the 0th to the 23rd training data of CIFAR-10. cifar10.png

Next, here is the image extended with ImageDataGenerator. You can see that different processing (combination of movement, rotation, scaling, flipping, etc.) is applied to each image. cifar10ex.png

If you do not specify shuffle = False in the argument offlow (...), the order of the output image array will be shuffled.

Comparison of processing by OpenCV and tf.keras

Manually specify the amount of movement, rotation angle, and magnification, and generate extended images with the OpenCV and tf.keras libraries. Also, compare the processing time.

rotation

Rotates the image ** counterclockwise </ font> 45 degrees **. The axis of rotation is the center of the image.

A program that uses the OpenCV library looks like this: In OpenCV, the counterclockwise direction is "positive", so the rotation angle 45 is given as it is. cv2.warpAffine (...) is the body of the process. If you do not specify borderMode = cv2.BORDER_REPLICATE, the gap will be filled with black.

50 of CIFAR10 for OpenCV,Rotation processing of 000 sheets


import time
import cv2
deg = 45  #Counterclockwise is "positive"
w, h = 32, 32  #Image size
m = cv2.getRotationMatrix2D((w/2,h/2), deg, 1) #Transformation matrix
img_cifar10_ex = np.empty_like(img_cifar10) #Result storage location np.empty([50000,32,32,3])Same as
t1 = time.time()
for i,img in enumerate( img_cifar10 ) :
  img = cv2.warpAffine(img, m, (w,h), borderMode=cv2.BORDER_REPLICATE)
  img_cifar10_ex[i,:,:,:] = img
t2 = time.time()
print(f'processing time{t2-t1:.1f}[sec]')
showImageArray(img_cifar10_ex[:24]) #Display only 24 sheets

The execution result is as follows. Also, the processing time was ** 1.3 [sec] **. cv_r.png

On the other hand, the program using the tf.keras library is as follows. Here, the counterclockwise direction is "negative", so the rotation angle is given as -45.

tf.50 of keras version of CIFAR10,Rotation processing of 000 sheets


import time
from tensorflow.keras.preprocessing.image import apply_affine_transform
deg = -45  #Counterclockwise is "negative"
img_cifar10_ex = np.empty_like(img_cifar10) #Result storage location
t1 = time.time()
for i,img in enumerate( img_cifar10 ) :
  img = apply_affine_transform(img, theta=deg)
  img_cifar10_ex[i,:,:,:] = img
t2 = time.time()
print(f'processing time{t2-t1:.1f}[sec]')
showImageArray(img_cifar10_ex[:24])

The execution result is as follows. Also, the processing time was ** 16.2 [sec] **. The processing speed was overwhelmingly ** faster ** with OpenCV </ font>. tf_r.png

Translation

The image is moved to ** rightward ** by $ 2 $ px and ** downward ** by $ 5 $ px.

First is the OpenCV version. The process is the same as the previous "rotation", only the contents of the transformation matrix m are different.

50 of CIFAR10 for OpenCV,Moving 000 sheets


import time
import cv2
tx, ty = 2, 5 
w, h = 32, 32  #Image size
m = np.float32([[1,0,tx],[0,1,ty]]) #Transformation matrix
img_cifar10_ex = np.empty_like(img_cifar10) #Result storage location
t1 = time.time()
for i,img in enumerate( img_cifar10 ) :
  img = cv2.warpAffine(img, m, (w,h), borderMode=cv2.BORDER_REPLICATE)
  img_cifar10_ex[i,:,:,:] = img
t2 = time.time()
print(f'processing time{t2-t1:.1f}[sec]')
showImageArray(img_cifar10_ex[:24])

The execution result is as follows. The processing time was ** 1.3 [sec] ** (since the essential processing is the same as "rotation", the execution time does not change either). cv_m.png

Next is the tf.keras version. I didn't understand why it was designed like that, but it looks like ʻapply_affine_transform (img, tx = -5, ty = -2)` to move 2px to the right and 5px to the bottom. Must be specified. Mystery is.

--Reference: [Keras ImageDataGenerator apply_transform () method shifts the image in opposite direction](https://stackoverflow.com/questions/56580076/keras-imagedatagenerator-apply-transform-method-shifts-the-image-in-opposite- d)

tf.50 of keras version of CIFAR10,Rotation processing of 000 sheets


import time
from tensorflow.keras.preprocessing.image import apply_affine_transform
tx, ty = 2, 5 
img_cifar10_ex = np.empty_like(img_cifar10) #Result storage location
t1 = time.time()
for i,img in enumerate( img_cifar10 ) :
  img = apply_affine_transform(img, tx=-ty, ty=-tx) #Pay attention to how to give arguments
  img_cifar10_ex[i,:,:,:] = img
t2 = time.time()
print(f'processing time{t2-t1:.1f}[sec]')
showImageArray(img_cifar10_ex[:24])

The execution result is as follows. The processing time is ** 13.9 [sec] **, which is about 10 times longer than the processing time of OpenCV as before. tf_m.png

Expansion

The image is enlarged 1.2 times (the image size does not change even if it is enlarged).

First is the OpenCV version. I zoomed in on resize and then trimmed it to $ 32 \ times 32 $ px.

50 of CIFAR10 for OpenCV,Enlargement processing of 000 sheets


import cv2
f = 1.2 
w, h = 32, 32               #Image size
w2, h2 = int(w*f), int(h*f) #Enlarged image size
tx, ty = int((w2-w)/2),int((h2-h)/2) #Trimming start coordinates
m = np.float32([[1,0,tx],[0,1,ty]]) #Transformation matrix
img_cifar10_ex = np.empty_like(img_cifar10) #Result storage location
t1 = time.time()
for i,img in enumerate( img_cifar10 ) :
  img = cv2.resize(img,(w2,h2)) 
  img_cifar10_ex[i,:,:,:] = img[tx:tx+w,ty:ty+h,:] #Crop 32x32 from the enlarged image
t2 = time.time()
print(f'processing time{t2-t1:.1f}[sec]')
showImageArray(img_cifar10_ex[:24])

The execution result is as follows. The processing time was ** 1.2 [sec] **. cv_f.png

Next is the tf.keras version. It's also very confusing as before, but like ʻapply_affine_transform (img, zx = 1 / f, zy = 1 / f), the arguments zx and zy` have "** reciprocal of magnification **". Is given.

tf.50 of keras version of CIFAR10,Enlargement processing of 000 sheets


import time
from tensorflow.keras.preprocessing.image import apply_affine_transform
f = 1.2
img_cifar10_ex = np.empty_like(img_cifar10) #Result storage location
t1 = time.time()
for i,img in enumerate( img_cifar10 ) :
  img = apply_affine_transform(img, zx=1/f, zy=1/f) #Argument attention
  img_cifar10_ex[i,:,:,:] = img
t2 = time.time()
print(f'processing time{t2-t1:.1f}[sec]')
showImageArray(img_cifar10_ex[:24])

The processing time is 13.4 [sec] and the result is as follows. tf_f.png

Recommended Posts