New Data Augmentation? [Grid Mix]

Introduction

You might think, "Grid Mix? I've never heard of it." That's right. Inspired by Grid Mask and Cut Mix ** I made it on my own ** Augmentation as shown below. I've tried a little to see if it works, so I'll leave it as a memo. mixed_img.png

Overview

Purpose: What did you do

-** The effect of Grid Mix mentioned above was confirmed with cifer10. ** ** -** Compared with Cut Mix, which is an Augmentation of the same series. ** **

Conclusion: how was it

** Accuracy: The proposed method (Grid Mix) is slightly better ** ** Convergence: The existing method (CutMix) is excellent ** ** Tuning: Proposed method (Grid Mix) may be more troublesome **

I'm not sure because it's just for fun, but I was able to confirm the minimum potential.

background

Introducing Grid Mask

One of the recently announced Data Augmentations is Grid Mask. As shown in the figure below, it is a method that masks the image in a grid pattern, which is superior to the conventional method such as Cutout.

Introducing CutMix

Since various people have already introduced this in Qiita etc., I will omit the details, but it will be a method of randomly cutting out a part of the image, pasting it on another image, and giving a label by area ratio. I will. 3.png Source paper: https://arxiv.org/abs/1905.04899

CutMix ⇒ Motivation for GridMix

I had some doubts about CutMix for some time. It seems that the amount of information is larger near the center, but I wonder if it is okay to simply decide the label by the area ratio.

For example, in the figure below, half of the areas are cats and half are dogs, but I think it's awkward to split the labels in half. It looks like a dog to me. 2.png

approach

With a common model, the accuracy is compared by training the cifer10 dataset in the following three cases.

  1. No Augmentation
  2. CutMix Augmentation (existing method)
  3. GridMix (Proposed method)

Model to use

Conv8 layer shallow CNN (not pretrained) input shape: 32x32x3

GridMix Augmentation The proposed method is like a child of CutMix and GridMask, which mixes images with a grid of appropriate size. ** The mask is basically a checkered pattern, but the mesh pattern and no mix are stochastically created. ** **

The figure below shows the checkered pattern, mesh pattern, and no mix in order from the left. gridmiximg.png

** If only the checkered pattern was used, the mix ratio was constant at about 0.5 and the convergence was poor **, so I tried to make it easy in some cases. By adding a mesh pattern, it is possible to reproduce something similar to the existing method CutMix.

def grid_mixer(img_1, img_2, interval_h, interval_w, thresh=0.3):
    #make checkerboad
    h, w, _ = img_1.shape
    h_start = np.random.randint(0,2*interval_h)
    w_start = np.random.randint(0,2*interval_w)
    h_grid = ((np.arange(h_start, h_start+h)//interval_h)%2).reshape(-1,1)
    w_grid = ((np.arange(w_start, w_start+w)//interval_w)%2).reshape(1,-1)
    checkerboard = np.abs(h_grid-w_grid)
    
    #reverse vertical and/or horizontal
    if np.random.rand()<thresh:
        checkerboard += h_grid*w_grid
    if np.random.rand()<thresh:
        checkerboard += (1-h_grid)*(1-w_grid)

    #mix images
    mixed_img = img_1*checkerboard[:, :, np.newaxis]+img_2*(1-checkerboard[:, :, np.newaxis])
    mix_rate = np.sum(checkerboard)/(h*w)
    return mixed_img, mix_rate

h,w,_=img_1.shape
interval_h = h//np.random.uniform(2, 4)
interval_w = w//np.random.uniform(2, 4)                        
img, mix_rate = grid_mixer(img_1, img_m_2, interval_h, interval_w, 0.3)

As shown below, the neck is that there are a few parameters.

** Grid spacing: ** If the grid width is too fine, it seems that it can only be picked up in shallow layers (since the default size of cifer-10 is 32x32), so I set the image so that it is divided into 2 to 4 parts vertically and horizontally. I feel that this area also depends on the model. The aspect ratio of the grid is also set to be random, but the effect has not been confirmed.

** Checkered pattern-mesh pattern switching threshold: ** The horizontal mask is excluded with a 30% probability, and the vertical mask is excluded with a 30% probability. As a result, 49% will have a checkered pattern, 42% will have a mesh pattern, and the remaining 9% will have no mix. After all, it does the same thing as adjusting the β distribution used in CutMix and so on.

Learning conditions

Result evaluation

The table below shows the average values executed three times after tuning the learning rate and schedule parameters.

Case Epochs Val_Accuracy Val_Loss
No Augmentation 25 0.805 0.710
CutMix (beta=alpha=0.7) 32 0.841 0.505
GridMix 45 0.852 0.463

Grid Mix is slow to converge ... You may want to cut off the first few epochs. But the accuracy is a little better. It's only one case at most, but I feel a little possibility.

Summary

In conclusion, ** CutMixing like a Grid may be better than regular CutMix **. Since the verification is insufficient, it is only possible. I can't say anything without trying a little more. If anyone feels like it, they will cry and be happy if you give it a try. If it doesn't work at all, I cry and apologize.

Recommended Posts

New Data Augmentation? [Grid Mix]
Data Augmentation with openCV
Learn new data with PaintsChainer
[PyTorch] Data Augmentation for segmentation
How to Data Augmentation with PyTorch