This time, I created a program that performs image data expansion processing in a batch using python (referring to the one handled by ImageDataGenerator of keras). We will also touch on a brief description and notes on the data expansion process. The source code is here.
A large amount of high-quality learning data is required to build an image recognition / classification model by deep learning. However, in most cases, you will face a situation where the amount of training data is small. If the learning data is small, it can be increased by brute force (such as shooting an object to be learned), but it takes a lot of time and effort. Data expansion processing is a technique that is useful when "there is little learning data". Specifically, we will expand the number of data by applying inversion, enlargement / reduction processing, etc. to the original training data.
There are various types of data expansion processing at the moment, but this time we will deal with the following processing. In order to display the result of applying the extended processing, from "Commercial free photo search", a cat image ([One day wormwood by sabamiso](http://www.igosso.net/se.cgi?q=%E3%81%82%E3%82%8B%E6%97%A5%E3%81%AE%E3%82%88%E3% 82% 82% E3% 81% 8E & sa =% E6% A4% 9C% E7% B4% A2 & lid = 1 & lia = 1 & lib = 1 & lic = 1)) was used.
--Rotation It is a process to rotate to an arbitrary angle. I think this process assumes that the angles of the cats being shot are different.
--Translation in horizontal and vertical directions This is a process to translate the subject of an image in the horizontal and vertical directions. I think this process is supposed to be when the cat being photographed is on the left or above.
--Enlarge / Reduce This is the process of enlarging or reducing the subject of an image. I think this process is intended for shooting cats that are near or far away.
--Color tone change It is a process to brighten or darken the whole. I think this process assumes that the shooting environment is bright or dark.
Flip horizontal It is a process to reverse the left and right of the image. I think this process is intended for shooting cats facing left or right.
Random Erasing It is a process to mask (cover) a part of the image with a rectangle. At this time, 1. whether to mask or not, 2. the size of the rectangle, and 3. the aspect ratio of the rectangle are randomly determined.
By executing "main.py", the processing listed in "Data expansion processing handled this time" is combined and applied to create a learning image. You can use it as you like by setting variables in the following source code.
main.Part of py
# -------------------Below, each variable specified individually-------------------
input_dir = 'trainImg' #Folder name containing the original image of learning
output_dir = "output" #Output folder name after expansion processing
num = 10 #Number of images to expand
generator = ImageDataGenerator(
rotation_range=90, #Set the rotation angle to 90 °
width_shift_range=0.1, #Randomly shift horizontally
height_shift_range=0.1, #Randomly shift vertically
zoom_range=0.3, #Range to scale
channel_shift_range=50.0, #Add a random value to the pixel value
horizontal_flip=False, #Randomly flipped vertically
vertical_flip=True #Randomly flipped horizontally
)
# -------------------As mentioned above, each variable specified individually-------------------
If "input_dir" contains 5 learning original images and "main.py" is executed with the above variables, 5x10 images will be generated in "output_dir".
Data expansion processing does not have to be applied. For example, when considering creating a character recognition model for hiragana, a character image (see: https://lab.ndl.go.jp/cms/hiragana73 When data expansion processing is applied to jp / cms / hiragana73)), the following problems occur.
--When rotation processing is applied to "i" in hiragana (left: original image, right: processed image) As shown in the above result, the processed image will have characters similar to "ko", so it may be misrecognized.
--When the left-right reversal processing is applied to the hiragana "U" (left: original image, right: processed image) As shown in the above result, the processed image is a non-existent character, which causes a decrease in the accuracy of the character recognition model (meaningless learning).
In this article, I gave a brief explanation of creating a data expansion processing program using python and data expansion processing. Data expansion processing can be done easily, and if it goes well, it will lead to improvement in accuracy. However, depending on the target of the training image, applying the data expansion process may create a meaningless training image. Therefore, I think it is necessary to consider which transformation (rotation, etc.) should be used and how much transformation (rotation, how much angle) should be performed according to the target of the training image.
Recommended Posts