CNN 1 Image Recognition Basics

Aidemy 2020/10/2

Introduction

Hello, it is Yope! I am a liberal arts student, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This is CNN's first post. Nice to meet you.

What to learn this time ・ About image recognition ・ Implementation of CNN ・ Hyperparameters on CNN

About image recognition

What is image recognition?

-Image recognition is a technology that detects "things" and "features" such as characters and faces that appear in images and videos. ・ This time, we will learn __CNN (Convolutional Neural Network) __, a deep learning method widely used in image recognition.

About CNN

-CNN is a neural network that extracts features using a layer called __ "convolution layer" __ that has a structure similar to the visual cortex of the human brain. It has particularly high performance in the field of image recognition. -The CNN convolution layer can process image data that remains two-dimensional. That is, __ is excellent for extracting 2D features such as lines and corners. -In addition to the convolution layer, there is also a __ "pooling layer" __. The pooling layer reduces the information obtained from the __convolution layer and finally classifies the images __.

About the convolution layer

-The convolution layer is a layer that focuses on a part of the __input data and examines (= convolves) the features of the image in that part. -The features to be extracted are automatically learned by the training data and the value of the loss function! -Each feature (nose, mouth, etc. for a human face) is treated as a weight matrix called __filter (kernel) __ inside the program, and one filter is used for each feature.

About the pooling layer

-The pooling layer is a layer that reduces (compresses) the output data from the __convolution layer and reduces the amount of data __. The reason for contracting is that among the features of image data, the same features are often clustered in the same place, while places without features are often widely distributed, so if you convolve it as it is __ This is because the output becomes wasteful __. -As a method of reduction (compression), take the maximum value of each feature (Max pooling) or take the average (Avarage pooling).

CNN implementation

-Implement CNN using Keras + TensorFlow. Most of them are used in practice. -The creation method is the same as the "supervised learning" that I did before. __ First, create a Sequential instance, then add layers layer by layer with the add method, and finally compile with the compile method __.

#Create an instance
model=Sequential()
#Add layer (parameters will be described later)
model.add(Dense(128))
#Addition of convolution layer
model.add(Conv2D(filters=64, kernel_size=(3, 3)))
#Addition of pooling layer
model.add(MaxPooling2D(pool_size=(2,2)))
#compile
model.compile(optimizer=sgd,loss="categorical_crossentropy",metrics=["accuracy"])

Classification of handwritten characters

-Classify the handwritten data set MNIST using CNN. Details will be omitted, but the procedure as described in the previous section can be used. -In addition to handwritten characters, we also classified "CIFER-10", which is a data set of images of vehicles and animals.

Hyperparameters

Overview of CNN hyperparameters

・ Hyperparameters of convolution layer (Conv) -Filters: Number of features to extract -Kernel_size: Kernel size -Strides: Distance to move the kernel ・ Padding: The size of the margin created on the outside of the image ・ Hyperparameters of the pooling layer (Pool) -__ Pool size__: Specify the range to be pooled at once -__ Strides__: Distance to move the pooling range (pooling interval) ・ Padding: The size of the margin created on the outside of the image

Hyperparameters of the convolution layer (Conv)

filters -Filters specifies the number of feature maps to generate, that is, __type of features to extract __. ・ If filters are too small, features cannot be extracted sufficiently, and if they are too large, overfitting will occur.

kernel_size -Kernel_size indicates the size of the __kernel __. -The kernel is a weight matrix used for convolution, that is, it is like a lens __ that detects __ features. ・ If kernel_size is too small, even small features will not fit in the lens and cannot be detected well. On the contrary, if it is too large, a feature that should be detected as a small feature is also extracted as one large feature.

strides -Strides specifies the interval for extracting features, that is, the distance __ for moving the kernel. -The smaller the strides value, the shorter the interval for extracting features, so the features can be extracted in detail. However, since there are many overlapping parts, the detection of the same part increases. ・ However, __generally, the smaller the strides, the better __. The default is the lowest (1,1).

padding -Padding indicates the size of the margin created on the outside of the __ image __. The margin is created to prevent unintended image reduction during convolution. In addition, there are various merits such as the characteristics of the end data being taken into consideration, the frequency of data updates increasing, and the number of input / output units of each layer being adjustable. -In general, set the pixel to be added to 0. This is called "zero padding". ・ In addition to expressing the width of the margin such as __padding = (1,1) __, in the Conv2D layer, it can also be expressed as padding = valid, padding = same. The former does not pad and the latter is the output. Make a margin so that the size matches the size of the input.

Hyperparameters of the pooling layer (Pool)

pool_size -Pool_size is the range __ that can be pooled at one time. The treatment is the same as the kernel, it's like a __ lens __. -If pool_size is large, it can be said that it is preferable because the output does not change even if the position of the feature changes a little. __ Generally, it should be (2x2) __.

strides -Strides is the distance __ that moves the __pooling range. The behavior is the same as for the convolutional kernel.

padding -Padding works the same as that of the convolution layer. The setting method is the same, so refer to that.

Summary

-In the field of image recognition, deep learning is performed using __CNN (convolutional neural network) __. -In CNN, a part of the image is extracted and features are found __ "Convolution layer (Conv)" _, and the extracted information is reduced and classified to reduce data waste and improve accuracy _ There is a "Pool" __. In addition, there is also the "Dense" learned in "Supervised Learning". -There are also hyperparameters. The hyperparameters of the convolution layer are filters, which indicates the number of features to be extracted, kernel_size, which indicates the size of the kernel (lens for extracting features), strides, which indicates the distance to move the kernel, and the margins created outside the image. There is padding indicating the size. -The hyperparameters of the pooling layer (Pool) include pool size, which indicates the range to be pooled at one time, strides, which indicates the distance to move the pooling range, and padding, which indicates the size of the margin created outside the image.

This time is over. Thank you for reading until the end.

Recommended Posts

CNN 1 Image Recognition Basics
Application of CNN2 image recognition
Image recognition
Image recognition environment construction and basics
Python: Application of image recognition using CNN
Image recognition using CNN Horses and deer
Image recognition with keras
Pepper Tutorial (7): Image Recognition
Deep learning image recognition 1 theory
CNN (1) for image classification (for beginners)
Image recognition with Keras + OpenCV
Real-time image processing basics with opencv
python x tensoflow x image face recognition
(Test automation) Make image recognition ambiguous
Deep learning image recognition 2 model implementation
Speech Recognition: Genre Classification Part2-Music Genre Classification CNN
Image recognition of fruits using VGG16
Basics of binarized image processing with Python
Category estimation using docomo's image recognition API
Image recognition model using deep learning in 2016
Deep learning image recognition 3 after model creation
I tried simple image recognition with Jupyter