Hello Licht. Following here, Deep Learning Tutorial Chapter 2 Describes the generation of a Deep Learning prediction model by machine learning. I will explain in a later chapter how neural networks specifically perform machine learning. Here, we will introduce practical usage.
First, download the source code from this Github and put everything under hiraganaNN.py in the same directory as the image dataset. In Chapter 2, we will use hiraganaNN.py, dataArgs.py, hiragana_unicode.csv.
After moving to the HIRAGANA_NN directory in the terminal (command prompt)
python hiraganaNN.py
Start with.
This will start machine learning by Deep Learning, but it will take time to learn, so I will explain a little while waiting. In the HIRAGANA_NN directory, images (110 * 110 pixels) divided into directories for each hiragana are stored. For example, in the 305e directory, images of hiragana "zo" are registered in various fonts and handwriting as follows.
What is the purpose of machine learning in general, from various "zo" images? It is an attempt to learn that. In other words, these are "zo" (learning)
This is "Zo", isn't it? I want to do something (called identification / prediction / recognition).
It sounds easy, but because the machine is simple, we make mistakes that humans wouldn't expect. For example, I learned "Zo" above. So the bottom is not "Zo"! Kippari
(Because it is tilted a little) Mistakes (deterioration of versatility) due to such excessive learning (other than the above "zo" are not recognized as "zo") are called "overfitting". In addition, deterioration of learning efficiency, which makes learning difficult in places that are not originally related to recognizing the color and density of characters, is also a cause of performance deterioration.
Performs "pre-processing" to avoid performance degradation such as overfitting and deterioration of learning efficiency. There are various pre-processing, but data expansion (rotation, movement, elastic distortion, noise, etc.) (whatever comes if you learn various "zo"!),
One example is data normalization (grayscale, whitening, batch normalization, etc.) that simplifies the problem and improves learning efficiency.
From the first line
unicode2number = {}
import csv
train_f = open('./hiragana_unicode.csv', 'rb')
train_reader = csv.reader(train_f)
train_row = train_reader
hiragana_unicode_list = []
counter = 0
for row in train_reader:
for e in row:
unicode2number[e] = counter
counter = counter + 1
Here, each hiragana is numbered. The one with unicode 304a (o) is number 0, the one with 304b (ka) is number 1, and so on. next
files = os.listdir('./')
for file in files:
if len(file) == 4:
#Hiragana directory
_unicode = file
imgs = os.listdir('./' + _unicode + '/')
counter = 0
for img in imgs:
if img.find('.png') > -1:
if len(imgs) - counter != 1:
...
Here, the image as input data (learning data) is read. Load only the last image in each directory for testing. When reading
x_train.append(src)
y_train.append(unicode2number[_unicode])
Image data is stored in x_train (x_test), and the correct label (0-83) is stored in y_train (y_test). The following part is to enlarge the data for one input data image
for x in xrange(1, 10):
dst = dargs.argumentation([2, 3])
ret, dst = cv2.threshold(
dst, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
x_train.append(dst)
y_train.append(unicode2number[_unicode])
Here, it is randomly moved and rotated to expand it to 10 sheets. A few of dargs.argumentation ([2, 3]) are terribly difficult to understand, but in order 2: rotation (rotation with three-dimensional depth) and 3: movement, after performing rotation processing, move processing It is given. Of each directory
next
x_train = np.array(x_train).astype(np.float32).reshape(
(len(x_train), 1, IMGSIZE, IMGSIZE)) / 255
y_train = np.array(y_train).astype(np.int32)
The grayscale image has a pixel value of 0-255, but it is normalized to a value of 0-1 by dividing this by 255. This process improves learning efficiency.
The above is the preparation for training data (x_train, y_train), but the preparation for test (x_test, y_test) is also done in the same way. Here, we will read the last one in each directory for testing and verify the accuracy of the machine-learning model. The image for the test is also enlarged and read, but the reason for this is written around Chapter 7.
Since the following is the specific structure of Deep Learning, I will summarize the explanation in a later chapter.
While doing so, the progress of machine learning came out in the terminal.
('epoch', 1)
COMPUTING...
train mean loss=3.53368299341, accuracy=0.161205981514
test mean loss=1.92266467359, accuracy=0.506097565337
('epoch', 2)
COMPUTING...
train mean loss=1.66657279936, accuracy=0.518463188454
test mean loss=1.0855880198, accuracy=0.701219529277
.
.(A warning will appear, but you can ignore it for the time being.)
('epoch', 16)
COMPUTING...
train mean loss=0.198548029516, accuracy=0.932177149753
test mean loss=0.526535278777, accuracy=0.844512195849
.
.
('epoch', 23)
COMPUTING...
train mean loss=0.135178960405, accuracy=0.954375654268
test mean loss=0.686121761981, accuracy=0.814024389154
Think of loss as the error between the output predicted by Deep Learning and the correct answer. accuracy is the percentage of correct answers In machine learning, the goal is to reduce the loss of test data.
As the learning progresses, the loss of train and test goes down, but the loss of train goes down after epoch16. There is a tendency for overfitting to increase the loss of test. When this happens, learning ends because it is the limit. For the time being, test loss = 0.526 of epoch16 gives the best result in this model. (Efforts to improve this accuracy will be discussed in a later chapter)
Since the learning result of Deep Learning of each epoch is saved in the same directory as the source code, Save the file'model16', which has the best results. (You can delete other model files)
In the next Chapter 3, we will make actual predictions using this model.
chapter | title |
---|---|
Chapter 1 | Building a Deep Learning environment based on chainer |
Chapter 2 | Creating a Deep Learning Predictive Model by Machine Learning |
Chapter 3 | Character recognition using a model |
Chapter 4 | Improvement of recognition accuracy by expanding data |
Chapter 5 | Introduction to neural networks and explanation of source code |
Chapter 6 | Improvement of learning efficiency by selecting Optimizer |
Chapter 7 | TTA,Improvement of learning efficiency by Batch Normalization |
Recommended Posts