Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 3 [Character recognition using a model]

Hello Licht. Following here, Deep Learning Tutorial Chapter 3 Describes character recognition using a model.


We will use model 16 generated by the machine learning of Deep Learning in Chapter 2. First, let's prepare an image of hiragana. Actually, fonts that are not in the data used for learning are good, but it is difficult to find them, so prepare them in a textbook.

If you have trouble finding it, please download and use the image below. (I wrote "A" in OneNote and cut it out as an image) a.png


Place the image "A" (a.png) in the same directory as model16, lets_recognize.png

Enter the following command from the terminal

python --img a.png --model model16

Then in the output

Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:10, Unicode:304b,Hiragana:Or
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:78, Unicode:308f,Hiragana:Wow
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
**Final judgment Neuron number:1, Unicode:3042,Hiragana:Ah**

The prediction result came out. Although there are "ka" and "wa" among the candidates, the output of the final judgment is "a", so recognition is successful. This candidate is the recognition result of each image when it is enlarged to multiple images (14 images). The final judgment is the average of each recognition result.


TTA (Test Time Argumentation) that this is not judged from one image, but the recognition accuracy is improved by judging from various angles. It is the same concept as the method. Try other perceptions as well.

a2.png a2.png recognition

python --img a2.png --model model16


**Final judgment Neuron number:1, Unicode:3042,Hiragana:Ah**

a3.png a3.png recognition

python --img a2.png --model model16


**Final judgment Neuron number:1, Unicode:3042,Hiragana:Ah**

It's OK.

Handwritten hiragana character recognition

Let's try a little more difficult recognition because it is the correct answer. Let's recognize the handwritten "A". a_tegaki.png

To be honest, it's unreasonable, but let's just do it.

python --img a_tegaki.png --model model16

Start recognition with

Candidate neuron number:71, Unicode:3088,Hiragana:Yo
Candidate neuron number:30, Unicode:305f,Hiragana:Ta
Candidate neuron number:71, Unicode:3088,Hiragana:Yo
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:24, Unicode:3059,Hiragana:Su
Candidate neuron number:71, Unicode:3088,Hiragana:Yo
Candidate neuron number:30, Unicode:305f,Hiragana:Ta
Candidate neuron number:24, Unicode:3059,Hiragana:Su
Candidate neuron number:32, Unicode:3061,Hiragana:Chi
Candidate neuron number:32, Unicode:3061,Hiragana:Chi
Candidate neuron number:24, Unicode:3059,Hiragana:Su
Candidate neuron number:30, Unicode:305f,Hiragana:Ta
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
**Final judgment Neuron number:32, Unicode:3061,Hiragana:Chi**

It's a failure! Lol There is a little "A" as a candidate, but it was not good because it was the final decision "Chi". Well, model 16 has a loss of 0.526, and I learned it in print, so it's like this. However, if you try with a model with loss reduced to 0.237 by applying some improvement measures in the following chapters

python --img a2.png --model loss237model
Candidate neuron number:30, Unicode:305f,Hiragana:Ta
Candidate neuron number:52, Unicode:3075,Hiragana:Fu
Candidate neuron number:52, Unicode:3075,Hiragana:Fu
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:32, Unicode:3061,Hiragana:Chi
Candidate neuron number:71, Unicode:3088,Hiragana:Yo
Candidate neuron number:52, Unicode:3075,Hiragana:Fu
Candidate neuron number:9, Unicode:304a,Hiragana:O
Candidate neuron number:52, Unicode:3075,Hiragana:Fu
Candidate neuron number:9, Unicode:304a,Hiragana:O
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:9, Unicode:304a,Hiragana:O
Candidate neuron number:10, Unicode:304b,Hiragana:Or
**Final judgment Neuron number:1, Unicode:3042,Hiragana:Ah**

I can recognize it correctly (although it is barely possible)! Basically, we have used print data as learning data for machine learning, but it is a model that can recognize handwriting a little. Since the candidates recognize it by mistake, we can see the usefulness of TTA.

By the way There are 3 "A" and 4 "Fu" as candidates, so why is the final decision "A"? If you think, it's awesome! However, we are not making a simple majority vote here. We are considering the results of making more confident decisions. In other words, while all four "fu" were predicted by Ayafuya, three "A" were confident, so the final judgment was "A".

Source code overview

Finally, an overview of the source code. Mostly the same as used for machine learning.

def forward(x_data, train=False):
    x = chainer.Variable(x_data, volatile=not train)
    h = F.max_pooling_2d(F.relu(model.bn1(model.conv1(x))), 2)
    h = F.max_pooling_2d(F.relu(model.bn2(model.conv2(h))), 2)
    h = F.max_pooling_2d(F.relu(model.conv3(h)), 2)
    h = F.dropout(F.relu(model.fl4(h)), train=train)
    y = model.fl5(h)

This is a deep learning neural network structure and must have the same structure as the forward of

src = cv2.imread(args.img, 0)
src = cv2.copyMakeBorder(
    src, 20, 20, 20, 20, cv2.BORDER_CONSTANT, value=255)
src = cv2.resize(src, (IMGSIZE, IMGSIZE))

Read input image and resize to 64 * 64 image size. I also add a margin of 20 pixels to the image. With the current model, it cannot be recognized well unless the margin is an appropriate amount.

for x in xrange(0, 14):
    dst = dargs.argumentation([2, 3])
    ret, dst = cv2.threshold(dst,
    #For image confirmation
    #cv2.imshow('ARGUMENTATED', dst)

    xtest = np.array(dst).astype(np.float32).reshape(
        (1, 1, IMGSIZE, IMGSIZE)) / 255
    if result is None:
        result = forward(xtest)
        result = result + forward(xtest)

After enlarging the data of the input image and binarizing it, the pixel value is normalized to 0-1 and predicted by the forward function.

tmp = np.argmax(forward(xtest))
for strunicode, number in unicode2number.iteritems():
    if number == tmp:
        hiragana = unichr(int(strunicode, 16))
        print 'Candidate neuron number:{0}, Unicode:{1},Hiragana:{2}'.format(number, strunicode, hiragana.encode('utf_8'))

The recognition result of the neural network (number 0-82) is converted to unicode and output as hiragana. Chapter 3 ends here. In Chapter 4, we will enlarge one image to 3500 to see the accuracy.

chapter title
Chapter 1 Building a Deep Learning environment based on chainer
Chapter 2 Creating a Deep Learning Predictive Model by Machine Learning
Chapter 3 Character recognition using a model
Chapter 4 Improvement of recognition accuracy by expanding data
Chapter 5 Introduction to neural networks and explanation of source code
Chapter 6 Improvement of learning efficiency by selecting Optimizer
Chapter 7 TTA,Improvement of learning efficiency by Batch Normalization

Recommended Posts

Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 3 [Character recognition using a model]
Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 2 [Model generation by machine learning]
Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 1 [Environment construction]
Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 4 [Improvement of recognition accuracy by expanding data]
[Introduction to Reinforcement Learning] Reinforcement learning to try moving for the time being
Python learning memo for machine learning by Chainer Chapter 8 Introduction to Numpy
Python learning memo for machine learning by Chainer Chapter 10 Introduction to Cupy
Python learning memo for machine learning by Chainer Chapter 9 Introduction to scikit-learn
Image recognition model using deep learning in 2016
A useful note when using Python for the first time in a while
I tried using scrapy for the first time
How to use MkDocs for the first time
Implementation of Deep Learning model for image recognition
I installed Chainer, a framework for deep learning
Try posting to Qiita for the first time
Creating a position estimation model for the Werewolf Intelligence Tournament using machine learning
GTUG Girls + PyLadiesTokyo Meetup I went to machine learning for the first time
Register a task in cron for the first time
I want to create a lunch database [EP1] Django study for the first time
I tried hosting a TensorFlow deep learning model using TensorFlow Serving
Introduction to Bayesian Modeling Using pymc3 Bayesian-Modeling-in-Python Japanese Translation (Chapter 0-2)
Try to model a multimodal distribution using the EM algorithm
I tried to divide with a deep learning language model
I want to create a Dockerfile for the time being.
If you're learning Linux for the first time, do this!
Kaggle for the first time (kaggle ①)
Kaguru for the first time
Introduction to Deep Learning ~ Learning Rules ~
Deep Reinforcement Learning 1 Introduction to Reinforcement Learning
Introduction to Deep Learning ~ Backpropagation ~
Chapter 1 Introduction to Python Cut out only the good points of deep learning made from scratch
Differences C # engineers felt when learning python for the first time
[Introduction to Python] How to split a character string with the split function
Before the introduction to machine learning. ~ Technology required for machine learning other than machine learning ~
Try to edit a new image using the trained StyleGAN2 model
[Introduction to Python] How to use the in operator in a for statement?
Raspberry Pi --1 --First time (Connect a temperature sensor to display the temperature)
Introduction to Deep Learning (2) --Try your own nonlinear regression with Chainer-
Summary of pages useful for studying the deep learning framework Chainer
Disclose the know-how that created a similar image search service for AV actresses by deep learning by chainer
Introduction to Deep Learning ~ Function Approximation ~
Introduction to Deep Learning ~ Coding Preparation ~
[For self-learning] Go2 for the first time
[Hi Py (Part 1)] I want to make something for the time being, so first set a goal.
See python for the first time
Start Django for the first time
Creating a learning model using MNIST
Introduction to Deep Learning ~ Dropout Edition ~
Introduction to Deep Learning ~ Forward Propagation ~
Introduction to Deep Learning ~ CNN Experiment ~
How to study for the Deep Learning Association G test (for beginners) [2020 version]
I tried logistic regression analysis for the first time using Titanic data
[Introduction to machine learning] Until you run the sample code with chainer
Take the free "Introduction to Python for Machine Learning" online until 4/27 application
I tried using the trained model VGG16 of the deep learning library Keras
[Introduction to Python] How to write a character string with the format function
For the first time in Numpy, I will update it from time to time
Until the Deep Learning environment (TensorFlow) using GPU is prepared for Ubuntu 14.04
I tried the common story of using Deep Learning to predict the Nikkei 225
Python learning memo for machine learning by Chainer until the end of Chapter 2
I tried tensorflow for the first time