I tried to make Kana's handwriting recognition Part 2/3 Data creation and learning

Overview

Last time (1/3): https://qiita.com/tfull_tf/items/6015bee4af7d48176736 Next time (3/3): https://qiita.com/tfull_tf/items/d9fe3ab6c1e47d1b2e1e

Code: https://github.com/tfull/character_recognition

In creating the kana recognition system, we first built a model using CNN and confirmed how accurate it would be with MNIST. Next, prepare the image data of Kana, build the model in the same way, and improve the model.

Data creation

I can't think of any image data for Kana, so I will create it automatically. For the time being, it seems that some datasets are open to the public.

Use ImageMagick for automatic generation. Since you can add characters to the image with the convert command, create a black image first, and then write only one white character on it.

Data proliferation method

In order to prepare multiple images for one character, we have prepared a method to increase the data.

1: Use multiple fonts

If you write in different fonts, you can generate images with the same characters but different font types.

You can see the font with the following command, so pick up the one that seems to be usable.

convert -list font

One thing to keep in mind is that not all of them support Japanese, so even if you try to output kana, nothing may be written out.

Mac OS 10.15, which I was working on in the main, didn't have a font that looked good, so I generated the image on Ubuntu. The following fonts were included from the beginning, so I decided to use them.

font_list = [
    "Noto-Sans-CJK-JP-Thin",
    "Noto-Sans-CJK-JP-Medium",
    "Noto-Serif-CJK-JP"
]

2: Change the font size

You can generate different images by writing full text on the screen or writing a little conservatively. This time, I wrote letters while gradually increasing the size from about half the size to the size just below.

3: Shift the letters

If you write small characters, you can create spaces on the top, bottom, left, and right, so you can use the technique of shifting the characters vertically and horizontally. For example, if you think about shifting the blank / 2 and the blank / 3 up / down / left / right, you can generate 5 x 5 different images.

4: Rotate the character

You can rotate the characters with convert. You can increase the number of images by rotating it slightly clockwise or counterclockwise.

(Unused) Add blur

You can increase the number of images by preparing a blurred image, but what about an image in which half of the image is out of focus? I didn't do it because I thought. A sufficient number of images can be secured with 1 to 4.

(Unused) Add noise

By adding noise such as small dots to the image, it is possible that the image will not only increase but also become more resistant to noise. I didn't do it because I couldn't find an easy way to add nice noise, but it may be a good future task.

result

Generates an image with characters by combining 1 to 4 (multiplication). Created at 256px in height and width, more than 4000 images were obtained for each character. You can change the number of sheets by playing with the various parameters used in the method. There are 169 types of hiragana (0x3041 ~ 0x3093) and katakana (0x30A1 ~ 0x30F6), so the capacity is quite large.

code


data_directory = "/path/to/data"
image_size = 256

#Creating a black image
def make_template():
    res = subprocess.call([
        "convert",
        "-size", "{s}x{s}".format(s = image_size),
        "xc:black",
        "{}/tmp.png ".format(data_directory)
    ])

#Create an image of white text
def generate(path, font, pointsize, character, rotation, dx, dy):
    res = subprocess.call([
        "convert",
        "-gravity", "Center",
        "-font", font,
        "-pointsize", str(pointsize),
        "-fill", "White",
        "-annotate", format_t(rotation, dx, dy), character,
        "{}/tmp.png ".format(data_directory), path
    ])

#Move format function
def format_t(rotation, x, y):
    xstr = "+" + str(x) if x >= 0 else str(x)
    ystr = "+" + str(y) if y >= 0 else str(y)
    return "{r}x{r}{x}{y}".format(r = rotation, x = xstr, y = ystr)

Create a black image only once for the first time, and create a white character image while changing the parameters of font, pointsize, character, rotation, dx, dy in a loop.

Model building

Now that we have the image, we will build the model in the same way as MNIST, but it didn't work from the beginning. The value of the Cross Entropy error is the same for each batch, and when observing the value in the layer when training as a debug, the absolute value contains a large value such as hundreds or thousands, and the output Was always the same. That's why I was able to insert Batch Normalization to greatly improve accuracy.

import torch.nn as nn

class Model(nn.Module):
    def __init__(self, image_size, output):
        super(Model, self).__init__()
        n = ((image_size - 4) // 2 - 4) // 2

        self.conv1 = nn.Conv2d(1, 4, 5)
        self.relu1 = nn.ReLU()
        self.normal1 = nn.BatchNorm2d(4)
        self.pool1 = nn.MaxPool2d(2, 2)
        self.dropout1 = nn.Dropout2d(0.3)
        self.conv2 = nn.Conv2d(4, 16, 5)
        self.relu2 = nn.ReLU()
        self.normal2 = nn.BatchNorm2d(16)
        self.pool2 = nn.MaxPool2d(2, 2)
        self.dropout2 = nn.Dropout2d(0.3)
        self.flatten = nn.Flatten()
        self.linear1 = nn.Linear(n * n * 16, 1024)
        self.relu3 = nn.ReLU()
        self.normal3 = nn.BatchNorm1d(1024)
        self.dropout3 = nn.Dropout(0.3)
        self.linear2 = nn.Linear(1024, 256)
        self.relu4 = nn.ReLU()
        self.normal4 = nn.BatchNorm1d(256)
        self.dropout4 = nn.Dropout(0.3)
        self.linear3 = nn.Linear(256, output)
        self.softmax = nn.Softmax(dim = 1)

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.normal1(x)
        x = self.pool1(x)
        x = self.dropout1(x)
        x = self.conv2(x)
        x = self.relu2(x)
        x = self.normal2(x)
        x = self.pool2(x)
        x = self.dropout2(x)
        x = self.flatten(x)
        x = self.linear1(x)
        x = self.relu3(x)
        x = self.normal3(x)
        x = self.dropout3(x)
        x = self.linear2(x)
        x = self.relu4(x)
        x = self.normal4(x)
        x = self.dropout4(x)
        x = self.linear3(x)
        x = self.softmax(x)
        return x

Learning

Basically, you will learn in the same procedure as you did in MNIST. I used Cross Entropy Loss, Adam (learning rate = 0.001).

Points to note when learning

Since the image was generated while changing the parameters in a loop, avoid it because the data seems to be biased if trained in order. Also, since I want to learn each character evenly, I would like to learn them in order.

If you train while reading images in a loop, you will learn one image in one batch. However, there is a lot of image data, so if you read them all at once, you may run out of memory. To avoid both, I decided to use yield to read the data by chunk.

# a1,Get the double loop of a2 by the number of chunks
def double_range(a1, a2, chunk = 100):
    records = []

    for x1 in a1:
        for x2 in a2:
            records.append((x1, x2))
            if len(records) >= chunk:
                yield records
                records = []

    if len(records) > 0:
        yield records

A function that gives two arrays and returns the pairs obtained in a double loop by the number of chunks. Give this to for further.

Pseudo code


for indices in double_range("1~Shuffled image numbers up to N", "Number assigned to letters(0~168)"):
    inputs = []
    for i_character, i_image in indices:
        inputs.append("i_character i of the second character_image Load the first image")

    model.train(inputs) #Learning

With this, memory usage can be reduced by performing a loop that reads and trains images for the batch size.

Model performance

4236 [sheets / characters] ✕ 169 [characters] I started the experiment after creating the image data. Using 5% of the total as test data, we trained with 2 epochs and measured the correct answer rate of the test data. It was about 71.4%. At first, I made a mistake in the program and chose 4236 instead of 169, but at that time it was a mystery that about 80% was out. I want to improve the performance a little more, but it seems that I can make a recognition system and run it for the time being.

Recommended Posts

I tried to make Kana's handwriting recognition Part 2/3 Data creation and learning
I tried to make Kana's handwriting recognition Part 3/3 Cooperation with GUI using Tkinter
I tried to process and transform the image and expand the data for machine learning
I tried to classify Oba Hana and Emiri Otani by deep learning (Part 2)
I tried to make a simple image recognition API with Fast API and Tensorflow
I tried to make various "dummy data" with Python faker
I tried to make GUI tic-tac-toe with Python and Tkinter
I tried to implement Perceptron Part 1 [Deep Learning from scratch]
I tried to make deep learning scalable with Spark × Keras × Docker
I tried to make a periodical process with Selenium and Python
I tried using PyEZ and JSNAPy. Part 2: I tried using PyEZ
I tried to make a Web API
[Deep Learning from scratch] I tried to implement sigmoid layer and Relu layer.
I tried to classify Oba Hana and Emiri Otani by deep learning
I tried handwriting recognition of runes with scikit-learn
I tried using PyEZ and JSNAPy. Part 1: Overview
[Python] I tried to solve 100 past questions that beginners and intermediates should solve [Part 5/22]
I implemented DCGAN and tried to generate apples
I tried to save the data with discord
[Python] I tried to solve 100 past questions that beginners and intermediates should solve [Part7 / 22]
I tried to get CloudWatch data with Python
[Python] I tried to solve 100 past questions that beginners and intermediates should solve [Part 4/22]
[Python] I tried to solve 100 past questions that beginners and intermediates should solve [Part3 / 22]
[Python] I tried to solve 100 past questions that beginners and intermediates should solve [Part 1/22]
I tried to make AI for Smash Bros.
Introduction to AI creation with Python! Part 3 I tried to classify and predict images with a convolutional neural network (CNN)
[Python] I tried to solve 100 past questions that beginners and intermediates should solve [Part 6/22]
[Introduction to PID] I tried to control and play ♬
I tried to make a suspicious person MAP quickly using Geolonia address data
I tried to make a ○ ✕ game using TensorFlow
I tried to make deep learning scalable with Spark × Keras × Docker 2 Multi-host edition
I tried to make a real-time sound source separation mock with Python machine learning
I tried to predict horse racing by doing everything from data collection to deep learning
I tried to make a "fucking big literary converter"
I tried to move machine learning (ObjectDetection) with TouchDesigner
I tried to read and save automatically with VOICEROID2 2
I tried adding post-increment to CPython. Overview and summary
I tried to predict the J-League match (data analysis)
I tried to automatically read and save with VOICEROID2
I tried adding system calls and scheduler to Linux
I tried to erase the negative part of Meros
I tried to analyze J League data with Python
I tried to implement Grad-CAM with keras and tensorflow
I tried to make an OCR application with PySimpleGUI
[Deep Learning from scratch] I tried to explain Dropout
I tried to install scrapy on Anaconda and couldn't
I tried to compress the image using machine learning
I tried deep learning
I tried to debug.
I tried to paste
Introduction to AI creation with Python! Part 1 I tried to classify and predict what the numbers are from the handwritten number images.
I tried to verify the yin and yang classification of Hololive members by machine learning
I tried to make Othello AI that I learned 7.2 million hands by deep learning with Chainer
I tried to make a bot that randomly acquires Wikipedia articles and tweets once a day
I tried to predict and submit Titanic survivors with Kaggle
[Introduction to cx_Oracle] (Part 6) DB and Python data type mapping
I want to be able to analyze data with Python (Part 3)
I tried to search videos using Youtube Data API (beginner)
I tried to combine Discord Bot and face recognition-for LT-
I want to be able to analyze data with Python (Part 1)
I tried to speed up video creation by parallel processing