I tried handwriting recognition of runes with scikit-learn

This article is the 15th day article of Fujitsu Systems Web Technology Advent Calendar. (Promise) The content of this article is my own opinion and does not represent the organization to which I belong.

Introduction

In this article, we use ** Python's machine learning library, scikit-learn. This is a summary of the procedure and results when I tried handwriting recognition of runes **. While the ancestors of this Advent Calendar have posted great features and know-how that seem to be quite useful in actual work I just make something that is just fun for me, and I can't help it. .. I hope you can take a quick look.

Also, since the author of this article is a beginner in machine learning, there may be more content now. Also for those who are going to touch python, scikit-learn I will do my best to provide reasonable information. Thank you.

background

I don't touch it at all in my daily work, but I'm interested in machine learning and want to learn the basics of learning and movement. I touched scikit-learn. I could have tried using the dataset that the library has, but I want to know "what kind of data can I prepare and what can I do?" I decided to start from the point of preparing the data.

** (digression) ** Runes are kind of cool, aren't they?

What was used

--Anaconda: A package that includes Python itself and commonly used libraries. --scikit-learn: A library that makes it easy to use neural networks in Python. It is open source. This time, we will use a model called MLP Classifier that performs "classification".

--E-cutter: Free software for image division. Used to divide the handwritten character image.

Data preparation

This time, we will focus on "Germanic common runes (24 characters)" among the runes.

There seems to be no convenient data such as "rune character data for machine learning", so prepare your own image. This time, I created it by the following method.

(1) Create an image with handwritten characters lined up at regular intervals by handwriting. (It will be convenient later if the image file name is drawn with one character (example: "ᚠ.png ")) (2) Divide the image into equal parts with E-cutter (free software). 01_画像取得Ecutter.PNG The divided image is saved in the specified folder with "[original file name] _ [branch number] .png ", which was very convenient. Once, 18 image data were created for each type of rune character.

Loading images

From here, we will process in Python. First, load the image.

import cv2 #Library for image conversion
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import os, glob

#Array to store image data
X = []
#Characters corresponding to image data(answer)Array to store
Y = []

#Training data directory file
dir = "[Image data storage directory]"
files = glob.glob(dir + "\\*.png ")

#Vertical and horizontal size of the image(pixel)
image_size = 50

#Read the file in the directory and add it to the list of training data
for i, file in enumerate(files):
    image = Image.open(file)
    #Convert to 8bit grayscale
    image = image.convert("L")
    image = image.resize((image_size, image_size))
    data = np.asarray(image).flatten()
    X.append(data)
    moji = file.split("\\")[-1].split("_")[0]
    Y.append(moji)
 
X = np.array(X)
Y = np.array(Y)

Let's display the loaded image.

#Visualize the first data
showimage = np.reshape(X[0], (50,50)) #Make a 50x50 double array with reshape
plt.subplot(121),plt.imshow(showimage),plt.title('Input')
plt.show()

It can be read as data of 50 * 50 size.

◆ Let's learn and classify

The number of data is quite small, but let's learn and classify with this data (24 (characters) * 18 (sheets)) once!

・ Separate data for learning and testing

#Divide the data for training and testing
x_train, x_test, y_train, y_test = model_selection.train_test_split(X, Y, test_size=0.1, random_state=0)

・ Learn and judge

#Learning
clf = MLPClassifier(hidden_layer_sizes=(200,))
clf.fit(x_train, y_train)

#Categorize data for testing
y_pred = clf.predict(x_test)

#View results
print("---Assumed answer---")
print(y_test)

print("---The answer given by the model---")
print(y_pred)

print("---Correct answer rate---")
accuracy_score(y_test, y_pred)

·result

** The percentage of correct answers is very low ...! !! (7.1%) orz **

Most of them are classified as "ᚱ", and the other characters judged are also incorrect ... However, those classified other than "ᚱ" seem to be classified as runes with similar shapes. I'm a little happy to get a glimpse of the sprout of intelligence (although I made a mistake after all).

It seems that the learning data is still insufficient, but it is troublesome to add more handwritten character data. Try Data Augmentation.

Data Augmentation

Prepared [character type] * It seems that the data for learning is not enough with only 18 images, so Convert data to increase the amount of data.

The following article was very helpful for data augmentation. https://products.sint.co.jp/aisia/blog/vol1-7#toc-3

It seems that there are methods such as "increasing noise", "reversing", "shifting", and "transforming". This time, we will perform "transformation" and "rotation" in it.

Deformation

#Transform the image
for i, file in enumerate(files):
    image = Image.open(file)   
    image = image.resize((image_size, image_size))
    image = image.convert("L")
    moji = file.split("\\")[-1].split("_")[0]
    
    #Invert the bits of the data into an array
    image_array = cv2.bitwise_not(np.array(image))
    
    ##Transformation ①
    #Create transformation map of image
    pts1 = np.float32([[0,0],[0,100],[100,100],[100,0]])
    pts2 = np.float32([[0,0],[0, 98],[102,102],[100,0]])
    #Image transformation
    M = cv2.getPerspectiveTransform(pts1,pts2)
    dst1 = cv2.warpPerspective(image_array,M,(50, 50))
    
    X.append(dst1.flatten())
    Y.append(moji)
    
    ##Deformation ②
    #Create transformation map of image
    pts2 = np.float32([[0,0],[0, 102],[98, 98],[100,0]]) 
    #Image transformation
    M = cv2.getPerspectiveTransform(pts1,pts2)
    dst2 = cv2.warpPerspective(image_array,M,(50, 50))
    X.append(dst2.flatten())
    Y.append(moji)

#Display the last data
showimage = np.reshape(image_array, (50,50)) #Make a 50x50 double array with reshape
plt.subplot(121),plt.imshow(showimage),plt.title('Input')
plt.subplot(122),plt.imshow(dst),plt.title('Output')
plt.show()

Display of deformed image

I was able to generate an image that was slightly deformed compared to the original image. Two deformed images are generated for each original image and added to the training data.

rotation

To further increase the data, create an image with the original image rotated 15 degrees and add it.

#Rotate the image
for i, file in enumerate(files):
    image = Image.open(file)   
    image = image.resize((image_size, image_size))
    image = image.convert("L")
    moji = file.split("\\")[-1].split("_")[0]
    
    #Invert the bits of the data into an array
    image = cv2.bitwise_not(np.array(image))
    
    #1. 1. Rotate 15 degrees clockwise
    #Specify the rotation angle
    angle = -15.0
    #Specify scale
    scale = 1.0
    #use getRotationMatrix2D function
    trans = cv2.getRotationMatrix2D((24, 24), angle , scale)
    #Affine transformation
    image1 = cv2.warpAffine(image, trans, (50, 50))
    X.append(image1.flatten())
    Y.append(moji)
    
    #2. Rotate 15 degrees counterclockwise
    #Specify the rotation angle
    angle = 15.0
    #Specify scale
    scale = 1.0
    #use getRotationMatrix2D function(Arguments: center position, rotation angle, scale)
    trans = cv2.getRotationMatrix2D((24, 24), angle , scale)
    #Affine transformation
    image2 = cv2.warpAffine(image, trans, (50, 50))
    X.append(image2.flatten())
    Y.append(moji)

#Display the last data
showimage = np.reshape(image, (50,50)) #Make a 50x50 double array with reshape
plt.subplot(121),plt.imshow(showimage),plt.title('Input')
plt.subplot(122),plt.imshow(image1),plt.title('Output')
plt.show()

showimage = np.reshape(image, (50,50)) #Make a 50x50 double array with reshape
plt.subplot(121),plt.imshow(showimage),plt.title('Input')
plt.subplot(122),plt.imshow(image2),plt.title('Output')
plt.show()

Display of rotated image

The original image was rotated 15 degrees to the left and right to generate each image. This image is also added to the training data.

With this, the number of data for learning is 5 times the original (original, transformation ①, transformation ②, clockwise rotation, counterclockwise rotation).

Learning / recognition [re]

Let's try to recognize (classify) images for learning and testing again!

·result

** Accuracy has improved ...! (86.9%) **

Looking back

Which data was valid after all

The result was something like "I increased the data and the accuracy of the analysis improved! I did it!" After all, I was wondering how effective each of the deformed images was, so Roughly speaking, I changed the breakdown of the data to be trained and tried to verify the correct answer rate. 雑まとめ.PNG I was able to confirm that the correct answer rate was high enough to give variation to the image.

What I want to do in the future

I would like to continue to verify whether other methods ("trimming", "noise" ...) will further improve the accuracy. In addition, this time I will verify the pattern in which the number of nodes in the hidden layer of the neural network was changed, which was fixed at 200.

Thank you for reading!