This article is the 15th day article of Fujitsu Systems Web Technology Advent Calendar. (Promise) The content of this article is my own opinion and does not represent the organization to which I belong.
In this article, we use ** Python's machine learning library, scikit-learn. This is a summary of the procedure and results when I tried handwriting recognition of runes **. While the ancestors of this Advent Calendar have posted great features and know-how that seem to be quite useful in actual work I just make something that is just fun for me, and I can't help it. .. I hope you can take a quick look.
Also, since the author of this article is a beginner in machine learning, there may be more content now. Also for those who are going to touch python, scikit-learn I will do my best to provide reasonable information. Thank you.
I don't touch it at all in my daily work, but I'm interested in machine learning and want to learn the basics of learning and movement. I touched scikit-learn. I could have tried using the dataset that the library has, but I want to know "what kind of data can I prepare and what can I do?" I decided to start from the point of preparing the data.
** (digression) ** Runes are kind of cool, aren't they?
--Anaconda: A package that includes Python itself and commonly used libraries. --scikit-learn: A library that makes it easy to use neural networks in Python. It is open source. This time, we will use a model called MLP Classifier that performs "classification".
--E-cutter: Free software for image division. Used to divide the handwritten character image.
This time, we will focus on "Germanic common runes (24 characters)" among the runes.
There seems to be no convenient data such as "rune character data for machine learning", so prepare your own image. This time, I created it by the following method.
(1) Create an image with handwritten characters lined up at regular intervals by handwriting. (It will be convenient later if the image file name is drawn with one character (example: "ᚠ.png ")) (2) Divide the image into equal parts with E-cutter (free software). The divided image is saved in the specified folder with "[original file name] _ [branch number] .png ", which was very convenient. Once, 18 image data were created for each type of rune character.
From here, we will process in Python. First, load the image.
import cv2 #Library for image conversion
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import os, glob
#Array to store image data
X = []
#Characters corresponding to image data(answer)Array to store
Y = []
#Training data directory file
dir = "[Image data storage directory]"
files = glob.glob(dir + "\\*.png ")
#Vertical and horizontal size of the image(pixel)
image_size = 50
#Read the file in the directory and add it to the list of training data
for i, file in enumerate(files):
image = Image.open(file)
#Convert to 8bit grayscale
image = image.convert("L")
image = image.resize((image_size, image_size))
data = np.asarray(image).flatten()
X.append(data)
moji = file.split("\\")[-1].split("_")[0]
Y.append(moji)
X = np.array(X)
Y = np.array(Y)
Let's display the loaded image.
#Visualize the first data
showimage = np.reshape(X[0], (50,50)) #Make a 50x50 double array with reshape
plt.subplot(121),plt.imshow(showimage),plt.title('Input')
plt.show()
It can be read as data of 50 * 50 size.
The number of data is quite small, but let's learn and classify with this data (24 (characters) * 18 (sheets)) once!
#Divide the data for training and testing
x_train, x_test, y_train, y_test = model_selection.train_test_split(X, Y, test_size=0.1, random_state=0)
#Learning
clf = MLPClassifier(hidden_layer_sizes=(200,))
clf.fit(x_train, y_train)
#Categorize data for testing
y_pred = clf.predict(x_test)
#View results
print("---Assumed answer---")
print(y_test)
print("---The answer given by the model---")
print(y_pred)
print("---Correct answer rate---")
accuracy_score(y_test, y_pred)
** The percentage of correct answers is very low ...! !! (7.1%) orz **
Most of them are classified as "ᚱ", and the other characters judged are also incorrect ... However, those classified other than "ᚱ" seem to be classified as runes with similar shapes. I'm a little happy to get a glimpse of the sprout of intelligence (although I made a mistake after all).
It seems that the learning data is still insufficient, but it is troublesome to add more handwritten character data. Try Data Augmentation.
Prepared [character type] * It seems that the data for learning is not enough with only 18 images, so Convert data to increase the amount of data.
The following article was very helpful for data augmentation. https://products.sint.co.jp/aisia/blog/vol1-7#toc-3
It seems that there are methods such as "increasing noise", "reversing", "shifting", and "transforming". This time, we will perform "transformation" and "rotation" in it.
#Transform the image
for i, file in enumerate(files):
image = Image.open(file)
image = image.resize((image_size, image_size))
image = image.convert("L")
moji = file.split("\\")[-1].split("_")[0]
#Invert the bits of the data into an array
image_array = cv2.bitwise_not(np.array(image))
##Transformation ①
#Create transformation map of image
pts1 = np.float32([[0,0],[0,100],[100,100],[100,0]])
pts2 = np.float32([[0,0],[0, 98],[102,102],[100,0]])
#Image transformation
M = cv2.getPerspectiveTransform(pts1,pts2)
dst1 = cv2.warpPerspective(image_array,M,(50, 50))
X.append(dst1.flatten())
Y.append(moji)
##Deformation ②
#Create transformation map of image
pts2 = np.float32([[0,0],[0, 102],[98, 98],[100,0]])
#Image transformation
M = cv2.getPerspectiveTransform(pts1,pts2)
dst2 = cv2.warpPerspective(image_array,M,(50, 50))
X.append(dst2.flatten())
Y.append(moji)
#Display the last data
showimage = np.reshape(image_array, (50,50)) #Make a 50x50 double array with reshape
plt.subplot(121),plt.imshow(showimage),plt.title('Input')
plt.subplot(122),plt.imshow(dst),plt.title('Output')
plt.show()
I was able to generate an image that was slightly deformed compared to the original image. Two deformed images are generated for each original image and added to the training data.
To further increase the data, create an image with the original image rotated 15 degrees and add it.
#Rotate the image
for i, file in enumerate(files):
image = Image.open(file)
image = image.resize((image_size, image_size))
image = image.convert("L")
moji = file.split("\\")[-1].split("_")[0]
#Invert the bits of the data into an array
image = cv2.bitwise_not(np.array(image))
#1. 1. Rotate 15 degrees clockwise
#Specify the rotation angle
angle = -15.0
#Specify scale
scale = 1.0
#use getRotationMatrix2D function
trans = cv2.getRotationMatrix2D((24, 24), angle , scale)
#Affine transformation
image1 = cv2.warpAffine(image, trans, (50, 50))
X.append(image1.flatten())
Y.append(moji)
#2. Rotate 15 degrees counterclockwise
#Specify the rotation angle
angle = 15.0
#Specify scale
scale = 1.0
#use getRotationMatrix2D function(Arguments: center position, rotation angle, scale)
trans = cv2.getRotationMatrix2D((24, 24), angle , scale)
#Affine transformation
image2 = cv2.warpAffine(image, trans, (50, 50))
X.append(image2.flatten())
Y.append(moji)
#Display the last data
showimage = np.reshape(image, (50,50)) #Make a 50x50 double array with reshape
plt.subplot(121),plt.imshow(showimage),plt.title('Input')
plt.subplot(122),plt.imshow(image1),plt.title('Output')
plt.show()
showimage = np.reshape(image, (50,50)) #Make a 50x50 double array with reshape
plt.subplot(121),plt.imshow(showimage),plt.title('Input')
plt.subplot(122),plt.imshow(image2),plt.title('Output')
plt.show()
The original image was rotated 15 degrees to the left and right to generate each image. This image is also added to the training data.
With this, the number of data for learning is 5 times the original (original, transformation ①, transformation ②, clockwise rotation, counterclockwise rotation).
Let's try to recognize (classify) images for learning and testing again!
** Accuracy has improved ...! (86.9%) **
The result was something like "I increased the data and the accuracy of the analysis improved! I did it!" After all, I was wondering how effective each of the deformed images was, so Roughly speaking, I changed the breakdown of the data to be trained and tried to verify the correct answer rate. I was able to confirm that the correct answer rate was high enough to give variation to the image.
I would like to continue to verify whether other methods ("trimming", "noise" ...) will further improve the accuracy. In addition, this time I will verify the pattern in which the number of nodes in the hidden layer of the neural network was changed, which was fixed at 200.
Thank you for reading!
Recommended Posts