I tried to discriminate a 6-digit number with a number discrimination application made with python

Nice to meet you. My name is dev.

This is my first time posting to Qiita. I am writing an article in the hope that this post will be useful to someone.

This time, I would like to introduce a number discrimination application that uses OCR.

About the app

It's very simple, but it's a web application that determines the input image by OCR (Google cloud vision api) and returns the answer.

Reason for creation

It would be convenient if you could read an analog meter like a car speedometer. It seems that the service has already been provided, but it all started when I thought, "I want to do something a little closer."

But reading the meter seems to be difficult. Since this is the first app, it is a challenge to identify the numbers by saying "Let's read the analog mileage numbers first!".

Basic functions

If you select an image of 6-digit numbers and click it, the read numbers will be returned to the Web screen.

For example, you can read such an image.

What I felt when I made it was Using "Google cloud vision api", you can easily create a high-precision app like this! That's right.

It's easy, but the accuracy is GOOD (le)!

Moreover, not only numbers but also letters can be judged.

So this is also OK

図1.png あ1.png

But what can it be used for?

It's a play app, so you can't use it for anything as it is.

As an advanced form, I think it can also be used for "reading the serial number of document NO" and "reading slips".

You can realize the OCR function with "Tesseract" and other free software without using "Google cloud vision api".

Implementation environment

html css Flask

Choices

To read the numbers, I considered the following two things. ・ Learning data of mnist ・ OCR

Train using mnist dataset

For mnist, it's relatively easy to train. However, in order to express the number of digits, it is necessary to detect objects such as the first and second digits.

The training data can be saved in the following ways. ■ [Reference]

`Sample code for mnist learning`


from keras.datasets import mnist
from keras.models import Sequential, load_model
from keras.layers.core import Dense, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D, Dropout, Reshape
from keras.utils import np_utils
import numpy as np
(X_train, y_train),(X_test, y_test) = mnist.load_data()
X_train = np.array(X_train)/255
X_test = np.array(X_test)/255
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
model = Sequential()
model.add(Reshape((28,28,1),input_shape=(28,28)))
model.add(Conv2D(32,(3,3)))
model.add(Activation("relu"))
model.add(Conv2D(32,(3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.5))
model.add(Conv2D(16,(3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(784))
model.add(Activation("relu"))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation("softmax"))
model.compile(loss="categorical_crossentropy", optimizer="sgd", metrics=["accuracy"])
hist = model.fit(X_train, y_train, batch_size=200,
                 verbose=1, epochs=1, validation_split=0.1)
score = model.evaluate(X_test, y_test, verbose=1)
print('Test loss:', score[0])
print("test accuracy：", score[1])
model.save("C:/test/mnist_main.h5")

Use OCR to determine

OCR is the quickest way to do it. Since it uses Google's API, it is highly accurate and does not need to be created. It reads without worrying about the number of digits.

You can use it enough depending on the purpose!

For the implementation of "Google cloud vision api", refer to here.

Accuracy

I checked a 1000-character number to check the accuracy. ![1000.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/969670/e4dba485-359e-b3d8-c1e5-e5c1c32ff511.png)

■ Conditions Image size: 1,024x768 Font: Yu Gothic Font size: 16x23pixel (WH) Character spacing: 5 pixels Line spacing: 11pix3l

■ Results Accuracy: 100%

Even if the font size was halved, it was 100%.

I was able to get high accuracy, so next I tried changing the font under the same conditions.

■ Results msp gothic: 100% msp Mincho: 100% fugaz one:100% ink free:99.9% np b:99.6%

All are highly accurate, but the "np b" font is less accurate than the others. Why?

The cause is in the form of "1". There were places that were recognized as "| (pipe)" and "I (eye)".

■ Other 1 ink free is a handwritten character like the one below, but the accuracy was as high as 99.9%, so it may be compatible with standard fonts.

■ Other 2 I also tried the following characters with 1 pixel spacing and line spacing, but the result was 100%.

Character size: 12x14pixel (WH)

Yup. You can use it enough!

There is a lot of information on the web about accuracy, so let's search for it.

Articles that I referred to
(https://qiita.com/saken649/items/4bfd215bf943c36a52ab "Differences in character identification by images") (https://qiita.com/se_fy/items/963b295bbd13101c044b "Throughput by image size")

About Google Cloud Vision API

The setting itself is very easy. You can use it if you get the API key. * Please note that if you do not enable the billing settings, an error will be returned and you will not be able to use it.

The price itself seems to be quite cheap. Free up to 1,000 times a month (unit). After that, 1.5 $ for every 1,000 units. The price changes according to the number of times range.

It seems that you can operate it with just pocket money. However, let's enable the alert setting of the usage fee just in case. (accident prevention)

Future development

I think it would be useful to be able to implement a function to read the document number according to the business. I will try to learn by looking at my free time.