This article is Sudoku solved with Python, [Recognize the grid from images with Python (Sudoku)](https://qiita. It is a continuation of com / hatt_takumi / items / 47a46d5e85223a41afa4)

Python OCR recognizes single-letter numbers

I tried multiple OCRs in Python to recognize and solve Sudoku in Pyhton. It recognizes 81 blank sheets or png images with one number written on them. OCR is ・ Scikit-learn ・ Google CloudVision API ・ Tesseract I tried three of them.

Image to recognize

This time, we will recognize the following 81 images that cut out the problem of Sudoku. It is in a folder called "img".

scikit-learn For the time being, it was easy to use this for recognizing numbers, so I tried using it. It seems to be a relatively standard library for machine learning. All the blank results are 1, and I didn't know how to recognize the blank (the recognition result is no match). I may be able to do it if I investigate a little more, but I am not specialized in machine learning, The match rate of other parts is not so high, so I will just use it for the time being.

Source code

`scikit-learn-ocr.py`


import copy
from sklearn import datasets, model_selection, svm, metrics
import numpy as np
from PIL import Image

digits = datasets.load_digits()

row_list = []
res_list = []
for x in range(1, 82):
    image = Image.open("img/{}.png ".format(x)).convert('L')
    image = image.resize((8, 8), Image.ANTIALIAS)
    img = np.asarray(image, dtype=float)
    img = np.floor(16 - 16 * (img / 256))
    img = img.flatten()
    data_train = digits.data
    label_train = digits.target
    data_test = img
    label_test = list(range(0,10))
    clf = svm.SVC(gamma=0.001)
    clf.fit(data_train, label_train)
    text = clf.predict([data_test])[0]
    if text == "":
        row_list.append("-")
    else:
        row_list.append(text)
    if x%9 == 0:
        res_list.append(copy.deepcopy(row_list))
        row_list = []
for l in res_list:
    print(l)

result

[7, 1, 1, 1, 4, 1, 1, 1, 1]
[9, 1, 1, 1, 7, 7, 1, 1, 9]
[1, 1, 1, 9, 9, 1, 1, 7, 1]
[1, 9, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 7, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 4, 1, 1, 1]
[1, 8, 1, 1, 1, 1, 1, 1, 1]
[2, 1, 1, 4, 8, 9, 1, 4, 4]
[1, 1, 1, 1, 6, 1, 1, 1, 1]

CloudVisionAPI Speaking of OCR, this is what I tried. However, with this requirement, *** has become an uncool specification that hits the API 81 times in one process ***. To execute it, you need to get the license file and pass the environment variables by referring to the following. Optical Character Recognition (OCR)

I tried hitting it by default, but the recognition result was not good. If you specify the json of the request in detail, the recognition rate may increase a little more. It would be nice to be able to throw a hint like "The recognition target is a one-letter number", Apparently there was no such item. I'm worried about the number that I can't recognize, but it seems that "." May appear after the number. What are you misunderstanding? ..

Source code

`cloud-vision-ocr.py`


import copy
from google.cloud import vision
from pathlib import Path

client = vision.ImageAnnotatorClient()
row_list = []
res_list = []

for x in range(1, 82):
    p = Path(__file__).parent / "img/{}.png ".format(x)
    with p.open('rb') as image_file:
        content = image_file.read()
    image = vision.types.Image(content=content)
    response = client.text_detection(image=image)
    if len(response.text_annotations) == 0:
        row_list.append("-")
    for text in response.text_annotations:
        row_list.append(text.description)
        break
    if x%9 == 0:
        res_list.append(copy.deepcopy(row_list))
        row_list = []

for l in res_list:
    print(l)

result

['-', '-', '-', '-', '-', '-', '-', '-', '-']
['5', '-', '-', '8.', '2', '-', '-', '-', '9.']
['-', '-', '-', '3', '-', '-', '-', '-', '-']
['-', '3', '-', '6.', '-', '-', '-', '-', '-']
['-', '-', '-', '-', '-', '-', '2', '-', '-']
['-', '-', '-', '-', '-', '-', '-', '-', '-']
['-', '-', '-', '-', '-', '2', '-', '-', '-']
['2', '-', '-', '9.', '-', '3', '-', '4', '-']
['-', '-', '-', '-', '-', '-', '-', '-', '-']

Tesseract Finally, I tried using Tesseract from pyocr. It seems that Google is developing it as open source (?). In conclusion, this worked, and I was able to get it with a recognition rate of 100%. However, I did not recognize it well when I used it normally, so I will write about that as well.

Source code

`tesseract-ocr.py`


import copy
from PIL import Image
import sys
import pyocr
import pyocr.builders

tools = pyocr.get_available_tools()
if len(tools) == 0:
    print("No OCR tool found")
    sys.exit(1)

tool = tools[0]
lang = 'eng'

row_list = []
res_list = []

for x in range(1, 82):
    text = tool.image_to_string(
    Image.open("img/{}.png ".format(x)),
    lang=lang,
    # builder=pyocr.builders.DigitBuilder()
    builder=pyocr.builders.TextBuilder(tesseract_layout=6)
    )
    if text == "":
        row_list.append("-")
    else:
        row_list.append(text)
    if x%9 == 0:
        res_list.append(copy.deepcopy(row_list))
        row_list = []

for l in res_list:
    print(l)

result

['7', '-', '-', '-', '9', '-', '-', '-', '-']
['5', '1', '-', '8', '2', '7', '-', '-', '9']
['-', '-', '-', '3', '5', '-', '-', '7', '-']
['-', '3', '-', '6', '-', '-', '-', '-', '1']
['-', '-', '7', '-', '-', '-', '2', '-', '-']
['1', '-', '-', '-', '-', '4', '-', '6', '-']
['-', '6', '-', '-', '4', '2', '-', '-', '-']
['2', '-', '-', '9', '8', '3', '-', '4', '6']
['-', '-', '-', '-', '6', '-', '-', '-', '7']

About tesseract-ocr

As a result, it worked, but it was a parameter ・ Lang ・ Builder ・ Tesseract_layout Something went wrong with the settings around, so I'll write about that. Regarding lang, "eng" is better than "jpn" for recognizing numbers. Probably because "eng" has a smaller amount of characters as a parameter.

As for the builder

TextBuilder Recognize strings WordBoxBuilder Character recognition word by word LineBoxBuilder Character recognition line by line DigitBuilder Recognize numbers / symbols DigitLineBoxBuilder Recognize numbers / symbols

It seems that DigitBuilder seems to be better for this requirement, Apparently it doesn't work with new engines version 4.0 or later.

Regarding tesseract_layout,

tesseract_layout (pagesegmode) pagesegmode values are: 0 = Orientation and script detection (OSD) only. 1 = Automatic page segmentation with OSD. 2 = Automatic page segmentation, but no OSD, or OCR 3 = Fully automatic page segmentation, but no OSD. (Default) 4 = Assume a single column of text of variable sizes. 5 = Assume a single uniform block of vertically aligned text. 6 = Assume a single uniform block of text. 7 = Treat the image as a single text line. 8 = Treat the image as a single word. 9 = Treat the image as a single word in a circle. 10 = Treat the image as a single character.

So I used 6 this time.

reference

scikit-learn -First Python-Handwritten Number Recognition with Machine Learning (MNIST) ・ Recognizing handwritten numbers with SVM from Scikit learn ・ [Python] I tried to judge handwritten numbers with scikit-learn -Classify MNIST handwritten digit data with scikit-learn SVM ・ First machine learning, write it yourself and try to identify the numbers correctly

GoogleCloudVisionAPI -How to recognize characters from Python using OCR of Google Cloud Vision API -Optical character recognition with Google Cloud Vision API ・ Try OCR with Python using Google Cloud Vision

Tesseract -Character recognition with Python and Tesseract OCR ・ Try simple OCR with Tesseract + PyOCR -How to execute OCR in Python -Basic usage of Tesseract 4 written in Python. How to run OCR from API and CLI ・ When I ocr with Python + pyocr, I recognized single digit numbers with high recognition rate -[Python] How to transcribe an image and convert it to text (tesseract-OCR, pyocr) ・ [Python] Read the expiration date with OCR (tesseract-ocr / pyocr) (image → sequence) [Home IT # 19]