I tried multiple OCRs in Python to recognize and solve Sudoku in Pyhton. It recognizes 81 blank sheets or png images with one number written on them. OCR is ・ Scikit-learn ・ Google CloudVision API ・ Tesseract I tried three of them.
This time, we will recognize the following 81 images that cut out the problem of Sudoku. It is in a folder called "img".
scikit-learn For the time being, it was easy to use this for recognizing numbers, so I tried using it. It seems to be a relatively standard library for machine learning. All the blank results are 1, and I didn't know how to recognize the blank (the recognition result is no match). I may be able to do it if I investigate a little more, but I am not specialized in machine learning, The match rate of other parts is not so high, so I will just use it for the time being.
scikit-learn-ocr.py
import copy
from sklearn import datasets, model_selection, svm, metrics
import numpy as np
from PIL import Image
digits = datasets.load_digits()
row_list = []
res_list = []
for x in range(1, 82):
image = Image.open("img/{}.png ".format(x)).convert('L')
image = image.resize((8, 8), Image.ANTIALIAS)
img = np.asarray(image, dtype=float)
img = np.floor(16 - 16 * (img / 256))
img = img.flatten()
data_train = digits.data
label_train = digits.target
data_test = img
label_test = list(range(0,10))
clf = svm.SVC(gamma=0.001)
clf.fit(data_train, label_train)
text = clf.predict([data_test])[0]
if text == "":
row_list.append("-")
else:
row_list.append(text)
if x%9 == 0:
res_list.append(copy.deepcopy(row_list))
row_list = []
for l in res_list:
print(l)
[7, 1, 1, 1, 4, 1, 1, 1, 1]
[9, 1, 1, 1, 7, 7, 1, 1, 9]
[1, 1, 1, 9, 9, 1, 1, 7, 1]
[1, 9, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 7, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 4, 1, 1, 1]
[1, 8, 1, 1, 1, 1, 1, 1, 1]
[2, 1, 1, 4, 8, 9, 1, 4, 4]
[1, 1, 1, 1, 6, 1, 1, 1, 1]
CloudVisionAPI Speaking of OCR, this is what I tried. However, with this requirement, *** has become an uncool specification that hits the API 81 times in one process ***. To execute it, you need to get the license file and pass the environment variables by referring to the following. Optical Character Recognition (OCR)
I tried hitting it by default, but the recognition result was not good. If you specify the json of the request in detail, the recognition rate may increase a little more. It would be nice to be able to throw a hint like "The recognition target is a one-letter number", Apparently there was no such item. I'm worried about the number that I can't recognize, but it seems that "." May appear after the number. What are you misunderstanding? ..
cloud-vision-ocr.py
import copy
from google.cloud import vision
from pathlib import Path
client = vision.ImageAnnotatorClient()
row_list = []
res_list = []
for x in range(1, 82):
p = Path(__file__).parent / "img/{}.png ".format(x)
with p.open('rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
response = client.text_detection(image=image)
if len(response.text_annotations) == 0:
row_list.append("-")
for text in response.text_annotations:
row_list.append(text.description)
break
if x%9 == 0:
res_list.append(copy.deepcopy(row_list))
row_list = []
for l in res_list:
print(l)
['-', '-', '-', '-', '-', '-', '-', '-', '-']
['5', '-', '-', '8.', '2', '-', '-', '-', '9.']
['-', '-', '-', '3', '-', '-', '-', '-', '-']
['-', '3', '-', '6.', '-', '-', '-', '-', '-']
['-', '-', '-', '-', '-', '-', '2', '-', '-']
['-', '-', '-', '-', '-', '-', '-', '-', '-']
['-', '-', '-', '-', '-', '2', '-', '-', '-']
['2', '-', '-', '9.', '-', '3', '-', '4', '-']
['-', '-', '-', '-', '-', '-', '-', '-', '-']
Tesseract Finally, I tried using Tesseract from pyocr. It seems that Google is developing it as open source (?). In conclusion, this worked, and I was able to get it with a recognition rate of 100%. However, I did not recognize it well when I used it normally, so I will write about that as well.
tesseract-ocr.py
import copy
from PIL import Image
import sys
import pyocr
import pyocr.builders
tools = pyocr.get_available_tools()
if len(tools) == 0:
print("No OCR tool found")
sys.exit(1)
tool = tools[0]
lang = 'eng'
row_list = []
res_list = []
for x in range(1, 82):
text = tool.image_to_string(
Image.open("img/{}.png ".format(x)),
lang=lang,
# builder=pyocr.builders.DigitBuilder()
builder=pyocr.builders.TextBuilder(tesseract_layout=6)
)
if text == "":
row_list.append("-")
else:
row_list.append(text)
if x%9 == 0:
res_list.append(copy.deepcopy(row_list))
row_list = []
for l in res_list:
print(l)
['7', '-', '-', '-', '9', '-', '-', '-', '-']
['5', '1', '-', '8', '2', '7', '-', '-', '9']
['-', '-', '-', '3', '5', '-', '-', '7', '-']
['-', '3', '-', '6', '-', '-', '-', '-', '1']
['-', '-', '7', '-', '-', '-', '2', '-', '-']
['1', '-', '-', '-', '-', '4', '-', '6', '-']
['-', '6', '-', '-', '4', '2', '-', '-', '-']
['2', '-', '-', '9', '8', '3', '-', '4', '6']
['-', '-', '-', '-', '6', '-', '-', '-', '7']
As a result, it worked, but it was a parameter ・ Lang ・ Builder ・ Tesseract_layout Something went wrong with the settings around, so I'll write about that. Regarding lang, "eng" is better than "jpn" for recognizing numbers. Probably because "eng" has a smaller amount of characters as a parameter.
As for the builder
TextBuilder Recognize strings WordBoxBuilder Character recognition word by word LineBoxBuilder Character recognition line by line DigitBuilder Recognize numbers / symbols DigitLineBoxBuilder Recognize numbers / symbols
It seems that DigitBuilder seems to be better for this requirement, Apparently it doesn't work with new engines version 4.0 or later.
Regarding tesseract_layout,
tesseract_layout (pagesegmode) pagesegmode values are: 0 = Orientation and script detection (OSD) only. 1 = Automatic page segmentation with OSD. 2 = Automatic page segmentation, but no OSD, or OCR 3 = Fully automatic page segmentation, but no OSD. (Default) 4 = Assume a single column of text of variable sizes. 5 = Assume a single uniform block of vertically aligned text. 6 = Assume a single uniform block of text. 7 = Treat the image as a single text line. 8 = Treat the image as a single word. 9 = Treat the image as a single word in a circle. 10 = Treat the image as a single character.
So I used 6 this time.
scikit-learn -First Python-Handwritten Number Recognition with Machine Learning (MNIST) ・ Recognizing handwritten numbers with SVM from Scikit learn ・ [Python] I tried to judge handwritten numbers with scikit-learn -Classify MNIST handwritten digit data with scikit-learn SVM ・ First machine learning, write it yourself and try to identify the numbers correctly
GoogleCloudVisionAPI -How to recognize characters from Python using OCR of Google Cloud Vision API -Optical character recognition with Google Cloud Vision API ・ Try OCR with Python using Google Cloud Vision
Tesseract -Character recognition with Python and Tesseract OCR ・ Try simple OCR with Tesseract + PyOCR -How to execute OCR in Python -Basic usage of Tesseract 4 written in Python. How to run OCR from API and CLI ・ When I ocr with Python + pyocr, I recognized single digit numbers with high recognition rate -[Python] How to transcribe an image and convert it to text (tesseract-OCR, pyocr) ・ [Python] Read the expiration date with OCR (tesseract-ocr / pyocr) (image → sequence) [Home IT # 19]
Recommended Posts