Introduction

When I was looking for an article that uses Selenium, I found an article about automating sushi. The method is basically as follows ・ When you start the game, keep entering all the keys ・ When you start the game, take a screenshot and enter the character string obtained by OCR.

Since the game screen is drawn on the Canvas element for sushi, you cannot get the character string directly.

This time I tried simple image processing using OpenCV as OCR part and pre-processing

Advance preparation

Installation of tesseract

tesseract is an OCR engine. This time I will run this OCR engine with python's pyocr module Installation is completed with the following command

$ brew install tesseract

Since there is no test data for Japanese as it is, download it from the following URL https://github.com/tesseract-ocr/tessdata ↑ Download jpn.traineddata from this URL to / usr / local / share / tessdata /

Install pyocr and OpenCV

Execute the following command in the terminal to complete

$ pip3 install pyocr
$ pip3 install opencv-python

I will try OCR for the time being

Image preparation

The image for the test is below ↓ Trimming

Save the trimmed version as test.png

OCR with pyocr

import cv2
import pyocr
from PIL import Image
image = "test.png "

img = cv2.imread(image)
tools = pyocr.get_available_tools()
if len(tools) == 0:
    print("No OCR tool found")
    sys.exit(1)
tool = tools[0]
res = tool.image_to_string(
    Image.open("test.png ")
    ,lang="eng")

print(res)

Execution result Not recognized correctly at all ... After all it seems that pre-processing is necessary

Try to touch OpenCV

I want to preprocess with OpenCV, but I'm new to OpenCV so I'll play with it Try to process your own icon image

import sys
import cv2
import pyocr
import numpy as np
from PIL import Image
image = "test_1.png "
name = "test_1"

#original
img = cv2.imread(image)

#gray
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imwrite(f"1_{name}_gray.png ",img)

#goussian
img = cv2.GaussianBlur(img, (5, 5), 0)
cv2.imwrite(f"2_{name}_gaussian.png ",img)

#threshold
img = cv2.adaptiveThreshold(
    img
    , 255
    , cv2.ADAPTIVE_THRESH_GAUSSIAN_C
    , cv2.THRESH_BINARY
    , 11
    , 2
)
cv2.imwrite(f"3_{name}_threshold.png ",img)

The image in the processing process looks like this 画像処理.png

OpenCV + OCR Preprocess the image used in OCR with OpenCV and try OCR again In the following, grayscale → threshold processing → color inversion is performed as preprocessing.

import sys
import cv2
import pyocr
import numpy as np
from PIL import Image
image = "test.png "
name = "test"

#original
img = cv2.imread(image)
cv2.imwrite(f"1_{name}_original.png ",img)

#gray
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imwrite(f"2_{name}_gray.png ",img)

#threshold
th = 140
img = cv2.threshold(
    img
    , th
    , 255
    , cv2.THRESH_BINARY
)[1]
cv2.imwrite(f"3_{name}_threshold_{th}.png ",img)

#bitwise
img = cv2.bitwise_not(img)
cv2.imwrite(f"4_{name}_bitwise.png ",img)

cv2.imwrite("target.png ",img)

tools = pyocr.get_available_tools()
if len(tools) == 0:
    print("No OCR tool found")
    sys.exit(1)
tool = tools[0]
res = tool.image_to_string(
    Image.open("target.png ")
    ,lang="eng")

print(res)

事前処理.png