A record of struggling to do something with Python and OCR (Tesseract-OCR is used for OCR)

First, let's read a whole sheet of the object to be read As a result of trying hard to read things like stamps and photos, I do not understand the meaning Even if I can read it, I do not know the data and the break between the data Because of that, it's useless. Therefore, I decided to cut out only the necessary parts and read them.

If you write the following code, you can place it anywhere from the image file You can cut out and save the cut out image with a different name.

   from PIL import Image

#Open image with PIL img_trim = Image.open ('original image file name') #Cut out the specified coordinates img_trim.crop ((x1, y1, x2, y2)). save ('Save name of cropped image')

At this rate, it is necessary to adjust the coordinates many times to cut out the desired part. Let's create an application that acquires coordinates by operating the mouse in order to streamline the work of sneaking coordinates. There are various ways to create a GUI such as C # and VB, but this time I will try Kivy, a mechanism for creating a Python GUI. For the installation work, I should have installed Kivy with pip. .. .. (Details omitted)

UI part definition (main.kv):

#:import hex_color kivy.utils.get_color_from_hex
<ImageWidget>:
    canvas.before:
    Color:
        rgb: 1,1,1
    Rectangle:
        pos: self.pos
        size: self.size
BoxLayout:
    orientation: 'horizontal'
    height: root.height
    width: root.width

    Image:
        id: img
        allow_stretch: True
        source: root.image_src

    BoxLayout:
        size: root.size
        orientation: 'vertical'
        width: 200

        Label:
            id: lbl_file_name
            color: 0, 0, 0, 1
            font_size: 20
            background_color: hex_color('#000000')
        Label:
            id: lbl_result
            color: 0, 0, 0, 1
            font_size: 20

It is written like a simplified version of HTML Then the source of the main body (main.py):

from kivy.app import App

from kivy.core.text import LabelBase, DEFAULT_FONT #Additions from kivy.config import Config from kivy.resources import resource_add_path #addition from kivy.properties import StringProperty from kivy.uix.widget import Widget from kivy.graphics import Line from kivy.graphics import Color from kivy.utils import get_color_from_hex from PIL import Image import math import os import pyocr import pyocr.builders

resource_add_path ('c: / Windows / Fonts') #Addition LabelBase.register (DEFAULT_FONT,'msgothic.ttc') #Addition

Config.set('graphics', 'width', '1224')
Config.set('graphics', 'height', '768')  # 16:9

class ImageWidget(Widget):
    image_src = StringProperty('')

def __init__(self, **kwargs):
    super().__init__(**kwargs)
    self.image_src = 'read_img/0112-3.png'

self.ids.lbl_file_name.text = "filename: \ n {}" .format (self.image_src) self.lines = []

def on_touch_down(self, touch):
    self.x1 = touch.x
    self.y1 = touch.y
    self.x2 = None
    self.y2 = None

def on_touch_move(self, touch):
    img = self.ids.img
    if touch.x > img.width:
        self.x2 = img.width
    else:
        self.x2 = touch.x
    if touch.y > img.height:
        self.y2 = 0
    else:
        self.y2 = touch.y

    for line in self.lines:
        self.canvas.remove(line)
    self.lines = []

    with self.canvas:

#Settings for red line Color(100, 0, 0) touch.ud['line'] = Line(points=[self.x1, self.y1, self.x2, self.y1, self.x2, self.y2, self.x1, self.y2], close='True') self.lines.append(touch.ud['line'])

Settings for making a dashed line

        Color(1, 1, 1)
        touch.ud['line'] = Line(points=[self.x1, self.y1, self.x2, self.y1,
                                        self.x2, self.y2, self.x1, self.y2],
                                dash_offset=5, dash_length=3,
                                close='True')
        self.lines.append(touch.ud['line'])

def on_touch_up(self, touch):

Exit if # touch_move event has not occurred if self.x2 is None: return

#Initialization process: #Get an IMG object img = self.ids.img

Find the size of the resized image:

    vs = img.norm_image_size

#Open image with PIL img_trim = Image.open(self.image_src)

Get the size of the image

    rs = img_trim.size

#Calculate image scale ratio = rs[0] / vs[0]

Find the value of padding applied:

MEMO Assuming center alignment (image object size-display size) / 2

    px = 0
    py = 0
    if img.width > vs[0]:
        px = (img.width - vs[0]) / 2
    if img.height > vs[1]:
        py = (img.height - vs[1]) / 2

Remove padding from IMG objects

    x1 = (self.x1 - px) * ratio
    x2 = (self.x2 - px) * ratio
    y1 = (img.height - self.y1 - py) * ratio
    y2 = (img.height - self.y2 - py) * ratio

Sort the coordinates of the cutout position from small to large

    if x1 < x2:
        real_x1 = math.floor(x1)
        real_x2 = math.ceil(x2)
    else:
        real_x1 = math.floor(x2)
        real_x2 = math.ceil(x1)
    if y1 < y2:
        real_y1 = math.floor(y1)
        real_y2 = math.ceil(y2)
    else:
        real_y1 = math.floor(y2)
        real_y2 = math.ceil(y1)

#Cut out the specified coordinates img_trim.crop((real_x1, real_y1, real_x2, real_y2)).save('write_img/test.png')

Read text from image

    self.read_image_to_string()

def read_image_to_string(self):
    try:

1. Pass the installed Tesseract path

        path_tesseract = r"C:\Program Files\Tesseract-OCR"
        if path_tesseract not in os.environ["PATH"].split(os.pathsep):
            os.environ["PATH"] += os.pathsep + path_tesseract

1. Acquisition of OCR engine

        tools = pyocr.get_available_tools()
        tool = tools[0]

2. Reading the original image

        img = Image.open("write_img/test.png ")

3. OCR execution

        builder = pyocr.builders.TextBuilder(tesseract_layout=6)
        result = tool.image_to_string(img, lang="jpn", builder=builder)

self.ids.lbl_result.text = f "Reading result: \ n {result}" print(result) except Exception as ex: print(ex) self.ids.lbl_result.text = f "Read result: \ nFailure"

class MainApp(App):
    def __init__(self, **kwargs):
        super(MainApp, self).__init__(**kwargs)

self.title ='test'

def build(self):
    return ImageWidget()

if __name__ == '__main__':
    app = MainApp()
    app.run()

Program flow: Get the point you clicked when you clicked Continue drawing the square frame while dragging Get the coordinates of the end when you unclick The image is cut out and OCR is made to read the image.

point: Clicked coordinates on the screen cannot be used even if applied to the original image It is necessary to calculate the actual coordinates in consideration of the contrast. It is necessary to consider that padding is included in the image object on the GUI. It should also be considered that the mouse drag direction surrounds the upper left

As a result of reading OCR, ... It is not very accurate if executed without any adjustment. It seems that we may adjust the parameters and do various things. In the first place, there may be a problem with the performance of OCR. In the future, I would like to select an OCR engine, etc.

Cut out an image with python