A record of struggling to do something with Python and OCR (Tesseract-OCR is used for OCR)
First, let's read a whole sheet of the object to be read As a result of trying hard to read things like stamps and photos, I do not understand the meaning Even if I can read it, I do not know the data and the break between the data Because of that, it's useless. Therefore, I decided to cut out only the necessary parts and read them.
If you write the following code, you can place it anywhere from the image file You can cut out and save the cut out image with a different name.
from PIL import Image
#Open image with PIL img_trim = Image.open ('original image file name') #Cut out the specified coordinates img_trim.crop ((x1, y1, x2, y2)). save ('Save name of cropped image')
At this rate, it is necessary to adjust the coordinates many times to cut out the desired part. Let's create an application that acquires coordinates by operating the mouse in order to streamline the work of sneaking coordinates. There are various ways to create a GUI such as C # and VB, but this time I will try Kivy, a mechanism for creating a Python GUI. For the installation work, I should have installed Kivy with pip. .. .. (Details omitted)
UI part definition (main.kv):
#:import hex_color kivy.utils.get_color_from_hex
<ImageWidget>:
canvas.before:
Color:
rgb: 1,1,1
Rectangle:
pos: self.pos
size: self.size
BoxLayout:
orientation: 'horizontal'
height: root.height
width: root.width
Image:
id: img
allow_stretch: True
source: root.image_src
BoxLayout:
size: root.size
orientation: 'vertical'
width: 200
Label:
id: lbl_file_name
color: 0, 0, 0, 1
font_size: 20
background_color: hex_color('#000000')
Label:
id: lbl_result
color: 0, 0, 0, 1
font_size: 20
It is written like a simplified version of HTML Then the source of the main body (main.py):
from kivy.app import App
from kivy.core.text import LabelBase, DEFAULT_FONT #Additions from kivy.config import Config from kivy.resources import resource_add_path #addition from kivy.properties import StringProperty from kivy.uix.widget import Widget from kivy.graphics import Line from kivy.graphics import Color from kivy.utils import get_color_from_hex from PIL import Image import math import os import pyocr import pyocr.builders
resource_add_path ('c: / Windows / Fonts') #Addition LabelBase.register (DEFAULT_FONT,'msgothic.ttc') #Addition
Config.set('graphics', 'width', '1224')
Config.set('graphics', 'height', '768') # 16:9
class ImageWidget(Widget):
image_src = StringProperty('')
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.image_src = 'read_img/0112-3.png'
self.ids.lbl_file_name.text = "filename: \ n {}" .format (self.image_src) self.lines = []
def on_touch_down(self, touch):
self.x1 = touch.x
self.y1 = touch.y
self.x2 = None
self.y2 = None
def on_touch_move(self, touch):
img = self.ids.img
if touch.x > img.width:
self.x2 = img.width
else:
self.x2 = touch.x
if touch.y > img.height:
self.y2 = 0
else:
self.y2 = touch.y
for line in self.lines:
self.canvas.remove(line)
self.lines = []
with self.canvas:
#Settings for red line Color(100, 0, 0) touch.ud['line'] = Line(points=[self.x1, self.y1, self.x2, self.y1, self.x2, self.y2, self.x1, self.y2], close='True') self.lines.append(touch.ud['line'])
Color(1, 1, 1)
touch.ud['line'] = Line(points=[self.x1, self.y1, self.x2, self.y1,
self.x2, self.y2, self.x1, self.y2],
dash_offset=5, dash_length=3,
close='True')
self.lines.append(touch.ud['line'])
def on_touch_up(self, touch):
Exit if # touch_move event has not occurred if self.x2 is None: return
#Initialization process: #Get an IMG object img = self.ids.img
vs = img.norm_image_size
#Open image with PIL img_trim = Image.open(self.image_src)
rs = img_trim.size
#Calculate image scale ratio = rs[0] / vs[0]
px = 0
py = 0
if img.width > vs[0]:
px = (img.width - vs[0]) / 2
if img.height > vs[1]:
py = (img.height - vs[1]) / 2
x1 = (self.x1 - px) * ratio
x2 = (self.x2 - px) * ratio
y1 = (img.height - self.y1 - py) * ratio
y2 = (img.height - self.y2 - py) * ratio
if x1 < x2:
real_x1 = math.floor(x1)
real_x2 = math.ceil(x2)
else:
real_x1 = math.floor(x2)
real_x2 = math.ceil(x1)
if y1 < y2:
real_y1 = math.floor(y1)
real_y2 = math.ceil(y2)
else:
real_y1 = math.floor(y2)
real_y2 = math.ceil(y1)
#Cut out the specified coordinates img_trim.crop((real_x1, real_y1, real_x2, real_y2)).save('write_img/test.png')
self.read_image_to_string()
def read_image_to_string(self):
try:
path_tesseract = r"C:\Program Files\Tesseract-OCR"
if path_tesseract not in os.environ["PATH"].split(os.pathsep):
os.environ["PATH"] += os.pathsep + path_tesseract
tools = pyocr.get_available_tools()
tool = tools[0]
img = Image.open("write_img/test.png ")
builder = pyocr.builders.TextBuilder(tesseract_layout=6)
result = tool.image_to_string(img, lang="jpn", builder=builder)
self.ids.lbl_result.text = f "Reading result: \ n {result}" print(result) except Exception as ex: print(ex) self.ids.lbl_result.text = f "Read result: \ nFailure"
class MainApp(App):
def __init__(self, **kwargs):
super(MainApp, self).__init__(**kwargs)
self.title ='test'
def build(self):
return ImageWidget()
if __name__ == '__main__':
app = MainApp()
app.run()
Program flow: Get the point you clicked when you clicked Continue drawing the square frame while dragging Get the coordinates of the end when you unclick The image is cut out and OCR is made to read the image.
point: Clicked coordinates on the screen cannot be used even if applied to the original image It is necessary to calculate the actual coordinates in consideration of the contrast. It is necessary to consider that padding is included in the image object on the GUI. It should also be considered that the mouse drag direction surrounds the upper left
As a result of reading OCR, ... It is not very accurate if executed without any adjustment. It seems that we may adjust the parameters and do various things. In the first place, there may be a problem with the performance of OCR. In the future, I would like to select an OCR engine, etc.
Recommended Posts