I wondered if I could easily make an OCR application that starts on the desktop, so I made it.
· MacOS Catalina 10.15.4 ・ Visual Studio Code ・ Python3.8.1
pip install pysimplegui
2. Tesseract OCR(OCR)
brew install tesseract
sudo pip3 install pyocr
sudo pip3 install pillow
● What are these libraries? Including that, I have greatly referred to the following sites for OCR (thanks). Try simple OCR with Tesseract + PyOCR
● This may be easier to understand for Windows users. [Python] How to transcribe an image and convert it to text (tesseract-OCR, pyocr)
Now let's see how to write the code to read the characters from the image (importing the library is included in the whole source code at the end, so I will omit it here).
def scan_file_to_str(file_path, langage):
"""read_file_to_str
Generate a string from an image file
Args:
file_path(str):File path to read
langage(str): 'jpn'Or'eng'
Returns:
Read character string
"""
tools = pyocr.get_available_tools()
if len(tools) == 0:
print("No OCR tool found")
sys.exit(1)
tool = tools[0]
text = tool.image_to_string(
#Open the file sent as an argument
Image.open(file_path),
#Specify the language sent as an argument('jpn'Or'eng')
lang=langage,
builder=pyocr.builders.TextBuilder(tesseract_layout=6)
)
#Finally returns the string read from the image
return text
It's really surprising that you can read a character string from an image in just 15 lines. I was impressed.
Next, I will put this on the GUI. I think tkinter is famous when it comes to Python GUI. I used to write code using tkinter at first, but when I was doing the research, I came across the following article.
[If you use Tkinter, try using PySimpleGUI](https://qiita.com/dario_okazaki/items/656de21cab5c81cabe59#exe%E5%8C%96%E3%81%AB%E3%81%A4 % E3% 81% 84% E3% 81% A6)
I was also impressed by the fact that the GUI could be implemented with simple code, so I decided to use it.
Here is the code for the GUI part.
#Set theme(There are many themes)
sg.theme('Light Grey1')
#Where and what to place(I think it will be easier to assemble if you know that it is arranged in units of lines.)
layout = [
#The first line(Text:Put the text)
[sg.Text('File to read(Multiple selections possible)', font=('IPA Gothic', 16))],
#2nd line(InputText:Text box, FilesBrowse:File dialog)
[sg.InputText(font=('IPA Gothic', 14), size=(70, 10),), sg.FilesBrowse('Select files', key='-FILES-'),],
#3rd line(Text:text, Radio:Radio button x 2)
[sg.Text('Language to read', font=('IPA Gothic', 16)),
sg.Radio('Japanese', 1, key='-jpn-', font=('IPA Gothic', 10)),
sg.Radio('English', 1, key='-eng-', font=('IPA Gothic', 10))],
#4th line(Button:button)
[sg.Button('Read execution'),],
#5th line(MLine:100 columns x 30 rows textarea)
[sg.MLine(font=('IPA Gothic', 14), size=(100,30), key='-OUTPUT-'),]
]
#Get window(The argument of Window is "Title, Layout")
window = sg.Window('Easy OCR', layout)
#List to put the read files
files = []
#Now turn an infinite loop and wait for an event such as a button click.
while True:
event, values = window.read()
#None is the "✕" button in the window. When this is pressed, it breaks out of the loop and closes the window.
if event == None:
break
# 'Read execution'When the button is pressed
if event == 'Read execution':
# key='-FILES-'The value of InputText specified in';'Get a list of filenames separated by
files.extend(values['-FILES-'].split(';'))
#Radio buttons are values['-jpn-']Then language is'jpn',Otherwise'eng'
language = 'jpn' if values['-jpn-'] else 'eng'
text = ''
#Loop by the number of files
for i in range(len(files)):
if not i == 0:
#There is a delimiter for each file
text += '================================================================================================\n'
#The scan defined earlier here_file_to_Receive the read string with str method
text += scan_file_to_str(files[i], language)
if language == 'jpn':
#In the case of Japanese character strings, there was a lot of extra space, so I deleted it.
text = text.replace(' ', '')
#Leave two lines apart from the string in the next file
text += '\n\n'
#Read data(=text)Key='-OUTPUT-'Display on the MLine specified in
window.FindElement('-OUTPUT-').Update(text)
#Inform the end with a pop-up window
sg.Popup('Has completed')
window.close()
Regarding the GUI, there are some other things that I have referred to a lot, so I will post them.
・ Learning Notes for K-TechLabo Seminar → The PDF text is very easy to understand. -Create a UI that replaces VBA with PySimpleGUI (file dialog, list, log output) → The same person as the article introduced earlier is written. I also learned from here.
import os
import sys
from PIL import Image
import PySimpleGUI as sg
import pyocr
import pyocr.builders
def scan_file_to_str(file_path, langage):
"""read_file_to_str
Generate a string from an image file
Args:
file_path(str):File path to read
langage(str): 'jpn'Or'eng'
Returns:
Read character string
"""
tools = pyocr.get_available_tools()
if len(tools) == 0:
print("No OCR tool found")
sys.exit(1)
tool = tools[0]
text = tool.image_to_string(
Image.open(file_path),
lang=langage,
builder=pyocr.builders.TextBuilder(tesseract_layout=6)
)
return text
#Set theme
sg.theme('Light Grey1')
layout = [
#The first line
[sg.Text('File to read(Multiple selections possible)', font=('IPA Gothic', 16))],
#2nd line
[sg.InputText(font=('IPA Gothic', 14), size=(70, 10),), sg.FilesBrowse('Select files', key='-FILES-'),],
#3rd line
[sg.Text('Language to read', font=('IPA Gothic', 16)),
sg.Radio('Japanese', 1, key='-jpn-', font=('IPA Gothic', 10)),
sg.Radio('English', 1, key='-eng-', font=('IPA Gothic', 10))],
#4th line
[sg.Button('Read execution'),],
#5th line
[sg.MLine(font=('IPA Gothic', 14), size=(100,30), key='-OUTPUT-'),]
]
#Get window
window = sg.Window('Easy OCR', layout)
files = []
a = 0
while True:
event, values = window.read()
if event == None:
break
if event == 'Read execution':
files.extend(values['-FILES-'].split(';'))
language = 'jpn' if values['-jpn-'] else 'eng'
text = ''
for i in range(len(files)):
if not i == 0:
text += '================================================================================================\n'
text += scan_file_to_str(files[i], language)
if language == 'jpn':
text = text.replace(' ', '')
text += '\n\n'
window.FindElement('-OUTPUT-').Update(text)
sg.Popup('Has completed')
window.close()
Let me read two images
[English 1st (from The White House Building)]
[2nd English] ☟
【result】
I think English is quick to read and has a high degree of accuracy.
[Japanese (from Aozora Bunko)]
☟
【result】
Japanese takes time. Still, the accuracy is at a level that seems to be usable.
Actually, I wanted to make this app an executable file that runs on the desktop of Mac or Windows, but neither pyinstaller nor py2app worked, so I decided to write an article in this state. If I can do that in the future, I will update it.
Also, if you have any suggestions, opinions, or suggestions such as "Isn't it different here?" Or "There is such a way here," please feel free to write in the comment section.
Recommended Posts