I tried to make an OCR application with PySimpleGUI

■ Background

I wondered if I could easily make an OCR application that starts on the desktop, so I made it.

■ Environment

· MacOS Catalina 10.15.4 ・ Visual Studio Code ・ Python3.8.1

■ Library installation

    1. PysimpleGUI (Create GUI)
pip install pysimplegui

2. Tesseract OCR(OCR)

brew install tesseract
  1. pyocr (OCR tool wrapper for Python)
sudo pip3 install pyocr
  1. pillow (loading image)
sudo pip3 install pillow

● What are these libraries? Including that, I have greatly referred to the following sites for OCR (thanks). Try simple OCR with Tesseract + PyOCR

● This may be easier to understand for Windows users. [Python] How to transcribe an image and convert it to text (tesseract-OCR, pyocr)

Now let's see how to write the code to read the characters from the image (importing the library is included in the whole source code at the end, so I will omit it here).

def scan_file_to_str(file_path, langage):
   """read_file_to_str

Generate a string from an image file

        Args:
            file_path(str):File path to read
            langage(str): 'jpn'Or'eng'

        Returns:
Read character string
   """
   tools = pyocr.get_available_tools()
   if len(tools) == 0:
      print("No OCR tool found")
      sys.exit(1)

   tool = tools[0]

   text = tool.image_to_string(
      #Open the file sent as an argument
      Image.open(file_path),
      #Specify the language sent as an argument('jpn'Or'eng')
      lang=langage,
      builder=pyocr.builders.TextBuilder(tesseract_layout=6)
   )
   #Finally returns the string read from the image
   return text

It's really surprising that you can read a character string from an image in just 15 lines. I was impressed.

Next, I will put this on the GUI. I think tkinter is famous when it comes to Python GUI. I used to write code using tkinter at first, but when I was doing the research, I came across the following article.

[If you use Tkinter, try using PySimpleGUI](https://qiita.com/dario_okazaki/items/656de21cab5c81cabe59#exe%E5%8C%96%E3%81%AB%E3%81%A4 % E3% 81% 84% E3% 81% A6)

I was also impressed by the fact that the GUI could be implemented with simple code, so I decided to use it.

Here is the code for the GUI part.

#Set theme(There are many themes)
sg.theme('Light Grey1')

#Where and what to place(I think it will be easier to assemble if you know that it is arranged in units of lines.)
layout = [
    #The first line(Text:Put the text)
    [sg.Text('File to read(Multiple selections possible)', font=('IPA Gothic', 16))],

    #2nd line(InputText:Text box, FilesBrowse:File dialog)
    [sg.InputText(font=('IPA Gothic', 14), size=(70, 10),), sg.FilesBrowse('Select files', key='-FILES-'),],

    #3rd line(Text:text, Radio:Radio button x 2)
    [sg.Text('Language to read', font=('IPA Gothic', 16)), 
    sg.Radio('Japanese', 1, key='-jpn-', font=('IPA Gothic', 10)),
    sg.Radio('English', 1, key='-eng-', font=('IPA Gothic', 10))],

    #4th line(Button:button)
    [sg.Button('Read execution'),],

    #5th line(MLine:100 columns x 30 rows textarea)
    [sg.MLine(font=('IPA Gothic', 14), size=(100,30), key='-OUTPUT-'),]
]

#Get window(The argument of Window is "Title, Layout")
window = sg.Window('Easy OCR', layout)

#List to put the read files
files = []

#Now turn an infinite loop and wait for an event such as a button click.
while True:
    event, values = window.read()
    #None is the "✕" button in the window. When this is pressed, it breaks out of the loop and closes the window.
    if event == None:
        break
    
    # 'Read execution'When the button is pressed
    if event == 'Read execution':
        # key='-FILES-'The value of InputText specified in';'Get a list of filenames separated by
        files.extend(values['-FILES-'].split(';'))
        #Radio buttons are values['-jpn-']Then language is'jpn',Otherwise'eng'
        language = 'jpn' if values['-jpn-'] else 'eng'
        text = ''
        #Loop by the number of files
        for i in range(len(files)):
            if not i == 0:
                #There is a delimiter for each file
                text += '================================================================================================\n'
                #The scan defined earlier here_file_to_Receive the read string with str method
                text += scan_file_to_str(files[i], language)
         
                if language == 'jpn':
                #In the case of Japanese character strings, there was a lot of extra space, so I deleted it.
                text = text.replace(' ', '')
                #Leave two lines apart from the string in the next file
                text += '\n\n'
        #Read data(=text)Key='-OUTPUT-'Display on the MLine specified in
        window.FindElement('-OUTPUT-').Update(text)
        #Inform the end with a pop-up window
        sg.Popup('Has completed')

window.close()

Regarding the GUI, there are some other things that I have referred to a lot, so I will post them.

Learning Notes for K-TechLabo Seminar → The PDF text is very easy to understand. -Create a UI that replaces VBA with PySimpleGUI (file dialog, list, log output) → The same person as the article introduced earlier is written. I also learned from here.

■ Source code (completed)

import os
import sys
from PIL import Image

import PySimpleGUI as sg
import pyocr
import pyocr.builders


def scan_file_to_str(file_path, langage):
   """read_file_to_str

Generate a string from an image file

        Args:
            file_path(str):File path to read
            langage(str): 'jpn'Or'eng'

        Returns:
Read character string
   """
   tools = pyocr.get_available_tools()
   if len(tools) == 0:
      print("No OCR tool found")
      sys.exit(1)

   tool = tools[0]

   text = tool.image_to_string(
      Image.open(file_path),
      lang=langage,
      builder=pyocr.builders.TextBuilder(tesseract_layout=6)
   )
   return text


#Set theme
sg.theme('Light Grey1')

layout = [
   #The first line
   [sg.Text('File to read(Multiple selections possible)', font=('IPA Gothic', 16))],
   #2nd line
   [sg.InputText(font=('IPA Gothic', 14), size=(70, 10),), sg.FilesBrowse('Select files', key='-FILES-'),],
   #3rd line
   [sg.Text('Language to read', font=('IPA Gothic', 16)), 
   sg.Radio('Japanese', 1, key='-jpn-', font=('IPA Gothic', 10)),
   sg.Radio('English', 1, key='-eng-', font=('IPA Gothic', 10))],
   #4th line
   [sg.Button('Read execution'),],
   #5th line
   [sg.MLine(font=('IPA Gothic', 14), size=(100,30), key='-OUTPUT-'),]
]

#Get window
window = sg.Window('Easy OCR', layout)

files = []

a = 0

while True:
   event, values = window.read()
   if event == None:
      break

   if event == 'Read execution':
      files.extend(values['-FILES-'].split(';'))
      language = 'jpn' if values['-jpn-'] else 'eng'
      text = ''
      for i in range(len(files)):
         if not i == 0:
            text += '================================================================================================\n'
         text += scan_file_to_str(files[i], language)
         if language == 'jpn':
            text = text.replace(' ', '')
         text += '\n\n'
      window.FindElement('-OUTPUT-').Update(text)
      sg.Popup('Has completed')

window.close()
スクリーンショット 2020-05-06 22.45.44.png

Let me read two images

[English 1st (from The White House Building)] スクリーンショット 2020-05-06 22.47.50.png

[2nd English] スクリーンショット 2020-05-06 22.48.00.png

【result】 スクリーンショット 2020-05-06 22.59.50.png

I think English is quick to read and has a high degree of accuracy.

[Japanese (from Aozora Bunko)]

スクリーンショット 2020-05-06 22.56.34.png

【result】 スクリーンショット 2020-05-06 22.58.45.png

Japanese takes time. Still, the accuracy is at a level that seems to be usable.

■ Finally

Actually, I wanted to make this app an executable file that runs on the desktop of Mac or Windows, but neither pyinstaller nor py2app worked, so I decided to write an article in this state. If I can do that in the future, I will update it.

Also, if you have any suggestions, opinions, or suggestions such as "Isn't it different here?" Or "There is such a way here," please feel free to write in the comment section.

Recommended Posts

I tried to make an OCR application with PySimpleGUI
I tried to make a 2channel post notification application with Python
I tried to make a todo application using bottle with python
I tried to make an image similarity function with Python + OpenCV
I tried to detect an object with M2Det!
I tried to make an open / close sensor (Twitter cooperation) with TWE-Lite-2525A
I tried to implement an artificial perceptron with python
I tried to find an alternating series with tensorflow
I tried to make a simple mail sending application with tkinter of Python
I tried to make various "dummy data" with Python faker
I tried to make GUI tic-tac-toe with Python and Tkinter
I tried to create an article in Wiki.js with SQLAlchemy
[5th] I tried to make a certain authenticator-like tool with python
I tried to make an activity that collectively sets location information
I tried sending an SMS with Twilio
I tried to implement Autoencoder with TensorFlow
[2nd] I tried to make a certain authenticator-like tool with python
I tried to visualize AutoEncoder with TensorFlow
I tried to make deep learning scalable with Spark × Keras × Docker
I tried to get started with Hy
[3rd] I tried to make a certain authenticator-like tool with python
I tried sending an email with python.
I tried to make a periodical process with Selenium and Python
I tried to make an analysis base of 5 patterns in 3 years
I want to make an automation program!
[4th] I tried to make a certain authenticator-like tool with python
[1st] I tried to make a certain authenticator-like tool with python
I tried to make a strange quote for Jojo with LSTM
I tried to make a mechanism of exclusive control with Go
I tried to make an original language "PPAP Script" that imaged PPAP (Pen Pineapple Appo Pen) with Python
Python: I tried to make a flat / flat_map just right with a generator
I tried to make a calculator with Tkinter so I will write it
I tried to make "Sakurai-san" a LINE BOT with API Gateway + Lambda
[AWS] [GCP] I tried to make cloud services easy to use with Python
I tried to discriminate a 6-digit number with a number discrimination application made with python
[Zaif] I tried to make it easy to trade virtual currencies with Python
I tried to make a url shortening service serverless with AWS CDK
I tried to predict next year with AI
I tried to detect Mario with pytorch + yolov3
I tried to implement reading Dataset with PyTorch
I tried to use lightGBM, xgboost with Boruta
I tried to learn logical operations with TF Learn
I tried to move GAN (mnist) with keras
I tried "License OCR" with Google Vision API
I tried to get an image by scraping
I tried to save the data with discord
I tried to detect motion quickly with OpenCV
I tried to integrate with Keras in TFv1.1
I want to make a game with Python
I want to be an OREMO with setParam!
I tried to output LLVM IR with Python
I tried to make AI for Smash Bros.
I tried to automate sushi making with python
I tried "Receipt OCR" with Google Vision API
I tried to operate Linux with Discord Bot
I tried sending an email with SendGrid + Python
I tried to study DP with Fibonacci sequence
I tried to start Jupyter with Amazon lightsail
I tried to judge Tsundere with Naive Bayes
I tried to make a ○ ✕ game using TensorFlow
I tried to debug.