OCR is a technology for extracting character strings from images. There is a technology that translates the screen read by your smartphone with Google Translate. That is OCR. It extracts text from the scanned image and performs natural language processing.
I know I'm pulling text out of the image, but what else can I use it for? It will be a story. For example, you can read the leaflets you received, prints from your company or school, and read them into Word files. In addition, you can convert the contents on the blackboard or whiteboard into text without having to write them down.
・ Installation of Python3 ・ Installation of pyocr ・ Pillow installation ・ Installation of tesseract OCR
Python3 installation is long, so I will omit it I'm a Mac user, so I'll only explain that side.
For Windows users, please refer to the author of the link below.
https://qiita.com/henjiganai/items/7a5e871f652b32b41a18
Then for Mac.
pip install Pillow
Or
pip3 install pillow
pip install pyocr
Or
pip3 install pyocr
brew install tesseract
That's it. Note that you may not be able to execute it without sudo in front.
Targets only png images. I don't know about support for other formats.
import glob
import pyocr.builders
from PIL import Image
#When you receive the file name of the image,Returns txt
class OCRs:
def __init__(self):
self.tools = pyocr.get_available_tools()
self.tool = self.tools[0]
self.langs = self.tool.get_available_languages()
self.lang = self.langs[0]
self.res = False
print(self.tools)
if len(self.tools) != 0:
self.res = True
def read(self, file_name):
if not self.res:
return 'error'
else:
txt = self.tool.image_to_string(
Image.open(file_name),
lang=self.lang,
builder=pyocr.builders.TextBuilder()
)
return txt
The tinkering name such as OCRs is absent, so go to the content. First, declare the module to be used first.
glob is a module for getting the path in a file (directory). pyocr is a module that bridges Python with an engine called tesseract for performing OCR in Python. PIL is a module required to load images.
And init has a set of things such as tool and lang that are used only once (no need to call). If res does not have an OCR engine, it should have a value of False, and if it does, it should have a value of True.
Yes, it's the main read function. What we are doing is receiving the file name as an argument, OCR (extracting the character string) and returning it as text.
First, determine if you have an OCR engine. If not, the string error is returned. After that, set the image and language, etc., receive the text in txt, and then return it.
Yes, let's go to the main function.
if __name__ == '__main__':
cl = OCRs()
cl.__init__()
file_names = glob.glob('/Users/sa/Desktop/programming/target_folder/*')
for file_name in file_names:
if cl.read(file_name) == 'error':
print('OCR software was not found.')
break
else:
print(cl.read(file_name))
Let's take a look. First, assign the previous class to cl and then call init. Initial setting is complete. Then, use glob to specify the image folder you want to set (OCR). I've modified my configuration a bit for people because it's still difficult to manipulate directories. Don't say stupid? ??
#Directory you want to specify(folder)Put in.
filenames = glob.glob('hogehoge/*')
#Now you can get all the filenames in hogehoge.
Then, using a repeating for statement, throw all the elements into the previous function. If an error is returned, the OCR software is not included.
that's all. If you want to specify only this one image! !! In that case, call it as follows.
cl.read(filename)
Recommended Posts