It is a memorandum when converting the explanation on the card into text using pyOCR.
--Install pyOCR, Tesseract, jpn.traineddata
Convert image data to text with pyOCR in Mac environment PyOCR installation
$ sudo pip install pyocr
>
> Install Tesseract, the OCR engine section
>
>```
$ brew install tesseract
$ ls /usr/local/Cellar/tesseract/4.1.0/share/tessdata/
Get jpn.traineddata
$ wget https://github.com/tesseract-ocr/tessdata/raw/4.00/jpn.traineddata $ mv jpn.traineddata /usr/local/Cellar/tesseract/4.1.0/share/tessdata/
--Card
(Example) We will use the following cards.
Quote: [Yu-Gi-Oh! Duel Links Strategy](https://www.google.com/url?sa=i&url=https%3A%2F%2Fkamigame.jp%2F%25E9%2581%258A%25E6%2588%25AF%25E7 % 258E% 258B% 25E3% 2583% 2587% 25E3% 2583% 25A5% 25E3% 2582% 25A8% 25E3% 2583% 25AB% 25E3% 2583% 25AA% 25E3% 2583% 25B3% 25E3% 2582% 25AF% 25E3% 2582 % 25B9% 2F% 25E3% 2582% 25AB% 25E3% 2583% 25BC% 25E3% 2583% 2589% 2F% 25E9% 259D% 2592% 25E7% 259C% 25BC% 25E3% 2581% 25AE% 25E7% 2599% 25BD% 25E9 % 25BE% 258D.html & psig = AOvVaw3wIPO8FpnvpxrtFSCtCIN2 & ust = 1587370256244000 & source = images & cd = vfe & ved = 0CA0QjhxqFwoTCPiXxpiF9OgCFQAAAAAdAAAAABAD)
![card1.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/344880/8f50d346-efbd-5f03-0021-4e59f56d5df1.png)
# code
#### **`sample.py`**
```py
from PIL import Image
import sys
sys.path.append('/path/to/dir')
import pyocr
import pyocr.builders
tools = pyocr.get_available_tools()
if len(tools) == 0:
print("No OCR tool found")
sys.exit(1)
tool = tools[0]
print("Will use tool '%s'" % (tool.get_name()))
langs = tool.get_available_languages()
print("Available languages: %s" % ", ".join(langs))
txt = tool.image_to_string(
Image.open('card.png'),
lang='jpn',
builder=pyocr.builders.TextBuilder()
)
print(txt)
$ python sample.py
Will use tool 'Tesseract (sh)'
Available languages: eng, jpn, jpn_vert, osd, snum
An invading E-Lagon that boasts a prosperous attack. What kind of grandchildren
The monument, the anchor of the anchor, is immeasurable.
How can it be detected with higher accuracy?
problem: OCR result is not output Countermeasure: The characters in the image may be too small to analyze. It was possible to analyze by increasing the image size. (Alternatively, if there is any other necessary preprocessing, it should be executed. Example: Background noise removal)
[Yu-Gi-Oh! Duel Links Strategy](https://www.google.com/url?sa=i&url=https%3A%2F%2Fkamigame.jp%2F%25E9%2581%258A%25E6%2588%25AF%25E7%258E % 258B% 25E3% 2583% 2587% 25E3% 2583% 25A5% 25E3% 2582% 25A8% 25E3% 2583% 25AB% 25E3% 2583% 25AA% 25E3% 2583% 25B3% 25E3% 2582% 25AF% 25E3% 2582% 25B9 % 2F% 25E3% 2582% 25AB% 25E3% 2583% 25BC% 25E3% 2583% 2589% 2F% 25E9% 259D% 2592% 25E7% 259C% 25BC% 25E3% 2581% 25AE% 25E7% 2599% 25BD% 25E9% 25BE % 258D.html & psig = AOvVaw3wIPO8FpnvpxrtFSCtCIN2 & ust = 1587370256244000 & source = images & cd = vfe & ved = 0CA0QjhxqFwoTCPiXxpiF9OgCFQAAAAAdAAAAABAD) Convert image data to text with pyOCR in Mac environment How to execute OCR in Python
Recommended Posts