We have summarized how to recognize handwritten characters using the optical character recognition (OCR) function that can be used with GCP (Google Cloud platform). For GCP beginners and those who want to use GCP in the future.
The goal is to recognize handwritten characters in the image using the OCR function of GCP.
macOS Catalina 10.15.6 Python 3.8.1
――Before you start --Let's prepare input data --Implementation --Execute
You need to create a Google account to use each GCP service. If you do not have a Google account, please refer to here to create a Google account.
After creating a Google account, go to GCP Console and [here](https://cloud.google.com/vision/docs/before-you- Please refer to begin) to set the Cloud project and authentication information.
Before starting the implementation, first prepare the handwritten image you want to recognize. I prepared such an image.
[Tutorial](https://cloud.google.com/vision/docs/handwriting?apix_params=%7B%22alt%22%3A%22json%22%2C%22%24.xgafv%22%3A%221%22 % 2C% 22prettyPrint% 22% 3Atrue% 2C% 22resource% 22% 3A% 7B% 7D% 7D # vision-document-text-detection-python) to create the code. The created code is as follows. The file name is detect.py.
import os
import io
from google.cloud import vision
def detect_document(path):
client = vision.ImageAnnotatorClient()
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
response = client.document_text_detection(image=image)
for page in response.full_text_annotation.pages:
for block in page.blocks:
print('\nBlock confidence: {}\n'.format(block.confidence))
for paragraph in block.paragraphs:
print('Paragraph confidence: {}'.format(
paragraph.confidence))
for word in paragraph.words:
word_text = ''.join([
symbol.text for symbol in word.symbols
])
print('Word text: {} (confidence: {})'.format(
word_text, word.confidence))
for symbol in word.symbols:
print('\tSymbol: {} (confidence: {})'.format(
symbol.text, symbol.confidence))
if response.error.message:
raise Exception(
'{}\nFor more info on error messages, check: '
'https://cloud.google.com/apis/design/errors'.format(
response.error.message))
if __name__ == "__main__":
path = 'sample.png'
detect_document(os.path.abspath(path))
The execution command is as follows.
python3 detect.py
Block confidence: 0.8999999761581421
Paragraph confidence: 0.8999999761581421
Word text:I(confidence: 0.9800000190734863)
Symbol:I(confidence: 0.9800000190734863)
Word text:of(confidence: 0.9900000095367432)
Symbol:of(confidence: 0.9900000095367432)
Word text:name(confidence: 0.9300000071525574)
Symbol:Name(confidence: 0.8600000143051147)
Symbol:Before(confidence: 1.0)
Word text:Is(confidence: 0.9900000095367432)
Symbol:Is(confidence: 0.9900000095367432)
Word text: KOTARO (confidence: 0.8299999833106995)
Symbol: K (confidence: 0.4099999964237213)
Symbol: O (confidence: 0.8299999833106995)
Symbol: T (confidence: 0.8600000143051147)
Symbol: A (confidence: 0.9900000095367432)
Symbol: R (confidence: 0.9900000095367432)
Symbol: O (confidence: 0.949999988079071)
Word text:is(confidence: 0.9399999976158142)
Symbol:so(confidence: 0.9399999976158142)
Symbol:Su(confidence: 0.949999988079071)
Word text: 。 (confidence: 0.9900000095367432)
Symbol: 。 (confidence: 0.9900000095367432)
Block confidence: 0.9200000166893005
Paragraph confidence: 0.9200000166893005
Word text:of(confidence: 0.9200000166893005)
Symbol:of(confidence: 0.9200000166893005)
Block confidence: 0.9300000071525574
Paragraph confidence: 0.9300000071525574
Word text: Python (confidence: 0.9700000286102295)
Symbol: P (confidence: 0.9800000190734863)
Symbol: y (confidence: 0.9800000190734863)
Symbol: t (confidence: 0.9100000262260437)
Symbol: h (confidence: 0.9900000095367432)
Symbol: o (confidence: 0.9900000095367432)
Symbol: n (confidence: 0.9900000095367432)
Word text:But(confidence: 0.9700000286102295)
Symbol:But(confidence: 0.9700000286102295)
Word text:Like(confidence: 0.8999999761581421)
Symbol:Good(confidence: 0.9399999976158142)
Symbol:Ki(confidence: 0.8600000143051147)
Word text:is(confidence: 0.8500000238418579)
Symbol:so(confidence: 0.7799999713897705)
Symbol:Su(confidence: 0.9300000071525574)
Word text: 。 (confidence: 0.8799999952316284)
Symbol: 。 (confidence: 0.8799999952316284)
Block confidence: 0.949999988079071
Paragraph confidence: 0.949999988079071
Word text:Everyone(confidence: 0.9900000095367432)
Symbol:Mi(confidence: 0.9900000095367432)
Symbol:Hmm(confidence: 1.0)
Symbol:Nana(confidence: 1.0)
Word text: 、 (confidence: 0.699999988079071)
Symbol: 、 (confidence: 0.699999988079071)
Word text:Follow(confidence: 0.9300000071525574)
Symbol:Fu(confidence: 0.8899999856948853)
Symbol:Oh(confidence: 0.9200000166893005)
Symbol:B(confidence: 0.9399999976158142)
Symbol:-(confidence: 1.0)
Word text:Shi(confidence: 1.0)
Symbol:Shi(confidence: 1.0)
Word text:hand(confidence: 1.0)
Symbol:hand(confidence: 1.0)
Word text:Ne(confidence: 0.9900000095367432)
Symbol:Ne(confidence: 0.9900000095367432)
Word text: 。 (confidence: 0.9900000095367432)
Symbol: 。 (confidence: 0.9900000095367432)
python3 detect.py 0.82s user 0.42s system 2% cpu 57.861 total
The size of the image file was 8.7MB and the execution time was 0.82s. I found that it was considerably more accurate than the model I trained. As expected, Google in the world. .. ..
Let's take a brief look at the code inside the detect_document method.
client = vision.ImageAnnotatorClient()
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
Authentication and image acquisition are performed in this part. If the authentication settings are not set properly, an error will occur in the first line. Next is the recognition part.
response = client.document_text_detection(image=image)
This is the only line that is actually recognizing. The result of applying the image specified in image to the recognition of the model trained by Google in advance is returned in response.
for page in response.full_text_annotation.pages:
for block in page.blocks:
print('\nBlock confidence: {}\n'.format(block.confidence))
for paragraph in block.paragraphs:
print('Paragraph confidence: {}'.format(
paragraph.confidence))
for word in paragraph.words:
word_text = ''.join([
symbol.text for symbol in word.symbols
])
print('Word text: {} (confidence: {})'.format(
word_text, word.confidence))
The result is displayed in this part. A block is a collection of words, and you can access the confidence of the entire block in block.confidence. Block.paragraphs for what is recognized as a sentence (paragraph) in a block, block.words for what is recognized as a word in a paragraph, and block.symbols for each character (symbols) in a word. You can access it.
If you want to do something with the recognition results, you should get a way to access each recognition result from this part.
As expected, it was about accuracy and processing speed. I also wanted to touch various other things.
Thank you for watching until the end. I am still an inexperienced person, so please do not hesitate to contact me if you have any suggestions or questions regarding the article.
Recommended Posts