Overview

We have summarized how to recognize handwritten characters using the optical character recognition (OCR) function that can be used with GCP (Google Cloud platform). For GCP beginners and those who want to use GCP in the future.

Introduction

Target

The goal is to recognize handwritten characters in the image using the OCR function of GCP.

Execution environment

macOS Catalina 10.15.6 Python 3.8.1

――Before you start --Let's prepare input data --Implementation --Execute

Execution result --Code explanation --Impression

Before you start

You need to create a Google account to use each GCP service. If you do not have a Google account, please refer to here to create a Google account.

After creating a Google account, go to GCP Console and [here](https://cloud.google.com/vision/docs/before-you- Please refer to begin) to set the Cloud project and authentication information.

Let's prepare the input data

Before starting the implementation, first prepare the handwritten image you want to recognize. I prepared such an image.

Now, implementation

[Tutorial](https://cloud.google.com/vision/docs/handwriting?apix_params=%7B%22alt%22%3A%22json%22%2C%22%24.xgafv%22%3A%221%22 % 2C% 22prettyPrint% 22% 3Atrue% 2C% 22resource% 22% 3A% 7B% 7D% 7D # vision-document-text-detection-python) to create the code. The created code is as follows. The file name is detect.py.

import os
import io

from google.cloud import vision

def detect_document(path):
    client = vision.ImageAnnotatorClient()
    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.types.Image(content=content)

    response = client.document_text_detection(image=image)

    for page in response.full_text_annotation.pages:
        for block in page.blocks:
            print('\nBlock confidence: {}\n'.format(block.confidence))

            for paragraph in block.paragraphs:
                print('Paragraph confidence: {}'.format(
                    paragraph.confidence))

                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    print('Word text: {} (confidence: {})'.format(
                        word_text, word.confidence))

                    for symbol in word.symbols:
                        print('\tSymbol: {} (confidence: {})'.format(
                            symbol.text, symbol.confidence))

    if response.error.message:
        raise Exception(
            '{}\nFor more info on error messages, check: '
            'https://cloud.google.com/apis/design/errors'.format(
                response.error.message))


if __name__ == "__main__":
    path = 'sample.png'
    detect_document(os.path.abspath(path))

Run

The execution command is as follows.

python3 detect.py

Execution result


Block confidence: 0.8999999761581421

Paragraph confidence: 0.8999999761581421
Word text:I(confidence: 0.9800000190734863)
	Symbol:I(confidence: 0.9800000190734863)
Word text:of(confidence: 0.9900000095367432)
	Symbol:of(confidence: 0.9900000095367432)
Word text:name(confidence: 0.9300000071525574)
	Symbol:Name(confidence: 0.8600000143051147)
	Symbol:Before(confidence: 1.0)
Word text:Is(confidence: 0.9900000095367432)
	Symbol:Is(confidence: 0.9900000095367432)
Word text: KOTARO (confidence: 0.8299999833106995)
	Symbol: K (confidence: 0.4099999964237213)
	Symbol: O (confidence: 0.8299999833106995)
	Symbol: T (confidence: 0.8600000143051147)
	Symbol: A (confidence: 0.9900000095367432)
	Symbol: R (confidence: 0.9900000095367432)
	Symbol: O (confidence: 0.949999988079071)
Word text:is(confidence: 0.9399999976158142)
	Symbol:so(confidence: 0.9399999976158142)
	Symbol:Su(confidence: 0.949999988079071)
Word text: 。 (confidence: 0.9900000095367432)
	Symbol: 。 (confidence: 0.9900000095367432)

Block confidence: 0.9200000166893005

Paragraph confidence: 0.9200000166893005
Word text:of(confidence: 0.9200000166893005)
	Symbol:of(confidence: 0.9200000166893005)

Block confidence: 0.9300000071525574

Paragraph confidence: 0.9300000071525574
Word text: Python (confidence: 0.9700000286102295)
	Symbol: P (confidence: 0.9800000190734863)
	Symbol: y (confidence: 0.9800000190734863)
	Symbol: t (confidence: 0.9100000262260437)
	Symbol: h (confidence: 0.9900000095367432)
	Symbol: o (confidence: 0.9900000095367432)
	Symbol: n (confidence: 0.9900000095367432)
Word text:But(confidence: 0.9700000286102295)
	Symbol:But(confidence: 0.9700000286102295)
Word text:Like(confidence: 0.8999999761581421)
	Symbol:Good(confidence: 0.9399999976158142)
	Symbol:Ki(confidence: 0.8600000143051147)
Word text:is(confidence: 0.8500000238418579)
	Symbol:so(confidence: 0.7799999713897705)
	Symbol:Su(confidence: 0.9300000071525574)
Word text: 。 (confidence: 0.8799999952316284)
	Symbol: 。 (confidence: 0.8799999952316284)

Block confidence: 0.949999988079071

Paragraph confidence: 0.949999988079071
Word text:Everyone(confidence: 0.9900000095367432)
	Symbol:Mi(confidence: 0.9900000095367432)
	Symbol:Hmm(confidence: 1.0)
	Symbol:Nana(confidence: 1.0)
Word text: 、 (confidence: 0.699999988079071)
	Symbol: 、 (confidence: 0.699999988079071)
Word text:Follow(confidence: 0.9300000071525574)
	Symbol:Fu(confidence: 0.8899999856948853)
	Symbol:Oh(confidence: 0.9200000166893005)
	Symbol:B(confidence: 0.9399999976158142)
	Symbol:-(confidence: 1.0)
Word text:Shi(confidence: 1.0)
	Symbol:Shi(confidence: 1.0)
Word text:hand(confidence: 1.0)
	Symbol:hand(confidence: 1.0)
Word text:Ne(confidence: 0.9900000095367432)
	Symbol:Ne(confidence: 0.9900000095367432)
Word text: 。 (confidence: 0.9900000095367432)
	Symbol: 。 (confidence: 0.9900000095367432)
python3 detect.py  0.82s user 0.42s system 2% cpu 57.861 total

The size of the image file was 8.7MB and the execution time was 0.82s. I found that it was considerably more accurate than the model I trained. As expected, Google in the world. .. ..

Code commentary

Let's take a brief look at the code inside the detect_document method.

    client = vision.ImageAnnotatorClient()
    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.types.Image(content=content)

Authentication and image acquisition are performed in this part. If the authentication settings are not set properly, an error will occur in the first line. Next is the recognition part.

response = client.document_text_detection(image=image)

This is the only line that is actually recognizing. The result of applying the image specified in image to the recognition of the model trained by Google in advance is returned in response.

for page in response.full_text_annotation.pages:
        for block in page.blocks:
            print('\nBlock confidence: {}\n'.format(block.confidence))

            for paragraph in block.paragraphs:
                print('Paragraph confidence: {}'.format(
                    paragraph.confidence))

                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    print('Word text: {} (confidence: {})'.format(
                        word_text, word.confidence))

The result is displayed in this part. A block is a collection of words, and you can access the confidence of the entire block in block.confidence. Block.paragraphs for what is recognized as a sentence (paragraph) in a block, block.words for what is recognized as a word in a paragraph, and block.symbols for each character (symbols) in a word. You can access it.

If you want to do something with the recognition results, you should get a way to access each recognition result from this part.

Impressions

As expected, it was about accuracy and processing speed. I also wanted to touch various other things.

Thank you for watching until the end. I am still an inexperienced person, so please do not hesitate to contact me if you have any suggestions or questions regarding the article.

Try using GCP Handwriting Recognition (OCR)