About this article

This article describes how to use GCP's Vision API by skipping difficult things. Even for the first time, you can use it if you follow the image.

Reference page

Google Vision API https://cloud.google.com/vision/docs/ocr/?hl=ja (I was able to go without installing the Google Cloud SDK. Thanks to installing google-cloud-vision with pip?)

Preliminary work

Windows10 Install Python 3.7 Create a GCP account Preparing the image you want to read

Start work

Library installation

First, get the Google Vision API library. Enter the following command.

pip install --upgrade google-cloud-vision

After installation, we will test it. Open python

from google.cloud import vision

To execute. If this does not cause an error, it is okay. If not

Python 3.7.7 (default, May 6 2020, 11:45:54) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from google.cloud import vision Traceback (most recent call last): File "", line 1, in ImportError: cannot import name 'vision' from 'google.cloud' (unknown location)

It will be displayed.

Creating a GCP project

This time we will set up a new GCP project. Press New Project. Create a project name with a descriptive name.

API activation

After creating the project, enable the Vision API. Please click in the order of the image numbers.

Enter "google vision api" in the search window. Click the displayed item. Click the Enable button.

Creating a service account

Select according to the number in the image. Enter an appropriate service account name. The ID will be entered automatically, so you can leave it as it is. Please select an appropriate role. The next screen will be completed as it is. A service account has been added. Click image.png And select "Create Key". Select JSON and press the Finish button. Move the downloaded file to any location.

Setting environment variables

Set environment variables. Variables: GOOGLE_APPLICATION_CREDENTIALS Value: Location and name of the downloaded json file (example: c: \ user \ xxxxxx \ desctop \ xxxxxxx.json)

The setting procedure is as follows. Proceed according to the number in the image.

Source code

Change the source code brought from Github.

`detext.py`


"""Detects text in the file."""
from google.cloud import vision
import io
client = vision.ImageAnnotatorClient()

# [START vision_python_migration_text_detection]
path = "C:\\Users\\xxxx\\Desktop\\gcptest\\xxxxx.png "
with io.open(path, 'rb') as image_file:
    content = image_file.read()

image = vision.types.Image(content=content)

response = client.text_detection(image=image)
texts = response.text_annotations
print('Texts:')

for text in texts:
    print('\n"{}"'.format(text.description))

    vertices = (['({},{})'.format(vertex.x, vertex.y)
                for vertex in text.bounding_poly.vertices])

    print('bounds: {}'.format(','.join(vertices)))

if response.error.message:
    raise Exception(
        '{}\nFor more info on error messages, check: '
        'https://cloud.google.com/apis/design/errors'.format(
            response.error.message))

You can change the image by changing the part of "path =" C: \ Users \ xxxx \ Desktop \ gcptest \ xxxxx.png "" on the 7th line in the source code. The following image is used as an example.

Output result

"Do your best Japan Meiji chocolate snack Mountain of mushrooms Fragrant strawberry flavor only TOKTO 0 20 ES Child) ©Tokyo 2020 " bounds: (1,66),(745,66),(745,954),(1,954) "Kanbare" bounds: (72,80),(351,73),(353,152),(74,159) "Nippon" bounds: (353,73),(619,66),(621,145),(355,152) "Meiji" bounds: (208,151),(305,150),(305,190),(208,191) "chocolate" bounds: (307,151),(405,150),(405,189),(307,190) "snack" bounds: (394,156),(535,155),(535,184),(394,185) "mushroom" bounds: (33,167),(471,157),(475,354),(37,364) "of" bounds: (473,157),(569,155),(573,352),(477,354) "Mountain" bounds: (571,155),(735,151),(739,348),(575,352) "Kaoru" bounds: (131,403),(314,390),(322,502),(139,516) "Strawberry" bounds: (315,390),(596,370),(604,482),(323,503) "taste" bounds: (598,370),(671,365),(679,476),(606,482) "only" bounds: (637,551),(736,527),(745,564),(646,588) "TOKTO" bounds: (236,697),(260,690),(263,701),(239,708) "0" bounds: (262,691),(277,687),(280,696),(265,701) "20" bounds: (1,804),(34,804),(34,829),(1,829) "ES" bounds: (1,828),(16,828),(16,844),(1,844) "Child" bounds: (1,910),(18,910),(18,932),(1,932) ")" bounds: (19,912),(24,912),(24,927),(19,927) "©Tokyo" bounds: (72,924),(150,921),(151,951),(73,954) "2020" bounds: (157,921),(210,919),(211,949),(158,951)

As expected, a mountain of mushrooms

Let's touch Google's Vision API from Python for the time being