This article is the 12th day article of Fujitsu Systems Web Technology Advent Calendar. (Promise) The content of this article is my own opinion and does not represent the organization to which I belong.

Introduction

This article summarizes the minimum steps to use Google's image recognition API "Cloud Vision API". Finally, I am trying to judge the food photo.

Digression

My personal hobby is Mesitello [^ 1], and I post food photos on the LINE timeline for the purpose of making me think "I'm hungry". I was wondering if I could do something with the image recognition API to take more delicious pictures and improve the quality of Mesitello, so this time I started by trying the Google Cloud Vision API.

What I used this time

Google Cloud Platform --Services required to use the Cloud Vision API
Anaconda --A package containing Python itself and commonly used libraries

Preparation for API use

Registration for Google Cloud Platform

First, register to use Google Cloud Platform to use the Cloud Vision API.

Click "Start for free" on the following site to start the registration procedure. Available with your Google account.

Cloud Computing Service | Google Cloud

Credit card registration is required even when using for free.

Creating a project

When registration is complete, you will see a screen like this and the default project has been created. You can create a new project from the red frame. The API is also available in the default project, but here I created a "Meshitero" project.

コンソールトップ.PNG

Enable Cloud Vision API

If you enter "Viso in API" in the search form, the Cloud Vision API will be hit, so click it. Select "Enable" on the screen after the transition.

コンソールトップ_検索.png

Vision_API_トップ.PNG

Digression: By the way, if you select "Try this API", you can try the demo on the screen.

When enabled, you will see a screen like this.

Issuance of API key

This time we will call it in Python, so create the credentials. Click "Create Credentials" on the Cloud Vision API screen to move to the "APIs and Services" credentials page. If you select "API Key" from the "Create Credentials" pull-down, an API key will be issued.

APIキー.PNG

Restrict keys to avoid unauthorized use. Clicking "Restrict Keys" on the above screen will take you to a screen where you can restrict keys. Set restrictions as needed. This time, we restricted the usage by IP address, and once restricted the APIs that can be used to the Cloud Vision API.

キーを制限.PNG

This completes the settings for using the API.

API call

Installation of Anaconda

To call the API in Python I installed Anaconda referring to the following.

Installation of Anaconda (Windows version)

Creating the source

Create the source by referring to the contents of Reference of API.

Creation of request data to be passed to API

It seems that the image data needs to be passed as a base64 encoded string. Also, specify the type of image analysis and the maximum number of results to be returned in "features". This time, specify label detection (LABEL_DETECTION).

Other types of image analysis include face detection and logo / landmark detection.

When passing it to the API, it will be in Json format, so it will also be encoded in Json format.

img_request = []
with open(filename, 'rb') as f:
    ctxt = b64encode(f.read()).decode()
    img_requests.append({
            'image': {'content': ctxt},
            'features': [{
                'type': 'LABEL_DETECTION',
                'maxResults': 10
            }]
    })
request_data = json.dumps({"requests": img_request }).encode()

API call part

Specify the request data and the API key output in the previous stage, and send the API request.

API_URL = 'https://vision.googleapis.com/v1/images:annotate'

response = requests.post(API_URL,
                         data=request_data ,
                         params={'key': api_key},
                         headers={'Content-Type': 'application/json'})

Result output

Outputs the data returned from the API.

for resp in enumerate(response.json()['responses']):
            print (json.dumps(resp, indent=2))

Run

Pass the API key issued by GCP and the image path to the created source and execute it. Try some food images.

$ Python Meshitero.py [API key] [Image path]

Whole source (Meshitero.py)

`Meshitero.py`


from base64 import b64encode
from sys import argv
import json
import requests

API_URL = 'https://vision.googleapis.com/v1/images:annotate'

if __name__ == '__main__':
    api_key = argv[1]
    filename = argv[2]
    
    img_request = []
    with open(filename, 'rb') as f:
        ctxt = b64encode(f.read()).decode()
        img_request.append({
                'image': {'content': ctxt},
                'features': [{
                    'type': 'LABEL_DETECTION',
                    'maxResults': 10
                }]
        })

    request_data = json.dumps({"requests": img_request }).encode()
    
    response = requests.post(API_URL,
                            data=request_data,
                            params={'key': api_key},
                            headers={'Content-Type': 'application/json'})

    if response.status_code != 200 or response.json().get('error'):
        print(response.text)
    else:
        for resp in enumerate(response.json()['responses']):
            print (json.dumps(resp, indent=2))

Execution result (oyster)

 {
    "labelAnnotations": [
      {
        "mid": "/m/0_cp5",
        "description": "Oyster",
        "score": 0.9910632,
        "topicality": 0.9910632
      },
      {
        "mid": "/m/02wbm",
        "description": "Food",
        "score": 0.9903261,
        "topicality": 0.9903261
      },
      {
        "mid": "/m/06nwz",
        "description": "Seafood",
        "score": 0.9609892,
        "topicality": 0.9609892
      },
      {
        "mid": "/m/01cqy9",
        "description": "Bivalve",
        "score": 0.9138548,
        "topicality": 0.9138548
      },
      {
        "mid": "/m/02q08p0",
        "description": "Dish",
        "score": 0.8472096,
        "topicality": 0.8472096
      },
      {
        "mid": "/m/01ykh",
        "description": "Cuisine",
        "score": 0.811229,
        "topicality": 0.811229
      },
      {
        "mid": "/m/07xgrh",
        "description": "Ingredient",
        "score": 0.8011539,
        "topicality": 0.8011539
      },
      {
        "mid": "/m/088kg2",
        "description": "Oysters rockefeller",
        "score": 0.70525026,
        "topicality": 0.70525026
      },
      {
        "mid": "/m/0fbdv",
        "description": "Shellfish",
        "score": 0.6510715,
        "topicality": 0.6510715
      },
      {
        "mid": "/m/0ffhy",
        "description": "Clam",
        "score": 0.6364975,
        "topicality": 0.6364975
      }
    ]
  }

Execution result (sushi)

  {
    "labelAnnotations": [
      {
        "mid": "/m/02q08p0",
        "description": "Dish",
        "score": 0.9934035,
        "topicality": 0.9934035
      },
      {
        "mid": "/m/01ykh",
        "description": "Cuisine",
        "score": 0.9864208,
        "topicality": 0.9864208
      },
      {
        "mid": "/m/02wbm",
        "description": "Food",
        "score": 0.97343695,
        "topicality": 0.97343695
      },
      {
        "mid": "/m/048wsd",
        "description": "Gimbap",
        "score": 0.96859926,
        "topicality": 0.96859926
      },
      {
        "mid": "/m/07030",
        "description": "Sushi",
        "score": 0.9650486,
        "topicality": 0.9650486
      },
      {
        "mid": "/m/0cjyd",
        "description": "Sashimi",
        "score": 0.9185767,
        "topicality": 0.9185767
      },
      {
        "mid": "/m/04q6ng",
        "description": "Comfort food",
        "score": 0.8544887,
        "topicality": 0.8544887
      },
      {
        "mid": "/m/07xgrh",
        "description": "Ingredient",
        "score": 0.8450334,
        "topicality": 0.8450334
      },
      {
        "mid": "/m/05jrv",
        "description": "Nori",
        "score": 0.8431285,
        "topicality": 0.8431285
      },
      {
        "mid": "/m/027lnr6",
        "description": "Sakana",
        "score": 0.8388547,
        "topicality": 0.8388547
      }
    ]
  }

Execution result (hamburger)

 {
    "labelAnnotations": [
      {
        "mid": "/m/02q08p0",
        "description": "Dish",
        "score": 0.9934035,
        "topicality": 0.9934035
      },
      {
        "mid": "/m/02wbm",
        "description": "Food",
        "score": 0.9903261,
        "topicality": 0.9903261
      },
      {
        "mid": "/m/01ykh",
        "description": "Cuisine",
        "score": 0.9864208,
        "topicality": 0.9864208
      },
      {
        "mid": "/m/0h55b",
        "description": "Junk food",
        "score": 0.9851551,
        "topicality": 0.9851551
      },
      {
        "mid": "/m/01_bhs",
        "description": "Fast food",
        "score": 0.97022384,
        "topicality": 0.97022384
      },
      {
        "mid": "/m/0cdn1",
        "description": "Hamburger",
        "score": 0.9571771,
        "topicality": 0.9571771
      },
      {
        "mid": "/m/0cc7bks",
        "description": "Buffalo burger",
        "score": 0.94575346,
        "topicality": 0.94575346
      },
      {
        "mid": "/m/03f476",
        "description": "Veggie burger",
        "score": 0.9283731,
        "topicality": 0.9283731
      },
      {
        "mid": "/m/0bp3f6m",
        "description": "Fried food",
        "score": 0.9257971,
        "topicality": 0.9257971
      },
      {
        "mid": "/m/02y6n",
        "description": "French fries",
        "score": 0.92217153,
        "topicality": 0.92217153
      }
    ]
  }

Execution result (fried shrimp)

  {
    "labelAnnotations": [
      {
        "mid": "/m/02q08p0",
        "description": "Dish",
        "score": 0.9934035,
        "topicality": 0.9934035
      },
      {
        "mid": "/m/02wbm",
        "description": "Food",
        "score": 0.9903261,
        "topicality": 0.9903261
      },
      {
        "mid": "/m/01ykh",
        "description": "Cuisine",
        "score": 0.9864208,
        "topicality": 0.9864208
      },
      {
        "mid": "/m/0g9vs81",
        "description": "Steamed rice",
        "score": 0.9271187,
        "topicality": 0.9271187
      },
      {
        "mid": "/m/07xgrh",
        "description": "Ingredient",
        "score": 0.9207317,
        "topicality": 0.9207317
      },
      {
        "mid": "/m/0bp3f6m",
        "description": "Fried food",
        "score": 0.9098738,
        "topicality": 0.9098738
      },
      {
        "mid": "/m/0dxjn",
        "description": "Deep frying",
        "score": 0.9049985,
        "topicality": 0.9049985
      },
      {
        "mid": "/m/0f99t",
        "description": "Tonkatsu",
        "score": 0.901048,
        "topicality": 0.901048
      },
      {
        "mid": "/m/0krfg",
        "description": "Meal",
        "score": 0.81980187,
        "topicality": 0.81980187
      },
      {
        "mid": "/m/04q6ng",
        "description": "Comfort food",
        "score": 0.8160322,
        "topicality": 0.8160322
      }
    ]
  }

Results of judging food photos with Cloud Vision API

Oysters, sushi, and hamburgers are not only food products, but also their types. It can be determined that the fried shrimp is fried food, but it seems that it cannot be determined that it is "fried shrimp". I tried it with the photos not shown in this article, but basically it seemed to be able to identify the type of food. Although the genre of fried food is easy to understand, it seems difficult to identify the type of photo with the same conditions as fried shrimp, where the elements of shrimp are difficult to understand on the image.

Finally

I tried label detection this time, but it seems that the Cloud Vision API can also detect the tint of the image. If you can grasp the tendency of the color of food photos, you may be able to understand what kind of color looks delicious. The image recognition API itself is also provided by other than Google, so I think that it is necessary to try that as well in the future.

[^ 1]: The act of making what you see hungry by uploading a picture of a delicious meal at midnight etc.

Try to determine food photos using Google Cloud Vision API