This article is the 12th day article of Fujitsu Systems Web Technology Advent Calendar. (Promise) The content of this article is my own opinion and does not represent the organization to which I belong.
This article summarizes the minimum steps to use Google's image recognition API "Cloud Vision API". Finally, I am trying to judge the food photo.
My personal hobby is Mesitello [^ 1], and I post food photos on the LINE timeline for the purpose of making me think "I'm hungry". I was wondering if I could do something with the image recognition API to take more delicious pictures and improve the quality of Mesitello, so this time I started by trying the Google Cloud Vision API.
First, register to use Google Cloud Platform to use the Cloud Vision API.
Click "Start for free" on the following site to start the registration procedure. Available with your Google account.
Cloud Computing Service | Google Cloud
When registration is complete, you will see a screen like this and the default project has been created. You can create a new project from the red frame. The API is also available in the default project, but here I created a "Meshitero" project.
If you enter "Viso in API" in the search form, the Cloud Vision API will be hit, so click it. Select "Enable" on the screen after the transition.
Digression: By the way, if you select "Try this API", you can try the demo on the screen.
When enabled, you will see a screen like this.
This time we will call it in Python, so create the credentials. Click "Create Credentials" on the Cloud Vision API screen to move to the "APIs and Services" credentials page. If you select "API Key" from the "Create Credentials" pull-down, an API key will be issued.
Restrict keys to avoid unauthorized use. Clicking "Restrict Keys" on the above screen will take you to a screen where you can restrict keys. Set restrictions as needed. This time, we restricted the usage by IP address, and once restricted the APIs that can be used to the Cloud Vision API.
This completes the settings for using the API.
To call the API in Python I installed Anaconda referring to the following.
Installation of Anaconda (Windows version)
Create the source by referring to the contents of Reference of API.
It seems that the image data needs to be passed as a base64 encoded string. Also, specify the type of image analysis and the maximum number of results to be returned in "features". This time, specify label detection (LABEL_DETECTION).
When passing it to the API, it will be in Json format, so it will also be encoded in Json format.
img_request = []
with open(filename, 'rb') as f:
ctxt = b64encode(f.read()).decode()
img_requests.append({
'image': {'content': ctxt},
'features': [{
'type': 'LABEL_DETECTION',
'maxResults': 10
}]
})
request_data = json.dumps({"requests": img_request }).encode()
Specify the request data and the API key output in the previous stage, and send the API request.
API_URL = 'https://vision.googleapis.com/v1/images:annotate'
response = requests.post(API_URL,
data=request_data ,
params={'key': api_key},
headers={'Content-Type': 'application/json'})
Outputs the data returned from the API.
for resp in enumerate(response.json()['responses']):
print (json.dumps(resp, indent=2))
Pass the API key issued by GCP and the image path to the created source and execute it. Try some food images.
$ Python Meshitero.py [API key] [Image path]
Meshitero.py
from base64 import b64encode
from sys import argv
import json
import requests
API_URL = 'https://vision.googleapis.com/v1/images:annotate'
if __name__ == '__main__':
api_key = argv[1]
filename = argv[2]
img_request = []
with open(filename, 'rb') as f:
ctxt = b64encode(f.read()).decode()
img_request.append({
'image': {'content': ctxt},
'features': [{
'type': 'LABEL_DETECTION',
'maxResults': 10
}]
})
request_data = json.dumps({"requests": img_request }).encode()
response = requests.post(API_URL,
data=request_data,
params={'key': api_key},
headers={'Content-Type': 'application/json'})
if response.status_code != 200 or response.json().get('error'):
print(response.text)
else:
for resp in enumerate(response.json()['responses']):
print (json.dumps(resp, indent=2))
{
"labelAnnotations": [
{
"mid": "/m/0_cp5",
"description": "Oyster",
"score": 0.9910632,
"topicality": 0.9910632
},
{
"mid": "/m/02wbm",
"description": "Food",
"score": 0.9903261,
"topicality": 0.9903261
},
{
"mid": "/m/06nwz",
"description": "Seafood",
"score": 0.9609892,
"topicality": 0.9609892
},
{
"mid": "/m/01cqy9",
"description": "Bivalve",
"score": 0.9138548,
"topicality": 0.9138548
},
{
"mid": "/m/02q08p0",
"description": "Dish",
"score": 0.8472096,
"topicality": 0.8472096
},
{
"mid": "/m/01ykh",
"description": "Cuisine",
"score": 0.811229,
"topicality": 0.811229
},
{
"mid": "/m/07xgrh",
"description": "Ingredient",
"score": 0.8011539,
"topicality": 0.8011539
},
{
"mid": "/m/088kg2",
"description": "Oysters rockefeller",
"score": 0.70525026,
"topicality": 0.70525026
},
{
"mid": "/m/0fbdv",
"description": "Shellfish",
"score": 0.6510715,
"topicality": 0.6510715
},
{
"mid": "/m/0ffhy",
"description": "Clam",
"score": 0.6364975,
"topicality": 0.6364975
}
]
}
{
"labelAnnotations": [
{
"mid": "/m/02q08p0",
"description": "Dish",
"score": 0.9934035,
"topicality": 0.9934035
},
{
"mid": "/m/01ykh",
"description": "Cuisine",
"score": 0.9864208,
"topicality": 0.9864208
},
{
"mid": "/m/02wbm",
"description": "Food",
"score": 0.97343695,
"topicality": 0.97343695
},
{
"mid": "/m/048wsd",
"description": "Gimbap",
"score": 0.96859926,
"topicality": 0.96859926
},
{
"mid": "/m/07030",
"description": "Sushi",
"score": 0.9650486,
"topicality": 0.9650486
},
{
"mid": "/m/0cjyd",
"description": "Sashimi",
"score": 0.9185767,
"topicality": 0.9185767
},
{
"mid": "/m/04q6ng",
"description": "Comfort food",
"score": 0.8544887,
"topicality": 0.8544887
},
{
"mid": "/m/07xgrh",
"description": "Ingredient",
"score": 0.8450334,
"topicality": 0.8450334
},
{
"mid": "/m/05jrv",
"description": "Nori",
"score": 0.8431285,
"topicality": 0.8431285
},
{
"mid": "/m/027lnr6",
"description": "Sakana",
"score": 0.8388547,
"topicality": 0.8388547
}
]
}
{
"labelAnnotations": [
{
"mid": "/m/02q08p0",
"description": "Dish",
"score": 0.9934035,
"topicality": 0.9934035
},
{
"mid": "/m/02wbm",
"description": "Food",
"score": 0.9903261,
"topicality": 0.9903261
},
{
"mid": "/m/01ykh",
"description": "Cuisine",
"score": 0.9864208,
"topicality": 0.9864208
},
{
"mid": "/m/0h55b",
"description": "Junk food",
"score": 0.9851551,
"topicality": 0.9851551
},
{
"mid": "/m/01_bhs",
"description": "Fast food",
"score": 0.97022384,
"topicality": 0.97022384
},
{
"mid": "/m/0cdn1",
"description": "Hamburger",
"score": 0.9571771,
"topicality": 0.9571771
},
{
"mid": "/m/0cc7bks",
"description": "Buffalo burger",
"score": 0.94575346,
"topicality": 0.94575346
},
{
"mid": "/m/03f476",
"description": "Veggie burger",
"score": 0.9283731,
"topicality": 0.9283731
},
{
"mid": "/m/0bp3f6m",
"description": "Fried food",
"score": 0.9257971,
"topicality": 0.9257971
},
{
"mid": "/m/02y6n",
"description": "French fries",
"score": 0.92217153,
"topicality": 0.92217153
}
]
}
{
"labelAnnotations": [
{
"mid": "/m/02q08p0",
"description": "Dish",
"score": 0.9934035,
"topicality": 0.9934035
},
{
"mid": "/m/02wbm",
"description": "Food",
"score": 0.9903261,
"topicality": 0.9903261
},
{
"mid": "/m/01ykh",
"description": "Cuisine",
"score": 0.9864208,
"topicality": 0.9864208
},
{
"mid": "/m/0g9vs81",
"description": "Steamed rice",
"score": 0.9271187,
"topicality": 0.9271187
},
{
"mid": "/m/07xgrh",
"description": "Ingredient",
"score": 0.9207317,
"topicality": 0.9207317
},
{
"mid": "/m/0bp3f6m",
"description": "Fried food",
"score": 0.9098738,
"topicality": 0.9098738
},
{
"mid": "/m/0dxjn",
"description": "Deep frying",
"score": 0.9049985,
"topicality": 0.9049985
},
{
"mid": "/m/0f99t",
"description": "Tonkatsu",
"score": 0.901048,
"topicality": 0.901048
},
{
"mid": "/m/0krfg",
"description": "Meal",
"score": 0.81980187,
"topicality": 0.81980187
},
{
"mid": "/m/04q6ng",
"description": "Comfort food",
"score": 0.8160322,
"topicality": 0.8160322
}
]
}
Oysters, sushi, and hamburgers are not only food products, but also their types. It can be determined that the fried shrimp is fried food, but it seems that it cannot be determined that it is "fried shrimp". I tried it with the photos not shown in this article, but basically it seemed to be able to identify the type of food. Although the genre of fried food is easy to understand, it seems difficult to identify the type of photo with the same conditions as fried shrimp, where the elements of shrimp are difficult to understand on the image.
I tried label detection this time, but it seems that the Cloud Vision API can also detect the tint of the image. If you can grasp the tendency of the color of food photos, you may be able to understand what kind of color looks delicious. The image recognition API itself is also provided by other than Google, so I think that it is necessary to try that as well in the future.
[^ 1]: The act of making what you see hungry by uploading a picture of a delicious meal at midnight etc.
Recommended Posts