Make objects recognized with IBM Watson Developer Cloud Visual Recognition

Detecting an object in an image is called object recognition. This time, we will use Visual Recognition, one of the services provided by IBM Watson Developer Cloud, to perform object recognition.

Get username and password for API

You need to get a username and password to use the Visual Recognition web API.

Create an application from the IBM Bluemix admin screen and add Visual Recognition by adding a service to that application.

After adding, click "View Credentials" for that service and you will see your username and password.

スクリーンショット 2015-04-12 22.46.03.png

Get the label

The result of object recognition is a set of label and label score. Get the label used at that time.


#! /usr/bin/env python
# -*- coding: utf-8 -*-
"""Get label information with Visual Recognition in IBM Watson Developer Cloud
"""
import sys
import json
import requests
from pit import Pit

setting = Pit.get('iwdcat',
                  {'require': {'username': '',
                               'password': '',
                               }})

auth_token = setting['username'], setting['password']
url = 'https://gateway.watsonplatform.net/visual-recognition-beta/api/v1/tag/labels'
res = requests.get(url, auth=auth_token, headers={'content-type': 'application/json'})
if res.status_code == requests.codes.ok:
    labels = json.loads(res.text)
    print('label groups({}): {}'.format(len(labels['label_groups']), labels['label_groups']))
    print()
    print('labels({}): {}'.format(len(labels['labels']), labels['labels']))
else:  # error
    print('stauts_code: {} (reason: {})'.format(res.status_code, res.reason))
    sys.exit(1)

The result will be JSON returned. label_groups is a list of label groups and labels is a list of labels.

Analyze the image

The object recognition API requires sending images in multiple parts. The image may be png, jpg or zipped. The following is an example of sending one png image.


#! /usr/bin/env python
# -*- coding: utf-8 -*-
"""Perform object recognition with Visual Recognition of IBM Watson Developer Cloud
"""
import os
import sys
import json
import requests
from pit import Pit

setting = Pit.get('iwdcat',
                  {'require': {'username': '',
                               'password': '',
                               }})

auth_token = setting['username'], setting['password']
url = 'https://gateway.watsonplatform.net/visual-recognition-beta/api/v1/tag/recognize'

filepath = 'var/images/first/2015-04-12-11.47.01.png'  # path to image file
filename = os.path.basename(filepath)

res = requests.post(
    url, auth=auth_token,
    files={
        'imgFile': (filename, open(filepath, 'rb')),
        }
    )
if res.status_code == requests.codes.ok:
    data = json.loads(res.text)
    for img in data['images']:
        print('{} - {}'.format(img['image_id'], img['image_name']))
        for label in img['labels']:
            print('    {:30}: {}'.format(label['label_name'], label['label_score']))

else:  # error
    print('stauts_code: {} (reason: {})'.format(res.status_code, res.reason))
    sys.exit(1)

Analyze one image. The output result is as follows.


$ python analyze_image.py
0 - 2015-04-12-11.47.01.png
    Outdoors                      : 0.714211
    Nature Scene                  : 0.671271
    Winter Scene                  : 0.669832
    Vertebrate                    : 0.635903
    Boat                          : 0.61398
    Animal                        : 0.610709
    Water Vehicle                 : 0.607173
    Placental Mammal              : 0.580503
    Snow Scene                    : 0.571422
    Fabric                        : 0.563129
    Gray                          : 0.56078
    Water Sport                   : 0.555034
    Person                        : 0.533461
    Mammal                        : 0.515725
    Surface Water Sport           : 0.511447

The returned data is as follows.


{'images': [{'image_id': '0', 'labels': [{'label_score': '0.714211', 'label_name': 'Outdoors'}, {'label_score': '0.671271', 'label_name': 'Nature Scene'}, {'label_score': '0.669832', 'label_name': 'Winter Scene'}, {'label_score': '0.635903', 'label_name': 'Vertebrate'}, {'label_score': '0.61398', 'label_name': 'Boat'}, {'label_score': '0.610709', 'label_name': 'Animal'}, {'label_score': '0.607173', 'label_name': 'Water Vehicle'}, {'label_score': '0.580503', 'label_name': 'Placental Mammal'}, {'label_score': '0.571422', 'label_name': 'Snow Scene'}, {'label_score': '0.563129', 'label_name': 'Fabric'}, {'label_score': '0.56078', 'label_name': 'Gray'}, {'label_score': '0.555034', 'label_name': 'Water Sport'}, {'label_score': '0.533461', 'label_name': 'Person'}, {'label_score': '0.515725', 'label_name': 'Mammal'}, {'label_score': '0.511447', 'label_name': 'Surface Water Sport'}], 'image_name': '2015-04-12-11.47.01.png'}]}

Send multiple images in one request and analyze them all at once

By increasing the number of files sent in multipart, multiple files can be analyzed with one request.

#! /usr/bin/env python
# -*- coding: utf-8 -*-
"""Perform object recognition with Visual Recognition of IBM Watson Developer Cloud

Include 3 files in 1 request
"""
import os
import sys
import json
import requests
from pit import Pit

setting = Pit.get('iwdcat',
                  {'require': {'username': '',
                               'password': '',
                               }})

auth_token = setting['username'], setting['password']
url = 'https://gateway.watsonplatform.net/visual-recognition-beta/api/v1/tag/recognize'

filepaths = [
    'var/images/first/2015-04-12-11.47.01.png',
    'var/images/first/2015-04-12-11.44.42.png',
    'var/images/first/2015-04-12-11.46.11.png',
    ]
files = dict((os.path.basename(filepath), (os.path.basename(filepath), open(filepath, 'rb'))) for filepath in filepaths)

res = requests.post(
    url, auth=auth_token,
    files=files,
    )

for key, (filename, fp) in files.items():
    fp.close()

if res.status_code == requests.codes.ok:
    data = json.loads(res.text)
    for img in data['images']:
        print('{} - {}'.format(img['image_id'], img['image_name']))
        for label in img['labels']:
            print('    {:30}: {}'.format(label['label_name'], label['label_score']))

else:  # error
    print('stauts_code: {} (reason: {})'.format(res.status_code, res.reason))
    sys.exit(1)

The returned JSON is a list of elements for the'images' key, which contains the elements for the number of images you entered. The execution result is as follows.


$ python analyze_image_multi.py
0 - 2015-04-12-11.44.42.png
    Gray                          : 0.735805
    Winter Scene                  : 0.7123
    Nature Scene                  : 0.674336
    Water Scene                   : 0.668881
    Outdoors                      : 0.658805
    Natural Activity              : 0.643865
    Vertebrate                    : 0.603751
    Climbing                      : 0.566247
    Animal                        : 0.537788
    Mammal                        : 0.518001
1 - 2015-04-12-11.46.11.png
    Gray                          : 0.719819
    Vertebrate                    : 0.692607
    Animal                        : 0.690942
    Winter Scene                  : 0.683918
    Mammal                        : 0.669149
    Snow Scene                    : 0.664266
    Placental Mammal              : 0.663866
    Outdoors                      : 0.66335
    Nature Scene                  : 0.656991
    Climbing                      : 0.645557
    Person                        : 0.557965
    Person View                   : 0.528335
2 - 2015-04-12-11.47.01.png
    Outdoors                      : 0.714211
    Nature Scene                  : 0.671271
    Winter Scene                  : 0.669832
    Vertebrate                    : 0.635903
    Boat                          : 0.61398
    Animal                        : 0.610709
    Water Vehicle                 : 0.607173
    Placental Mammal              : 0.580503
    Snow Scene                    : 0.571422
    Fabric                        : 0.563129
    Gray                          : 0.56078
    Water Sport                   : 0.555034
    Person                        : 0.533461
    Mammal                        : 0.515725
    Surface Water Sport           : 0.511447

Even if I included 30 files in one request, they processed normally. Maybe I can go more.

Script I wrote

Cut image

https://gist.github.com/TakesxiSximada/ca1b5aac871ec7167ff9

Recognize the object and save the result in a json file

https://gist.github.com/TakesxiSximada/996dbbfae5fa3bbab61d

Output the resulting json file to csv

https://gist.github.com/TakesxiSximada/d451221dc2a280b7e35d

Caution

This time I'm using a Python third party package called pit to get the username and password from the config file. However, as of April 12, 2015, pit does not support Python3, so even if you pip install pit with Python3 normally, an error will occur. Fork the pit repository and there is a branch that supports Python3. Please install and use pit from there.

https://github.com/TakesxiSximada/pit/archive/fix/sximada/py3k.zip https://github.com/TakesxiSximada/pit/tree/fix/sximada/py3k

... or rather, put out a pull request without skipping> I

Recommended Posts

Make objects recognized with IBM Watson Developer Cloud Visual Recognition
Streaming speech recognition with Google Cloud Speech API