Introduction

In order to create an image recognition program, we first need a large amount of image data to train. Therefore, create a program that automatically collects image data.

Reference materials: [Image judgment AI application development / Part 1] Introduction to image judgment AI application development made with TensorFlow / Python / Flask

Flickr Image collection site https://www.flickr.com/

API https://www.flickr.com/services/api/

Preparation

https://qiita.com/Saayaman/items/a3066697a108a7e7fc39 See "1. api registration" in. It is necessary to obtain the key and private key (?).

Installation method

https://stuvel.eu/flickrapi

pip install flickrapi

Source code

Save the searched image in the specified folder. Create a program called download.py. For example

python download.py monkey

If you enter, the images searched by monkey will be stored in a folder called monkey.

import

from flickrapi import FlickrAPI
from urllib.request import urlretrieve
from pprint import pprint
import os, time, sys

API key information

#API key information
key = "<<String>>"
secret = "<<String>>"
wait_time = 1

Store the acquired key and private key. As the name suggests, wait_time is the wait time. If you access Flickr frequently, access will be denied, so wait 1 second for each image you get.

Image information acquisition

#Specify save folder
animalname = sys.argv[1]
savedir = "./" + animalname

flickr = FlickrAPI(key, secret, format='parsed-json')
result = flickr.photos.search(
    text = animalname,
    per_page = 400,
    media = 'photos',
    sort = 'relative',
    safe_search = 1,
    extras = 'url_q, licence'
)

Specify the key, private key, and format obtained in "Preparation" in FlickrAPI (this time in json format), create aflickr instance, and call the flickr.photos.search method. The result contains the following data.

{'photos': {'page': 1,
            'pages': 541,
            'perpage': 400,
            'photo': [{'farm': 66,
                       'height_q': 150,
                       'id': '49823614651',
                       'isfamily': 0,
                       'isfriend': 0,
                       'ispublic': 1,
                       'owner': '14136614@N03',
                       'secret': '888c8a381a',
                       'server': '65535',
                       'title': 'LEGO Minifigures Series 19 Rainbow Bear',
                       'url_q': 'https://live.staticflickr.com/65535/49823614651_888c8a381a_q.jpg',
                       'width_q': 150},
...

Image download

Download the image using the information contained in result. Get the URL of the image with photo ['url_q']. Download is executed by ʻurl retrieve`.

photos = result['photos']

for i, photo in enumerate(photos['photo']):
    print(i)
    url_q = photo['url_q']
    filepath = savedir + '/' + photo['id'] + '.jpg'
    if os.path.exists(filepath): continue
    urlretrieve(url_q, filepath)
    time.sleep(wait_time)

Verification

python download.py monkey

When you execute, it is saved in the monkey folder as follows. However, images that are not monkeys (such as a picture of a monkey or one that has nothing to do with a monkey) are also included, so it is necessary to remove them manually.

Image download with Flickr API