In order to create an image recognition program, we first need a large amount of image data to train. Therefore, create a program that automatically collects image data.
Reference materials: [Image judgment AI application development / Part 1] Introduction to image judgment AI application development made with TensorFlow / Python / Flask
Flickr Image collection site https://www.flickr.com/
API https://www.flickr.com/services/api/
https://qiita.com/Saayaman/items/a3066697a108a7e7fc39 See "1. api registration" in. It is necessary to obtain the key and private key (?).
https://stuvel.eu/flickrapi
pip install flickrapi
Save the searched image in the specified folder. Create a program called download.py
.
For example
python download.py monkey
If you enter, the images searched by monkey
will be stored in a folder called monkey
.
from flickrapi import FlickrAPI
from urllib.request import urlretrieve
from pprint import pprint
import os, time, sys
#API key information
key = "<<String>>"
secret = "<<String>>"
wait_time = 1
Store the acquired key and private key. As the name suggests, wait_time is the wait time. If you access Flickr frequently, access will be denied, so wait 1 second for each image you get.
#Specify save folder
animalname = sys.argv[1]
savedir = "./" + animalname
flickr = FlickrAPI(key, secret, format='parsed-json')
result = flickr.photos.search(
text = animalname,
per_page = 400,
media = 'photos',
sort = 'relative',
safe_search = 1,
extras = 'url_q, licence'
)
Specify the key, private key, and format obtained in "Preparation" in FlickrAPI
(this time in json format), create aflickr
instance, and call the flickr.photos.search
method.
The result
contains the following data.
{'photos': {'page': 1,
'pages': 541,
'perpage': 400,
'photo': [{'farm': 66,
'height_q': 150,
'id': '49823614651',
'isfamily': 0,
'isfriend': 0,
'ispublic': 1,
'owner': '14136614@N03',
'secret': '888c8a381a',
'server': '65535',
'title': 'LEGO Minifigures Series 19 Rainbow Bear',
'url_q': 'https://live.staticflickr.com/65535/49823614651_888c8a381a_q.jpg',
'width_q': 150},
...
Download the image using the information contained in result
.
Get the URL of the image with photo ['url_q']
.
Download is executed by ʻurl retrieve`.
photos = result['photos']
for i, photo in enumerate(photos['photo']):
print(i)
url_q = photo['url_q']
filepath = savedir + '/' + photo['id'] + '.jpg'
if os.path.exists(filepath): continue
urlretrieve(url_q, filepath)
time.sleep(wait_time)
python download.py monkey
When you execute, it is saved in the monkey
folder as follows.
However, images that are not monkeys (such as a picture of a monkey or one that has nothing to do with a monkey) are also included, so it is necessary to remove them manually.
Recommended Posts