Images were collected using the bing Image Search API for Deep Learning in TensorFlow. We are making it possible to collect a large number of images. I am running in the following environment.
type | version | Remarks |
---|---|---|
OS | Windows10 | Pro 64 bit |
Anaconda | Anaconda 4.4.0 | I am creating a virtual environment on Anaconda |
Python | 3.5 | Because it coexists with TensorFlow, 3.5 |
requests library | 2.18.4 | I'm using it to call the bing API * I used it due to lack of skill and knowledge of the author, but it should be possible even if I do not use it |
API | Bing Image Search API v5 | I'm using v5 because I haven't released v7 yet. I have not compared it with Google. |
First, run the program without any parameters.
python get_images_via_bing_20170822.py
The progress is displayed in this way. The two lines "Get 3 images from offset X" are displayed because the Bing Image Search API is called twice. This is because the API specifications allow you to retrieve up to 150 search results with a single call, so you can call multiple times so that you can retrieve a large number of results.
Cute cats have been collected in the folder.
Try to execute the program with parameters. Get 80 results once with the search term "cat".
python get_images_via_bing_20170822.py --query "cat" --call_count 1 --image_count 80
"Not Saved file type:" does not acquire image types other than jpeg and png, and outputs only to that effect in the log.
It collects pounding: smile_cat:
Click the Free Account link from the Azure Cognitive Service page (https://azure.microsoft.com/en-us/services/cognitive-services/).
Click the "Get Started for Free" link
I had an account, so I went to sign up. After this, I put in my personal information (memory is ambiguous).
A Welcome email will be sent to your account email address, so click the link and then go to the portal screen. Click the "+" icon and click "create" in the Bing Search APIs.
Created by selecting the free version of Bing Search APIs.
Click "Show access keys" in Manage keys and make a note of the keys. Two are displayed, but it seems that either one is fine (I am using the first one and the second one is unconfirmed).
I installed it because there were no requests in my environment. Ignore those who have already installed it. Also, as mentioned at the beginning, I think it can be achieved without it. Install it in Anaconda's virtual environment "tensorflow121". Start Terminal from Anaconda Navigator.
Both installed with pip. I haven't done anything particularly difficult.
pip install requests
I made it like this.
-** Up to 150 images for the search term can be acquired with one Bing API call ** (API limit) -** Call multiple times with one program execution ** and get a lot --Save the retrieved file locally --Can be obtained from the middle of the search results
--image_count: Number of image files to get with one Bing API call --call_count: Number of Bing API calls in one program execution (image_count x call_count = total number of acquired images) --off_set_start: Where to call during mid-call --output_path: image output directory --query: search term
Please rewrite the "Please enter your Subscription Key here" part of the code below with the Subscription Key created in the Azure portal.
import argparse, requests, urllib.parse, os, io, imghdr
#Basic model parameters
FLAGS = None
#end point
kEndPoint = 'https://api.cognitive.microsoft.com/bing/v5.0/images/search'
#http request header
kHeaders = { 'Ocp-Apim-Subscription-Key': 'Enter your Subscription Key here' }
#Get a list of image URLs for search results
def GetImageUrls():
print('Start getting %d images from offset %d' % (FLAGS.image_count, FLAGS.off_set_start ))
image_list = []
#Since the bing API limit is up to 150, loop and call_Get count times
for step in range(FLAGS.call_count):
#Get offset
off_set = FLAGS.off_set_start + step * FLAGS.image_count
#http request parameters
params = urllib.parse.urlencode({
'count': FLAGS.image_count,
'offset': off_set,
'imageType':'Photo',
'q': FLAGS.query,
})
# 'mkt': 'ja-JP',
#bing API call
res = requests.get(kEndPoint, headers=kHeaders, params=params)
if step == 0:
print('Total Estimated Mathes: %s' % res.json()['totalEstimatedMatches'])
vals = res.json()['value']
print('Get %d images from offset %d' % (len(vals), off_set))
#Store the resulting image URL
for j in range(len(vals)):
image_list.append(vals[j]["contentUrl"])
return image_list
#Get an image and save it locally
def fetch_images(image_list):
print('total images:%d' % len(image_list))
for i in range(len(image_list)):
#Progress output for every 100 cases
if i % 100 == 0:
print('Start getting and saving each image:%d' % i)
try:
#Image acquisition
response = requests.get(image_list[i], timeout=5 )
#Since an error may occur depending on the acquisition source, just log and continue
except requests.exceptions.RequestException:
print('%d:Error occurs :%s' % (i, image_list[i]))
continue
#Filter by image type
with io.BytesIO(response.content) as fh:
image_type = imghdr.what(fh)
if imghdr.what(fh) != 'jpeg' and imghdr.what(fh) != 'png':
print('Not saved file type:%s' % imghdr.what(fh))
continue
#Save image locally
with open('{}/image.{}.{}'.format(FLAGS.output_path, str(i), imghdr.what(fh)), 'wb') as f:
f.write(response.content)
#Pass if run directly(Imported and does not pass at runtime)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
'--image_count',
type=int,
default=3,
help='collection number of image files per api call.'
)
parser.add_argument(
'--call_count',
type=int,
default=2,
help='number of api calls.'
)
parser.add_argument(
'--off_set_start',
type=int,
default=0,
help='offset start.'
)
parser.add_argument(
'--output_path',
type=str,
default='./images',
help='image files output directry.'
)
parser.add_argument(
'--query',
type=str,
default='Cat',
help='search query.'
)
#Parameter acquisition and execution
FLAGS, unparsed = parser.parse_known_args()
fetch_images(GetImageUrls())
I didn't have enough time to do the following: I'm giving up because it seems to take time for Python beginners.
--A function that automatically ends when the total number of search results is reached: You can code normally, but it is troublesome ... --Do not use the requests library: I wanted to implement it only with a standard library like urllib, but I couldn't. From a non-open SAP shop, non-standard libraries are kind of unpleasant (a matter of familiarity?). ――Because the subscription is a free trial version, I could only get up to about 1000 results even for search terms such as "cat" that seem to have many search results. By all means, the total value is low ... Hands-on: raised_hand:
This time, I will post the referenced site as a link.
site | comment |
---|---|
Official test tool | I made the code while looking here |
API official document | I referred to the parameters that could be used By the way, where did OData go? |
Automatically collect images using the Bing Image Search API | I found a lot about the Bing API |
The story of migrating from Bing Search API v2 to v5 | I especially referred to the logic of the image acquisition part |
bing_image_getter.py | First I changed it based on this source |
Recommended Posts