Image collection using Google Custom Search API

From now on, I would like to start studying Deep Learning seriously. Before that, we have to think about where to get the large amount of data needed for learning.

One of the methods I came up with is to collect images from the Twitter image bot. The other uses image search engines such as Google and Bing. It will take some time to find a good bot, so let's use the search API first. The Bing search API seems to end at the end of this year, so I'll choose Google this time.

Search engine settings

Create a new search engine with Custom Search. The settings are as follows.

キャプチャ.JPG ① Turn on image search ② Select "Search the entire web" ③ Delete the search site ④ Get ID from search engine ID The ID is of the type "number string: alphabetic string". It seems that the number string is the user ID and the alphabetic string is the engine ID.

Acquired Custom Search API

Enable the Custom Search API in the Google Cloud Platform Console (https://console.cloud.google.com/apis) and create an API key with your credentials. 222.JPG

Creating a Python Script

https://www.googleapis.com/customsearch/v1?key=[API_KEY]&cx=[CUSTOM_SEARCH_ENGINE]&q=[search_item]

You can search with. Add searchType = image to search for images, num = xx & start = yy is pagination for getting a large number of images. According to the Reference (https://developers.google.com/custom-search/json-api/v1/reference/cse/list?hl=ja), num is an integer from 1 to 10. In other words, you can search up to 10 at a time.

The script is based on tukiyo3's code.

get_image.py


#-*- coding:utf-8 -*-
#[email protected] 2016/11/21
import urllib.request
from urllib.parse import quote
import httplib2
import json 
import os

API_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
CUSTOM_SEARCH_ENGINE = "12345648954985648968:xxxxxxxxx"

def getImageUrl(search_item, total_num):
 img_list = []
 i = 0
 while i < total_num:
  query_img = "https://www.googleapis.com/customsearch/v1?key=" + API_KEY + "&cx=" + CUSTOM_SEARCH_ENGINE + "&num=" + str(10 if(total_num-i)>10 else (total_num-i)) + "&start=" + str(i+1) + "&q=" + quote(search_item) + "&searchType=image"
  print (query_img)
  res = urllib.request.urlopen(query_img)
  data = json.loads(res.read().decode('utf-8'))
  for j in range(len(data["items"])):
   img_list.append(data["items"][j]["link"])
  i=i+10
 return img_list
 
def getImage(search_item, img_list):
 opener = urllib.request.build_opener()
 http = httplib2.Http(".cache")
 for i in range(len(img_list)):
  try:
   fn, ext = os.path.splitext(img_list[i])
   print(img_list[i])
   response, content = http.request(img_list[i])
   with open(search_item+str(i)+ext, 'wb') as f:
    f.write(content)
  except:
   print("failed to download images.")
   continue

if __name__ == "__main__":
 img_list = getImageUrl("dog", 5)
 print(img_list)
 getImage("dog", img_list)

The code isn't that pretty, but I'll share it. I also put it in Github.

end

The Google Custom Search API is convenient, but the free usage tier is 100 requests / day, and I used 70% just for script testing. When you actually use it, you have to pay for it. After all, I want to collect images for free, so I will try some other methods (such as Twitter).

2016/11/24 update I found a good way to collect images! ↓ http://d.hatena.ne.jp/shi3z/20160309/1457480722 The python script in the above link has been modified to support python3. → GitHub

Recommended Posts

Image collection using Google Custom Search API
Scraping google search (image)
Collect machine learning training image data on your own (Google Custom Search API Pikachu)
Collect large numbers of images using Bing's image search API
Image collection by calling Bing Image Search API v5 from Python
[Python3] Google translate google translate without using api
Create an application that just searches using the Google Custom Search API with Python 3.3.1 in Bottle
Category estimation using docomo's image recognition API
Speech transcription procedure using Google Cloud Speech API
Save dog images from Google image search
Image segment using Oxford_iiit_pet on Google Colab
Creating Google Spreadsheet using Python / Google Data API
FX data collection using OANDA REST API
Record custom events using the Shotgun API
[Python] Download original images from Google Image Search
Get image URL using Flickr API in Python
Image collection method
I tried using the Google Cloud Vision API
Get Google Image Search images in original size
Google App Engine Datastore and Search API integration
[Google Cloud Platform] Use Google Cloud API using API Client Library
A story about a Python beginner trying to get Google search results using the API
Upload JPG file using Google Drive API in Python
How to analyze with Google Colaboratory using Kaggle API
Speech transcription procedure using Python and Google Cloud Speech API
Using the National Diet Library Search API in Python
Speech file recognition by Google Speech API v2 using Python
Image segmentation using U-net
Search Twitter using Python
I tried to search videos using Youtube Data API (beginner)
Image recognition with API from zero knowledge using AutoML Vision
Try a similar search for Image Search using the Python SDK [Search]
[Python scraping] I tried google search top10 using Beautifulsoup & selenium
Try to determine food photos using Google Cloud Vision API
Let's publish the super resolution API using Google Cloud Platform
The story of creating a database using the Google Analytics API
Output product information to csv using Rakuten product search API [Python]
Aggregate and analyze product prices using Rakuten Product Search API [Python]
Play with YouTube Data API v3 using Google API Python Client