Google image download is very convenient for collecting images for learning for image recognition with Tensorflow etc.
The google image download that can be obtained with pip cannot collect images normally at this time (2020/06), so I got the modified version of google image download. However, when I executed the obtained python file, I got a UnicodeDecodeError, so I will leave a memorandum of how to deal with it.
By the way, I won't mention it in this content, but you need a separate chrome driver to collect 101 or more images with google imade download.
macOS Catalina 10.15.3 Python 3.5.3 pip 20.1.1
You can install google image download with pip.
pip install google_images_download
However, even if it is executed at this time (2020/06), the image cannot be collected normally.
~ $ ./google_images_download/google_images_download.py --keywords "cat"
Item no.: 1 --> Item name = cat
Evaluating...
Starting Download...
Unfortunately all 100 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!
Errors: 0
Everything downloaded!
Total errors: 0
Total time taken: 1.4127511978149414 Seconds
The modified version of google image download is released below, so get it with git clone. https://github.com/Joeclinton1/google-images-download/tree/patch-1
When I executed the obtained python file, I got the following error.
~ $ python ./gid-joeclinton/google_images_download/google_images_download.py -k cat
Item no.: 1 --> Item name = cat
Evaluating...
Starting Download...
Traceback (most recent call last):
File "./gid-joeclinton/google_images_download/google_images_download.py", line 1019, in <module>
main()
File "./gid-joeclinton/google_images_download/google_images_download.py", line 1008, in main
paths,errors = response.download(arguments) #wrapping response in a variable just for consistency
File "./gid-joeclinton/google_images_download/google_images_download.py", line 844, in download
paths, errors = self.download_executor(arguments)
File "./gid-joeclinton/google_images_download/google_images_download.py", line 962, in download_executor
items,errorCount,abs_path = self._get_all_items(raw_html,main_directory,dir_name,limit,arguments) #get all image items and download images
File "./gid-joeclinton/google_images_download/google_images_download.py", line 765, in _get_all_items
image_objects = self._get_image_objects(page)
File "./gid-joeclinton/google_images_download/google_images_download.py", line 754, in _get_image_objects
object_decode = bytes(object_raw, "utf-8").decode("unicode_escape")
UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 123085: \ at end of string
Fixed google_images_download.py. Added "ignore" to the error argument of bytes.decode ().
# Getting all links with the help of '_images_get_next_image'
def _get_image_objects(self,s):
start_line = s.find("AF_initDataCallback({key: \\'ds:1\\'") - 10
start_object = s.find('[', start_line + 1)
end_object = s.find('</script>', start_object + 1) - 4
object_raw = str(s[start_object:end_object])
object_decode = bytes(object_raw, "utf-8").decode("unicode_escape","ignore")
image_objects = json.loads(object_decode)[31][0][12][2]
image_objects = [x for x in image_objects if x[0]==1]
return image_objects
There are some images that cannot be acquired due to URLError, etc., but now you can collect images normally.
$ python ./gid-joeclinton/google_images_download/google_images_download.py -k cat
Item no.: 1 --> Item name = cat
Evaluating...
Starting Download...
Completed Image ====> 1.XXXX.png
Completed Image ====> 2.XXXX.jpg
~abridgement~
Unfortunately all 100 could not be downloaded because some images were not downloadable. 65 is all we got for this search filter!
Errors: 35
Everything downloaded!
Total errors: 35
Total time taken: 173.5407509803772 Seconds
I just ignored the error that appeared, but since I was able to collect the image that I wanted, I stopped pursuing the cause. When I have time, I would like to investigate the cause and take appropriate measures.
Recommended Posts