How to deal with UnicodeDecodeError when executing google image download

0. Introduction

Google image download is very convenient for collecting images for learning for image recognition with Tensorflow etc.

The google image download that can be obtained with pip cannot collect images normally at this time (2020/06), so I got the modified version of google image download. However, when I executed the obtained python file, I got a UnicodeDecodeError, so I will leave a memorandum of how to deal with it.

By the way, I won't mention it in this content, but you need a separate chrome driver to collect 101 or more images with google imade download.

1. Environment

macOS Catalina 10.15.3 Python 3.5.3 pip 20.1.1

2. Install with pip

You can install google image download with pip.

pip install google_images_download

However, even if it is executed at this time (2020/06), the image cannot be collected normally.

~ $ ./google_images_download/google_images_download.py --keywords "cat"

Item no.: 1 --> Item name = cat
Evaluating...
Starting Download...


Unfortunately all 100 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!

Errors: 0


Everything downloaded!
Total errors: 0
Total time taken: 1.4127511978149414 Seconds

3. Fixed version installation

The modified version of google image download is released below, so get it with git clone. https://github.com/Joeclinton1/google-images-download/tree/patch-1

When I executed the obtained python file, I got the following error.

~ $ python ./gid-joeclinton/google_images_download/google_images_download.py -k cat

Item no.: 1 --> Item name = cat
Evaluating...
Starting Download...
Traceback (most recent call last):
  File "./gid-joeclinton/google_images_download/google_images_download.py", line 1019, in <module>
    main()
  File "./gid-joeclinton/google_images_download/google_images_download.py", line 1008, in main
    paths,errors = response.download(arguments)  #wrapping response in a variable just for consistency
  File "./gid-joeclinton/google_images_download/google_images_download.py", line 844, in download
    paths, errors = self.download_executor(arguments)
  File "./gid-joeclinton/google_images_download/google_images_download.py", line 962, in download_executor
    items,errorCount,abs_path = self._get_all_items(raw_html,main_directory,dir_name,limit,arguments)    #get all image items and download images
  File "./gid-joeclinton/google_images_download/google_images_download.py", line 765, in _get_all_items
    image_objects = self._get_image_objects(page)
  File "./gid-joeclinton/google_images_download/google_images_download.py", line 754, in _get_image_objects
    object_decode = bytes(object_raw, "utf-8").decode("unicode_escape")
UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 123085: \ at end of string

4. UnicodeDecodeError workaround

Fixed google_images_download.py. Added "ignore" to the error argument of bytes.decode ().

   # Getting all links with the help of '_images_get_next_image'
    def _get_image_objects(self,s):
        start_line = s.find("AF_initDataCallback({key: \\'ds:1\\'") - 10
        start_object = s.find('[', start_line + 1)
        end_object = s.find('</script>', start_object + 1) - 4
        object_raw = str(s[start_object:end_object])
        object_decode = bytes(object_raw, "utf-8").decode("unicode_escape","ignore")
        image_objects = json.loads(object_decode)[31][0][12][2]
        image_objects = [x for x in image_objects if x[0]==1]
        return image_objects

There are some images that cannot be acquired due to URLError, etc., but now you can collect images normally.

$ python ./gid-joeclinton/google_images_download/google_images_download.py -k cat

Item no.: 1 --> Item name = cat
Evaluating...
Starting Download...
Completed Image ====> 1.XXXX.png
Completed Image ====> 2.XXXX.jpg

~abridgement~

Unfortunately all 100 could not be downloaded because some images were not downloadable. 65 is all we got for this search filter!

Errors: 35


Everything downloaded!
Total errors: 35
Total time taken: 173.5407509803772 Seconds

5. Conclusion

I just ignored the error that appeared, but since I was able to collect the image that I wanted, I stopped pursuing the cause. When I have time, I would like to investigate the cause and take appropriate measures.

Recommended Posts

How to deal with UnicodeDecodeError when executing google image download
How to deal with errors when hitting pip ②
How to deal with SessionNotCreatedException when using Selenium
How to deal with OAuth2 error when using Google APIs from Python
[AWS] How to deal with WordPress "An error occurred when cropping an image."
How to deal with Executing transaction: failed in Anaconda
How to deal with imbalanced data
How to deal with imbalanced data
[Linux] How to deal with garbled characters when viewing files
How to deal with DistributionNotFound errors
How to deal with errors when installing whitenoise and deploying to Heroku
How to deal with errors when installing Python and pip with choco
How to deal with enum compatibility errors
How to search Google Drive with Google Colaboratory
How to download youtube videos with youtube-dl
[Python] How to deal with module errors
How to deal with SSL error when connecting to S3 with boto of Python
How to deal with memory leaks in matplotlib.pyplot
can't pickle annoy. How to deal with Annoy objects
How to crop an image with Python + OpenCV
How to deal with run-time errors in subprocess.call
How to deal with module'tensorflow' has no attribute'〇〇'
How to display in the entire window when setting the background image with tkinter
How to deal with "^ [[A ^ [[B ^ [[C ^ [[D"] when you press the arrow keys when executing python on mac
[Ansible] How to use SSH password authentication when executing ansible
How to deal with pyenv initialization failure in fish 3.1.0
[Python] How to specify the download location with youtube-dl
How to load files in Google Drive with Google Colaboratory
How to access with cache when reading_json in pandas
[Python] How to deal with pandas read_html read error
How to analyze with Google Colaboratory using Kaggle API
How to deal with "Type Error: No matching signature found" error when using pandas fillna
Image download with Flickr API
How to handle static files when deploying to production with Django
How to extract any appointment in Google Calendar with Python
How to not load images when using PhantomJS with Selenium
[Ev3dev] How to display bmp image on LCD with python
How to update with SQLAlchemy?
How to resolve CSRF Protection when using AngularJS with Django
How to cast with Theano
[AWS] How to deal with "Invalid codepoint" error in CloudSearch
A story about how to deal with the CORS problem
For beginners, how to deal with common errors in keras
How to write to update Datastore to async with Google Apps Engine
How to separate strings with','
How to deal with the terminal getting into the pipenv environment without permission when using pipenv with vscode
How to RDP with Fedora31
2 ways to deal with SessionNotCreatedException
How to Delete with SQLAlchemy?
How to output additional information when logging with python's logging module
How to use Google Colaboratory
How to deal with the problem that the current directory moves when Python is executed from Atom
How to deal with the error "Failed to load module" canberra-gtk-module "that appears when you run OpenCV
How to deal with python installation error in pyenv (BUILD FAILED)
How to not escape Japanese when dealing with json in python
How to deal with "You have multiple authentication backends configured ..." (Django)
How to display formulas in latex when using sympy (> = 1.4) in Google Colaboratory
How to install pandas on EC2 (How to deal with MemoryError and PermissionError)
How to connect to Cloud Firestore from Google Cloud Functions with python code
Download Google logo → Convert to text with OCR → Display on HTML
[Icrawler] How to download images even after changing Google specifications (after March 2023)