Serial number rename of scraped images

Introduction

I just renamed the post-processing of scraping posted in Scraping from google images with google-images-download. I'm not doing that difficult.

Source code


from google_images_download import google_images_download
import glob
import os
from PIL import Image

Add "from PIL import Image" to the previous one.

config = {
    "Records": [
        {
            "keywords": "Sho Hirano",
            "limit": 10,
            "no_numbering": True,
            "output_directort": "images",
            "image_directory": "Sho Hirano",
            "chromedriver": "C\\[path to chromedirver]\chromedriver\chromedriver.exe",
        }
    ]
}

response = google_images_download.googleimagesdownload()
for rc in config["Records"]:
    response.download(rc)

gifImgs = glob.glob("images" + os.sep + "*" + os.sep + "*.gif")
print(f"removing gif files: {len(gifImgs)} files")
_ = [os.remove(f) for f in gifImgs]

Up to this point, the same as last time. For this, insert the following code. Here, rename the image file using the enumerate function in the for statement.

files = glob.glob('./*.jpg')
for (i, old_name) in enumerate(files):
    new_name = ('./shohirano_{0:03d}.jpg'.format(i + 1))
    os.rename(old_name, new_name)
    print(old_name + " → " + new_name)

The contents of print are as follows.

C:\[path to image]\280full.jpg → C:\[path to dir]\shohirano_001.jpg
C:\[path to image]\2966d3c610bf42015a1f853410848b5c.jpg → C:\[path to dir]\shohirano_002.jpg
C:\[path to image]\300px-Hirano_Sho-p2.jpg → C:\[]\shohirano_003.jpg
C:\[path to image]\4f3340a3005c32ffcc64728b75b70792.jpg → C:\[path to dir]\shohirano_004.jpg
C:\[path to image]\5o0JZc.jpg → C:\[path to dir]\shohirano_005.jpg
C:\[path to image]\d8ea5b8f0f2ae32dbf5a62c00c8c3c3e.jpg → C:\[path to dir]\shohirano_006.jpg
C:\[path to image]\ed909d1bd55e96e5bab12881b350f28964e30aa3.jpg → C:\[path to dir]\shohirano_007.jpg
C:\[path to image]\MV5BODk2YzAyNGUtNTI0Yi00MDllLWFlNDUtNGJlMjAwMThmM2Q5XkEyXkFqcGdeQXVyNDQxNjcxNQ@@._V1_UY1200_CR565,0,630,1200_AL_.jpg → C:\[path to dir]\shohirano_008.jpg
C:\[path to image]\NEOBK-2298759.jpg → C:\[path to dir]\shohirano_009.jpg
C:\[path to image]\Sho_Hirano-p2.jpg → C:\[path to dir]\shohirano_010.jpg

Results in file

スクリーンショット (24).png In this way, you can rename to serial numbers. By the way, I chose Sho Hirano, but it's a handsome guy. that's all. (Minami Tanaka lost his mind)

File structure

scrapy ├── downloads │  ├── Sho Hirano │  │  ├──shohirano_001.jpg │  │  └── ... │  └── ... └── scrapy.ipynb If you run a file with someone else's name, it will accumulate in downloads.

Utilization

If you increase this number and store only the face cut out as a file, you can use it as learning data. For face recognition, etc.

Recommended Posts

Serial number rename of scraped images
Upload a large number of images to Wordpress
How to increase the number of machine learning dataset images
[Python] Easy reading of serial number image files with OpenCV
10. Counting the number of lines
Get the number of digits
List of self-made Docker images
Optimal placement of multiple images
Calculate the number of changes
Faster loading of Python images
TensorFlow To learn from a large number of images ... ~ (almost) solution ~