Recursively unzip zip files with python

** Added on June 1, 2017 ** Corrected comments by @ kota9 and @ pashango2

Introduction

This is my first time writing an article. As a background, when I downloaded the data necessary for research, there was a zip file in the zip file and it had a structure like a zip file in it again, so I wrote a script that automatically expands it. It was because I thought about it. Also, for beginners, I will introduce the code along the process I thought about. If you can afford it, just look at the last code.

Commentary

1. Aim with this script

-Extract the zip file recursively -Allow execution even if a directory is specified -Automatically delete the unzipped zip file

2. Process flow

Write down the general flow of processing.

(1) Check if the command line argument is a zip file or a directory (2) Exception handling

(3) If it is a zip file (3.1) Extract the specified zip file (3.2) Delete the zip file after extracting

(4) If it is a directory (4.1) Perform (3) for the zip file in the directory (4.2) Repeat (4) for the directories in the directory

(4) is a little difficult to understand, but the point is to extract the zip file, and if it is a directory, search for the zip file in it again. Since the extracted zip file may be a directory, I thought it would be more efficient to extract the zip file before proceeding with the directory search. (See below)

3. Actual code

(1) Check if the command line argument is a zip file or a directory

The execution statement assumed this time is

$ python expand_zip.py ZIP_FILE_NAME

Or

$ python expand_zip.py DIR_NAME

Therefore, it is necessary to change the process depending on which one was executed.

expand_zip.py


# -*- coding: utf-8 -*-
import os
import sys


if __name__ == "__main__":
    args = sys.argv
    if(os.path.isdir(args[1])):
        #When the directory is entered
    else:
        #When the zip file is entered

(2) Exception handling

Exception handling is required because input is accepted from the user. I'm not used to it here either, so please correct it if you make a mistake.

expand_zip.py


# -*- coding: utf-8 -*-
import os
import sys


if __name__ == "__main__":
    args = sys.argv
    try:
        if(os.path.isdir(args[1])):
            #When the directory is entered
        else:
            #When the zip file is entered
    except IndexError:
        print('IndexError: Usage "python %s ZIPFILE_NAME" or "python %s DIR_NAME"' % (args[0], args[0]))
    except IOError:
        print('IOError: Couldn\'t open "%s"' % args[1])

(3) If it is a zip file

(3.1) Extract the specified zip file

expand_zip.py


# -*- coding: utf-8 -*-
import os
import sys
import zipfile


def unzip(filename):
    with zipfile.ZipFile(filename, "r") as zf:
        zf.extractall(path=os.path.dirname(filename))


if __name__ == "__main__":
    args = sys.argv
    try:
        if(os.path.isdir(args[1])):
            #When the directory is entered
        else:
            unzip(os.path.join(args[1]))
    except IndexError:
        print('IndexError: Usage "python %s ZIPFILE_NAME" or "python %s DIR_NAME"' % (args[0], args[0]))
    except IOError:
        print('IOError: Couldn\'t open "%s"' % args[1])

** zf.extractall (path) ** will extract the zip file to the path location. This time, I will extract it to the location where the zip file originally existed.

(3.2) Delete the zip file after extracting

expand_zip.py


# -*- coding: utf-8 -*-
import os
import sys
import zipfile


def unzip(filename):
    with zipfile.ZipFile(filename, "r") as zf:
        zf.extractall(path=os.path.dirname(filename))
    delete_zip(filename)


def delete_zip(zip_file):
    os.remove(zip_file)


if __name__ == "__main__":
    args = sys.argv
    try:
        if(os.path.isdir(args[1])):
            #When the directory is entered
        else:
            unzip(os.path.join(args[1]))
    except IndexError:
        print('IndexError: Usage "python %s ZIPFILE_NAME" or "python %s DIR_NAME"' % (args[0], args[0]))
    except IOError:
        print('IOError: Couldn\'t open "%s"' % args[1])

(4) If it is a directory

(4.1) Perform (3) for the zip file in the directory

expand_zip.py


# -*- coding: utf-8 -*-
import os
import sys
import zipfile
import glob


def unzip(filename):
    with zipfile.ZipFile(filename, "r") as zf:
        zf.extractall(path=os.path.dirname(filename))
    delete_zip(filename)


def delete_zip(zip_file):
    os.remove(zip_file)


def walk_in_dir(dir_path):
    for filename in glob.glob(os.path.join(dir_path, "*.zip")):
        unzip(filename=os.path.join(dir_path,filename))


if __name__ == "__main__":
    args = sys.argv
    try:
        if(os.path.isdir(args[1])):
            walk_in_dir(args[1])
        else:
            unzip(os.path.join(args[1]))
    except IndexError:
        print('IndexError: Usage "python %s ZIPFILE_NAME" or "python %s DIR_NAME"' % (args[0], args[0]))
    except IOError:
        print('IOError: Couldn\'t open "%s"' % args[1])

In the ~~ for statement, ** os.listdir (dir_path) ** gets all the files and directories in dir_path, and then ** if os.path.isfile (os.path.join (dir_path,,) f)) ** gets only the files in it, and the last ** if u ".zip" in f ** gets the one with the extension .zip. ~~

** Added on June 1, 2017 ** In response to @ pashango2's comment

for filename in (f for f in os.listdir(dir_path) if os.path.isfile(os.path.join(dir_path, f)) if u".zip" in f):

The description

for filename in glob.glob(os.path.join(dir_path, "*.zip")):

Changed to.

** Addendum to here **

(4.2) Repeat (4) for the directories in the directory

expand_zip.py


# -*- coding: utf-8 -*-
import os
import sys
import zipfile
import glob


def unzip(filename):
    with zipfile.ZipFile(filename, "r") as zf:
        zf.extractall(path=os.path.dirname(filename))
    delete_zip(filename)


def delete_zip(zip_file):
    os.remove(zip_file)


def walk_in_dir(dir_path):
    for filename in glob.glob(os.path.join(dir_path, "*.zip")):
        unzip(filename=os.path.join(dir_path,filename))

    for dirname in (d for d in os.listdir(dir_path) if os.path.isdir(os.path.join(dir_path, d))):
        walk_in_dir(os.path.join(dir_path, dirname))


if __name__ == "__main__":
    args = sys.argv
    try:
        if(os.path.isdir(args[1])):
            walk_in_dir(args[1])
        else:
            unzip(os.path.join(args[1]))
    except IndexError:
        print('IndexError: Usage "python %s ZIPFILE_NAME" or "python %s DIR_NAME"' % (args[0], args[0]))
    except IOError:
        print('IOError: Couldn\'t open "%s"' % args[1])

The reason why the zip file in the directory is expanded and then recursively processed for all the directories in the directory is because the extraction result of the zip file may be a directory, so in this order. It has become.

Also, in this code, if the specified command line argument is a zip file, it is expanded in (3) and ended, so it is necessary to include that as a target for recursion.

So the final code is

expand_zip.py


# -*- coding: utf-8 -*-
import os
import sys
import zipfile
import glob


def unzip(filename):
    with zipfile.ZipFile(filename, "r") as zf:
        zf.extractall(path=os.path.dirname(filename))
    delete_zip(filename)


def delete_zip(zip_file):
    os.remove(zip_file)


def walk_in_dir(dir_path):
    for filename in glob.glob(os.path.join(dir_path, "*.zip")):
        unzip(filename=os.path.join(dir_path,filename))

    for dirname in (d for d in os.listdir(dir_path) if os.path.isdir(os.path.join(dir_path, d))):
        walk_in_dir(os.path.join(dir_path, dirname))


if __name__ == "__main__":
    args = sys.argv
    try:
        if(os.path.isdir(args[1])):
            walk_in_dir(args[1])
        else:
            unzip(os.path.join(args[1]))
            name, _ = os.path.splitext(args[1])
            if (os.path.isdir(name)):
                walk_in_dir(name)
    except IndexError:
        print('IndexError: Usage "python %s ZIPFILE_NAME" or "python %s DIR_NAME"' % (args[0], args[0]))
    except IOError:
        print('IOError: Couldn\'t open "%s"' % args[1])

Will be.

Summary

This was my first post, but it will be a good opportunity to organize my knowledge, so I would like to continue posting in the future. If you find any mistakes in this article, please correct them.

Recommended Posts

Recursively unzip zip files with python
Zip, unzip with python
Handle zip files with Japanese filenames in Python 3
Sorting image files with Python (2)
Sort huge files with python
Sorting image files with Python (3)
Sorting image files with Python
Integrate PDF files with Python
Reading .txt files with Python
Manipulating EAGLE .brd files with Python
[Python] POST wav files with requests [POST]
Decrypt files encrypted with OpenSSL with Python 3
Handle Excel CSV files with Python
Read files in parallel with Python
python zip
Recursively collect wikipedia links with python
Play audio files from Python with interrupts
Decrypt files encrypted with openssl from python with openssl
Reading and writing JSON files with Python
Download files on the web with Python
[Easy Python] Reading Excel files with openpyxl
Convert HEIC files to PNG files with Python
[Easy Python] Reading Excel files with pandas
FizzBuzz with Python3
Scraping with Python
Statistics with python
Scraping with Python
Python with Go
Twilio with Python
Integrate with Python
Play with 2016-Python
AES256 with python
Tested with Python
python starts with ()
with syntax (Python)
Bingo with python
Zundokokiyoshi with python
Excel with Python
Microcomputer with Python
Cast with python
[Python] Get the files in a folder with Python
Recursively search all files with multiple specified extensions
Drag and drop local files with Selenium (Python)
Character encoding when dealing with files in Python 3
Download and import files with Splunk external python
Upload files to Google Drive with Lambda (Python)
Extract zip with Python (Japanese file name support)
Reading and writing fits files with Python (memo)
Convert multiple proto files at once with python
Read wav files with only Python standard packages
Unzip all zip files under the current directory
Serial communication with Python
Django 1.11 started with Python3.6
Primality test with Python
Python with eclipse + PyDev.
Socket communication with Python
Data analysis with python 2
Upload files with Django
Scraping with Python (preparation)
Try scraping with Python.
Extract the table of image files with OneDrive & Python