Handle zip files with Japanese filenames in Python 3

Although zip files can store filenames in UTF-8 in recent specifications, they often use a legacy environment-dependent character code format that stores filenames. In the case of Japanese, Shift-JIS (cp932) is often used according to Windows.

In Python 2, the file name returned by the zipfile module was a byte string, so the file name of cp932 was returned as it was, but in Python 3, the character string was unified to Unicode, so when the zip file is read, the file name is decoded. It will be returned as a character string. However, of course, Japanese customs are not the default behavior, so the characters will be garbled as they are.

When I read the zipfile module in Python 3.4, it looked like this:

            if flags & 0x800:
                # UTF-8 file names extension
                filename = filename.decode('utf-8')
            else:
                # Historical ZIP filename encoding
                filename = filename.decode('cp437')

Wouldn't it be possible to get a UnicodeDecodeError by decoding a cp932 encoded string?

>>> len(bytes(range(256)).decode('cp437'))
256

cp437 seems to decode all bytes one-to-one per character. So, it seems good to re-encode with cp437 and then decode with cp932 again.

import zipfile
zf = zipfile.ZipFile('foo.zip')
for name in zf.namelist():
    print(name.encode('cp437').decode('cp932')

Recommended Posts

Handle zip files with Japanese filenames in Python 3

Recursively unzip zip files with python

Handle Excel CSV files with Python

Read files in parallel with Python

How to handle Japanese in Python

[Python] Get the files in a folder with Python

Resolve Japanese write error UnicodeEncodeError in Python files

Handle Excel with python

Handle rabbimq with python

Create an image with characters in python (Japanese)

Character encoding when dealing with files in Python 3

Handle markdown in python

Handle Parquet in Python

Japanese output in Python

Extract zip with Python (Japanese file name support)

[Python] How to handle Japanese characters with openCV

Convert the image in .zip to PDF with Python

Japanese output when dealing with python in visual studio

Handle Ambient data in Python

Send Japanese email with Python3

I wrote python in Japanese

Scraping with selenium in Python

Working with LibreOffice in Python

Scraping with chromedriver in python

Debugging with pdb in Python

Handle environment variables in Python

Working with sounds in Python

Sorting image files with Python (2)

Scraping with Selenium in Python

Sort huge files with python

Sorting image files with Python (3)

Scraping with Tor in Python

[Tips] Handle Athena with Python

Tweet with image in Python

Sorting image files with Python

Combined with permutations in Python

Japanese morphological analysis with Python

Integrate PDF files with Python

Reading .txt files with Python

I understand Python in Japanese!

Handle JSON files with Matlab

Handle complex numbers in Python

Get Japanese synonyms in Python

[R] [Python] Memo to read multiple csv files in multiple zip files

Issue reverse geocoding in Japanese with Python Google Maps API

Number recognition in images with Python

Transpose CSV files in Python Part 1

Handle Base91 keys with python + redis.

Testing with random numbers in Python

GOTO in Python with Sublime Text 3

Working with LibreOffice in Python: import

Manipulating EAGLE .brd files with Python

CSS parsing with cssutils in Python

Easily handle lists with python + sqlite3

Manipulate files and folders in Python

[Python] POST wav files with requests [POST]

Handle posix message queues in python

Handle NetCDF format data in Python

Numer0n with items made in Python

Handle GDS II format in Python

Handling of JSON files in Python