[Python] [chardet] Automatic detection of character code of file

I checked if Python could automatically determine the character code, and I made a note.

It was easy to do with a package called chardet.

Usage — chardet 2.3.0 documentation

Example of use

`test.py`


from chardet.universaldetector import UniversalDetector

def check_encoding(file_path):
    detector = UniversalDetector()
    with open(file_path, mode='rb') as f:
        for binary in f:
            detector.feed(binary)
            if detector.done:
                break
    detector.close()
    print(detector.result, end='')
    print(detector.result['encoding'], end='')

def main():
    check_encoding('/path/to/sjis.txt')
    check_encoding('/path/to/utf8.txt')

if __name__ == '__main__':
    main()

`Output example`


$ python test.py
{'encoding': 'CP932', 'confidence': 0.99}
CP932
{'encoding': 'utf-8', 'confidence': 0.99}
utf-8

Please note that it may take some time to determine if it is a large file. (The above ʻUniversal Detetor` seems to end as soon as it can be determined)

reference

Encoding judgment in Python --Qiita Usage — chardet 2.3.0 documentation

Recommended Posts

[Python] [chardet] Automatic detection of character code of file

[Python] Get the character code of the file

Convert the character code of the file with Python3

2.x, 3.x character code of python

python character code

Handling of character code of file in IronPython

About Python3 character code

Links and memos of Python character code strings

Summary of python file operations

Character code learned in Python

Mass generation of QR code with character display by Python

Spit out a list of file name, last modified date and character code in python3

Easy encryption of file contents (Python)

Automatic collection of stock prices using python

[Python3] Rewrite the code object of the function

[Python] Summary of conversion between character strings and numerical values (ascii code)

Character code

Python CSV file Character code conversion, file name extraction, reading, output, merging operation

Read the file by specifying the character code.

Basic grammar of Python3 system (character string)

Character code conversion of CSV file using Loop (Shift JIS to UTF8)

[Python3] Understand the basics of file operations

[Python] Read the source code of Bottle Part 2

[python] Create a list of various character types

[Automation] Convert Python code into an exe file

Doki Doki Literature Club x Python ① Character file analysis

Character encoding when using csv module of python 2.7.3

[Python] Read the source code of Bottle Part 1

Calculation of match rate of character string breaks [python]

[Python] Summary of S3 file operations with boto3

Code for checking the operation of Python Matplotlib

List of Python code to move and remember

[Python] Chapter 02-02 Basics of Python programs (Handling of character strings)

Static analysis of Python code with GitLab CI

Speed evaluation of CSV file output in Python

[Blender x Python] Think of code with symbols

Python / Automatic low wrench unfitting of experimental data

Get the update date of the Python memo file.

Script python file

Introduction of Python

Python file processing

Basics of Python ①

Basics of python ①

Empty file detection

Copy of python

[Python] Algorithm-aware code

Introduction of Python

python2 series / 3 series, character code and print statement / command line

Make a copy of a Google Drive file from Python

[Python] Get the official file path of the shortcut file (.lnk)

A collection of code often used in personal Python

Let's break down the basics of TensorFlow Python code

Get the return code of the Python script from bat

# Function that returns the character code of a string

Ruby, Python code fragment execution of selection in Emacs

Read QR code from image file with Python (Mac)

List of Python code used in big data analysis

The story of automatic language conversion of TypeScript / JavaScript / Python

[Python] Code for measuring ambient light RGB of APDS9960

Let's statically check and format the code of E2E automatic test written in Python [VS Code]