[Python] [chardet] Automatic detection of character code of file

I checked if Python could automatically determine the character code, and I made a note.

It was easy to do with a package called chardet.

Usage — chardet 2.3.0 documentation

Example of use

test.py


from chardet.universaldetector import UniversalDetector

def check_encoding(file_path):
    detector = UniversalDetector()
    with open(file_path, mode='rb') as f:
        for binary in f:
            detector.feed(binary)
            if detector.done:
                break
    detector.close()
    print(detector.result, end='')
    print(detector.result['encoding'], end='')

def main():
    check_encoding('/path/to/sjis.txt')
    check_encoding('/path/to/utf8.txt')

if __name__ == '__main__':
    main()

Output example


$ python test.py
{'encoding': 'CP932', 'confidence': 0.99}
CP932
{'encoding': 'utf-8', 'confidence': 0.99}
utf-8

Please note that it may take some time to determine if it is a large file. (The above ʻUniversal Detetor` seems to end as soon as it can be determined)

reference

Encoding judgment in Python --Qiita Usage — chardet 2.3.0 documentation

Recommended Posts

[Python] [chardet] Automatic detection of character code of file
[Python] Get the character code of the file
Convert the character code of the file with Python3
2.x, 3.x character code of python
python character code
Handling of character code of file in IronPython
About Python3 character code
Links and memos of Python character code strings
Summary of python file operations
Character code learned in Python
Mass generation of QR code with character display by Python
Spit out a list of file name, last modified date and character code in python3
Easy encryption of file contents (Python)
Automatic collection of stock prices using python
[Python3] Rewrite the code object of the function
[Python] Summary of conversion between character strings and numerical values (ascii code)
Character code
Python CSV file Character code conversion, file name extraction, reading, output, merging operation
Read the file by specifying the character code.
Basic grammar of Python3 system (character string)
Character code conversion of CSV file using Loop (Shift JIS to UTF8)
[Python3] Understand the basics of file operations
[Python] Read the source code of Bottle Part 2
[python] Create a list of various character types
[Automation] Convert Python code into an exe file
Doki Doki Literature Club x Python ① Character file analysis
Character encoding when using csv module of python 2.7.3
[Python] Read the source code of Bottle Part 1
Calculation of match rate of character string breaks [python]
[Python] Summary of S3 file operations with boto3
Code for checking the operation of Python Matplotlib
List of Python code to move and remember
[Python] Chapter 02-02 Basics of Python programs (Handling of character strings)
Static analysis of Python code with GitLab CI
Speed evaluation of CSV file output in Python
[Blender x Python] Think of code with symbols
Python / Automatic low wrench unfitting of experimental data
Get the update date of the Python memo file.
Script python file
Introduction of Python
Python file processing
Basics of Python ①
Basics of python ①
Empty file detection
Copy of python
[Python] Algorithm-aware code
Introduction of Python
python2 series / 3 series, character code and print statement / command line
Make a copy of a Google Drive file from Python
[Python] Get the official file path of the shortcut file (.lnk)
A collection of code often used in personal Python
Let's break down the basics of TensorFlow Python code
Get the return code of the Python script from bat
# Function that returns the character code of a string
Ruby, Python code fragment execution of selection in Emacs
Read QR code from image file with Python (Mac)
List of Python code used in big data analysis
The story of automatic language conversion of TypeScript / JavaScript / Python
[Python] Code for measuring ambient light RGB of APDS9960
Let's statically check and format the code of E2E automatic test written in Python [VS Code]