Automatically determine and process the encoding of the text file

To find out the encoding of the text It seems that you should try decoding from one end and use the one that has been successfully decoded.

`python`


def conv_encoding(data):
    lookup = ('utf_8', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213',
            'shift_jis', 'shift_jis_2004','shift_jisx0213',
            'iso2022jp', 'iso2022_jp_1', 'iso2022_jp_2', 'iso2022_jp_3',
            'iso2022_jp_ext','latin_1', 'ascii')
    encode = None
    for encoding in lookup:
      try:
        data = data.decode(encoding)
        encode = encoding
        break
      except:
        pass
    if isinstance(data, unicode):
        return data,encode
    else:
        raise LookupError

#File reading and encoding investigation
fp = open(path,'r')
str,encoding = None,None
try:
  str,encoding = conv_encoding(fp.read())
finally:
  fp.close()

#Edit content
...<Arbitrary code>


#Write file in original encoding
fp = open(path,'w')
try:
  fp.write(str.encode(encoding))
finally:
  fp.close()

Recommended Posts

Automatically determine and process the encoding of the text file

The process of installing Atom and getting Python running

Get the MIME type in Python and determine the file format

The process of making Python code object-oriented and improving it

Process the contents of the file in order with a shell script

The story of Python and the story of NaN

Process the result of% time,% timeit

The story of the "hole" in the file

Process the gzip file UNLOADed with Redshift with Python of Lambda, gzip it again and upload it to S3

Dig the directory and create a list of directory paths + file names

[Python scraping] Output the URL and title of the site containing a specific keyword to a text file

Netmiko automatically detects the type of network device and executes the command

Open an Excel file in Python and color the map of Japan

This and that of the inclusion notation.

Implement part of the process in C ++

Check the existence of the file with python

Review the concept and terminology of regression

Automatically update and confirm the school homepage

Automatically generate images of koalas and bears

The story of trying deep3d and losing

Set the process name of the Python program

[Python] Get the character code of the file

Add lines and text on the image

[Python3] Understand the basics of file operations

Let's play with Python Receive and save / display the text of the input form

Attempt to launch another .exe and save the console output to a text file

Python Memorandum: Refer to the text and edit the file name while copying the target file

About the behavior of copy, deepcopy and numpy.copy

Summary of the differences between PHP and Python

Full understanding of the concepts of Bellman-Ford and Dijkstra

Download the image from the text file containing the URL

The answer of "1/2" is different between python2 and 3

Organize the meaning of methods, classes and objects

Specifying the range of ruby and python arrays

Change the color of Fabric errors and warnings

Compare the speed of Python append and map

Send Gmail at the end of the process [Python]

Experiment and leave evidence to determine the specifications.

Tucker decomposition of the hay process with HOOI

General description of the CPUFreq core and CPUFreq notifiers

Convert the character code of the file with Python3

Organize the super-basic usage of Autotools and pkg-config

I read and implemented the Variants of UKR

Determine the number of classes using the Starges formula

About the * (asterisk) argument of python (and itertools.starmap)

A discussion of the strengths and weaknesses of Python

[Python] Determine the type of iris with SVM

The nice and regrettable parts of Cloud Datalab

Macports easy_install automatically resolves and runs the version

Get the update date of the Python memo file.

When a file is placed in the shared folder of Raspberry Pi, the process is executed.

[Python] Change the text color and background color of a specific keyword in print output

I tried to extract the text in the image file using Tesseract of the OCR engine

Read the csv file with jupyter notebook and write the graph on top of it

Check the processing time and the number of calls for each process in python (cProfile)

Save the text of all Evernote notes to SQLite using Beautiful Soup and SQLAlchemy