Check what the character code is for all files under the directory that is Python and output

Overview

[DIR_NAME] For all files below Check if it is a text file with the character code defined in [TARGET_ENCODING_LIST], Output to the file name of [OUTPUT_NAME]. If it cannot be determined, it will be output as binary.

environment

Windows8 + Python2.6 series

code

`check_encoding.py`


#!/usr/bin/python
# -*- coding: utf-8 -*-
# vim: fileencoding=utf-8

import os , sys

DIR_NAME = 'C:\\Program Files\\'
OUTPUT_NAME = 'result_file_encoding_list.txt'

TARGET_ENCODING_LIST = [
	'utf-8',
	'shift-jis',
	'euc-jp',
	'iso2022-jp'
]

FLAG_STDOUT = True
#FLAG_STDOUT = False

import os, sys

write = sys.stdout.write

def guess_charset(data):
	file = lambda d, encoding: d.decode(encoding) and encoding
	for enc in TARGET_ENCODING_LIST:
		try:
			file(data, enc)
			return enc
		except:
			pass
	return 'binary'

out = open(OUTPUT_NAME, 'w')
for dirpath, dirs, files in os.walk(DIR_NAME):
	for fn in files:
		path = os.path.join(dirpath, fn)
		fobj = file(path, 'rU')
		data = fobj.read()
		fobj.close()
		try:
			enc = guess_charset(data)
		except:
			continue
		str = path + ',' + enc + '\n'
		try:
			if FLAG_STDOUT == True:
				write(str)
			out.write(str)
		except:
			continue

Supplement

Exception handling is appropriate. If the file name contains Japanese characters, the characters will be garbled.

Recommended Posts

Check what the character code is for all files under the directory that is Python and output

Checks if there is a specific character string for all files under the directory that is Python and outputs the target line

What is the python underscore (_) for?

Delete all pyc files under the specified directory

Unzip all zip files under the current directory

Search for files with line feed code CR + LF under the current directory

Python script that makes UTF-8 files with all BOMs under the folder without BOMs

Recursively search for files and directories in Python and output

python> Check if code is printable> Use ord () / all (c in string.printable for c in hello)

Recursively copy files from the directory directly under the directory using Python

[Python] Python and security-① What is Python?

What is the interface for ...

What should I do with the Python directory structure after all?

What is Python? What is it used for?

[Python] Check the current directory, move the directory

[Python] What is @? (About the decorator)

[python] What is the sorted key?

[Code] Module and Python version output

Python for statement ~ What is iterable ~

Export and output files in Python

Command for the current directory Python

How to input a character string in Python and output it as it is or in the opposite direction.

For Windows: Get a list of directories and files under a specific directory.

[Introduction to Python] What is the difference between a list and a tuple?

[Example of Python improvement] What is the recommended learning site for Python beginners?

Python beginner ~ Round off the Nth decimal place and output ~ (for memo)

[python] Move files that meet the conditions

[Python] What is pandas Series and DataFrame?

What is "mahjong" in the Python library? ??

[Python] Get the character code of the file

SublimeText2 and SublimeLinter --Syntax check for Python3--

I tried searching for files under the folder with Python by file name

What is the difference between `pip` and` conda`?

[python] Check the elements of the list all, any

What is "functional programming" and "object-oriented" in Python?

The answer of "1/2" is different between python2 and 3

[Xonsh] The Python shell is sharp and god

What is wheezy in the Docker Python image?

Wagtail is the best CMS for Python! (Perhaps)

About the difference between "==" and "is" in python

What are you comparing with Python is and ==?

Links and memos of Python character code strings

What is bucket sort? Merideme and code example

Archive and compress the entire directory with python

PDF files and sites useful for learning Python 3

Code for checking the operation of Python Matplotlib

Convert the character code of the file with Python3

virtualenv For the time being, this is all!

What are python tuples and * args after all?

What is the difference between Unix and Linux?

Import audit.log into Splunk and check the behavior when Splunk is started for the first time

Check the processing time and the number of calls for each process in python (cProfile)

It is easy to execute SQL with Python and output the result in Excel