Check what the character code is for all files under the directory that is Python and output

Overview

[DIR_NAME] For all files below Check if it is a text file with the character code defined in [TARGET_ENCODING_LIST], Output to the file name of [OUTPUT_NAME]. If it cannot be determined, it will be output as binary.

environment

Windows8 + Python2.6 series

code

check_encoding.py


#!/usr/bin/python
# -*- coding: utf-8 -*-
# vim: fileencoding=utf-8

import os , sys

DIR_NAME = 'C:\\Program Files\\'
OUTPUT_NAME = 'result_file_encoding_list.txt'

TARGET_ENCODING_LIST = [
	'utf-8',
	'shift-jis',
	'euc-jp',
	'iso2022-jp'
]

FLAG_STDOUT = True
#FLAG_STDOUT = False

import os, sys

write = sys.stdout.write

def guess_charset(data):
	file = lambda d, encoding: d.decode(encoding) and encoding
	for enc in TARGET_ENCODING_LIST:
		try:
			file(data, enc)
			return enc
		except:
			pass
	return 'binary'

out = open(OUTPUT_NAME, 'w')
for dirpath, dirs, files in os.walk(DIR_NAME):
	for fn in files:
		path = os.path.join(dirpath, fn)
		fobj = file(path, 'rU')
		data = fobj.read()
		fobj.close()
		try:
			enc = guess_charset(data)
		except:
			continue
		str = path + ',' + enc + '\n'
		try:
			if FLAG_STDOUT == True:
				write(str)
			out.write(str)
		except:
			continue

Supplement

Exception handling is appropriate. If the file name contains Japanese characters, the characters will be garbled.

Recommended Posts

Check what the character code is for all files under the directory that is Python and output
Checks if there is a specific character string for all files under the directory that is Python and outputs the target line
What is the python underscore (_) for?
Delete all pyc files under the specified directory
Unzip all zip files under the current directory
Search for files with line feed code CR + LF under the current directory
Python script that makes UTF-8 files with all BOMs under the folder without BOMs
Recursively search for files and directories in Python and output
python> Check if code is printable> Use ord () / all (c in string.printable for c in hello)
Recursively copy files from the directory directly under the directory using Python
[Python] Python and security-① What is Python?
What is the interface for ...
What should I do with the Python directory structure after all?
What is Python? What is it used for?
[Python] Check the current directory, move the directory
[Python] What is @? (About the decorator)
[python] What is the sorted key?
[Code] Module and Python version output
Python for statement ~ What is iterable ~
Export and output files in Python
Command for the current directory Python
How to input a character string in Python and output it as it is or in the opposite direction.
For Windows: Get a list of directories and files under a specific directory.
[Introduction to Python] What is the difference between a list and a tuple?
[Example of Python improvement] What is the recommended learning site for Python beginners?
Python beginner ~ Round off the Nth decimal place and output ~ (for memo)
[python] Move files that meet the conditions
[Python] What is pandas Series and DataFrame?
What is "mahjong" in the Python library? ??
[Python] Get the character code of the file
SublimeText2 and SublimeLinter --Syntax check for Python3--
I tried searching for files under the folder with Python by file name
What is the difference between `pip` and` conda`?
[python] Check the elements of the list all, any
What is "functional programming" and "object-oriented" in Python?
The answer of "1/2" is different between python2 and 3
[Xonsh] The Python shell is sharp and god
What is wheezy in the Docker Python image?
Wagtail is the best CMS for Python! (Perhaps)
About the difference between "==" and "is" in python
What are you comparing with Python is and ==?
Links and memos of Python character code strings
What is bucket sort? Merideme and code example
Archive and compress the entire directory with python
PDF files and sites useful for learning Python 3
Code for checking the operation of Python Matplotlib
Convert the character code of the file with Python3
virtualenv For the time being, this is all!
What are python tuples and * args after all?
What is the difference between Unix and Linux?
Import audit.log into Splunk and check the behavior when Splunk is started for the first time
Check the processing time and the number of calls for each process in python (cProfile)
It is easy to execute SQL with Python and output the result in Excel