Overview

[DIR_NAME] For all files below Check if it is a text file with the character code defined in [TARGET_ENCODING_LIST], If it is a text file, search for [SEARCH_WORD] and The result is output to the file name of [OUTPUT_NAME].

environment

Windows8 + Python2.6 series

code

`find_directory.py`


#!/usr/bin/python
# -*- coding: utf-8 -*-
# vim: fileencoding=utf-8

import os , sys , codecs

DIR_NAME = 'C:\\html\\HOGE\\'
OUTPUT_NAME = 'result_find_file_list.csv'

SEARCH_WORD = '<font'

TARGET_ENCODINGS = [
	'utf-8',
	'shift-jis',
	'euc-jp',
	'iso2022-jp'
]

FLAG_STDOUT = True
#FLAG_STDOUT = False

import os, sys

write = sys.stdout.write

def guess_charset(data):
	file = lambda d, encoding: d.decode(encoding) and encoding
	for enc in TARGET_ENCODINGS:
		try:
			file(data, enc)
			return enc
		except:
			pass
	return 'binary'

out = codecs.open(OUTPUT_NAME, 'w', 'shift-jis')
out.write('path,line_number,search,target_line\n')

for dirpath, dirs, files in os.walk(DIR_NAME):
	for fn in files:
		path = os.path.join(dirpath, fn)
		fobj = file(path, 'rU')
		data = fobj.read()
		fobj.close()
		try:
			enc = guess_charset(data)
		except:
			continue
		if enc == 'binary':
			continue
		count = 0
		try:
			for l in codecs.open(path, 'r', enc):
				count = count + 1
				if SEARCH_WORD in l:
					output = ''
					try:
						output = '"' + path + '","' + str(count) + '","' + SEARCH_WORD + '","' + l.replace('"',"'").replace('\r','').replace('\n','') + '"\r\n'
					except:
						continue
					if FLAG_STDOUT == True:
						write(output)
					out.write(output)
		except:
			continue

Supplement

As usual, exception handling is appropriate. There is room for refactoring, but I want to put it in the actual battle tomorrow, so I will post it as it is

Recommended Posts

Checks if there is a specific character string for all files under the directory that is Python and outputs the target line

Check what the character code is for all files under the directory that is Python and output

[Golang] Check if a specific character string is included in the character string

For Windows: Get a list of directories and files under a specific directory.

Check if the string is a number in python

[Python] Leave only the elements that start with a specific character string in the array

[Python] A function that searches the entire string with a regular expression and retrieves all matching strings.

Python will fail if there is a space after the backslash

Delete a particular character in Python if it is the last

Outputs a line containing the specified character string from a text file

Check if there is a specific symbol in the executable file and its dependent libraries (simplified version)