I made a tool quickly with python, so a memorandum. Since the source is that of the company, remember the important points

Directory recursive

Function to get all files

Here was very helpful.

def get_all_files(directory):
    for path, dirs, files in os.walk(directory):
        for file in files:
            yield os.path.join(path, file)

Function to get all directories

Borrowed from here.

def get_all_files(directory):
    for path, dirs, files in os.walk(directory):
        yield path

Function to get all directories and files

Borrowed from here.

def get_all_files(directory):
    for path, dirs, files in os.walk(directory):
        yield path
        for file in files:
            yield os.path.join(path, file)

When to use

for file in get_all_files('/tmp/test'):
    print file

About yield

Here was very helpful. To put it simply, it seems that the contents of the process can be left without retrun. So, you can turn it in a loop or get it with next (). In other words

def test():
	yield 'a'
	yield 'b'
	yield 'c'

for i in test():
	print i

When you run

a
b
c

Is output.

File processing

Get the filename from the full path

I use this often

file_name = os.path.basename(file)

Split extension and others from full path

I wanted to see the extension

file_title, file_ext = os.path.splitext(file_name)

Open the file in UTF-8 and read all lines

You need to be careful about unicode when opening in UTF-8

f = codecs.open(file, 'r', 'utf-8')
list = f.readlines()
f.close()

Open the file in UTF-8 and write

If the file does not exist, create it. If string is unicode, it can be written in Japanese. When printing, use print (string.encode ('utf-8')). If you don't add \ n, the line will not be broken when you open it with CRLF line break.

result = codecs.open(outfile, 'a', 'utf-8')
result.write(format("string --> %s\n" % string))
result.close()

When comparing the read files, you have to unify the character code I was quite addicted to it. .. ..

String comparison

Whether any string is included

line = "1abcdefg23456789"
target_string = "1a"

if (target_string in line):

As the number of processes increased, I sometimes felt that it was Python. Python is good for quick writing. I like it.

Python Memorandum 2