** What to explain in this article ** Sample code for the following features.
--Create a list of files under the specified directory --Check if the text in the file contains a particular string --Extract the text in the range enclosed by a specific string from the text in the file
Development environment
--python 2.7 and above
def generate_file_list(dirpath_to_search):
file_list = []
for dirpath, dirnames, filenames in os.walk(dirpath_to_search):
for filename in filenames:
file_list.append(os.path.join(dirpath,filename))
return file_list
A sample when you want to recursively acquire the file names under sample1 with the following directory structure.
Sample directory structure
sample1/
├── dir01
│ ├── dir11
│ │ └── file21.txt
│ └── file11.txt
├── file01.txt
└── file02.txt
how to use
file_list = generate_file_list('sample1')
for file in file_list:
print(file)
#output
# sample1/file01.txt
# sample1/file02.txt
# sample1/dir01/file11.txt
# sample1/dir01/dir11/file21.txt
os.walk(top, topdown=True, onerror=None, followlinks=False)
Create the file names under the directory tree by scanning the tree top-down or bottom-up. Yield tuples (dirpath, dirnames, filenames) for each directory (including top itself) in the directory tree rooted at directory top.
def contain_text_in_file(filepath, text):
with open(filepath) as f:
return any(text in line for line in f)
A sample when there are files contain.txt
and not_contain.txt
as shown below and you want to know the file that includes "2020/02/02" in the file.
contain.txt
Update date: 2020/02/02
This article is about python file operations.
not_contain.txt
Update date: 2019/10/15
This article is about python file operations.
how to use
filepath1 = './contain.txt'
text = '2020/02/02'
result1 = contain_text_in_file(filepath1, text)
print(result1) # True
filepath2 = './not_contain.txt'
text = '2020/02/02'
result2 = contain_text_in_file(filepath2, text)
print(result2) # False
Opens file and returns the corresponding file object.
Returns True if any element of iterable is true. Returns False if iterable is empty. Equivalent to the following code:
import re
def extract_text_in_file(filepath, pattern_prev, pattern_next):
extracted_text_array = []
pattern = pattern_prev + '(.*)' + pattern_next
with open(filepath) as f:
lines = f.readlines()
for line in lines:
tmp_extracted_text_array = re.findall(pattern, line)
extracted_text_array.extend(tmp_extracted_text_array)
return extracted_text_array
A sample when there is a file called file.txt
like the one below and you want to extract the date part surrounded by" update date "and" by ".
file.txt
Update date:2020/02/01 by taro
This article is about python file operations.
Update date:2020/02/02 by jiro
This article is about python file operations.
how to use
filepath = './file.txt'
pattern_prev = 'Update date:'
pattern_next = ' by'
extracted_text_array = extract_text_in_file(filepath, pattern_prev, pattern_next)
for extracted_text in extracted_text_array:
print(extracted_text)
#output
# 2020/02/01
# 2020/02/02
re.findall(pattern, string, flags=0)
Returns all unique matches by pattern in string as a list of strings. The string is scanned from left to right and matches are returned in the order they are found. Returns a list of groups if there is more than one group in the pattern. If the pattern has multiple groups, it will be a list of tuples. Empty matches are included in the result.
Opens file and returns the corresponding file object.
Recommended Posts