The csv file may be divided by time or attribute, and I implemented it so that those files can be read in one line. The desired functions are as follows. -Read all csv files in the folder. -It is also possible to target only csv files that include specified characters. -Allow subordinate directories to be included.
・ Windows10 64bit -Python 3.8.3 ・ Pandas 0.25.3 ・ Seaborn 0.11.0
Divide iris data (150 items) into 4 and save as csv file ("main" folder directly under E drive). Also, save the same file in the "sub" folder inside the "main" folder.
import seaborn as sns
data = sns.load_dataset('iris')
import os
os.makedirs(r'E:\main', exist_ok=True)
for i in range(4):
st = int(0 if i==0 else (len(data)/4)*i)
en = int((len(data)/4)*(i+1))
data.iloc[st:en].to_csv(r'E:\main\iris{}.csv'.format(i), encoding='cp932', index=False)
os.makedirs(r'E:\main\sub', exist_ok=True)
for i in range(4):
st = int(0 if i==0 else (len(data)/4)*i)
en = int((len(data)/4)*(i+1))
data.iloc[st:en].to_csv(r'E:\main\sub\iris{}.csv'.format(i+4), encoding='cp932', index=False)
As a result, I implemented it with the following function.
import glob
import pandas as pd
def read_csv(path, encode, sub_check=False, target_name=None):
#Get the path of all csv files in the folder with list
#sub_check=If True, target subfolders
target_files = glob.glob(path+r'\**\*.csv', recursive=True) if sub_check else glob.glob(path+r'\*.csv')
#For storing files after merging
merged_file = pd.DataFrame()
#Combine all target csv files
for filepath in target_files:
#If the file name does not contain the specified characters, it will be excluded.
filename = filepath.split('\\')[-1]
if target_name!=None and target_name not in filename: continue
#Read one csv file
input_file = pd.read_csv(filepath, encoding=encode, sep=",", engine='python')
#Combine one csv file into the previously read csv file
merged_file = pd.concat([merged_file, input_file], axis=0)
#Reset the index of the DataFrame after joining
merged_file = merged_file.reset_index(drop=True)
return merged_file
All csv files (150 items) in one folder could be read.
When "1" was specified for target_name, csv files (38 cases) containing "1" in the name could be read.
If sub_check = True, all csv files (300) including the "sub" folder in the lower directory could be read.
Thank you for browsing.
Recommended Posts