Overview

The csv file may be divided by time or attribute, and I implemented it so that those files can be read in one line. The desired functions are as follows. -Read all csv files in the folder. -It is also possible to target only csv files that include specified characters. -Allow subordinate directories to be included.

Execution environment

・ Windows10 64bit -Python 3.8.3 ・ Pandas 0.25.3 ・ Seaborn 0.11.0

Implementation

1. Data preparation

Divide iris data (150 items) into 4 and save as csv file ("main" folder directly under E drive). Also, save the same file in the "sub" folder inside the "main" folder.

import seaborn as sns
data = sns.load_dataset('iris')

import os
os.makedirs(r'E:\main', exist_ok=True)
for i in range(4):
    st = int(0 if i==0 else (len(data)/4)*i)
    en = int((len(data)/4)*(i+1))
    
    data.iloc[st:en].to_csv(r'E:\main\iris{}.csv'.format(i), encoding='cp932', index=False)
    

os.makedirs(r'E:\main\sub', exist_ok=True)
for i in range(4):
    st = int(0 if i==0 else (len(data)/4)*i)
    en = int((len(data)/4)*(i+1))
    
    data.iloc[st:en].to_csv(r'E:\main\sub\iris{}.csv'.format(i+4), encoding='cp932', index=False)

2. Read csv file

As a result, I implemented it with the following function.

import glob
import pandas as pd
def read_csv(path, encode, sub_check=False, target_name=None):
    #Get the path of all csv files in the folder with list
    #sub_check=If True, target subfolders
    target_files = glob.glob(path+r'\**\*.csv', recursive=True) if sub_check else glob.glob(path+r'\*.csv')

    #For storing files after merging
    merged_file = pd.DataFrame()

    #Combine all target csv files
    for filepath in target_files:
        
        #If the file name does not contain the specified characters, it will be excluded.
        filename = filepath.split('\\')[-1]
        if target_name!=None and target_name not in filename: continue        
    
        #Read one csv file
        input_file = pd.read_csv(filepath, encoding=encode, sep=",", engine='python')
        
        #Combine one csv file into the previously read csv file
        merged_file = pd.concat([merged_file, input_file], axis=0)

    #Reset the index of the DataFrame after joining
    merged_file = merged_file.reset_index(drop=True)
    
    return merged_file

Operation check

1. Read csv file in one folder

All csv files (150 items) in one folder could be read.

2. Read only the specified file with the csv file in one folder

When "1" was specified for target_name, csv files (38 cases) containing "1" in the name could be read.

3. Read files including lower directories

If sub_check = True, all csv files (300) including the "sub" folder in the lower directory could be read.

Thank you for browsing.