Read all csv files in the folder

Overview

The csv file may be divided by time or attribute, and I implemented it so that those files can be read in one line. The desired functions are as follows. -Read all csv files in the folder. -It is also possible to target only csv files that include specified characters. -Allow subordinate directories to be included.

Execution environment

・ Windows10 64bit -Python 3.8.3 ・ Pandas 0.25.3 ・ Seaborn 0.11.0

Implementation

1. Data preparation

Divide iris data (150 items) into 4 and save as csv file ("main" folder directly under E drive). Also, save the same file in the "sub" folder inside the "main" folder.

import seaborn as sns
data = sns.load_dataset('iris')

import os
os.makedirs(r'E:\main', exist_ok=True)
for i in range(4):
    st = int(0 if i==0 else (len(data)/4)*i)
    en = int((len(data)/4)*(i+1))
    
    data.iloc[st:en].to_csv(r'E:\main\iris{}.csv'.format(i), encoding='cp932', index=False)
    

os.makedirs(r'E:\main\sub', exist_ok=True)
for i in range(4):
    st = int(0 if i==0 else (len(data)/4)*i)
    en = int((len(data)/4)*(i+1))
    
    data.iloc[st:en].to_csv(r'E:\main\sub\iris{}.csv'.format(i+4), encoding='cp932', index=False)

2. Read csv file

As a result, I implemented it with the following function.

import glob
import pandas as pd
def read_csv(path, encode, sub_check=False, target_name=None):
    #Get the path of all csv files in the folder with list
    #sub_check=If True, target subfolders
    target_files = glob.glob(path+r'\**\*.csv', recursive=True) if sub_check else glob.glob(path+r'\*.csv')

    #For storing files after merging
    merged_file = pd.DataFrame()

    #Combine all target csv files
    for filepath in target_files:
        
        #If the file name does not contain the specified characters, it will be excluded.
        filename = filepath.split('\\')[-1]
        if target_name!=None and target_name not in filename: continue        
    
        #Read one csv file
        input_file = pd.read_csv(filepath, encoding=encode, sep=",", engine='python')
        
        #Combine one csv file into the previously read csv file
        merged_file = pd.concat([merged_file, input_file], axis=0)

    #Reset the index of the DataFrame after joining
    merged_file = merged_file.reset_index(drop=True)
    
    return merged_file

Operation check

1. Read csv file in one folder

All csv files (150 items) in one folder could be read. image.png

2. Read only the specified file with the csv file in one folder

When "1" was specified for target_name, csv files (38 cases) containing "1" in the name could be read. image.png

3. Read files including lower directories

If sub_check = True, all csv files (300) including the "sub" folder in the lower directory could be read. image.png

Thank you for browsing.

Recommended Posts

Read all csv files in the folder
2 ways to read all csv files in a folder
Batch convert all xlsx files in the folder to CSV files
How to read CSV files in Pandas
[Python] Get the files in a folder with Python
Convert UTF-8 CSV files to read in Excel
How to get the files in the [Python] folder
Read the csv file and display it in the browser
Read the linked list in csv format with graph-tool
[Python] Open the csv file in the folder specified by pandas
[R] [Python] Memo to read multiple csv files in multiple zip files
Process the files in the folder in order with a shell script
Compress all the text files below!
Read files in parallel with Python
How to combine all CSVs in a folder into one CSV
Read and write csv files with numpy
Read the file line by line in Python
I saved the scraped data in CSV!
Read the file line by line in Python
Read and write JSON files in Python
Read all the contents of proc / [pid]
[Python] Read the specified line in the file
[Python] Combine all the elements in the array
Create a command to delete all temporary files generated in a specific folder
Various ways to read the last line of a csv file in Python
Access files in the same directory as the executable
Delete all pyc files under the specified directory
Reading and writing CSV and JSON files in Python
Read the output of subprocess.Popen in real time
Read CSV files uploaded to Flask without saving
The story of viewing media files in Django
Csv in python
[Django] css in the project cannot be read
Python script that makes UTF-8 files with all BOMs under the folder without BOMs
Unzip all zip files under the current directory
[Python] Rename all image files in a specific folder by shooting date and time
Get the file name in a folder using glob
[Django] Perform Truncate Table (delete all data in the table)
Drop all CSV files under any directory into DataFrame
[Python] Outputs all combinations of elements in the list
How to read csv containing only integers in Python
Read "Quantum computer made in 14 days". the 2nd day
Read DXF in python
Read the OpenCV documentation
Read CSV file: pandas
About __all__ in python
Read Python csv file
[Python] Reading CSV files
[Python] Read the csv file and display the figure with matplotlib
Main configuration files introduced in the LPIC202 exam (personal memo)
Rename and replace remote files using NcFTP in the shell
How to get all the keys and values in the dictionary
Solve the Japanese problem when using the CSV module in Python.
Get all IP addresses of instances in the autoscaling group
Convert only date serial numbers in CSV files with awk
Command to list all files in order of file name