Consolidate a large number of CSV files in folders with python (data without header)

Introduction

--Integrate a large number of CSV files into one CSV file.

Advance preparation

--Prepare data without CSV file header. --Collect the CSV files you want to integrate in a folder. --Specify the output file name of the result of integration.

code


import csv, os
import pandas as pd

#Specify the folder containing the CSV file(1)Reference
csv_folder_path = os.path.join(".","csv_folder", "headerRemoved")

#Get a list of file names in list format
csv_files_list = os.listdir(csv_folder_path)

#Create a list to store the lines in all csv files
csv_rows=[]

#Specify the file name from the file list to be read, and csv all lines_Store in rows list.
for csv_filename in csv_files_list:
    csv_file_obj = open(os.path.join(csv_folder_path, csv_filename))
    reader_obj = csv.reader(csv_file_obj)
    for row in reader_obj:
        csv_rows.append(row)
csv_file_obj.close()

#Convert list to dataframe type.
df = pd.DataFrame(csv_rows)

#Specify the range of columns to export (0 to 44 only)(3)Export range
df = df.iloc[:,range(0,44)]

#Convert dataframe to csv and save(2)Output file name
df.to_csv(os.path.join(".","merged_file.csv"), index=False)

Commentary

--Get a list of filenames from the folder that contains the CSV file --Create a file object for each file according to the file name list, create a Reader object from it, and read lines from the file line by line. --Repeat on all files and finally collect all lines from all files in one list --Convert the list to DataFrame type before exporting. --Here, the range of columns required before exporting is specified. --Finally, write to a CSV file with the file name specified in pd.to_csv. At this time, index is not written by setting index = False.

Impressions

--By the way, since it is converted to df, you can retrieve the display of only any column. The 12th and 38th columns can be specified in this order by doing the following.

df.iloc[:,[38,12]]

――Next, I want to plot various graphs from df.

Recommended Posts

Consolidate a large number of CSV files in folders with python (data without header)

Get a list of files in a folder with python without a path

Organize a large number of files into folders

[Homology] Count the number of holes in data with Python

[Python] Read a csv file with a large data size using a generator

Use shutil to delete all folders with a small number of files

ETL processing for a large number of GTFS Realtime files (Python edition)

Plot CSV of time series data with unixtime value in Python (matplotlib)

[Python] Get the files in a folder with Python

A well-prepared record of data analysis in Python

How to get a list of files in the same directory with python

I made a program in Python that reads CSV data of FX and creates a large amount of chart images

Accelerate a large number of simple queries with MySQL

Full-width and half-width processing of CSV data in Python

[Python] Randomly generate a large number of English names

Get a large amount of Starbucks Twitter data with python and try data analysis Part 1

How to identify the element with the smallest number of characters in a Python list?

Combine multiple csv files into one csv file with python (assuming only one line of header)

Try scraping the data of COVID-19 in Tokyo with Python

Paste a large number of image files into PowerPoint [python-pptx]

Notes on handling large amounts of data with python + pandas

Get the number of specific elements in a python list

A set of script files that do wordcloud in Python3

[Python] Easy reading of serial number image files with OpenCV

[Python] Creating a GUI tool that automatically processes CSV of temperature rise data in Excel

Lambda + Python is good at restricting access with a large number of IP address lists

Number recognition in images with Python

Transpose CSV files in Python Part 1

A simple data analysis of Bitcoin provided by CoinMetrics in Python

Manipulate files and folders in Python

Handling of JSON files in Python

I made a lot of files for RDP connection with Python

Convert a large number of PDF files to text files using pdfminer

Sort large text files in Python

Handle Excel CSV files with Python

How to create a large amount of test data in MySQL? ??

Read files in parallel with Python

Law of large numbers in python

Get the number of readers of a treatise on Mendeley in Python

Get the number of searches with a regular expression. SeleniumBasic VBA Python

CSV output of pulse data with Raspberry Pi (confirm analog input with python)

Get a list of packages installed in your current environment with python

[Python] How to put any number of standard inputs in a list

Check the in-memory bytes of a floating point number float in Python

Receive a list of the results of parallel processing in Python with starmap

Code reading of faker, a library that generates test data in Python

Get additional data in LDAP with python

Data input / output in Python (CSV, JSON)

Add a Python data source with Redash

Output tree structure of files in Python

Display a list of alphabets in Python 3

Connect a large number of videos together!

Recommendation of Altair! Data visualization with Python

[Python] Get a list of folders only

Project Euler # 17 "Number of Characters" in Python

[Python] Precautions when finding the maximum and minimum values in a numpy array with a small number of elements

I wanted to know the number of lines in multiple files, so I tried to get it with a command

One-liner to create a large number of test files at once on Linux

Summary of how to read numerical data with python [CSV, NetCDF, Fortran binary]

Basic data frame operations written by beginners in a week of learning Python

Align the number of samples between classes of data for machine learning with Python