--Process csv files in several groups
In the process of my graduation research, I had to process a reasonably large csv file. Specifically, the work of grouping the rows according to the value of the id column and getting the row with the smallest distance column in each group. The method used at that time is described below.
The code is as follows.
import pandas as pd
data = pd.read_csv("Path to the original data file")
df = pd.DataFrame(columns=data.columns)
###This will be the final output file. The column name is the same as data and is empty at this stage.
dic = {}
for name,group in data.groupby('id'):
dic[name] = group
list = dic['id'].unique()
for i in list:
k = dic[i]
l = k['distance'].idxmin()
m = data.iloc[l:l+1,:]
df = df.append(m)
df.to_csv("The path of the directory you want to save")
I think there's more concise code ... It is useful in quite a variety of situations, such as creating a DataFrame that inherits the column name of the original data and getting the row number that minimizes the value of a certain column.
Recommended Posts