Grouping csv and getting minimum value (pandas)

Content of this article

--Process csv files in several groups

Preface

In the process of my graduation research, I had to process a reasonably large csv file. Specifically, the work of grouping the rows according to the value of the id column and getting the row with the smallest distance column in each group. The method used at that time is described below.

Main subject

The code is as follows.

import pandas as pd

data = pd.read_csv("Path to the original data file")
df = pd.DataFrame(columns=data.columns)
###This will be the final output file. The column name is the same as data and is empty at this stage.

dic = {}

for name,group in data.groupby('id'):
   dic[name] = group

list = dic['id'].unique()

for i in list:
   k = dic[i]
   l = k['distance'].idxmin()
   m = data.iloc[l:l+1,:]
   df = df.append(m)

df.to_csv("The path of the directory you want to save")

I think there's more concise code ... It is useful in quite a variety of situations, such as creating a DataFrame that inherits the column name of the original data and getting the row number that minimizes the value of a certain column.