When you add a column to a Pandas DataFrame, it will be added to the end, but there are times when you want to specify the column order of the DataFrame when outputting to a file. Speaking of sorting DataFrame, sort_values () and sort_index () are used, but neither of them supports sorting in row direction and sorting in column direction. I don't use it often, but I want to use it when I forget it, so I'll keep it as a memorandum.
Just pass a list of sorted column names to the Pandas loc argument. pandas.DataFrame.loc
The index can also be applied in any order.
from sklearn.datasets import load_iris
import pandas as pd
#Use iris data as a sample
iris = load_iris()
df_iris = pd.DataFrame(iris.data, columns=iris.feature_names)
#Find the average value of each item and add a suffix to the column name."_mean"Attach
for col in df_iris.columns:
df_iris[col + "_mean"] = df_iris[col].mean()
df_iris.head()
#Create a list in any column order
#Here, get the column name list of DataFrame, sort it in ascending order and use it.
list_col_sorted = df_iris.columns.to_list()
list_col_sorted.sort()
list_col_sorted
['petal length (cm)',
'petal length (cm)_mean',
'petal width (cm)',
'petal width (cm)_mean',
'sepal length (cm)',
'sepal length (cm)_mean',
'sepal width (cm)',
'sepal width (cm)_mean']
#Set the sorted list in the argument of loc
df_iris.loc[:, list_col_sorted].head()
Sorting by column name now causes column names ending in "_mean" to line up immediately after the original column.
In machine learning, hundreds or thousands of features may be created from dozens or more items. It is hard to see if the corresponding items are separated, such as when adding a missing value flag, so it is good to remember in such a situation.
Recommended Posts