Arbitrarily rearrange the column order of Pandas.DataFrame

Introduction

When you add a column to a Pandas DataFrame, it will be added to the end, but there are times when you want to specify the column order of the DataFrame when outputting to a file. Speaking of sorting DataFrame, sort_values () and sort_index () are used, but neither of them supports sorting in row direction and sorting in column direction. I don't use it often, but I want to use it when I forget it, so I'll keep it as a memorandum.

Method

Just pass a list of sorted column names to the Pandas loc argument. pandas.DataFrame.loc

The index can also be applied in any order.

Data preparation

from sklearn.datasets import load_iris
import pandas as pd

#Use iris data as a sample
iris = load_iris()
df_iris = pd.DataFrame(iris.data, columns=iris.feature_names)

#Find the average value of each item and add a suffix to the column name."_mean"Attach
for col in df_iris.columns:
    df_iris[col + "_mean"] = df_iris[col].mean()

df_iris.head()

iris_01.png

Sorting

#Create a list in any column order
#Here, get the column name list of DataFrame, sort it in ascending order and use it.
list_col_sorted = df_iris.columns.to_list()
list_col_sorted.sort()
list_col_sorted
['petal length (cm)',
 'petal length (cm)_mean',
 'petal width (cm)',
 'petal width (cm)_mean',
 'sepal length (cm)',
 'sepal length (cm)_mean',
 'sepal width (cm)',
 'sepal width (cm)_mean']

result

#Set the sorted list in the argument of loc
df_iris.loc[:, list_col_sorted].head()

iris_03.png

in conclusion

Sorting by column name now causes column names ending in "_mean" to line up immediately after the original column.

In machine learning, hundreds or thousands of features may be created from dozens or more items. It is hard to see if the corresponding items are separated, such as when adding a missing value flag, so it is good to remember in such a situation.

Recommended Posts

Arbitrarily rearrange the column order of Pandas.DataFrame
Get the column list & data list of CASTable
Change the order of PostgreSQL on Heroku
About the uncluttered arrangement in the import order of flake8
Set the specified column of QTableWidget to ReadOnly StyledItemDelegate
The copy method of pandas.DataFrame is deep copy by default
00. Reverse order of strings
The beginning of cif2cell
The meaning of self
the zen of Python
The story of sys.path.append ()
Revenge of the Types: Revenge of types
Sort the string array in order of length & Japanese syllabary