Standardize by group with pandas

Introduction

I was processing data for machine learning with pandas, I wanted to standardize by group of some columns rather than standardize as a whole. However, there was a scene where standardization was not necessary for the group name, but the standardization process was desired while retaining the group name. It's just a memo.

Execution environment

pandas = 0.25.3 numpy = 1.18.0

Code to standardize by group in pandas

Standardize columns for each class name in a table like the one below

	class	a	b	c
	a	1.0	2.0	3.0
	a	4.0	5.0	6.0
	b	7.0	8.0	9.0
	b	10.0	11.0	12.0


import pandas as pd
import numpy as np

# make data set
df = pd.DataFrame(np.arange(12).reshape(4, 3),
                  columns=['col_0', 'col_1', 'col_2'],
                  index=['row_0', 'row_1', 'row_2','row_3'])
df["class"] = ["a", "a", "b", "b"]

# Standardization for each group
class_ = df[["class"]]
class_names = df.groupby("class").groups.keys()
for name in class_names:
     df_tmp = df[(df['class'] == name)].drop(columns=['class'])
     df[(df['class'] == name)] =  (df_tmp - df_tmp.mean()) /df_tmp.std()
df["class"] = class_

First post. .. It's just a memo. Please let me know if there is a better way.

Recommended Posts

Standardize by group with pandas

Manipulating strings with pandas group by

Feature generation with pandas group by

Create an age group with pandas

Pandas: groupby () to complete value by group

Speed comparison when shifting by group by pandas

Sort by pandas

When to_csv with Pandas, it became line by line

Draw a graph by processing with Pandas groupby

Quickly visualize with Pandas

Processing datasets with pandas (1)

Bootstrap sampling with Pandas

Convert 202003 to 2020-03 with pandas

Processing datasets with pandas (2)

Merge datasets with pandas

Learn Pandas with Cheminformatics

Data visualization with pandas

Data manipulation with Pandas!

Shuffle data with pandas

Extract N samples for each group with Pandas DataFrame

pandas Matplotlib Summary by usage

Read csv with python pandas

Load nested json with pandas

[Python] Change dtype with pandas

Visualization memo by pandas, seaborn

Prevent omissions with pandas print

Data processing tips with Pandas

Extract the maximum value with pandas.

Standardize non-normal distribution with robust Z-score

Versatile data plotting with pandas + matplotlib

[Python] Join two tables with pandas

Extract specific multiple columns with pandas

1. Statistics learned with Python 1-1. Basic statistics (Pandas)

Convenient analysis with Pandas + Jupyter notebook

Draw a graph with pandas + XlsxWriter

Hello World! By QPython with Braincrash

Bulk Insert Pandas DataFrame with psycopg2

I want to do ○○ with Pandas

Object recognition with openCV by traincascade

Excel aggregation with Python pandas Part 1

[Python] Format when to_csv with pandas

Handle various date formats with pandas