Smart writing when adding machine learning statistics as features

This is how to write code that generates a statistic in the row direction of numerical data as a feature.

Common feature generation

First of all, how to write well

#How to add one by one
df["sum"] = df.sum(axis=1)
df["max"] = df.max(axis=1)
df["min"] = df.min(axis=1)
df["mean"] = df.mean(axis=1)
df["median"] = df.median(axis=1)
df["mad"] = df.mad(axis=1)
df["var"] = df.var(axis=1)
df["std"] = df.std(axis=1)
df["skew"] = df.skew(axis=1)
df["kurt"] = df.kurt(axis=1)

df.head()

This is fine, but it's a bit uncomfortable considering expandability and maintenance.

Smart feature generation

func_list =["sum", "max", "min", "mean", "median", "mad",
            "var", "std", "skew", "kurt"]

for func in func_list:
    df[func] = df.apply(func, axis=1)
    

You can also use Lambda functions

f_diff = lambda x: df["xxx"]-df["yyy"]

func_list =["sum", "max", "min", "mean", "median", "mad",
            "var", "std", "skew", "kurt", f_diff]

for func in func_list:
    df[func] = df.apply(func, axis=1)
    

In machine learning, there are many ad hoc experiments, so "write-down" code tends to be mass-produced.

Writing code that is as clean as possible makes it easier to reuse and more extensible.

Recommended Posts

Smart writing when adding machine learning statistics as features
When adding highly independent features
Introduction to machine learning Note writing
[Python] When an amateur starts machine learning
Machine learning
Machine learning of sports-Analysis of J-League as an example-②
[Note] Python, when starting machine learning / deep learning [Links]