This is how to write code that generates a statistic in the row direction of numerical data as a feature.
First of all, how to write well
#How to add one by one
df["sum"] = df.sum(axis=1)
df["max"] = df.max(axis=1)
df["min"] = df.min(axis=1)
df["mean"] = df.mean(axis=1)
df["median"] = df.median(axis=1)
df["mad"] = df.mad(axis=1)
df["var"] = df.var(axis=1)
df["std"] = df.std(axis=1)
df["skew"] = df.skew(axis=1)
df["kurt"] = df.kurt(axis=1)
df.head()
This is fine, but it's a bit uncomfortable considering expandability and maintenance.
func_list =["sum", "max", "min", "mean", "median", "mad",
"var", "std", "skew", "kurt"]
for func in func_list:
df[func] = df.apply(func, axis=1)
You can also use Lambda functions
f_diff = lambda x: df["xxx"]-df["yyy"]
func_list =["sum", "max", "min", "mean", "median", "mad",
"var", "std", "skew", "kurt", f_diff]
for func in func_list:
df[func] = df.apply(func, axis=1)
In machine learning, there are many ad hoc experiments, so "write-down" code tends to be mass-produced.
Writing code that is as clean as possible makes it easier to reuse and more extensible.
Recommended Posts