5th Feature Engineering for Machine Learning-Feature Selection

Introduction

This article explains the interaction features. This article is mainly based on "Features Engineering for Machine Learning". Please check it out if you become.

What is feature selection?

A technique for removing features that are not useful for model prediction. Ineffective features increase the training time of the model and reduce its accuracy.

Filter method

The filter method is a method of reducing features by looking only at the dataset, regardless of the model. How much each feature can be used for prediction is quantified based on the index, and the feature to be actually used is selected. This indicator includes ** Pearson's correlation coefficient **, ** Chi-square test **, ** ANOVA **, and so on. Specifically, you can delete features that have too high a correlation between features, or delete features that have too low a correlation with the objective variable. However, since it is a method that does not consider the model at all, it may delete the features that may be effective depending on the model.

Wrapper method

The wrapper method takes some features from the dataset and trains the model. This process is performed multiple times to determine the effective features. Since the features are selected while actually incorporating them into the model, unlike the filter method, the features that may be effective in the model are not deleted in advance. However, the amount of calculation becomes enormous.

Built-in method

The embedding method means that feature selection is incorporated when training the model. In the decision tree, the model is trained while calculating the importance of the features, so it is a method to select the one with the high importance of the features after the learning is completed. The built-in method is inferior in quality to the wrapper method, but it is a well-balanced method because it can reduce the calculation cost and select the features that are more effective for the model of the filter method.

Finally

I'm thinking of posting a video about IT on YouTube. Please like, subscribe to the channel, and give us a high rating, as it will motivate youtube and Qiita updates. YouTube: https://www.youtube.com/channel/UCywlrxt0nEdJGYtDBPW-peg Twitter: https://twitter.com/tatelabo

reference

https://qiita.com/shimopino/items/5fee7504c7acf044a521

Recommended Posts

5th Feature Engineering for Machine Learning-Feature Selection
Feature engineering for machine learning starting with the 4th Google Colaboratory --Interaction features
Feature Engineering for Machine Learning Beginning with Part 3 Google Colaboratory-Scaling
Predictive Power Score for feature selection
Feature Selection Datasets
Feature Engineering for Machine Learning Beginning with Part 2 Google Colaboratory-Logarithmic Transformation and Box-Cox Transformation
Feature preprocessing for modeling
HJvanVeen's "Feature Engineering" Note
Feature selection by sklearn.feature_selection
[Machine learning] Feature selection of categorical variables using chi-square test