The list is as follows, but it can be executed in a few lines by automating pycaret.
evaluate_model(tuned_model)
Confusion Matrix
This is also a confusion matrix that you often see. It is output as a heat map.
Binary classification looks lonely, but multi-value classification makes various mistakes.
Error
Which of Positive / Negative did you make the prediction for each actual class? Is displayed.
If this is also a multi-value classification, you can feel more useful.
Dicision Boundary
Decision boundary. The ** Credit dataset ** is "default or not?" Unbalanced data.
Therefore, since the Positive class is ** very few **, it is difficult to see the boundaries.
Although it is a multi-value classification, the decision boundary can be confirmed in a data set that is balanced to some extent.
The following is the lightGBM decision boundary, so you can see the jagged boundaries of the ** Tree-based algorithm **.
This is a bonus, but you can understand the characteristics of the algorithm by comparing the decision boundaries.
# | Logistic Regression |
K Nearest Neighbour |
Gaussian Process |
---|---|---|---|
boundary | |||
Feature | Because of the linear algorithm The decision boundary is also straight |
Grouping nearby points Boundary |
Be aware of the bell curve Smooth curved surface |
Threshold
Precision Recall
Learning Curve
Validation Curve
Train set / CV set scores are displayed for the regularization parameters for each model.
For LightGBM, take the max_depth (controlling the depth of the tree) parameter on the horizontal axis.
For this model
When max_depth = 4, the generalization performance (CV score) is high.
Above that, the generalization performance does not increase, but the train set is (slightly) overfitted.
Therefore, it seems better to control max_depth. It can be used for judgment such as *.
The horizontal axis is different for each algorithm because the parameters that control regularization are different for each model.
For example, in logistic regression, the regularization parameter is ** C **, so the horizontal axis is C.
The parameters on the horizontal axis for each algorithm are summarized below.
See source code for more information (https://github.com/pycaret/pycaret/blob/master/classification.py#L2871-L2941). LDA is not supported.
algorithm | Horizontal axis | algorithm | Horizontal axis |
---|---|---|---|
Decision Tree Random Forest Gradient Boosting Extra Trees Classifier Extreme Gradient Boosting Light Gradient Boosting CatBoost Classifier |
max_depth | Logistic Regression SVM (Linear) SVM (RBF) |
C |
Multi Level Perceptron (MLP) Ridge Classifier |
alpha | AdaBoost | n_estimators |
K Nearest Neighbour(knn) | n_neighbors | Gaussian Process(GP) | max_iter_predict |
Quadratic Disc. Analysis (QDA) | reg_param | Naives Bayes | var_smoothing |
Feature Importance
Manifold Learning
Dimensions
from pycaret.datasets import get_data
#Load the credit dataset.
#If you specify the profile option as True, pandas-EDA by profiling runs.
data = get_data('credit',profile=False)
from pycaret.classification import *
exp1 = setup(data, target = 'default')
compare_models(sort="AUC")
tuned_model = tune_model(estimator='lightgbm')
*The algorithms that can be specified are as follows.docstringBut you can check it.
algorithm | Specifying the estimator | algorithm | Specifying the estimator |
---|---|---|---|
Logistic Regression | 'lr' | Random Forest | 'rf' |
K Nearest Neighbour | 'knn' | Quadratic Disc. Analysis | 'qda' |
Naives Bayes | 'nb' | AdaBoost | 'ada' |
Decision Tree | 'dt' | Gradient Boosting | 'gbc' |
SVM (Linear) | 'svm' | Linear Disc. Analysis | 'lda' |
SVM (RBF) | 'rbfsvm' | Extra Trees Classifier | 'et' |
Gaussian Process | 'gpc' | Extreme Gradient Boosting | 'xgboost' |
Multi Level Perceptron | 'mlp' | Light Gradient Boosting | 'lightgbm' |
Ridge Classifier | 'ridge' | CatBoost Classifier | 'catboost' |
#Summary *I have written the method of visualizing the model separately, so I would like to finish organizing it by application at the end. *Assuming input data-> modeling-> results, I would like to group them for the following 5 purposes. * A)Understand the input data and features themselves. * B)Understand the features that the model is looking at. * C)Determine the model's learning status (insufficient learning, overfitting). * D)Consider the predictive characteristics of the model and the thresholds at which the objectives can be achieved. * E)Understand the prediction performance and prediction results of the model.
Use | Perspective | Visualization means |
---|---|---|
A)Understand the input data and features themselves. | Is positive / negative data separable? | Manifold Learning |
Same as above | Dimensions | |
B)Understand the features that the model is looking at. | Which features are important | Feature Importance |
C)Determine the model's learning status (insufficient learning, overfitting). | Can prediction performance be improved by increasing the number of learnings? | Learning Curve |
Is overfitting suppressed by regularization? | Validation Curve | |
D)Consider the predictive characteristics of the model and the thresholds at which the objectives can be achieved. | Which threshold value corresponds to the desired prediction characteristic? | Threshold |
What is the relationship between Precision and Recall? | Precision Recall | |
E)Understand the prediction performance and prediction results of the model. | What is the AUC (Predictive Performance)? | AUC |
Understand the boundaries of the results | Decision Boundary | |
Understand how to make mistakes | Confusion Matrix | |
Same as above | Error |
#Finally *Thank you for staying with us. *If you don't mindLike, shareI would be happy if you could. *If there is a response to some extent, I will write a masterpiece (parameter explanation, etc.).
Recommended Posts