When studying PyCaret, it seems that unseen data is mistaken for test data, but unseen data is test data, but if you explain in detail,
Create a predictive model with training data Create a final prediction model by combining training data with test data Finally, enter unseen data into the model to check the accuracy of the model
It will be the flow.
Machine learning experience in just a few lines (first part). Explain PyCaret in detail. From dataset preparation to accuracy comparison of multiple models. is continued. Last time, we did everything from preparing the dataset to comparing the accuracy of the models.
In part2, we will create the model, plot it, and create the final model.
The purpose of compare_models () is not to create trained models, but to evaluate high performance models and select model candidates. This time, we will train the model using a random forest.
code.py
rf = create_model('rf')
tune_model () is a random grid search for hyperparameters. By default, it is set to optimize accuracy.
code.py
tuned_rf = tune_model('rf')
For example, in a random forest, if you want to create a model with a high AUC value, the code would look like this:
code.py
tuned_rf_auc = tune_model('rf', optimize = 'AUC')
The model created with tuned_model is 1.45% more accurate, so I will use it.
Run AUC Plot
code.py
plot_model(tuned_rf, plot = 'auc')
Precision-Recall Curve
code.py
plot_model(tuned_rf, plot = 'pr')
Feature Importance Plot
code.py
plot_model(tuned_rf, plot='feature')
code.py
evaluate_model(tuned_rf)
Confusion Matrix
code.py
plot_model(tuned_rf, plot = 'confusion_matrix')
Before finally completing the predictive model, use test data to check that the training model is not overfitted. Here, if the difference in accuracy becomes large, it is necessary to consider it, but this time there is no big difference in accuracy, so we will proceed.
code.py
predict_model(tuned_rf);
Finally, the final version of the prediction model is completed. The model here is a combination of training and test data.
code.py
final_rf = finalize_model(tuned_rf)
print(final_rf)
RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None, criterion='gini', max_depth=10, max_features='auto', max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=2, min_samples_split=10, min_weight_fraction_leaf=0.0, n_estimators=70, n_jobs=None, oob_score=False, random_state=123, verbose=0, warm_start=False)
code.py
predict_model(final_rf);
The accuracy and AUC performance are high. This is because the test data was combined to improve the quality of the predictive model.
Finally, we will use unseen data (a dataset of 1200) to evaluate the predictive model.
code.py
unseen_predictions = predict_model(final_rf, data=data_unseen)
unseen_predictions.head()
Label and Score have been added to the dataset. Label will be the label predicted by the model. Score is the probability of prediction.
When you have more new data to predict, it's hard to start over. Save_model is prepared in PyCaret, and you can save the model.
code.py
save_model(final_rf,'Final RF Model')
Transformation Pipeline and Model Succesfully Saved
To load the model, do the following:
code.py
saved_final_rf = load_model('Final RF Model')
Transformation Pipeline and Model Sucessfully Loaded
Use the unseen data from earlier. The result is the same as before, so I will omit it.
code.py
new_prediction = predict_model(saved_final_rf, data=data_unseen)
code.py
new_prediction.head()
I tried to execute the explanation of the Level Beginner tutorial. I'm surprised that it can be done so far with a dozen lines. I feel that the hurdles for machine learning have become even lower.
If you have any suggestions, please comment. Thank you for reading.
Recommended Posts