Machine learning experience in just a few lines (Part 2). Explain PyCaret in detail. Model building and evaluation analysis.

Regarding unseen data

When studying PyCaret, it seems that unseen data is mistaken for test data, but unseen data is test data, but if you explain in detail,

Create a predictive model with training data Create a final prediction model by combining training data with test data Finally, enter unseen data into the model to check the accuracy of the model

It will be the flow.

Last review

Machine learning experience in just a few lines (first part). Explain PyCaret in detail. From dataset preparation to accuracy comparison of multiple models. is continued. Last time, we did everything from preparing the dataset to comparing the accuracy of the models.

Purpose of this time

In part2, we will create the model, plot it, and create the final model.

Create a model using training data

The purpose of compare_models () is not to create trained models, but to evaluate high performance models and select model candidates. This time, we will train the model using a random forest.

code.py


rf = create_model('rf')

image.png

tune_model () is a random grid search for hyperparameters. By default, it is set to optimize accuracy.

code.py


tuned_rf = tune_model('rf')

image.png

For example, in a random forest, if you want to create a model with a high AUC value, the code would look like this:

code.py


tuned_rf_auc = tune_model('rf', optimize = 'AUC')

The model created with tuned_model is 1.45% more accurate, so I will use it.

Plot the accuracy of the model

Run AUC Plot

code.py


plot_model(tuned_rf, plot = 'auc')

image.png

Precision-Recall Curve

code.py


plot_model(tuned_rf, plot = 'pr')

image.png

Feature Importance Plot

code.py


plot_model(tuned_rf, plot='feature')

image.png

code.py


evaluate_model(tuned_rf)

image.png

Confusion Matrix

code.py


plot_model(tuned_rf, plot = 'confusion_matrix')

image.png

To create a prediction model by combining training data and test data

Before finally completing the predictive model, use test data to check that the training model is not overfitted. Here, if the difference in accuracy becomes large, it is necessary to consider it, but this time there is no big difference in accuracy, so we will proceed.

code.py


predict_model(tuned_rf);

image.png

Finally, the final version of the prediction model is completed. The model here is a combination of training and test data.

code.py


final_rf = finalize_model(tuned_rf)
print(final_rf)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None, criterion='gini', max_depth=10, max_features='auto', max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=2, min_samples_split=10, min_weight_fraction_leaf=0.0, n_estimators=70, n_jobs=None, oob_score=False, random_state=123, verbose=0, warm_start=False)

code.py


predict_model(final_rf);

image.png

The accuracy and AUC performance are high. This is because the test data was combined to improve the quality of the predictive model.

Model evaluation using unseen data

Finally, we will use unseen data (a dataset of 1200) to evaluate the predictive model.

code.py


unseen_predictions = predict_model(final_rf, data=data_unseen)
unseen_predictions.head()

image.png

Label and Score have been added to the dataset. Label will be the label predicted by the model. Score is the probability of prediction.

Save model

When you have more new data to predict, it's hard to start over. Save_model is prepared in PyCaret, and you can save the model.

code.py


save_model(final_rf,'Final RF Model')

Transformation Pipeline and Model Succesfully Saved

Loading the saved model

To load the model, do the following:

code.py


saved_final_rf = load_model('Final RF Model')

Transformation Pipeline and Model Sucessfully Loaded

Use the unseen data from earlier. The result is the same as before, so I will omit it.

code.py


new_prediction = predict_model(saved_final_rf, data=data_unseen)

code.py


new_prediction.head()

at the end

I tried to execute the explanation of the Level Beginner tutorial. I'm surprised that it can be done so far with a dozen lines. I feel that the hurdles for machine learning have become even lower.

If you have any suggestions, please comment. Thank you for reading.

Recommended Posts

Machine learning experience in just a few lines (Part 2). Explain PyCaret in detail. Model building and evaluation analysis.
Machine learning experience (first part) in just a few lines. Explain PyCaret in detail. From dataset preparation to accuracy comparison of multiple models.
Enable Django https in just a few lines
How about Anaconda for building a machine learning environment in Python?
Become an AI engineer soon! Comprehensive learning of Python / AI / machine learning / deep learning / statistical analysis in a few days!
Classification and regression in machine learning
Inversely analyze a machine learning model
[Machine learning] Summary and execution of model evaluation / indicators (w / Titanic dataset)