I tried using the machine learning library PyCaret that was released the other day. Data feature analysis and performance comparison work with multiple models will be automated, and I think that the work time of data scientists so far will be significantly reduced.
Execute the code below to install it. I'm using Anaconda, but I launched and installed a virtual environment dedicated to PyCaret. In an existing virtual environment managed by Conda, an error may occur. (Probably due to a collision between pip and conda)
pip install pycaret
This time we will use the Wine Quality dataset.
The dataset is organized into 11 explanatory variables and 1 objective variable (Quality) that represents the quality of the wine.
** Explanatory variable ** 1 --fixed acidity 2 --volatile acidity 3 --citric acid 4-remaining sugar 5 --chlorides 6 --free sulfur dioxide 7 --total sulfur dioxide 8-density 9 - pH 10-sulphates 11 --alcohol
** Objective variable ** 12 --quality (score between 0 and 10)
The dataset can be downloaded from the following site. http://archive.ics.uci.edu/ml/datasets/Wine+Quality http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv
from pycaret.datasets
import pandas as pd
dataset = pd.read_csv('winequality-white.csv',sep=";",encoding="utf-8")
dataset.head()
Then, 95% is divided into training data and 5% is divided into test data (called Unseen Data).
data = dataset.sample(frac =0.95, random_state = 786).reset_index(drop=True)
data_unseen = dataset.drop(data.index).reset_index(drop=True)
print('Data for Modeling: ' + str(data.shape))
print('Unseen Data For Predictions: ' + str(data_unseen.shape))
result
Data for Modeling: (4653, 12) Unseen Data For Predictions: (245, 12)
Use setup () to preprocess the data. At that time, specify quality as an argument for the objective variable.
from pycaret.classification import *
exp_clf101 = setup(data = data, target = 'quality', session_id=123)
Result (up to 10 items)
Use compare_models () to analyze the dataset using multiple classification models and summarize the results in a table. This is a very useful feature when considering which classification model to use.
There are more than 10 classification models provided by Pycaret, which can be confirmed at the link below.
https://pycaret.org/classification/
compare_models()
With the Extra Trees Classifier, it was Accuracy 65.95%. For example, if you do not use these functions and analyze only with SVM from the beginning, the accuracy is 32.03%. Wine quality data, SVM analysis case
result
Select a classification model and model it. Use create_model (). This time, we will use the Light Gradient Boosting Machine model.
lightgbm = create_model('lightgbm')
result
The model is also tuned using tune_model.
tuned_lightgbm = tune_model('lightgbm')
result
The average accuracy before tuning was 0.6393, and the average accuracy after tuning was 0.6414.
Visualize the analysis results using plot_model.
First, plot the AUC curve.
plot_model(tuned_lightgbm, plot = 'auc')
result
Then plot the confusion matrix.
plot_model(tuned_lightgbm, plot = 'confusion_matrix')
result
It is possible to perform multiple evaluations at the same time using evaluate_model ().
evaluate_model(tuned_lightgbm)
If you press the button in the yellow frame, each evaluation result will be displayed.
result
After finalizing the model with finalize_model (), make a prediction with predict_model (). When making predictions, we use test data (here, data_unseen).
final_lightgbm = finalize_model(tuned_lightgbm)
unseen_predictions = predict_model(final_lightgbm, data=data_unseen)
unseen_predictions.head()
The last Label column is the result of the prediction.
result
I wrote a follow-up article. If you like, please read this article as well. ** I tried to predict Titanic survival with PyCaret **
1.PyCaret Home Page , http://www.pycaret.org/ 2.Wine Quality Dataset, http://archive.ics.uci.edu/ml/datasets/Wine+Quality 3.PyCaret Classification, https://pycaret.org/classification/ 4. I tried using PyCaret at the fastest speed, https://qiita.com/s_fukuzawa/items/5dd40a008dac76595eea 5. [Python] Judging the quality of wine by machine learning, https://ymgsapo.com/2019/01/06/ai-wine-quality/
Recommended Posts