Immediately, I tried using the machine learning library PyCaret released the other day. I realized that anyone can easily model. It was really easy! You can tune and predict from pre-processing without writing 10 lines of code! There are many parts that I haven't figured out yet, such as arguments, but I decided to write the PyCaret article first. If you notice anything, please comment.
Execute the code below to install it. It was a sensation, but it took only a few minutes. When I installed it locally, I got an error, so I gave up once.
! pip install pycaret
This time we will use the data of boston. You can get the data with the following code.
from pycaret.datasets import get_data
boston_data = get_data('boston')
Perform preprocessing.
Data and target variables are defined and initialized in setup ().
Since we are solving a regression problem this time, we have specified pycaret.regression
.
For classification problems, specify pycaret.classification
.
You can also perform tasks such as natural language processing and clustering.
setup () handles missing values, encodes categorical data, train-test-split, etc. For more information, see here.
from pycaret.regression import *
exp1 = setup(boston_data, target = 'medv')
Run it to complete the setup.
Let's compare and select models. You can compare models in one line below. It took a few minutes. It is convenient to check the evaluation index in a list! By default, k-fold is divided into 10 parts. You can specify the number of folds and the index to sort with the argument. (Running is done by default.)
compare_models()
Click here for execution results
Select a model and model it. This time I'm using Random Forest. (I feel completely.) This function returns a table containing k-folded scores and trained model objects. You can also check the SD, which is very convenient!
rf = create_model('rf')
By specifying a period after the trained object, you can check as follows.
Tuning can also be done in one line.
tuned_rf = tune_model('rf')
You can get the parameters below.
tuned_rf.get_params
Let's visualize the accuracy of the model. The regression plot is shown below, but for classification problems, you can choose the output according to the metric. I regret that I should have selected the classification problem here because there are many variations of visualization of the classification problem. .. ..
plot_model(tuned_rf)
The model is interpreted using SHAP. Check SHAP git for how to read the graph and how to interpret the model.
interpret_model(tuned_rf)
The prediction for the test data is written as follows. The execution result returns the predicted result for 30% of the test data train-test-split by setup ().
rf_holdout_pred = predict_model(rf)
When making predictions for new data, pass the dataset as an argument to data.
predictions = predict_model(rf, data=boston_data)
The prediction result is added to the far right.
Until the end Thank you for reading. If you have any questions, please leave a comment.
Recommended Posts