I tried using PyCaret

1. Environment and version

2. Installation

pip install pycaret

From Jupyter lab

!pip install pycaret

3. Possible errors

Please be careful as you will get an error if the version of scikit-learn is not 0.22

4. Data

This time we will use the existing diabetes data. It can be obtained in a data frame directly from get_data of PyCaret.

from pycaret.datasets import get_data
df = get_data("diabetes")

5. Pretreatment

For regression problems

from pycaret.regression import *

For classification problems

from pycaret.classification import *

setup () will encode the category data, handle missing values, and split the data (train_test_split). Specify the target with target =.

experiment = setup(df, target="Class variable")

6. Model comparison

You can compare models simply by doing the following: It's convenient. k-fold is 10 by default. You can specify the number of folds, etc.


compare_models()

It will highlight the best results in yellow.

7. Modeling

This is also easy and can be modeled in one line:

model = create_model("ada")

This time I chose Ada Boost Classifier.

8. Tuning

Tuning is also a line! It's too easy.

tuned_model = tune_model("ada")

You can also get the parameters.

tuned_model.get_params

8. Model evaluation, visualization and interpretation

It can be evaluated, visualized and interpreted in order from the top. You can get another graph by putting plot = "boundary" etc. in plot and interpret.

evaluate_model(tuned_model)
plot_model(tuned_model)
interpret_model(tuned_model)

9. Forecast

model_pred = predict_model(tuned_model)

It returns the predicted value for the split data.

predictions = predict_model(tuned_model,data=df)

You can make predictions with new data with data =.

Reference site

I tried using PyCaret at the fastest speed

Recommended Posts

I tried using PyCaret
I tried using PyCaret
I tried using parameterized
I tried using argparse
I tried using mimesis
I tried using anytree
I tried using aiomysql
I tried using coturn
I tried using Pipenv
I tried using matplotlib
I tried using "Anvil".
I tried using Hubot
I tried using ESPCN
I tried PyCaret2.0 (pycaret-nightly)
I tried using openpyxl
I tried using Ipython
I tried using cron
I tried using ngrok
I tried using face_recognition
I tried using Jupyter
I tried using Heapq
I tried using doctest
I tried using folium
I tried using jinja2
I tried using folium
I tried using time-window
[I tried using Pythonista 3] Introduction
I tried using easydict (memo).
I tried face recognition using Face ++
I tried using Random Forest
I tried clustering with PyCaret
I tried using PyCaret at the fastest speed
I tried using Amazon Glacier
I tried using git inspector
[Python] I tried using OpenPose
I tried using magenta / TensorFlow
I tried using AWS Chalice
I tried using Slack emojinator
I tried using Rotrics Dex Arm # 2
I tried using Rotrics Dex Arm
I tried using GrabCut of OpenCV
I tried using Thonny (Python / IDE)
I tried server-client communication using tmux
I tried reinforcement learning using PyBrain
I tried deep learning using Theano
Somehow I tried using jupyter notebook
[Kaggle] I tried undersampling using imbalanced-learn
I tried shooting Kamehameha using OpenPose
I tried using the checkio API
[Python] I tried using YOLO v3
I tried asynchronous processing using asyncio
I tried scraping
I tried PyQ
I tried AutoKeras
I tried papermill
I tried django-slack
I tried Django
I tried spleeter
I tried cgo
I tried using Azure Speech to Text.
I tried using Twitter api and Line api