I want to display the number of num_boost_rounds when early_stopping is applied using XGBoost callback (not achieved)

Premise

I'm a transcendental person, so I'll leave it as a memorandum. If you make a mistake, please kindly point it out because it's mental tofu A memorandum that makes the code of the reference site easy to understand The environment is azureml and I'm turning optuna to look for high para

Prerequisite knowledge

--num_boost_round is the number of gradient boosting iterations --Early_stopping ends the round when the prediction accuracy does not improve the specified number of times for validation. --Callback is a debug-like feature built into XGBoost (ambiguous) --Reference https://xgboost.readthedocs.io/en/latest/python/python_api.html#callback-api

Implementation

Minimal implementation

def return_callback():
    def print_num_boost_round(env):
        iteration = env.iteration
        msg = '\t'.join([str(x) for x in env.evaluation_result_list])
        print(iteration, msg)

As a result

0  ('validation_0-mae', 2657.650391)
1  ('validation_0-mae', 2657.609375)
0  ('validation_0-mae', 2624.649658)
2  ('validation_0-mae', 2657.425049)
1  ('validation_0-mae', 2624.609131)

You get something like Then change the code to

def return_callback():
    def print_num_boost_round(env):
        print(env)
XGBoostCallbackEnv(model=<xgboost.core.Booster object at 0x7fa972703208>, cvfolds=None, iteration=0, begin_iteration=0, end_iteration=100, rank=0, evaluation_result_list=[('validation_0-mae', 2657.623047)])
XGBoostCallbackEnv(model=<xgboost.core.Booster object at 0x7fa972703208>, cvfolds=None, iteration=1, begin_iteration=0, end_iteration=100, rank=0, evaluation_result_list=[('validation_0-mae', 2657.463379)])
XGBoostCallbackEnv(model=<xgboost.core.Booster object at 0x7f7a8224c208>, cvfolds=None, iteration=0, begin_iteration=0, end_iteration=100, rank=0, evaluation_result_list=[('validation_0-mae', 2624.622314)])
XGBoostCallbackEnv(model=<xgboost.core.Booster object at 0x7fa972703208>, cvfolds=None, iteration=2, begin_iteration=0, end_iteration=100, rank=0, evaluation_result_list=[('validation_0-mae', 2657.411377)])
XGBoostCallbackEnv(model=<xgboost.core.Booster object at 0x7f7a8224c208>, cvfolds=None, iteration=1, begin_iteration=0, end_iteration=100, rank=0, evaluation_result_list=[('validation_0-mae', 2624.467285)])
XGBoostCallbackEnv(model=<xgboost.core.Booster object at 0x7fa972703208>, cvfolds=None, iteration=3, begin_iteration=0, end_iteration=100, rank=0, evaluation_result_list=[('validation_0-mae', 2657.355957)])
XGBoostCallbackEnv(model=<xgboost.core.Booster object at 0x7f0ced02c208>, cvfolds=None, iteration=0, begin_iteration=0, end_iteration=100, rank=0, evaluation_result_list=[('validation_0-mae', 2639.834229)])
XGBoostCallbackEnv(model=<xgboost.core.Booster object at 0x7f7a8224c208>, cvfolds=None, iteration=2, begin_iteration=0, end_iteration=100, rank=0, evaluation_result_list=[('validation_0-mae', 2624.416016)])

It turns out that the value of iteration is obtained by env.iteration

Reference (https://kunsen.net/2020/05/02/post-3199/)

Try turning num_boost_round in Optuna to make a decision

param_list['num_boost_round'] = trial.suggest_int("num_boost_round", 100, 500)

First, try turning num_boost_round with the initial value of 100 to 500.

Specified parameters

List of parameters specified by Optuna

If you turn it as it is


{
 'max_depth': 20,
 'eta': 0.22613771945050443,
 'num_leaves': 2560,
 'lambda': 6.0425529841148486e-05,
 'alpha': 6.69043393720362e-07,
 'num_boost_round': 236,
 'colsample_bytree': 0.9727432424922707,
 'min_child_weight': 239.6173703091301
}

num_boost_round is 236 (not the same every time because it's Optuna's whim) So what is 236 ... Is it going around 236 times in the first place (By the way, it was 253 when I executed it again) As a result output

0 ('validation_0-mae', 2657.650391)
1  ('validation_0-mae', 2657.609375)
0  ('validation_0-mae', 2624.649658)
2  ('validation_0-mae', 2657.425049)
1  ('validation_0-mae', 2624.609131)

Is output, but iteration only rotates up to 100 as end_iteration shows. Next, I searched for the minimum value (manual) Since 135.56956 was the minimum value, I counted the number of lines where that value appeared. The result is 482

Conclusion

If you look closely, just because the iterations are the same does not mean that the values are the same. It might have been easier to understand if I read the XGBoost paper and had it as prerequisite knowledge ... Is there no choice but to push it now ... ??

Recommended Posts

I want to display the number of num_boost_rounds when early_stopping is applied using XGBoost callback (not achieved)
When generating a large number of graphs with matplotlib, I do not want to display the graph on the screen (jupyter environment)
When you want to save the result of the callback function somewhere
I want to output while converting the value of the type (e.g. datetime) that is not supported when outputting json with python
I want to display the progress bar
I want to solve the problem of memory leak when outputting a large number of images with Matplotlib
I want to get the path of the directory where the running file is stored.
I want to customize the appearance of zabbix
I want to display the progress in Python!
I want to be notified when the command operation is completed on linux!
I want to take a screenshot of the site on Docker using any font
I want to be notified of the connection environment when the Raspberry Pi connects to the network
I want to grep the execution result of strace
I want to refute "Ruby is not cool here"
I want to fully understand the basics of Bokeh
I want to automate ssh using the expect command!
I want to increase the security of ssh connections
[For beginners] I want to explain the number of learning times in an easy-to-understand manner.
I want to automate ssh using the expect command! part2
I want to use only the normalization process of SudachiPy
I want to get the operation information of yahoo route
I want to judge the authenticity of the elements of numpy array
I want to know the features of Python and pip
I want to map the EDINET code and securities number
Keras I want to get the output of any layer !!
I want to know the legend of the IT technology world
The story of when I was addicted to Caused by SSLError ("Can't connect to HTTPS URL because the SSL module is not available.")
When you want to use multiple versions of the same Python library (virtual environment using venv)
I want to get the name of the function / method being executed
I want to manually assign the training parameters of the [Pytorch] model
I got a TypeError:'int' object is not iterable when using keras
I tried to get the index of the list using the enumerate function
[Linux] I want to know the date when the user logged in
I want to output the beginning of the next month with Python
I wanted to challenge the classification of CIFAR-10 using Chainer's trainer
I want to run the Python GUI when starting Raspberry Pi
When I try to use pip, SSL module is not available.
I want to check the position of my face with OpenCV!
I want to know the population of each country in the world.
python> I want to display the number 8 as 008> print "{0: 0> 3}". format (8) or "{value: 0> 3}". format (value = 8) or "% 03d"% 8 Or format (8, "03d") or str (8) .zfill (3) or (Python 3.6 or later) f "{8: 0> 3}"
[Question] In sk-learn random forest regression, an error occurs when the number of parallels is set to -1.
For the time being using FastAPI, I want to display how to use API like that on swagger
I tried to transform the face image using sparse_image_warp of TensorFlow Addons
I want to get angry with my mom when my memory is tight
I tried to get the batting results of Hachinai using image processing
[Note] I want to completely preprocess the data of the Titanic issue-Age version-
How to find out the number of CPUs without using the sar command
When you want to use it as it is when using it with lambda memo
Python Note: When you want to know the attributes of an object
I want to display an image on Jupyter Notebook using OpenCV (mac)
I want to batch convert the result of "string" .split () in Python
I want to explain the abstract class (ABCmeta) of Python in detail.
When you want to change the HTTP headers of Flask's test client
I want to sort a list in the order of other lists
I want to express my feelings with the lyrics of Mr. Children
(Python) I tried to analyze 1 million hands ~ I tried to estimate the number of AA ~
I tried to extract and illustrate the stage of the story using COTOHA
I want to identify the alert email. --Is that x a wildcard? ---
I want to analyze the emotions of people who want to meet and tremble
I want to use the Qore SDK to predict the success of NBA players
I want to leave an arbitrary command in the command history of Shell