One of the features of Kaggle's popular library, LightGBM, is that it optimizes nicely by giving the gradient of the objective function you want to minimize. Choosing this objective function properly is one of the tricks to make a good model, and by default lots of objective is implemented. However, LightGBM allows you to pass a function to calculate the gradient in python. In this article, I would like to introduce what kind of implementation is done by creating an equivalent objective with python while referring to the official implementation of LightGBM.
In this article, it is assumed that the regression / two-class classification is performed by passing lgb.Dataset to lgb.train. When creating an objective for multi-class classification, [Mr. Tawara's article (forcibly perform Multi-Task (ha ???) Regression with LightGBM)](https://tawara.hatenablog.com/entry/2020/05/ 14/120016) I think it's easy to understand if you read around. Also, I'm not sure why I want grad and hess, so please refer to other materials around that.
Will begin the main subject. Since the core part of lightGBM is implemented in C ++ for speeding up, the Objective part is also written in C ++. Read the code (https://github.com/microsoft/LightGBM/blob/master/src/objective/regression_objective.hpp) for the behavior when objective = "l2". The part that calculates the gradient is implemented in GetGradients ().
cpp:github.com/microsoft/LightGBM/blob/master/src/objective/regression_objective.hpp
void GetGradients(const double* score, score_t* gradients,
score_t* hessians) const override {
if (weights_ == nullptr) {
#pragma omp parallel for schedule(static)
for (data_size_t i = 0; i < num_data_; ++i) {
gradients[i] = static_cast<score_t>(score[i] - label_[i]);
hessians[i] = 1.0f;
}
} else {
#pragma omp parallel for schedule(static)
for (data_size_t i = 0; i < num_data_; ++i) {
gradients[i] = static_cast<score_t>((score[i] - label_[i]) * weights_[i]);
hessians[i] = static_cast<score_t>(weights_[i]);
}
}
}
It's not practical because it's slow, but if you reproduce this in python, it will look like this.
def l2_loss(pred, data):
true = data.get_label()
grad = pred - true
hess = np.ones(len(grad))
return grad, hess
This objective minimizes poisson loss. When I read the metric about poisson loss, it is as follows.
cpp:github.com/microsoft/LightGBM/blob/master/src/metric/regression_metric.hpp
class PoissonMetric: public RegressionMetric<PoissonMetric> {
public:
explicit PoissonMetric(const Config& config) :RegressionMetric<PoissonMetric>(config) {
}
inline static double LossOnPoint(label_t label, double score, const Config&) {
const double eps = 1e-10f;
if (score < eps) {
score = eps;
}
return score - label * std::log(score);
}
inline static const char* Name() {
return "poisson";
}
};
And when I read objective, it has the following implementation.
cpp:github.com/microsoft/LightGBM/blob/master/src/objective/regression_objective.hpp
void GetGradients(const double* score, score_t* gradients,
score_t* hessians) const override {
if (weights_ == nullptr) {
#pragma omp parallel for schedule(static)
for (data_size_t i = 0; i < num_data_; ++i) {
gradients[i] = static_cast<score_t>(std::exp(score[i]) - label_[i]);
hessians[i] = static_cast<score_t>(std::exp(score[i] + max_delta_step_));
}
} else {
#pragma omp parallel for schedule(static)
for (data_size_t i = 0; i < num_data_; ++i) {
gradients[i] = static_cast<score_t>((std::exp(score[i]) - label_[i]) * weights_[i]);
hessians[i] = static_cast<score_t>(std::exp(score[i] + max_delta_step_) * weights_[i]);
}
}
}
... did you notice? Actually, the score of this objective is not the predicted value as it is, but the value of the exponent part x of e when it is expressed by score = e ^ x. Try entering the formula in WolframAlpha You can see. Therefore, when you make a poisson objective (others such as gamma and tweedie) with objective, you have to calculate metric with predicted value = e ^ (pred).
def poisson_metric(pred, data):
true = data.get_label()
loss = np.exp(pred) - true*pred
return "poisson", np.mean(loss), False
def poisson_object(pred, data):
poisson_max_delta_step = 0.7
true = data.get_label()
grad = np.exp(pred) - true
hess = exp(pred + poisson_max_delta_step)
return grad, hess
I would like to see the objective at the time of Niclas classification by pushing it badly. The metric at the time of binary is as follows.
cpp:github.com/microsoft/LightGBM/blob/master/src/metric/binary_metric.hpp
class BinaryLoglossMetric: public BinaryMetric<BinaryLoglossMetric> {
public:
explicit BinaryLoglossMetric(const Config& config) :BinaryMetric<BinaryLoglossMetric>(config) {}
inline static double LossOnPoint(label_t label, double prob) {
if (label <= 0) {
if (1.0f - prob > kEpsilon) {
return -std::log(1.0f - prob);
}
} else {
if (prob > kEpsilon) {
return -std::log(prob);
}
}
return -std::log(kEpsilon);
}
inline static const char* Name() {
return "binary_logloss";
}
};
Note that objective is sigmoid = 1, label_val = [-1, 1], label_weights = [1, 1] when is_unbalance = False.
cpp:github.com/microsoft/LightGBM/blob/master/src/objective/binary_objective.hpp
void GetGradients(const double* score, score_t* gradients, score_t* hessians) const override {
if (!need_train_) {
return;
}
if (weights_ == nullptr) {
#pragma omp parallel for schedule(static)
for (data_size_t i = 0; i < num_data_; ++i) {
// get label and label weights
const int is_pos = is_pos_(label_[i]);
const int label = label_val_[is_pos];
const double label_weight = label_weights_[is_pos];
// calculate gradients and hessians
const double response = -label * sigmoid_ / (1.0f + std::exp(label * sigmoid_ * score[i]));
const double abs_response = fabs(response);
gradients[i] = static_cast<score_t>(response * label_weight);
hessians[i] = static_cast<score_t>(abs_response * (sigmoid_ - abs_response) * label_weight);
}
} else {
#pragma omp parallel for schedule(static)
for (data_size_t i = 0; i < num_data_; ++i) {
// get label and label weights
const int is_pos = is_pos_(label_[i]);
const int label = label_val_[is_pos];
const double label_weight = label_weights_[is_pos];
// calculate gradients and hessians
const double response = -label * sigmoid_ / (1.0f + std::exp(label * sigmoid_ * score[i]));
const double abs_response = fabs(response);
gradients[i] = static_cast<score_t>(response * label_weight * weights_[i]);
hessians[i] = static_cast<score_t>(abs_response * (sigmoid_ - abs_response) * label_weight * weights_[i]);
}
}
}
As in the case of poisson, the score is predicted value = sigmoid (score), so the gradient is like this. Checking with WolframAlpha as before [when label = 0](https://ja.wolframalpha.com/input/?i=d%2Fdx+log%281+-+%281%2F%281+%2B + e% 5E% 28-x% 29% 29% 29% 29), when label = 1 % 2F% 281 +% 2B + e% 5E% 28-x% 29% 29% 29% 29), so if you write objective in python, it will be as follows.
def binary_metric(pred, data):
true = data.get_label()
loss = -(true * np.log(1/(1+np.exp(-pred))) + (1 - true) * np.log(1 - 1/(1+np.exp(-pred))))
return "binary", np.mean(loss), False
def binary_objective(pred, data):
true = data.get_label()
label = 2*true - 1
response = -label / (1 + np.exp(label * pred))
abs_response = np.abs(response)
grad = response
hess = abs_response * (1 - abs_response)
return grad, hess
This time I reproduced the official implementation of lightGBM in python. Understanding the basics introduced this time will make it easier to create your own custom objective. I would like to introduce the objective that I implemented in the competition in another article, so please do not hesitate to contact me.
Recommended Posts