The pitfall of RMSE (Root Mean Square Error), the evaluation index for regression!

Introduction

RMSE (Root Mean Squared Error) is often used to evaluate machine learning regression models. On the other hand, it is often stated in the introduction that RMSE is vulnerable to outliers, but there are cases where it is used without worrying about it. Let's suppress it as an anti-pattern by thinking "How weak is it then?" To write the conclusion first, ** "As a guide, if there is a value that is 10 times or more the average, you need to be careful when using it. In that case, it is better to use the logarithmic RMSLE." * It becomes *.

① What is RMSE (Root Mean Square Error)?

Well, as the name suggests. The error, that is, the difference between the actual value and the predicted value is squared and then averaged to take the route.

\textrm{RMSE} = \sqrt{\frac{1}{N}\sum_{i=1}^{N}(y_{i}-\hat{y}_{i})^{2}}

The formula is pretty close to the deviation value, only the deviation turned into an error. So, if you use it well, you can make an image that "this model will give about this error".

② What if RMSE goes wrong? About the scale of values.

** RMSE ** is popular and used by many people, but it has the disadvantage that ** when there are outliers, the value is easily dragged by the outliers **. I think there are many people who don't care about it. Since only the value as an error is included in the calculation, the error predicted as ** "10 to 20" ** and the error predicted as ** "100 to 110" ** are evaluated as the same. Also, if there is an error in predicting ** "1,000 to 900" ** in almost dozens of data, it will be treated as ** one data with completely different importance **.

import numpy as np
a = np.array([10]*100) #Assuming that 100 errors of 10 are prepared
print(np.sqrt((a**2).mean())) #RMSE is of course 10
a = np.append(a, [100]*1) #Add 100 to 101st
print(np.sqrt((a**2).mean())) #RMSE is 14.Rise to 07!

Since it is squared in the calculation, an error of 10 times scale is 100 times more important on the index. If there is a value 100 times the average error, its importance would be 10,000 times the average error, or 10,000 records. And since it is more difficult to predict the outliers in machine learning, the evaluation will inevitably become unstable if there are outliers.

③ Then what should the evaluation index be?

Personally, I think ** RMSLE ** or ** MAE **, which can be considered as a ratio, is good. I wonder if the machine learning model should be thought of as ** first, order correctly **. ** RMSLE ** is the logarithm of RMSE. You can make it RMSLE by taking the log of y. I think it is easy to use it as RMSE and make it exp at the end. ** MAE ** is the absolute value, not the square, when averaging. This will prevent the error from being amplified and will be more resistant to outliers. However, it is difficult to use as a loss factor when learning a model. Of course, if you say ** "There are almost no outliers" or "The problem is a mistake in a large value" **, there is no problem using RMSE.

④ Introduction of examples

In a recent competition, ProbSpace's Real Estate Transaction Price Prediction Competition used RMSE, and the data contained more than 100 times the average value. There was a big change in the ranking. "Topics analyzed how much RMSE changes depending on the scale difference" (login required) Please see if you are interested. Even in an actual project, I have seen that "I knew RMSE, but did not recognize whether it was appropriate for model evaluation when considering the possibility of outliers."

Conclusion

In conclusion, I think it's all about ** "Do not use RMSE when there is a possibility that data with different scales will come! Use RMSLE!" **. I think it's dangerous to see only RMSE as much as seeing only Accuracy in the classification. By the way, ProbSpace's real estate price forecast competition mentioned above was held again by the deadline of August 11, 2020 change the evaluation index to RMSLE. I will. I think that it is a recommended competition for beginners because you can participate from the state where there is a solution for the person who participated last time. I'm sorry to give you an example of an anti-pattern, but I'm happy that you can reopen it like this. As mentioned above, if you do not pay attention to the evaluation index, you will get into a big pitfall, and on the contrary, you can get a good ranking in the competition just by being careful, so it is worth noting.

Recommended Posts

The pitfall of RMSE (Root Mean Square Error), the evaluation index for regression!
Evaluation method of machine learning regression problem (mean square error and coefficient of determination)
Setting of evaluation index for cross-validation (memo, scikit-learn, cross_validation.cross_val_score)