This article
・ I wrote a code that exceeds 100% recovery rate in horse racing prediction using LightGBM (1)
It will be a continuation of the article.
In Part 1, I wrote about the model with momentum, but in Part 2, I will write the result of actually predicting the future, and finally publish the code.
Actually, as of July, the predicted value was already published in note. However, the code used for prediction is continuing to improve while issuing the prediction note, and the code released this time is only the basic part, so the prediction of the note here and the prediction value of the code to be published are not necessarily It does not match.
[Horse Racing Forecast] July 25, 2020 [Horse Racing Forecast] July 26, 2020 [Horse Racing Forecast] August 01, 2020 [[Horse Racing Forecast] 08/08/2020] (https://note.com/km_takao/n/n9d2acf507e60) [Horse Racing Forecast] August 09, 2020 [Horse Racing Forecast] August 15, 2020 [Horse Racing Forecast] August 22, 2020
(August 2, 16 and 23, 2020 could not be predicted due to the need.)
The recovery rate when these are purchased in double win is as follows. Regarding the amount to be bet, following the method of Mr. Ushi explained in Part 1, "total budget x odds of 0.01/30 minutes ago" is used, and the total budget is calculated at 100,000 yen.
Race date | Total amount bet | Refund amount | Recovery rate |
---|---|---|---|
July 25, 2020 | 7,500 yen | 9,440 yen | 125% |
July 26, 2020 | 6,700 yen | 7,350 yen | 109% |
August 01, 2020 | 10,100 yen | 10,110 yen | 100% |
08/08/2020 | 23,700 yen | 23,200 yen | 98% |
August 09, 2020 | 14,900 yen | 15,210 yen | 102% |
August 15, 2020 | 23,200 yen | 26,260 yen | 113% |
August 22, 2020 | 31,000 Yen | 30,540 yen | 99% |
As a result of the improvement, we were able to increase the number of purchases, but the recovery rate is worse (and as a supplement, it has been 3 Baba since the 15th). It's currently under consideration whether such a race just happened to come at this time or if further improvements are needed.
Similarly, the recovery rate when purchasing a win is as follows.
Race date | Total amount bet | Refund amount | Recovery rate |
---|---|---|---|
July 25, 2020 | 2,800 yen | 4,390 yen | 156% |
July 26, 2020 | 1,900 yen | 1,580 yen | 83% |
August 01, 2020 | 4,700 yen | 4,410 yen | 93% |
08/08/2020 | 9,800 yen | 7,600 yen | 78% |
August 09, 2020 | 4,500 yen | 3,380 yen | 75% |
August 15, 2020 | 9,600 yen | 15,060 yen | 157% |
August 22, 2020 | 12,700 yen | 13,900 yen | 109% |
The number of purchases is increasing here as well, but there are days when the recovery rate has dropped. By the way, here is the result of the win if you change to the method of always purchasing only 100 yen regardless of the budget instead of Mr. Ushi's betting method.
Race date | Total amount bet | Refund amount | Recovery rate |
---|---|---|---|
July 25, 2020 | 1,600 yen | 1,850 yen | 115% |
July 26, 2020 | 1,100 yen | 1,080 yen | 98% |
August 01, 2020 | 1,600 yen | 3,500 yen | 218% |
08/08/2020 | 3,900 yen | 11,610 yen | 297% |
August 09, 2020 | 2,400 yen | 8,190 yen | 341% |
August 15, 2020 | 3,700 yen | 4,530 yen | 122% |
August 22, 2020 | 4,800 yen | 5,980 yen | 125% |
In other words, the model can predict the winning of Anoma, but with Mr. Ushi's betting method, the maximum odds that can be bet will decrease depending on the amount of the budget, and it will not be possible to bet on Anoma. As a result, only popular horses with low odds can be bet, which seems to be a factor in lowering the recovery rate. However, on the contrary, for races that did not get rough, it is a factor to increase the recovery rate by sloping like Mr. Ushi's betting method. If the budget you are thinking about is 100,000 yen, it will not have much effect if the odds are low (at most around 5 times) like a double win. However, if the odds are about 10 times or more in a win, the minimum stake is 100 yen, so it seems to be particularly affected. In this area, it is necessary to consider the constant of the stake calculation formula (0.01 in this case), your own budget, and the predicted value of the model.
I will publish it in note. A detailed explanation of the code is given in the notes and comments in the notebook. Here, we will explain the simple flow.
Scraping past race results, odds, etc. from netkeiba's database for model creation. As I wrote in Part 1, the scraping here is based on "How to scrape horse racing data using pandas read_html". The features to be scraped include information on each participating horse such as order of arrival and jockey name, information on the race itself such as distance, riding ground information, weather, and odds of each horse before the start of the race.
The code I publish is the foundation of the code I'm still improving, and I think it's even more accurate if you create your own features or ensemble with other algorithms, for example. .. Of course, even in the code to be published, a new feature amount related to the aggregation of past grades is created from the scraped feature amount.
An example of the features to be created is the aggregation of horses' past performance. It is necessary to aggregate so that future grades will not be included at the past time, so here we will sort race_id in ascending order so that future grades will not be aggregated from the time of aggregation. For example, if you look at the aggregated results for almond eyes,
Therefore, only the past data is properly aggregated and added as a new feature quantity. (Note that only the data from 2018 to 2020 is used here to check the public code.)
As the title suggests, we will create a model using lightGBM. The parameters are automatically adjusted by optuna. I think that the accuracy can be further improved by performing ensemble and cross-validation in this part.
Pre-race race information to let the model predict is not the netkeiba database but the race information page Get from / top /? Rf = navi). The basic code is almost the same as scraping past grades.
Predicted values are displayed for each race. The bet amount similar to Mr. 卍's calculation is also displayed in the column using the odds at the time of scraping.
For example, if you display the Niigata 4R on Saturday, August 8, 2020 the other day
In this case, bet on horses whose predict value exceeds a certain value, or bet on 3 horses from the one with the largest value.
The full text is available at here. We also provide more detailed explanations.
Recommended Posts