At an event, I had the opportunity to touch TensorFlow, and at that time I was asked if I could predict a boat race by machine learning, so I tried it because it seemed interesting.
Ubuntu 16.04 + python 2.7.12 + TensorFlow 0.7.1
In the boat race, 6 boats compete for the ranking. The person who purchases the boat ticket predicts the order of arrival at the goal based on the player's battle history. This time, we will challenge the prediction of "Nirenmono", which predicts the 1st and 2nd place at the goal including the order of arrival.
Past race results are provided as text files at the following sites. http://www1.mbrace.or.jp/od2/K/pindex.html The results from 2014 to the present (2016/11) were acquired in a batch and made into a database with Python + SQLite3.
I calculated the features to be input when training. The features used are as follows. --Race venue --Whether or not the approach is fixed --Athlete approach distribution --Distribution of order of arrival by player frame --Athlete start timing distribution --Kimarite distribution of players The feature quantities of the players were created for the past one and a half months. In addition, races in which players with extremely little history are participating are excluded from the forecast. It seems that some people use the boat motors used by athletes as a reference for their predictions, but this time they were excluded.
I implemented the network by referring to the following article. [Machine learning (TensorFlow) + Lotto 6] http://qiita.com/yai/items/a128727ffdd334a4bc57
The training was conducted for 97200 races from January 2014 to March 2016, and the number of steps was 300. As a result, the hit rate in the training data was about 20%. After all it seems difficult to predict the boat race.
We tested the race for 6 months from May 2016 to October 2016. In each race, the simulation is performed assuming that the one with the highest output label (= expected result) is bought for 100 yen each. For the convenience of the created program, races with 5 or less boats that have scored goals due to fouls or dropped boats of athletes are excluded from the test cases. In addition, we do not anticipate any decrease in odds due to the purchase of boat tickets. Therefore, please note that the results such as the hit rate shown below may be slightly higher than the actual results.
I will try it in all the expected races during the period.
period | Expected number of races | Number of hit races | Hit rate | Income and expenditure(Circle) |
---|---|---|---|---|
2016/5 | 4178 | 856 | 0.204 | -63,010 |
2016/6 | 3589 | 723 | 0.201 | -54,460 |
2016/7 | 3940 | 752 | 0.190 | -75,450 |
2016/8 | 4336 | 816 | 0.188 | -61,120 |
2016/9 | 3598 | 672 | 0.186 | -64,610 |
2016/10 | 3750 | 688 | 0.183 | -74,940 |
Total | 23391 | 4507 | -393,590 |
It's a disappointing result. Since the hit rate is low and only races with low odds are hit, the balance is significantly negative.
Only try races where the output label exceeds a certain threshold (0.45 this time). I feel like I'm focusing on the races I'm confident about.
period | Expected number of races | Number of hit races | Hit rate | Income and expenditure(Circle) |
---|---|---|---|---|
2016/5 | 55 | 28 | 0.509 | +190 |
2016/6 | 53 | 24 | 0.452 | +1,050 |
2016/7 | 63 | 29 | 0.460 | +790 |
2016/8 | 47 | 24 | 0.510 | +530 |
2016/9 | 30 | 13 | 0.433 | -170 |
2016/10 | 30 | 14 | 0.466 | +450 |
Total | 278 | 132 | +2,840 |
The hit rate is over 40%, and the income and expenditure is subtle but positive in 5 months out of 6 months. After all it seems that only races with low odds are hit, but it seems that it is covered by a high hit rate. Considering that the average recovery rate of boat races is 75%, it seems to be a reasonable result.
Since a boat race is a person-to-person race, there are many irregular elements, and it seems difficult to predict the finish order result itself by machine learning. One of the reasons is that I am an amateur in machine learning and boat racing in the first place. It may be used to extract so-called "hard races" where there is an overwhelming difference in ability between athletes from a large number of races. As I mentioned before, the simulation I did this time was done under conditions different from reality, and I'm not sure if it will work in the actual race, so I'm not sure.
Recommended Posts