pohotos by Ronnie Macdonald

It's been a while since I started being told that ** AI robs humans of work, but now it's also said that I've entered a period of disillusionment **. Thanks to that, I was swayed by a crowded train every day without being robbed of my work. The scam that robs you is also good.

While it seems that the development of such AI will take a little longer, it has become easier to obtain an environment where students can learn. If you touch it, it feels like everyone is about to be disillusioned. With that said, I decided to touch on deep learning in order to know the power of ** AI. ** **

I tried various things, but here I will mainly tell you that the result is ** "Even if you study from an ignorant state, you can enjoy this much with deep learning" **. The program is not a model, so I will publish it for a fee only to those who want to see it.

Take a look at deep learning with Kaggle

First, I learned a little about the mathematics that is the basis of deep learning. No ... it's quite heartbreaking. The formula next to the formula. You can write the code without knowing it, but ** If you know the mathematical meaning, you will understand it better, so I think it would be worthwhile to learn just the outline **.

So, after studying lightly, I decided to try it with the famous machine learning theme ** "Titanic Survivor Prediction" ** in order to confirm the power of deep learning. A guy who predicts who survived based on attributes such as the age and gender of each passenger.

The environment is Google Colaboratory, and TensorFlow, which seems to be the easiest to implement, is used. It was easy to make just by referring to the Google tutorial. When you upload the prediction result, the score will be returned.

** Correct answer rate 76.5% **. Nearly 80% of the models are correct for beginners. Adjusting the parameters and data will make it even higher.

Originally, I thought about various things such as "Women and children were given priority in helping me?" And analyzed it by trial and error, but ** the whole thing was done by deep learning **. Certainly deep learning, it's pretty amazing.

Great but not interesting

While I wanted to touch more, I was making this prediction and I felt a big problem.

Uninteresting…. Deep learning isn't, but ** the theme isn't interesting. Predicting the life and death of passengers on overseas ships that sank more than 100 years ago is not interesting at all! ** **

** "James ... I thought you were dead ... Did you live!" ** or ** "Reina ... Why did you die !!" ** Isn't it? No one knows. All are dead. Kaggle doesn't tell me the correct answer in the first place.

I'd like a more exciting theme ... So, for this theme related to money that I've always wanted to try.

Try to exceed 100% recovery rate in horse racing using deep learning

Horse racing seems to have a deduction rate of about 20%, so the average return starts at 80%. It should be relatively difficult to exceed 100% recovery rate. But there must be a lot of past data, and deep learning will do it for you? I will try it with the expectation.

The goal is ** "A double-winning betting ticket with a recovery rate of over 100%" **.

Due to the characteristics of horse racing, it may be more efficient to aim for a betting ticket with a larger payout than a double win, but it seems that it will not be interesting unless it is a betting ticket that is easy to win, so I narrowed it down to double wins.

Target data

For learning: 2010-2017 For verification: 2018-2019 (until early November)

We aim to exceed 100% with verification data. First of all, I scraped on the net and prepared such data.

Classification	item
Horse information	Horse number
	Frame number
	age
	sex
	Weight (current)
	Weight (difference from previous run)
	Burden weight
Race information on the day	Race track
	Number of horses running
	Course distance
	Course type
	Course type (da/Turf/Obstacle)
	weather
	Going
Past race information of the horse (× 5 runs)	Odds
	Popular
	Ranking
	Time (seconds)
	Difference
	Elapsed days from the previous run
	Course distance
	Course type
	Course type (da/Turf/Obstacle)
	weather
	Going

The "odds" and "popularity" of the day are omitted because they are not directly related to the outcome.
"Odds" in this article refers to winning odds.

Predict horses within 3rd place

Using this data as an input, deep learning predicts ** "whether or not it is within 3rd place" **. ** The predicted value is a value from 0 to 100 (to call this the "third place index") **. The larger this value, the easier it is to finish in 3rd place.

By the way, this is the only code for the predictive model creation part, which is the core of deep learning.

`python`


import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(300, kernel_regularizer=tf.keras.regularizers.l2(0.001), activation=tf.nn.relu, input_dim=len(train_df.columns)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(300, kernel_regularizer=tf.keras.regularizers.l2(0.001), activation=tf.nn.relu),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])

model.compile(
    loss='binary_crossentropy',
    optimizer=tf.keras.optimizers.Adam(),
    metrics=['accuracy'])

fit = model.fit(train_df,
    train_labels,
    validation_data=(valid_df, valid_labels),
    epochs=30,
    batch_size=32)

The 3rd place index is the result of multiplying the output result of the sigmoid function by 100.

Try to buy in all races

With the model you created, try purchasing a betting ticket ** with the highest ** 3 finish index for each race. This is the result of simulation in all races after 2018.

item	result
Number of target races (*)	3639
Number of target records	41871
Purchase number	3639
Hit number	1976
Hit rate	54.3%
Recovery rate	82.7%

Only races with 5 or more horses without missing values (including past race information) are targeted.

It's a decent hit, but ** the recovery rate doesn't increase. ** I'm curious about the relationship between the 3rd place index, the hit rate (the rate that was actually within 3rd place), and the recovery rate.

Relationship between 3rd place index and hit rate / recovery rate

The relationship between the 3rd place index and the hit rate was like this. スクリーンショット 2019-11-01 14.10.12.png

** The higher the 3rd place index, the higher the hit rate **, so it seems to function as a model to hit the 3rd place or less. Then, ** If you buy only horses with a high index, will the recovery rate exceed 100%? ** **

Let's add the average recovery rate to the graph above. スクリーンショット 2019-11-01 14.05.30.png

The recovery rate is around 80% to 90% regardless of the hit rate. In other words, the higher the ** 3 finish index (the horse that is likely to win), the smaller the return. ** **

Then, I feel that the relationship between this index and the hit rate is similar to the ** relationship between odds and hit rate ** (the lower the odds, the smaller the return). So, for the time being, if you look at the relationship between the 3rd place index and the average odds ... スクリーンショット 2019-11-01 18.58.38.png

In inverse proportional graph. Horses with a high ** 3 finish index also have low odds. ** Even if you predict without looking at the odds, it will end up like this. It's interesting, but this seems to be the difficulty of horse racing.

To exceed 100% recovery rate

I found that the third place index and the odds are almost inversely proportional. But that's about average odds. If you look at each one, there should be ** "horses with high odds for a high third place index" **. In order to increase the recovery rate, it seems good to take advantage of such ** odds distortion **.

However, for example, if you buy all betting tickets with a 3rd place index of 70 or more and odds of 100 or more, the result will be disastrous.

item	result
Purchase number	73
Hit number	1
Hit rate	1.37%
Recovery rate	16.9%

Horses with very high odds seem to have a low hit rate, even if the index is high. There may be good reason for the high odds. On the other hand, if the odds are too low, the dividend will naturally be small and the recovery rate will not increase. If so, ** the aim is in the middle. ** **

(Since then, other people have pointed out the forecast target, so we have revised it to the result of expanding the target.)

In the 2018 forecast results, we narrowed down to a part of the range (around 55 to 60) where the 3rd place index is 60 or more and the odds are not too high **, and the recovery rate is 213% ** It feels pretty good.

item	result
Purchase number	44
Hit number	10
Hit rate	22.7%
Recovery rate	213.6%

However, this is a good point for 2018, so it is natural with good results. However, when I simulated it in 2019 with the same prediction method, it also showed a ** recovery rate of 194% **. This is the total for 2 years (about 22 months).

item	result
Purchase number	99
Hit number	19
Hit rate	19.2%
Recovery rate	202.63%

With this, the ** recovery rate of 100%, which was the target, was successfully exceeded. ** By the way ** After 2018, the income and expenditure graph ** when you buy 100 yen each time according to this forecast is as follows. It is rising steadily without a big drop.

The possibility that the last two years have been good is not zero, but if it is stepped so far, it seems to be credible to some extent. The number of purchases is not large (about 4 times a month), but ** not choosing an enemy who cannot win ** may be a prerequisite for winning.

(Bonus) Predict horses with high dividends

It's okay to finish with this, but since it's a big deal, I'll try the ** "Expected Dividend Value" ** forecast as a different pattern from the 3rd place forecast.

I will give the data included in the input as the odds to deep learning and graph the predicted result in the same way as before. ** Horses that are likely to win even with high odds ** appear to have higher expectations. (The horizontal axis of the graph is still ahead, but it is omitted.)

It is the relationship between the expected value and the recovery rate / hit rate.

As far as the recovery rate is rising, ** it seems that if you keep buying betting tickets with high expectations forever, the recovery rate will increase **, but this is also ** small in number and unstable in about 1 to 2 years ** it seems like.

If you buy all the horses with good performance ** expected value of 390-450 ** on the graph, it will exceed 100% for the time being, but it seems to be a local rise, so it seems that there will be no continuous stability.

item	result
Purchase number	275
Hit number	37
Hit rate	13.5%
Recovery rate	131.1%

The balance of payments also fluctuates more than in the case of the 3rd place index. I'm aiming for higher odds, so the return when I hit it is big.

That's all we have done.

Click here for the program

The program (Python) for creating and verifying the prediction model of the 3rd place index is experimentally published on the following page for a fee. It's not beautiful enough to be used as a textbook, so please only look at it if you are curious and have plenty of money and mind.

With deep learning, the recovery rate can exceed 100% in horse racing (program)

[Click here for later talks] (https://note.mu/yossymura/n/na3d0a471193c)

Afterword

Recently, mobile payments have finally become widespread due to the large amount of QR payment campaign battles. After seeing how returns and coupons are being rushed to make anything popular other than payments, I once again felt that it is still "money" that moves people **, and then the next step is money-related. I was thinking of writing an article, so I'm happy to write it this way.

The input for this forecast does not include "horse name" or "jockey name". In other words, if you consider ** horse pedigree, jockey's battle history, compatibility **, etc., you will be able to make more accurate predictions. Also, I think that it will be improved not only by ** complementing missing data ** and ** batch normalization **, but also by simply ** parameter adjustment ** as deep learning, and ** betting tickets other than double wins *. * Can also be expected. In short, there is still room for growth.

On the other hand, it was convenient, but the only thing I was concerned about was that predicting horse racing with AI would spoil the "fun of predicting with my own head" **. I feel that AI cannot provide the joy of winning a horse that you choose with your own intuition, attachment to horses, and other feelings that overflow from you.

When everyone uses AI for all future predictions, will people still be able to gather at the racetrack and forget about themselves and continue to be enthusiastic? How long will you be able to see that large number of betting tickets flying in the sky? I want to make good use of it so that not only my work but also my heart will not be fascinated by the approaching wave of AI.

With deep learning, you can exceed 100% recovery rate in horse racing