[Python & SQLite] I analyzed the expected value of a race with horses with a win of 1x ②

Overview

This is a sequel to the previous post [Python & SQLite] I analyzed the expected value of a race with horses in the 1x win range ①. Last time, I summarized the scraped data into SQL and calculated the expected win value with Python (Jupiter Notebook).

Continuing from the last time, the theme is "How to buy a race in which horses with 1x wins run".

Last reflection

I couldn't find a way to buy a winning betting ticket that exceeds the expected value of 100 because I calculated it without narrowing down the conditions such as the racetrack and distance. This time, let's narrow down the racetrack, distance, turf / dirt, and ticket type for detailed analysis.

Search for conditions that are likely to have high expectations by narrowing down the racetrack and ticket types

This time, I will write the results focusing on "Tokyo Racecourse". Other racecourses are fine, but I chose Tokyo Racecourse because it is a historic racecourse and I think it is easy for strong horses to demonstrate their abilities.

In addition, the ticket type will be analyzed by Maren (winning if two horses selected for either 1st or 2nd place are included). As you can see in the previous post, the most popular wins in the 1x win range have a nearly 50% chance of winning. We hypothesized that Maren has a good balance between ease of winning and payouts. (There is also a reason that there may be noise from horses that are not popular in the triple system.)

`TokyoRacecource`


#Shiba course Search for delicious betting tickets at Maren
#Find out the solidarity rate of the most popular horses in the 1x win range
cur.execute("SELECT i.distance, count(r.race_id) FROM race_result r \
INNER JOIN race_info i on r.race_id=i.id \
WHERE r.odds<2.0 AND r.order_of_finish IN ('1','2') AND i.surface like 'Turf%' AND i.place_detail like '%Tokyo%' \
AND r.race_id IN (SELECT race_id from race_result WHERE odds<2.0 AND popularity='1') \
GROUP BY i.distance ORDER BY i.distance")
rows = cur.fetchall()

print('Number of solidarity(Turf)')
for row in rows:
    print(row)
    
cur.execute("SELECT i.distance, count(r.race_id) FROM race_result r \
INNER JOIN race_info i on r.race_id=i.id \
WHERE r.odds<2.0 AND r.order_of_finish NOT IN ('1', '2') AND i.surface like 'Turf%' AND i.place_detail like '%Tokyo%'\
AND r.race_id IN (SELECT race_id from race_result WHERE odds<2.0 AND popularity='1') \
GROUP BY i.distance ORDER BY i.distance")
rows2 = cur.fetchall()

print('Number of 3 or less(Turf)')
for row2 in rows2:
    print(row2)

rentai1800 = round(rows[2][1] / (rows[2][1] + rows2[2][1]) * 100, 2)
print('1800m solidarity rate: %f percent' %rentai1800)
rentai2000 = round(rows[3][1] / (rows[3][1] + rows2[3][1]) * 100, 2)
print('2000m solidarity rate: %f percent' %rentai2000)

----------result----------
Number of solidarity(Turf)
(1400, 66)
(1600, 89)
(1800, 90)
(2000, 84)
(2300, 2)
(2400, 40)
(3400, 1)
Number of 3 or less(Turf)
(1400, 31)
(1600, 59)
(1800, 38)
(2000, 31)
(2300, 2)
(2400, 22)
1800m solidarity rate: 70.310000 percent
2000m solidarity rate: 73.040000 percent

The results of 1800m and 2000m are especially good. The 2400m where the Japan Derby, Oaks, and Japan Cup are held is delicate at about 64%.

By the way, 2500m is not displayed because there was no horse with a win of 1x in the past.

Let's dig deeper into the conditions of turf 2000m, which has the highest solidarity rate.

Tokyo Shiba 2000m Let's dig deeper into the conditions when the most popular is solidarity

Another solidarity horse [winning popularity]

`Another solidarity horse`


#Try to squeeze at 2000m
#1 Another popularity when popularity is solidarity

cur.execute("SELECT r.popularity, count(r.race_id) FROM race_result r \
INNER JOIN race_info i on r.race_id=i.id \
WHERE r.order_of_finish IN ('1','2') AND r.popularity != '1' \
AND r.race_id IN (SELECT race_id from race_result r INNER JOIN race_info i on r.race_id=i.id WHERE odds<2.0 AND popularity='1' \
AND order_of_finish IN ('1','2') AND i.surface like 'Turf%' AND i.place_detail like '%Tokyo%' AND i.distance=2000) \
GROUP BY r.popularity ORDER BY r.popularity")
rows = cur.fetchall()

print('(Popular,Number of times)')
for row in rows:
    print(row)

----------result----------
(Popular,Number of times)
(2, 21)
(3, 31)
(4, 12)
(5, 6)
(6, 7)
(7, 4)
(8, 2)
(11, 1)

Surprisingly, there were more solidarities of 1st and 3rd popularity than solidarity of 1st and 2nd popularity. Is it the result of other horses marking the second most popular horse, which is more likely to win than the most popular horse?

By the way, [when the most popular horse with a win of 1x was not solidarity] and [the most number of solidarity] was also the 3rd most popular. Whether the most popular wins or moss, buying from the second most seems to be less interesting.

Another solidarity horse [Position at the 4th corner]

`4cornerPosition`


cur.execute("SELECT substr(r.pass, -2), count(r.race_id) FROM race_result r \
INNER JOIN race_info i on r.race_id=i.id \
WHERE r.order_of_finish IN ('1','2') AND r.popularity != '1' \
AND r.race_id IN (SELECT race_id from race_result r INNER JOIN race_info i on r.race_id=i.id WHERE odds<2.0 AND popularity='1' \
AND order_of_finish IN ('1','2') AND i.surface like 'Turf%' AND i.place_detail like '%Tokyo%' AND i.distance=2000) \
GROUP BY substr(r.pass, -2) ORDER BY substr(r.pass, -2)")
rows = cur.fetchall()

print('(Count,Number of times)')
for row in rows:
    print(row)

----------result----------
(Count,Number of times)
('-1', 14)
('-2', 15)
('-3', 14)
('-4', 4)
('-5', 10)
('-6', 6)
('-7', 4)
('-8', 4)
('-9', 2)
('10', 3)
('11', 3)
('12', 2)
('13', 3)

At the last straight entrance, horses that were in 5th place seem to be in solidarity. It is possible that this is because the horse with the leading leg is sticky, and because there was spare capacity in the first place, it is in the front.

A popular horse with a win of 1x is running in the Tokyo Shiba 2000m, and if the 3rd most popular horse is the leading leg, the expected value seems to be high. It reminds me of the 2019 Tenno Sho Autumn ...!

Try to calculate the expected value of Maren (information acquisition from 3 tables)

From here, we will utilize three tables obtained by netkeiba-scraper. Therefore, I did [Nesting subqueries to narrow down the conditions].

Winning No. 1 Popular-Winning No. 2 Popular Maren

First of all, it is the expected value when you purchase the [1st most popular-2nd most popular] Maren, which seems to have a low expected value.

`1-2 Popular Maren Expected Value`


# race_result race_id and payoff race_Calculate the expected value by associating the id
# ticket_type win 0,Double win 1,Frame Ren 2,Maren 3,Wide 4,Horse single 5,Triple 6,Triple single 7

#Using double subqueries, total horse reimbursement for [among the races in which one horse was solidarity] and [the race in which another horse was the second most popular]
cur.execute("SELECT DISTINCT p.race_id, p.payoff FROM payoff p \
INNER JOIN race_result r ON p.race_id=r.race_id INNER JOIN race_info i on p.race_id=i.id \
WHERE p.ticket_type=3 AND p.race_id IN \
        (SELECT r.race_id FROM race_result r \
        INNER JOIN race_info i on r.race_id=i.id \
        WHERE r.order_of_finish IN ('1','2') AND r.popularity='2' \
            AND r.race_id IN (SELECT race_id from race_result r INNER JOIN race_info i on r.race_id=i.id \
            WHERE odds<2.0 AND popularity='1' AND order_of_finish IN ('1','2') AND i.surface like 'Turf%' \
            AND i.place_detail like '%Tokyo%' AND i.distance=2000))")
rows = cur.fetchall()
umaren_sum = 0
for row in rows:
    umaren_sum += row[1]

#Find out how many races a horse with a win of 1x has entered
cur.execute("SELECT count(race_id) from race_result r INNER JOIN race_info i on r.race_id=i.id \
            WHERE odds<2.0 AND popularity='1' AND i.surface like 'Turf%' AND i.place_detail like '%Tokyo%' \
            AND i.distance=2000")
rows2 = cur.fetchall()

print('Total number of horses in a race in which horses in the 1x win:')
print(umaren_sum)

print('The number of races in which horses with a win of 1x entered the Tokyo Shiba 2000m:')
print(rows2[0][0])

print('Maren expected value')
print(round(umaren_sum / rows2[0][0], 2))

-----result-----
Total number of horses in a race in which horses in the 1x win:
6740.0
The number of races in which horses with a win of 1x entered the Tokyo Shiba 2000m:
115
Maren expected value
58.61

After all it is low. It doesn't make sense to buy another popular horse, as one horse seems to be tough. In this case, it is better to buy the most popular win.

Winning No. 1 Popular-Winning No. 3 Popular Maren

Next, it is the expected value when you purchase the [1st most popular-3rd most popular] Maren, which you can expect the most expected value.

`1-3 Popular Maren Expected Value`


# race_result race_id and payoff race_Calculate the expected value by associating the id
# ticket_type win 0,Double win 1,Frame Ren 2,Maren 3,Wide 4,Horse single 5,Triple 6,Triple single 7

#Using double subqueries, total horse reimbursement for [among the races in which one horse was solidarity] and [the race in which another horse was the third most popular]
cur.execute("SELECT DISTINCT p.race_id, p.payoff FROM payoff p \
INNER JOIN race_result r ON p.race_id=r.race_id INNER JOIN race_info i on p.race_id=i.id \
WHERE p.ticket_type=3 AND p.race_id IN \
        (SELECT r.race_id FROM race_result r \
        INNER JOIN race_info i on r.race_id=i.id \
        WHERE r.order_of_finish IN ('1','2') AND r.popularity='3' \
            AND r.race_id IN (SELECT race_id from race_result r INNER JOIN race_info i on r.race_id=i.id \
            WHERE odds<2.0 AND popularity='1' AND order_of_finish IN ('1','2') AND i.surface like 'Turf%' \
            AND i.place_detail like '%Tokyo%' AND i.distance=2000))")
rows = cur.fetchall()
umaren_sum = 0
for row in rows:
    umaren_sum += row[1]

#Find out how many races a horse with a win of 1x has entered
cur.execute("SELECT count(race_id) from race_result r INNER JOIN race_info i on r.race_id=i.id \
            WHERE odds<2.0 AND popularity='1' AND i.surface like 'Turf%' AND i.place_detail like '%Tokyo%' \
            AND i.distance=2000")
rows2 = cur.fetchall()

print('Total number of horses in a race in which horses in the 1x win:')
print(umaren_sum)

print('The number of races in which horses with a win of 1x entered the Tokyo Shiba 2000m:')
print(rows2[0][0])

print('Maren expected value')
print(round(umaren_sum / rows2[0][0], 2))

-----result-----
Total number of horses in a race in which horses in the 1x win:
18460.0
The number of races in which horses with a win of 1x entered the Tokyo Shiba 2000m:
115
Maren expected value
160.52

The expected value has finally exceeded 100! If you knew this data, you would have bought [Almond Eye and Danon Premium Maren] in the fall of the 2019 Tenno Sho.

Winning No. 1 Popular-Winning No. 4 Popular Maren

Just in case, let's take a look at the expected value when purchasing the [1st most popular-4th most popular] Maren.

`1-4 Popular Maren Expected Value`


# race_result race_id and payoff race_Calculate the expected value by associating the id
# ticket_type win 0,Double win 1,Frame Ren 2,Maren 3,Wide 4,Horse single 5,Triple 6,Triple single 7

#Using double subqueries, total horse reimbursement for [among the races in which 1x horses were solidarity] [race in which another horse was the 4th most popular]
cur.execute("SELECT DISTINCT p.race_id, p.payoff FROM payoff p \
INNER JOIN race_result r ON p.race_id=r.race_id INNER JOIN race_info i on p.race_id=i.id \
WHERE p.ticket_type=3 AND p.race_id IN \
        (SELECT r.race_id FROM race_result r \
        INNER JOIN race_info i on r.race_id=i.id \
        WHERE r.order_of_finish IN ('1','2') AND r.popularity='4' \
            AND r.race_id IN (SELECT race_id from race_result r INNER JOIN race_info i on r.race_id=i.id \
            WHERE odds<2.0 AND popularity='1' AND order_of_finish IN ('1','2') AND i.surface like 'Turf%' \
            AND i.place_detail like '%Tokyo%' AND i.distance=2000))")
rows = cur.fetchall()
umaren_sum = 0
for row in rows:
    umaren_sum += row[1]

#Find out how many races 1x horses have run
cur.execute("SELECT count(race_id) from race_result r INNER JOIN race_info i on r.race_id=i.id \
            WHERE odds<2.0 AND popularity='1' AND i.surface like 'Turf%' AND i.place_detail like '%Tokyo%' \
            AND i.distance=2000")
rows2 = cur.fetchall()

print('Total number of horses in a race in which horses in the 1x win:')
print(umaren_sum)

print('The number of races in which horses with a win of 1x entered the Tokyo Shiba 2000m:')
print(rows2[0][0])

print('Maren expected value')
print(round(umaren_sum / rows2[0][0], 2))

-----result-----
Total number of horses in a race in which horses in the 1x win:
10030.0
The number of races in which horses with a win of 1x entered the Tokyo Shiba 2000m:
115
Maren expected value
87.22

Although this does not exceed 100, it was found that the expected value is higher than [1-2 Popular Maren].

Summary

By narrowing down the racetracks, courses, and ticket types, the expected value calculation has become deeper. I'm getting used to handling SQL. If you change the conditions, you can analyze it at other racetracks, so why not try calculating the expected value of the racetrack near your place of residence?

The next goal is to capture WIN5 (a betting ticket that wins all the winning horses in the set 5 races. The maximum dividend is hundreds of millions of yen)! It will be more difficult because it is necessary to analyze the horse field reading and development reading of horse racing, but I will try it!

Thank you for reading this far.