I tried to predict horse racing by doing everything from data collection to deep learning

Overview

I am a student majoring in information systems at a certain T university. When I was looking at various articles on Qiita, I found this article.

-If you have deep learning, you can exceed 100% recovery rate in horse racing

Regarding the achievement of 100% recovery rate in this article, since the number of betting tickets simulated for purchase is small, it is unknown whether it will be established in other periods. The source code is also charged, so I don't know the details of how to do it. However, I thought it would be interesting to predict horse racing myself, so I actually tried it with the intention of studying.

It will be a lot of learning because you will be doing all of the data collection, analysis, and forecasting.

Why horse racing?

There was a desire that it might be money, but horse racing seems to have a high deduction rate, so I can not expect much. The main reason is that it has been talked about recently and I wanted to try deep learning.

Another reason for choosing horse racing is

――The race result is less influenced by the spectators --If there are enough explanatory variables, it seems that you can make predictions with reasonable accuracy.

That is mentioned.

It seems good to make the theme of stocks, but since the price fluctuates due to the decision making of many people, it is difficult to predict with good accuracy unless information such as news that traders often see is incorporated. That's right. In addition, many institutional investors place orders automatically according to the algorithm, which is likely to depend on this.

From the above, I thought that it would not be easy with the current technology, so I thought that horse racing was more suitable for deep learning.

The number of horses running in horse racing varies from race to race, but it seems that the number of participating horses is constant in boat racing. It seems that machine learning will be easier if detailed data can be obtained.

Explanation to those who are new to horse racing

"Horse racing (horse racing) is a race in which horses with horses compete, and a gambling that predicts the order of arrival" (quote: [Horse Racing-Wikipedia](https: //) ja.wikipedia.org/wiki/horse racing)).

I had little knowledge about horse racing until I analyzed this data, so I will summarize the knowledge that I thought was necessary to read this article.

First, let's know about the types of betting tickets as basic knowledge. It's okay to just read a single win or a double win. Reference: [Type of betting ticket: JRA for first-time users](https://www.google.com/url?sa=t&rct=j&q=1esrc=s&source=web&cd=1&ved=2ahUKEwjj9cC71eHlAhXgy4sBHXKtA0QQFjAAegQIAhAB&url=http%3 jra.go.jp%2Fkouza%2Fbeginner%2Fbaken%2F&usg=AOvVaw12f8T5GSlozZG9tnRspGtC)

For other terms, refer to the following

--Odds: Magnification that shows how many times the money you get in a win is the number of money you spend ――Rise: The end of the race and training --Umaban: A number uniquely assigned to a racehorse --Frame number: There are 1 to 8. One number for every two gates at the start --Order of arrival: Order to reach the goal --Central Horse Racing: Horse racing held by the Japan Racing Association. There are 10 locations in Sapporo, Hakodate, Fukushima, Niigata, Nakayama, Tokyo, Chukyo, Kyoto, Hanshin, and Ogura. --Local Horse Racing: Unlike central horse racing, horse racing hosted by local governments

Reference: Horse Racing Glossary JRA

I'm not so familiar with it so please let me know if you make a mistake ...

Domain knowledge is said to be important in machine learning, so it will be necessary to become familiar with horse racing in order to improve prediction accuracy.

Rough procedure

Even if you predict horse racing, there are a lot of things to think about and do. The procedure can be roughly divided as follows.

  1. Data collection (crawling / scraping)
  2. Data shaping (pandas, SQL, etc.)
  3. Modeling (machine learning)

The first major issue for those who want to predict horse racing is the data collection and shaping. In competitions like Kaggle, it's pretty easy because the dataset is given from the beginning, but this time we need to start by collecting the data.

Also, it is difficult to create a model because various methods can be considered. Nowadays, you can easily use gradient boosting, deep learning, etc. in the library, but you will need to try various methods to improve the prediction accuracy.

Prerequisite knowledge

--Basic knowledge of HTML, CSS, etc. --Basic usage of Selenium --Basic usage of Beautifulsoup --Basic usage of pandas --Basic usage of keras

Summary of results

Usage data

--Learning data: January 2008-July 23, 2017 --Verification data: July 23, 2017-November 2019

result

--Winning accuracy rate: 0.2450 --Double win correct answer rate: 0.5434

I made a model with higher accuracy than myself as a horse racing beginner

Let's start by collecting data

Machine learning is not possible suddenly even though there is no data. Let's do crawling scraping.

First, get information on past race results and horses from the target site.

The data obtained here should be as close to the raw data as possible, and the data will be formatted later for learning.

Target site

It is the largest horse racing information site in Japan. From past race data to horse pedigree information, you can get pretty detailed data for free.

It seems that more detailed data can be obtained by becoming a paid member. It is effective when you want to improve the accuracy of the model.

Collected data

This time, we decided to collect data focusing on the race results at the Central Racecourse, which has a large amount of information and a unified system.

Since there is a lot of data, you can make a good model by collecting and using various data. However, it is quite troublesome to collect pedigree information and data such as owners and trainers, so I decided not to do it this time. It seems that the prediction accuracy will improve if you add data around here.

First, get the URL to all races

From the Detailed Race Search Screen on the site, use Selenium to get all the URLs to the race results.

The reason for not using requests and BeautifulSoup, which are often used when crawling and scraping in Python, is that both the search URL and the search result URL are [https://db.netkeiba.com/?pid=race_search_detail](https:: //db.netkeiba.com/?pid=race_search_detail) hasn't changed.

If the screen is dynamically generated by JavaScript or PHP, you cannot get the desired data by simply downloading the html.

With Selenium, screen transitions can be performed by actual browser operations, so web crawling can be performed even on sites where the display changes by clicking such a button or sites that require login. (Please note that many sites that require login prohibit crawling due to membership agreements, etc.).

First of all, prepare what you need

import time

from selenium import webdriver
from selenium.webdriver.support.ui import Select,WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument('--headless')    #In headless mode
driver = webdriver.Chrome(chrome_options=options) 
wait = WebDriverWait(driver,10)

Fill in the form input

Fill in the required fields on the form. After sending, wait until the search results are displayed. スクリーンショット 2019-11-18 18.09.14.png

URL = "https://db.netkeiba.com/?pid=race_search_detail"
driver.get(URL)
time.sleep(1)
wait.until(EC.presence_of_all_elements_located)

#Search by month
year = 2019
month = 1

#Select a period
start_year_element = driver.find_element_by_name('start_year')
start_year_select = Select(start_year_element)
start_year_select.select_by_value(str(year))
start_mon_element = driver.find_element_by_name('start_mon')
start_mon_select = Select(start_mon_element)
start_mon_select.select_by_value(str(month))
end_year_element = driver.find_element_by_name('end_year')
end_year_select = Select(end_year_element)
end_year_select.select_by_value(str(year))
end_mon_element = driver.find_element_by_name('end_mon')
end_mon_select = Select(end_mon_element)
end_mon_select.select_by_value(str(month))

#Check out the Central Racecourse
for i in range(1,11):
    terms = driver.find_element_by_id("check_Jyo_"+ str(i).zfill(2))
    terms.click()
        
#Select the number to be displayed(20,50,From 100 to the maximum 100)
list_element = driver.find_element_by_name('list')
list_select = Select(list_element)
list_select.select_by_value("100")

#Submit form
frm = driver.find_element_by_css_selector("#db_search_detail_form > form")
frm.submit()
time.sleep(5)
wait.until(EC.presence_of_all_elements_located)

For the sake of simplicity, I am trying to get the URL for January 2019. If you want a wider range of data, do one of the following:

--Do not fill out the year / month form --Get the URL for each year and month in a loop --Change the range of selected years

(In the code on github, we are trying to collect race data that has not been acquired since 2008.)

If you don't fill in the selection of racetracks, data on races held overseas will be included. Let's check 10 central racecourses properly.

I decided not to use the data other than the Central Racecourse this time because there may be few horses running or the data may be incomplete.

Save URL while pagination

Click the button in Selenium and save the URL displayed 100 times at a time. スクリーンショット 2019-11-18 18.11.48.png

with open(str(year)+"-"+str(month)+".txt", mode='w') as f:
    while True:
        time.sleep(5)
        wait.until(EC.presence_of_all_elements_located)
        all_rows = driver.find_element_by_class_name('race_table_01').find_elements_by_tag_name("tr")
        for row in range(1, len(all_rows)):
            race_href=all_rows[row].find_elements_by_tag_name("td")[4].find_element_by_tag_name("a").get_attribute("href")
            f.write(race_href+"\n")
        try:
            target = driver.find_elements_by_link_text("Next")[0]
            driver.execute_script("arguments[0].click();", target) #Click processing with javascript
        except IndexError:
            break

Open the file and write the obtained URL line by line. The race URL is in the 5th column of the table, so in Python where array elements start at 0, select something like find_elements_by_tag_name ("td") [4] .

Page feed is performed in a while loop. I'm using try to catch the exception because I can't click on the last page.

The driver.execute_script ("arguments [0] .click (); ", target) part of the try, but if you make it a simple target.click (), you will get a ʻElementClickInterceptedException` in headless mode. It has occurred. Apparently it was recognized that the elements overlapped and I could not click it well. Here had a solution, but I was able to do it well by clicking with JavaScript as above.

Get html based on the obtained URL

The html obtained earlier does not seem to make much use of PHP or JavaScript for displaying the page, so I will finally use requests here. I get the html based on the information in the URL above and save it, but it takes a few seconds to get each page, so it takes a lot of time.

import os
import requests

save_dir = "html"+"/"+str(year)+"/"+str(month)
if not os.path.isdir(save_dir):
    os.makedirs(save_dir)
        
with open(str(year)+"-"+str(month)+".txt", "r") as f:
    urls = f.read().splitlines()
    for url in urls:
        list = url.split("/")
        race_id = list[-2]
        save_file_path = save_dir+"/"+race_id+'.html'
        response = requests.get(url)
        response.encoding = response.apparent_encoding
        html = response.text
        time.sleep(5)
        with open(save_file_path, 'w') as file:
            file.write(html)

Due to the character code, if you get it obediently, the characters may be garbled. I did it with response.encoding = response.apparent_encoding and it worked. Reference: Correct garbled characters when handling Japanese in Requests

Parse html and create csv

Details of the race ・ Information on each racehorse will be stored in csv. I decided to create a csv with the following format.

--Race details --Race ID ――How many rounds --Race title --About the course

--Horse details --Race ID --Ranking --Horse ID --Horse number --Frame number --Gender Age --Burden weight --Weight and weight difference --Time ――Difference ――Rising time --Odds --Popular

There is other information that can be obtained. It seems that paid members can also get what is called a speed index.

import numpy as np
import pandas as pd
from bs4 import BeautifulSoup

CSV_DIR = "csv"
if not os.path.isdir(CSV_DIR):
    os.makedirs(CSV_DIR)
save_race_csv = CSV_DIR+"/race-"+str(year)+"-"+str(month)+".csv"
horse_race_csv = CSV_DIR+"/horse-"+str(year)+"-"+str(month)+".csv"

# race_data_columns, horse_data_Since columns will be long, omit it
race_df = pd.DataFrame(columns=race_data_columns )
horse_df = pd.DataFrame(columns=horse_data_columns )

html_dir = "html"+"/"+str(year)+"/"+str(month)
if os.path.isdir(html_dir):
    file_list = os.listdir(html_dir)
    for file_name in file_list:
        with open(html_dir+"/"+file_name, "r") as f:
            html = f.read()
            list = file_name.split(".")
            race_id = list[-2]
            race_list, horse_list_list = get_rade_and_horse_data_by_html(race_id, html) #Omitted because it will be long
            for horse_list in horse_list_list:
                horse_se = pd.Series( horse_list, index=horse_df.columns)
                horse_df = horse_df.append(horse_se, ignore_index=True)
            race_se = pd.Series(race_list, index=race_df.columns )
            race_df = race_df.append(race_se, ignore_index=True )
            
race_df.to_csv(save_race_csv, header=True, index=False)
horse_df.to_csv(horse_race_csv, header=True, index=False)

For each race, add the details of the race, information on each racehorse, etc. to the list and add one line to the pandas data frame.

The get_rade_and_horse_data_by_html function, race_data_columns, and horse_data_columns will be complicated and will not be included here. To briefly explain, the get_rade_and_horse_data_by_html function is a function that uses BeautifulSoup to list and return the desired data from html. race_data_columns, horse_data_columns are the column names of the data to be acquired.

Other notes

When crawling, make sure to allow time to access it so that it does not attack the server.

There are other people who have summarized detailed legal precautions, so if you actually do it, Web scraping precautions list-Qiita ) Etc., please refer to.

Once the data is obtained, we will carry out shaping and analysis.

Now that we have the data in csv format, let's clean it up so that it is easy to handle.

Next, think about what kind of model to make while looking at the state of the data. After that, let's create train data according to the model you want to create.

Format data to be easy to handle

Let's format the data so that it is easy to handle.

For example, convert date data or numbers in a string to a datetime object or int. Also, since it will be easier in the future if the data in one column is as simple as possible, gender and age are divided into two columns. There are many things to do.

It might have been better to do it at the same time as scraping, but since the scraping code seemed to be complicated, I decided to do it separately this time.

Below are some of them.

#Extract time information and combine it with date information. Make it a datetime type
race_df["time"] = race_df["time"].str.replace('Start: (\d\d):(\d\d)(.|\n)*', r'\1 o'clock\2 minutes')
race_df["date"] = race_df["date"] + race_df["time"]
race_df["date"] = pd.to_datetime(race_df['date'], format='%Y year%m month%d day%H o'clock%M minutes')
#The original time is unnecessary, so delete it
race_df.drop(['time'], axis=1, inplace=True)

#Remove extra R, blanks, and line breaks in some round columns
race_df['race_round'] = race_df['race_round'].str.strip('R \n')

Data analysis

We will analyze the formatted data and roughly check what kind of distribution it has. When creating a model, it is necessary to train it so that the data is not biased as much as possible, so it is also important for problem setting of the model.

Data analysis is also important when considering how to make features. In the case of deep learning etc., it seems that it is not necessary to stick to feature quantity engineering so much, but when doing ordinary machine learning that is not deep such as gradient boosting such as LightGBM, it is necessary to think carefully about what the feature quantity should be. there is.

Even with Kaggle, if you can find a good feature, it seems that the possibility of getting into the upper ranks will increase.

Creating train data

After deciding what kind of model to make while referring to the data analysis mentioned earlier, let's create train data.

Although it is input data, it is roughly as follows.

--Information on the race you want to predict --Horse number --Frame number --Age --Burden weight --Weight --Weight change from the last time --Burden weight / weight

The odds for the race you want to predict will fluctuate until just before the match, so we won't include them in the data.

Finally model creation (deep learning)

First of all, I will give an overview, this time I will do deep learning with keras. Using data from a horse as input

--A model that predicts the probability of becoming number one --A model that predicts the probability of being in the third place

I made two of them.

How did you decide on the model

It is necessary to consider whether to solve the classification problem or the regression problem.

In the case of regression problems, I think it will predict how much the horse will be (it will allow something like 1.2) and the time.

In the case of a classification problem, you will be asked to predict how many horses will be (this is classified by a natural number from 1 to 16), whether it will be number one, whether it will be in the top, etc. ..

Times and speeds vary greatly depending on the racetrack and course, so it will be difficult if you do not predict them separately. This time, we will simply predict "whether or not to be in the top" as a classification problem.

What we did in model creation and how to deal with overfitting

I will write about various things I tried when creating the model.

In addition, even if you create a model, it is indispensable to devise ways to prevent overfitting and to verify whether or not overfitting is occurring. Even if you do machine learning and get good results from your data, that model may not be able to predict other data with good accuracy.

Split the dataset for training and testing

First of all, from the basics. There is no point in creating a model unless you can evaluate whether it is good or not.

80% of the collected and formatted data was used as training data, and 20% was used as test data. In other words

--Learning data: January 2008-July 23, 2017 --Test data: July 23, 2017-November 2019

It is in the form of. This test data is used for the correct answer rate written at the beginning.

At the time of training, the training data was further divided into one for train and one for validation.

Weight regularization and dropout

There are weight regularizations and dropouts as a means of suppressing overfitting, which keras makes easy to use.

Adding the cost according to the weight to the loss function of the network is the regularization of the weight, and the dropout is to randomly reduce (drop) the feature amount from the layer during training.

We used L2 regularization for weight regularization.

Reference: Learn about overfitting and lack of learning | TensorFlow Core


model = tf.keras.Sequential([
        tf.keras.layers.Dense(300, kernel_regularizer=tf.keras.regularizers.l2(0.001), activation=tf.nn.relu, input_dim=df_columns_len), #l2 Regularized layer
        tf.keras.layers.Dropout(0.2), #Drop out
        tf.keras.layers.Dense(100, kernel_regularizer=tf.keras.regularizers.l2(0.001), activation=tf.nn.relu), #l2 Regularized layer
        tf.keras.layers.Dropout(0.2), #Drop out
        tf.keras.layers.Dense(1, activation=tf.nn.sigmoid) 
    ])

Cross-validation

A simple holdout validation using only a specific time period may happen to be overfitting for good results during that period.

Let's verify whether the model is good with the data at hand, as cross-validation is done in competitions such as Kaggle.

The problem is that it's time series data, so you can't just use KFold to split the data. When inputting time series data, if future information is set to train and past information is set to validation, the result may be better than it should be. Actually, I made a mistake at first and learned by inputting future data, but the probability of predicting a double win exceeded 70%.

So, this time, I used the split method used for cross-validation of time series data (sklearn's [TimeSeries Split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit. html)).

Roughly speaking, as shown in the figure below, the data set is divided by adding time series, and part of it is used as verification data.

プレゼンテーション1のコピー.png

In this figure, you will learn three times. However, some training data will be reduced, so if the number of data is small, a simple holdout may be better.

tscv = TimeSeriesSplit(n_splits=3)
for train_index, val_index in tscv.split(X_train,Y_train):
    train_data=X_train[train_index]
    train_label=Y_train[train_index]
    val_data=X_train[val_index]
    val_label=Y_train[val_index]
    model = train_model(train_data,train_label,val_data,val_label,target_name)

Hyperparameter tuning

Hyperparameters in machine learning are important. For example, in deep learning, the larger the layer in between, the more intermediate variables there are, and the less training data there is, the easier it is to overfoot. On the other hand, if it is small, even if the amount of data is sufficient, it may not be flexible enough to learn correctly.

There will be a lot of controversy about how to do it. This seems to vary from person to person.

This time, there was a library called hyperas that automatically adjusts the parameter tuning of keras, so I decided to use it. It was relatively intuitive and easy to understand.

To use it simply, pass the data preparation function and the function that returns the value you want to minimize by training to ʻoptim.minimize`.

Specify the width you want to adjust with choice for integer values and ʻuniform` for real numbers.

For details, refer to here: https://github.com/maxpumperla/hyperas

import keras
from keras.callbacks import EarlyStopping
from keras.callbacks import CSVLogger
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation

from hyperopt import Trials, STATUS_OK, tpe
from hyperas import optim
from hyperas.distributions import choice, uniform
def prepare_data_is_hukusyo():
    """
I will prepare various data here
    """
    return X_train, Y_train, X_test, Y_test

def create_model_is_hukusyo(X_train, Y_train, X_test, Y_test):
    train_size = int(len(Y_train) * 0.8)
    train_data = X_train[0:train_size]
    train_label = Y_train[0:train_size]
    val_data = X_train[train_size:len(Y_train)]
    val_label = Y_train[train_size:len(Y_train)]

    callbacks = []
    callbacks.append(EarlyStopping(monitor='val_loss', patience=2))

    model = Sequential()
    model.add(Dense({{choice([512,1024])}}, kernel_regularizer=keras.regularizers.l2(0.001), activation="relu", input_dim=train_data.shape[1]))
    model.add(Dropout({{uniform(0, 0.3)}}))
    model.add(Dense({{choice([128, 256, 512])}}, kernel_regularizer=keras.regularizers.l2(0.001), activation="relu"))
    model.add(Dropout({{uniform(0, 0.5)}}))

    if {{choice(['three', 'four'])}} == 'three':
        pass
    elif {{choice(['three', 'four'])}} == 'four':
        model.add(Dense(8, kernel_regularizer=keras.regularizers.l2(0.001), activation="relu"))
        model.add(Dropout({{uniform(0, 0.5)}}))

    model.add(Dense(1, activation="sigmoid"))

    model.compile(
        loss='binary_crossentropy',
        optimizer=keras.optimizers.Adam(),
        metrics=['accuracy'])

    history = model.fit(train_data,
        train_label,
        validation_data=(val_data, val_label),
        epochs=30,
        batch_size=256,
        callbacks=callbacks)

    val_loss, val_acc = model.evaluate(X_test, Y_test, verbose=0)
    print('Best validation loss of epoch:', val_loss)
    return {'loss': val_loss, 'status': STATUS_OK, 'model': model}

#Actually adjust with hyperas
best_run, best_model = optim.minimize(model=create_model_is_hukusyo,
                                     data=prepare_data_is_hukusyo,
                                     algo=tpe.suggest,
                                     max_evals=15,
                                     trials=Trials())

Blend the result

You may be able to make better accurate predictions by mixing the outputs from different models.

By averaging the 1st and 3rd place predictions, we were able to obtain a slightly higher value than the original prediction value.

The characteristics of horses that are likely to be ranked first and the characteristics of horses that are likely to be ranked high may be slightly different, and it is thought that a more accurate prediction is possible by mixing the two. I will.

For example, a horse that may be ranked first but does not overdo it if it seems to fail in the middle of the race and a horse that stably enters the top are likely to have slightly different characteristics.

result

In the end, I made a model with higher accuracy than myself, a horse racing beginner.

--Winning accuracy rate: 0.2450 --Double win correct answer rate: 0.5434

There is still more information that seems to be important in horse racing, so there seems to be room for improvement.

The balance when I keep buying the 1st place in a win is as follows. I plotted it properly using pandas.

image.png

In the double win, it became as follows.

image.png

It's a big deficit. It will be a little better if you buy only the ones with high predictions or not the ones with low odds.

Other tips

In making this horse racing prediction, I will leave some of the things I tried that have nothing to do with the main line.

Use GCP

GCP's free credits were about to expire around the end of November, so my second goal was to consume them.

You can throw the program before going to bed and check it when you wake up in the morning.

Free instances don't have enough memory for CSV creation and deep learning, so be careful if you use GCP.

Notify with LINE Notify

Regarding GCP, I used to send LINE Notify if the program ended or if an error occurred.

As soon as I was done, I could see the results and run the next program, which was a lot of work.

Some at the end

It's a suitable sideshow for students, so if you're familiar with it, I think there are a lot of things to do. If you make a mistake, it will be a learning experience, so I would be grateful if you could kindly point it out in the comments or on Twitter.

Twitter ID (I don't tweet too much): @ 634kami

Source code

It is published on github. I wanted to make something that works for the time being, so it's not something that people can see, but please see only those who say that it's okay.

The code on Qiita has been partially modified to make it easier to read.

Where you can improve / what you want to do

--Since the missing data of the input value is filled with 0, only the complete data is predicted. --Enter pedigree data --Enter jockey data --Try using gradient boosting such as LightGBM ――Since scraping of the tie-up race failed, make it complete data

Other links that I referred to or are likely to be

-If you have deep learning, you can exceed 100% recovery rate in horse racing -Horse Racing Prediction with Deep Learning -Story of winning the Teio Sho by machine learning at Oi Horse Racing -I tried to predict horse racing -7th Method and Evaluation Method for Solving Horse Racing Prediction by Machine Learning -Various ways to cut validation (summary of sklearn functions) [kaggle Advent Calendar Day 4]

Postscript

I added the following because I did the following.

--Deleted because the train data contains information on races of 7 or less. --Remove fault races from trian data --Calculate the correct answer rate for each number of races

Below are the results.

total: 8, random tansyo accuracy:0.125, hukusyo accuracy:0.375
tansyo accuracy: 0.3497536945812808
hukusyo accuracy: 0.7044334975369458

total: 9, random tansyo accuracy:0.1111111111111111, hukusyo accuracy:0.3333333333333333
tansyo accuracy: 0.2693726937269373
hukusyo accuracy: 0.6568265682656826

total: 10, random tansyo accuracy:0.1, hukusyo accuracy:0.3
tansyo accuracy: 0.30563002680965146
hukusyo accuracy: 0.6407506702412868

total: 11, random tansyo accuracy:0.09090909090909091, hukusyo accuracy:0.2727272727272727
tansyo accuracy: 0.2582278481012658
hukusyo accuracy: 0.5468354430379747

total: 12, random tansyo accuracy:0.08333333333333333, hukusyo accuracy:0.25
tansyo accuracy: 0.2600806451612903
hukusyo accuracy: 0.5826612903225806

total: 13, random tansyo accuracy:0.07692307692307693, hukusyo accuracy:0.23076923076923078
tansyo accuracy: 0.2894736842105263
hukusyo accuracy: 0.5855263157894737

total: 14, random tansyo accuracy:0.07142857142857142, hukusyo accuracy:0.21428571428571427
tansyo accuracy: 0.23014586709886548
hukusyo accuracy: 0.5380875202593193

total: 15, random tansyo accuracy:0.06666666666666667, hukusyo accuracy:0.2
tansyo accuracy: 0.2525399129172714
hukusyo accuracy: 0.532656023222061

In each case, the accuracy rate was better than the completely random selection method.

Recommended Posts

I tried to predict horse racing by doing everything from data collection to deep learning
[Deep Learning from scratch] I tried to explain Dropout
I tried to implement Perceptron Part 1 [Deep Learning from scratch]
[Deep Learning from scratch] I tried to implement sigmoid layer and Relu layer.
I tried to classify Oba Hana and Emiri Otani by deep learning
I tried the common story of using Deep Learning to predict the Nikkei 225
I tried deep learning
I tried to extract a line art from an image with Deep Learning
I tried to predict the presence or absence of snow by machine learning.
I tried to predict the change in snowfall for 2 years by machine learning
I tried to classify Oba Hana and Emiri Otani by deep learning (Part 2)
I tried to predict the J-League match (data analysis)
"Deep Learning from scratch" Self-study memo (No. 16) I tried to build SimpleConvNet with Keras
"Deep Learning from scratch" Self-study memo (No. 17) I tried to build DeepConvNet with Keras
[Horse Racing] I tried to quantify the strength of racehorses
I tried to implement anomaly detection by sparse structure learning
[Introduction to Pandas] I tried to increase exchange data by data interpolation ♬
Create AI to identify Zuckerberg's face by deep learning ③ (Data learning)
I tried to divide with a deep learning language model
I tried to get data from AS / 400 quickly using pypyodbc
[Deep Learning from scratch] I tried to explain the gradient confirmation in an easy-to-understand manner.
I tried to make Othello AI that I learned 7.2 million hands by deep learning with Chainer
I tried to implement Deep VQE
Using open data from Data City Sabae to predict water level gauge values by machine learning Part 2
I tried to get a database of horse racing using Pandas
I tried to make deep learning scalable with Spark × Keras × Docker
I tried to predict by letting RNN learn the sine wave
I tried deep learning using Theano
I tried to get data from AS / 400 quickly using pypyodbc Preparation 1
[Data science basics] I tried saving from csv to mysql with python
I tried to implement deep learning that is not deep with only NumPy
I tried to make Kana's handwriting recognition Part 2/3 Data creation and learning
I tried to classify mnist numbers by unsupervised learning [PCA, t-SNE, k-means]
I tried scraping conversation data from Askfm
Reinforcement learning to learn from zero to deep
Image alignment: from SIFT to deep learning
I tried to predict Covid-19 using Darts
[Python] I tried to analyze the characteristics of thumbnails that are easy to play on YouTube by deep learning
I'm not sure, but I feel like I understand Deep Learning (I tried Deep Learning from scratch)
[Python] [Natural language processing] I tried Deep Learning ❷ made from scratch in Japanese ①
[Python] Deep Learning: I tried to implement deep learning (DBN, SDA) without using a library.
Machine learning beginners tried to make a horse racing prediction model with python
I tried to automate the 100 yen deposit of Rakuten horse racing (python / selenium)
I tried to process and transform the image and expand the data for machine learning
I tried to rescue the data of the laptop by booting it on Ubuntu
I tried to implement Cifar10 with SONY Deep Learning library NNabla [Nippon Hurray]
I tried to pass the G test and E qualification by training from 50
I tried to make deep learning scalable with Spark × Keras × Docker 2 Multi-host edition
I tried to predict next year with AI
I tried to make a function to retrieve data from database column by column using sql with sqlite3 of python [sqlite3, sql, pandas]
I tried to get an image by scraping
I tried to save the data with discord
I tried to get CloudWatch data with Python
How to scrape horse racing data with BeautifulSoup
I tried to predict Titanic survival with PyCaret
I tried to classify dragon ball by adaline
I tried to predict the price of ETF
[Keras] I tried to solve a donut-type region classification problem by machine learning [Study]
[First data science ⑤] I tried to help my friend find the first property by data analysis.
I tried to visualize the running data of the racing game (Assetto Corsa) with Plotly
I tried to aggregate & compare unit price data by language with Real Gachi by Python