Predict horse racing with machine learning and aim for a recovery rate of 100%.
Scraping all 2019 race results from netkeiba.com. Data with a table tag can be scraped in one line by using pandas read_html, which is convenient.
pd.read_html("https://db.netkeiba.com/race/201902010101")[0]
Since race_id is assigned to each race on netkeiba.com, if you put in a list of race_id, create a function that scrapes each race result together and returns it in a dictionary type.
import pandas as pd
import time
from tqdm.notebook import tqdm
def scrape_race_results(race_id_list, pre_race_results={}):
race_results = pre_race_results
for race_id in tqdm(race_id_list):
if race_id in race_results.keys():
continue
try:
url = "https://db.netkeiba.com/race/" + race_id
race_results[race_id] = pd.read_html(url)[0]
time.sleep(1)
except IndexError:
continue
except:
break
return race_results
This time, I want to scrape the results of all races in 2019, so I will make a list of all race_ids in 2019.
race_id_list = []
for place in range(1, 11, 1):
for kai in range(1, 6, 1):
for day in range(1, 9, 1):
for r in range(1, 13, 1):
race_id = (
"2019"
+ str(place).zfill(2)
+ str(kai).zfill(2)
+ str(day).zfill(2)
+ str(r).zfill(2)
)
race_id_list.append(race_id)
After scraping, convert it to pandas DataFrame type and save it as a pickle file.
results = scrape_race_results(race_id_list)
for key in results:
results[key].index = [key] * len(results[key])
results = pd.concat([results[key] for key in results], sort=False)
results.to_pickle('results.pickle')
Next article uses BeautifulSoup to scrape detailed data such as race dates and weather! In addition, we explain in detail in the video! Data analysis and machine learning starting with horse racing prediction
Recommended Posts