How to scrape horse racing data using pandas read_html

Purpose

Predict horse racing with machine learning and aim for a recovery rate of 100%.

What to do this time

Scraping all 2019 race results from netkeiba.com. Data with a table tag can be scraped in one line by using pandas read_html, which is convenient.

pd.read_html("https://db.netkeiba.com/race/201902010101")[0]

スクリーンショット 2020-07-04 22.19.07.png

Source code

Since race_id is assigned to each race on netkeiba.com, if you put in a list of race_id, create a function that scrapes each race result together and returns it in a dictionary type.

import pandas as pd
import time
from tqdm.notebook import tqdm

def scrape_race_results(race_id_list, pre_race_results={}):
    race_results = pre_race_results
    for race_id in tqdm(race_id_list):
        if race_id in race_results.keys():
            continue
        try:
            url = "https://db.netkeiba.com/race/" + race_id
            race_results[race_id] = pd.read_html(url)[0]
            time.sleep(1)
        except IndexError:
            continue
        except:
            break
    return race_results

This time, I want to scrape the results of all races in 2019, so I will make a list of all race_ids in 2019.

race_id_list = []
for place in range(1, 11, 1):
    for kai in range(1, 6, 1):
        for day in range(1, 9, 1):
            for r in range(1, 13, 1):
                race_id = (
                    "2019"
                    + str(place).zfill(2)
                    + str(kai).zfill(2)
                    + str(day).zfill(2)
                    + str(r).zfill(2)
                )
                race_id_list.append(race_id)

After scraping, convert it to pandas DataFrame type and save it as a pickle file.

results = scrape_race_results(race_id_list)
for key in results:
    results[key].index = [key] * len(results[key])
results = pd.concat([results[key] for key in results], sort=False)
results.to_pickle('results.pickle')

Next article uses BeautifulSoup to scrape detailed data such as race dates and weather! In addition, we explain in detail in the video! Data analysis and machine learning starting with horse racing prediction スクリーンショット 2020-07-04 22.03.00.png

Recommended Posts

How to scrape horse racing data using pandas read_html

How to scrape horse racing data with BeautifulSoup

I tried to get a database of horse racing using Pandas

How to get article data using Qiita API

How to search HTML data using Beautiful Soup

Scraping 2 How to scrape

How to use Pandas 2

How to scrape image data from flickr with python

How to convert horizontally held data to vertically held data with pandas

How to extract non-missing value nan data with pandas

[Python] How to deal with pandas read_html read error

How to extract non-missing value nan data with pandas

How to use Pandas Rolling

Horse Racing Data Scraping Flow

How to handle data frames

Data analysis using python pandas

How to add new data (lines and plots) using matplotlib

How to get an overview of your data in Pandas

Data science companion in python, how to specify elements in pandas

How to install python using anaconda

How to paste a CSV file into an Excel file using Pandas

[Python] How to FFT mp3 data

How to read e-Stat subregion data

Data visualization method using matplotlib (+ pandas) (5)

How to write soberly in pandas

[Python] How to use Pandas Series

Horse Racing Data Scraping at Colaboratory

How to deal with imbalanced data

How to deal with imbalanced data

<Pandas> How to handle time series data in a pivot table

How to format a table using Pandas apply, pivot and swaplevel

Data visualization method using matplotlib (+ pandas) (3)

How to Data Augmentation with PyTorch

How to update a Tableau packaged workbook data source using Python

Data visualization method using matplotlib (+ pandas) (4)

How to collect machine learning data

How to divide and process a data frame using the groupby function

I learned scraping using selenium to make a horse racing prediction model.

How to plot galaxy visible light data using OpenNGC database in python

How to collect Twitter data without programming

[Pandas] What is set_option [How to use]

How to draw a graph using Matplotlib

How to set up SVM using Optuna

How to install a package using a repository

Use pandas to convert grid data to row-holding (?) Data

How to set xg boost using Optuna

How to reassign index in pandas dataframe

Try converting to tidy data with pandas

How to scrape websites created with SPA

How to use "deque" for Python data

How to download youtube videos using pytube3

How to handle time series data (implementation)

How to read CSV files in Pandas

How to read problem data with paiza

Vectorization of horse racing pedigree using fastText

How to use pandas Timestamp and date_range

How to replace with Pandas DataFrame, which is useful for data analysis (easy)

The first step to log analysis (how to format and put log data in Pandas)

How to display Map using Google Map API (Android)

How to create sample CSV data with hypothesis

Try using django-import-export to add csv data to django