Predict the second round of summer 2016 with scikit-learn

Introduction

This is the second free study of summer vacation to try out scikit-learn. The first is here. As usual, the content is beginner, but please forgive me.

This time, I thought about predicting the scoring power of the second round by learning the match result of the first round based on the results of the local tournament. It's a long-sought machine learning (Sumimasen is full of feelings that I wanted to try).

Training data

Based on the Batting results of the local tournament used last time, the combination of the first match for each representative school From Pitcher performance of opponent high school is combined with learning data to learn the score of the first match I decided to let you.

The created learning data is here. Since there are 49 schools, Morioka Daisuke will appear twice as the partner school for the first match.

Learning and prediction

As mentioned above, based on the data of the local tournament, all the batting results and the opponent's pitcher results are learned as explanatory variables. The objective variable is the score of the first match of Koshien.

The learning algorithm is linear regression. As usual, I don't have the knowledge to choose another ...

Then, as predictive data to verify the learning result, examine the combination of the second round and like the learning data here -game-2016.csv).

#coding:utf-8
import pandas as pd
import numpy as np
from sklearn import linear_model

#Learn the results of the first match
df = pd.read_csv('round1-result-2016.csv')
X = df.drop(['Prefecture','PrefectureNo','school name','Battle school','score'], axis=1)
Y = df['score'].as_matrix()

clf = linear_model.LinearRegression()
clf.fit(X, Y)

#Second round prediction
df_round2 = pd.read_csv('round2-game-2016.csv')
X_round2 = df_round2.drop(['Prefecture','PrefectureNo','school name','Battle school'], axis=1)
round2_pred=clf.predict(X_round2)

print(round2_pred)

result

Prefecture school name Battle school score
Iwate With Morioka Dai Soshi Gakuen 2.37607605
Nara Chiben Gakuen Naruto 3.62097786
Tokushima Naruto Chiben Gakuen 5.76513128
Yamanashi Yamanashi Gakuin Inabe synthesis 3.88857396
Triple Inabe synthesis Yamanashi Gakuin 5.36922697
Ibaraki Joso Gakuin Chukyo 5.14173416
Gifu Chukyo Joso Gakuin 7.22823584
Aichi Toho Hachinohe Gakuin Kosei 8.83172441
Aomori Hachinohe Gakuin Kosei Toho 1.28556647
Kanagawa Yokohama Shoshosha 7.68159192
Osaka Shoshosha Yokohama 4.58766162
Wakayama Ichi Wakayama Nichinan Gakuen 2.27939976
Miyazaki Nichinan Gakuen Ichi Wakayama 4.78286132
Kagoshima Shonan Hanasaki Tokuharu -1.30671611
Saitama Hanasaki Tokuharu Shonan 1.90896096
Hiroshima Hiroshima Shinjo Toyama Daiichi 1.28968031
Toyama Toyama Daiichi Hiroshima Shinjo 2.03399291

As of August 13th, when I wrote this, Morioka Obu was different from what I expected, but I was a little surprised that the results of Chiben and Naruto hit me. I'm getting one negative result, but ... I think it's because the other school's local tournament has 0 goals and an ERA of 0.

It's just a data science-like numerical play, so I'm not sure. Please do not misunderstand as we have no intention of appealing or criticizing the actual competition or players.

Recommended Posts

Predict the second round of summer 2016 with scikit-learn
The second night of the loop with for
Predict the number of people infected with COVID-19 with Prophet
Predict the gender of Twitter users with machine learning
Visualize the results of decision trees performed with Python scikit-learn
Grid search of hyperparameters with Scikit-learn
Align the size of the colorbar with matplotlib
Let's tune the model hyperparameters with scikit-learn!
[Scikit-learn] I played with the ROC curve
[Python] Round up with just the operator
Clustering representative schools in summer 2016 with scikit-learn
The third night of the loop with for
I tried to predict the behavior of the new coronavirus with the SEIR model.
Count the number of characters with echo
Note: Prepare the environment of CmdStanPy with docker
Find the second derivative with JAX automatic differentiation
2016 The University of Tokyo Mathematics Solved with Python
[Note] Export the html of the site with python.
See the behavior of drunkenness with reinforcement learning
About the processing speed of SVM (SVC) of scikit-learn
Increase the font size of the graph with matplotlib
Calculate the total number of combinations with python
Check the date of the flag duty with Python
Eliminate the inconveniences of QDock Widget with PySide
Challenge the Tower of Hanoi with recursion + stack
Rewrite the name of the namespaced tag with lxml
Fill the browser with the width of Jupyter Notebook
Dump the contents of redis db with lua
Tucker decomposition of the hay process with HOOI
Put the second axis in 2dhistgram of matplotlib
Find out the day of the week with datetime
The basis of graph theory with matplotlib animation
Visualize the behavior of the sorting algorithm with matplotlib
Convert the character code of the file with Python3
[Python] Determine the type of iris with SVM
I tried to predict the price of ETF
Isomap with Scikit-learn
Predict the distribution of continuous values ​​other than the normal distribution with ordinary PyTorch or TensorFlow
One of the cluster analysis methods, k-means, is executed with scikit-learn or implemented without scikit-learn.
Second half of the first day of studying Python Try hitting the Twitter API with Bottle
DBSCAN with scikit-learn
Clustering with scikit-learn (2)
PCA with Scikit-learn
kmeans ++ with scikit-learn
[Translation] scikit-learn 0.18 User Guide 3.2. Tuning the hyperparameters of the estimator
python beginners tried to predict the number of criminals
Extract the table of image files with OneDrive & Python
Add the attribute of the object of the class with the for statement
Coordinates of the right end of Label made with tkinter
The story of stopping the production service with the hostname command
Learn Nim with Python (from the beginning of the year).
Find the sum of unique values with pandas crosstab
The story of replacing Nvidia GTX 1650 with Linux Mint 20.1.
Add information to the bottom of the figure with Matplotlib
Get the sum of each of multiple columns with awk
The story of sharing the pyenv environment with multiple users
Destroy the intermediate expression of the sweep method with Python
How to visualize the decision tree model of scikit-learn
Take a screenshot of the LCD with Python-LEGO Mindstorms
Visualize the range of interpolation and extrapolation with python
[Chapter 6] Introduction to scikit-learn with 100 knocks of language processing