This is the second free study of summer vacation to try out scikit-learn
. The first is here. As usual, the content is beginner, but please forgive me.
This time, I thought about predicting the scoring power of the second round by learning the match result of the first round based on the results of the local tournament. It's a long-sought machine learning (Sumimasen is full of feelings that I wanted to try).
Based on the Batting results of the local tournament used last time, the combination of the first match for each representative school From Pitcher performance of opponent high school is combined with learning data to learn the score of the first match I decided to let you.
The created learning data is here. Since there are 49 schools, Morioka Daisuke will appear twice as the partner school for the first match.
As mentioned above, based on the data of the local tournament, all the batting results and the opponent's pitcher results are learned as explanatory variables. The objective variable is the score of the first match of Koshien.
The learning algorithm is linear regression
. As usual, I don't have the knowledge to choose another ...
Then, as predictive data to verify the learning result, examine the combination of the second round and like the learning data here -game-2016.csv).
#coding:utf-8
import pandas as pd
import numpy as np
from sklearn import linear_model
#Learn the results of the first match
df = pd.read_csv('round1-result-2016.csv')
X = df.drop(['Prefecture','PrefectureNo','school name','Battle school','score'], axis=1)
Y = df['score'].as_matrix()
clf = linear_model.LinearRegression()
clf.fit(X, Y)
#Second round prediction
df_round2 = pd.read_csv('round2-game-2016.csv')
X_round2 = df_round2.drop(['Prefecture','PrefectureNo','school name','Battle school'], axis=1)
round2_pred=clf.predict(X_round2)
print(round2_pred)
Prefecture | school name | Battle school | score |
---|---|---|---|
Iwate | With Morioka Dai | Soshi Gakuen | 2.37607605 |
Nara | Chiben Gakuen | Naruto | 3.62097786 |
Tokushima | Naruto | Chiben Gakuen | 5.76513128 |
Yamanashi | Yamanashi Gakuin | Inabe synthesis | 3.88857396 |
Triple | Inabe synthesis | Yamanashi Gakuin | 5.36922697 |
Ibaraki | Joso Gakuin | Chukyo | 5.14173416 |
Gifu | Chukyo | Joso Gakuin | 7.22823584 |
Aichi | Toho | Hachinohe Gakuin Kosei | 8.83172441 |
Aomori | Hachinohe Gakuin Kosei | Toho | 1.28556647 |
Kanagawa | Yokohama | Shoshosha | 7.68159192 |
Osaka | Shoshosha | Yokohama | 4.58766162 |
Wakayama | Ichi Wakayama | Nichinan Gakuen | 2.27939976 |
Miyazaki | Nichinan Gakuen | Ichi Wakayama | 4.78286132 |
Kagoshima | Shonan | Hanasaki Tokuharu | -1.30671611 |
Saitama | Hanasaki Tokuharu | Shonan | 1.90896096 |
Hiroshima | Hiroshima Shinjo | Toyama Daiichi | 1.28968031 |
Toyama | Toyama Daiichi | Hiroshima Shinjo | 2.03399291 |
As of August 13th, when I wrote this, Morioka Obu was different from what I expected, but I was a little surprised that the results of Chiben and Naruto hit me. I'm getting one negative result, but ... I think it's because the other school's local tournament has 0 goals and an ERA of 0.
It's just a data science-like numerical play, so I'm not sure. Please do not misunderstand as we have no intention of appealing or criticizing the actual competition or players.
Recommended Posts