Predict the number of titles won by Souta Fujii 7th Dan by gradient boosting

Overview

It has already been three years since Souta Fujii, a genius player who made his professional debut as the youngest player in history and achieved 29 consecutive victories, which was unprecedented since his debut, and caused'Fujii Fever'to the world. It was thought that it was only a matter of time before the title was won, but last year we were defeated with one more win until the title challenge, and as of April 2020, we have not decided on the title challenge yet. On the other hand, the shogi world has undergone major changes in the last few years. The'Hanyu generation', including Yoshiharu Habu Kudan, who has held the title for nearly 30 years, has begun to be pushed by young people, and with the development of shogi software using AI, the shogi world has become a group of male players, and who won the title. The situation is not strange. Even with Fujii 7th Dan, it is always difficult to win, and I feel the title is tough. Therefore, I predicted the number of titles won by Fujii 7th Dan from the results of the Go players who appeared in the title battle in the last 3 years, and evaluated how close they are to the title. [^ 1] The code is listed on GitHub.

data set

As learning data, from Japan Shogi Association HP, as of April 1, 2020, one or more B-class shogi players, 2017 ~ Shogi players who appeared in the title battle in 2019, (1) age, (2) professional age, (3) ranking battle class [^ 2], (4) Ryuo battle class [^ 3], (5) throne league [^ 4] The number of places, ⑥ Kings League [^ 5] reigns, ⑦ wins, ⑧ games, ⑨ general tournament wins [^ 6] are summarized, and the number of title battle appearances and title acquisitions are summarized in a tsv file. There were 28 players in question. (Originally, all players should be targeted, but since it takes a lot of time and effort, we are narrowing down.) Here, ③ to ⑧ are counted in the past 3 years, and if it is a ranking battle, A grade: 5 points, B grade 1 Group: 4 points, B class 2 groups: 3 points, C class 1 group: 2 points, C class 2 groups 1 point added, Ryuo line also 1 group: 5 points, 2 groups: 4 points, 3 groups: 3 points, 4 groups: 2 points, 5 groups: 1 point, 6 groups: 0 points are added (maximum 15 points), and the number of enrollments * 1 point (maximum 3 points) is added to the throne / king league.

Method

This time, using the gradient boosting of lightGBM, which is often used in the Kaggle world, the variables ① to ⑨ are used as learning variables to acquire the title. Predicted the number or the number of title battle appearances. For gradient boosting, please refer to here. Ensemble learning using a decision tree is characterized by using the gradient descent method to improve the error of the previous tree. For comparison, I also made predictions using linear regression and neural networks in scikit-learn.

Implementation

This time as well, we will implement it using Google Colaboratory.

Library import

Library import


import numpy as np
import pandas as pd
% matplotlib inline
import matplotlib.pyplot as plt
!pip install japanize-matplotlib
import japanize_matplotlib
import seaborn as sns
sns.set(font='IPAexGothic')
import lightgbm as lgb
from lightgbm import LGBMRegressor

Since matplotlib cannot use Japanese notation as it is, I am importing the library. (Reference)

Training data

Read training data


train_path = "/content/drive/My Drive/Colab Notebooks/Deliverables/Shogi title/Go player analysis(2017_2019) - train.tsv"
train = pd.read_csv(train_path, delimiter='\t')

Since the data is my own work this time, I will put it on Google Drive, mount it and read it. Let's look at the data information.

Learning data information


train.head(len(train))

image.png

The order is based on the order of Kamiza in the Japan Shogi Association (Ryuo-> Master-> Other titles-> Eternal title qualified person-> Rank (in order of Go player number)). You can think that the younger the number, the stronger it is. Looking at the number of titles won, Ryuo Toyoshima / Master and Triple Crown of Watanabe stand out in recent years, followed by Double Crown of Nagase and Kudan Hanyu. Let's graph the variables of the title winners above.

Visualization


train[train['get_titles'] > 0].plot.bar(x='name', figsize=(20,20), sharex=True, subplots=True, layout=(4,3))

image.png

In particular, the top three players have in common that they are in their 20s and 30s, are in their teens with professionals, and have a large number of wins and games (winning). This is also intuition.

test data

Read test data


test_path = "/content/drive/My Drive/Colab Notebooks/Deliverables/Shogi title/Go player analysis(2017_2019) - test.tsv"
test = pd.read_csv(test_path, delimiter='\t')
test.head(len(test))

image.png

Next, let's look at the test data. In addition to Fujii Nanadan, I brought four young players who I personally think are promising in the future.

Visualization


test.plot.bar(x='name', figsize=(15,10), sharex=True, subplots=True, layout=(3,3))

image.png

Daichi Sasaki 5th Dan is also amazing, but you can see how much you are winning from the number of games and the number of wins of Fujii 7th Dan.

Learning

python


train_x = train.loc[:, 'age' : 'champions']
train_y = train['get_titles']
test_x = test.loc[:, "age" : "champions"]
params = {
    'learning_rate' : 0.01,
    'min_child_samples' : 0,
}
model = LGBMRegressor(**params)
model.fit(train_x, train_y)
y_pred = model.predict(test_x)

I will actually learn. First, target the number of titles won. You can set the hyper parameters to be passed to LGBMRegressor in params. Since there is little training data this time, the training rate should be small and min_child_samples (the number of samples included in the final prediction node) should be minimized.

result

Plot the results.

Result display


names = test["name"]
display(pd.DataFrame(y_pred, index=names, columns=['Number of titles won']))

It is predicted that Fujii Nanadan will be caught for about 4 seasons. This is already a number comparable to the top Go players. Next was Masuda 6th Dan for about 3rd term and Sasaki Daichi 5th Dan for 1.5th term. It turns out that everyone is so active that it is predicted that they will win the title. In addition, you can see the variables that were emphasized in the model learned this time.

Learning variable importance


features = test.loc[:, 'age' : 'champions']
display(pd.DataFrame(model.feature_importances_, index=features.columns, columns=['importance']).sort_values('importance', ascending=False))

The most important variable was "age". This is convincing because shogi is generally more advantageous for younger people, and the success of young shogi players in recent years. Next, the number of wins (the more you win, the closer you get to the title), the age at which you enter the pro (generally younger and more professional players), and the number of games (the more you win, the more games you play) seem to be important. It is inevitable that Fujii Nanadan, who has outstanding numbers above, will be highly evaluated. On the other hand, it is judged that it is not important to be in the throne league or the king league. Maybe it's because the order of values is small?

Prediction of the number of title battles

Prediction of the number of title battles


train_y2 = train['titles']
model.fit(train_x, train_y2)
y_pred2 = model.predict(test_x)
display(pd.DataFrame(y_pred2, index=names, columns=['Number of titles']))

I also tried to predict by setting the target to "the number of title battle appearances" instead of "the number of titles won". You should get roughly the same result, but it should be larger than the number you earned.

Interestingly, Fujii 7th Dan and Masuda 6th Dan lined up. Next, Daichi Sasaki 5th Dan, Yuki Sasaki 7th Dan and Aoshima 5th Dan were the same. Compared to the previous result, does Fujii Nanadan appear in the title battle five times and win four of them (it's scary ...)

Learning variable importance


display(pd.DataFrame(model.feature_importances_, index=features.columns, columns=['importance']).sort_values('importance', ascending=False))
image.png

This time, the most important thing is the age of becoming a professional. Was the contribution of Hanyu Kudan (15 years old with a professional), who has appeared in many title battles, effective?

Linear regression

How does linear regression compare to gradient boosting regression? Linear regression is fitting with a linear function as shown in the equation below.

y=b_0+b_1x_1+b_2x_2+\cdots+b_Nx_N \\
(b_0,b_1,\cdots,b_N \in \mathbb{R}, x_1, x_2,\cdots, x_N \in learning variable)

Linear regression


from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(train_x, train_y)
lr_pred = reg.predict(test_x)

Fujii Nanadan was still in the 4th term, but the values of other Go players changed. Aoshima 5th Dan has become negative. In the case of linear regression, it is a straight line, so it may be negative, but it was surprising that the difference was so far. By the way, the coefficients and intercepts of this model are as follows.

Coefficient visualization


display(pd.DataFrame(reg.coef_, index=features.columns, columns=['coef']))
print('intercept = {}'.format(reg.intercept_))

If the coefficient is negative, the lower the variable value, the higher the predicted value, so you can see that the age and professional age are negative, but it is puzzling that the score of the ranking battle and the number of games are negative. Regarding the ranking battle, it may be the contribution of veterans who are not involved in the title even though they are in A class or B1.

neural network

Finally, let's make a prediction with a neural network.

neural network


from sklearn.neural_network import MLPRegressor
nn = MLPRegressor()
nn.fit(train_x, train_y)
nn_pred = nn.predict(test_x)
display(pd.DataFrame(nn_pred, index=names, columns=['Number of titles won']))

Yuki Sasaki 7th Dan was higher than Fujii 7th Dan. The result is different from intuition, such as Daichi Sasaki 5th Dan being at the bottom. I'm wondering if the neural network is useful for this task because it uses non-linear transformations to make predictions, but at least by default it didn't work as well as other methods.

Summary

This time, I tried to predict the number of titles won by Fujii 7th Dan from the data of the new era of shogi. Despite being so active, the title has not been won yet, but comparing the results with the Go player who won the title, it is predicted that about 4 terms have been won in the last 3 years, which is close enough to the title The result was. In addition, seeds that are not considered as variables this time are now being played in many shogi tournaments, and it seems that the probability of winning the title in the future is increasing. I'm looking forward to it as a shogi fan. Also, this time performance comparison (for example, comparing the sum of squares of the difference between the correct answer value and the predicted value) is not done because all the test data players have won 0 titles, but it is gradient boosting compared to personal intuition. I think I was able to predict the most correctly. If possible, I would like to collect more data and analyze something again.

References

[Age of Go Player](https://depalma01.com/2018/11/23/%E5%B0%86%E6%A3%8B%E3%81%AE%E6%A3%8B%E5%A3% AB% E3% 81% 8C% E3% 83% 97% E3% 83% AD% E5% 85% A5% E3% 82% 8A% E3% 81% 97% E3% 81% 9F% E5% B9% B4% E9% BD% A2% E3% 81% BE% E3% 81% A8% E3% 82% 81% E3% 80% 90% E6% 9C% 80% E5% B9% B4% E5% B0% 91% E3% 83% BB /) Go player age Pandas graph drawing [When the lightGBM predictions are the same](https://kiseno-log.com/2019/12/08/lightgbm%E3%81%A7%E3%81%AE%E4%BA%88%E6%B8% AC% E5% 80% A4% E3% 81% 8C% E3% 81% 99% E3% 81% B9% E3% 81% A6% E5% 90% 8C% E3% 81% 98% E5% 80% A4% E3% 81% AB% E3% 81% AA% E3% 82% 8B% E3% 81% A8% E3% 81% 8D% E3% 81% AE% E5% 8E% 9F% E5% 9B% A0% E3% 81% A8 /)

[^ 1]: To briefly explain to those who are not familiar with shogi, there are currently eight titles in the shogi world, and one challenger is decided in each shogi tournament throughout the year, and the challenger and the current title holder Will play the game and the winner will win the title. Taking even one title is a great achievement.

[^ 2]: The ranking battle is divided into 5 classes from A class to C class 2 groups, and they play in the class throughout the year, the upper class is promoted to the next higher class, and the lower class is one lower class. I will be demoted to. The highest A-class is a proof of the top shogi player with only 10 players, and the A-class winner challenges the master (the most traditional title in the shogi world). Fujii Nanadan is currently in Class B 2 and the current master is Masayuki Toyoshima.

[^ 3]: The Ryuo battle is divided into 6 classes from 1 group to 6 groups, and the top players are promoted to the next higher class and the lower players are demoted to the next lower class by performing tournament battles in the class. .. After that, the challenger to Ryuo (the highest title in the shogi world) will be decided from the top players in each class. The big difference from the ranking battle is that the master cannot challenge unless he is in the A class, while the Ryuo has a chance in any class. Fujii 7th Dan is currently 3 groups, and the current Ryuo is Masayuki Toyoshima Ryuo.

[^ 4]: The throne league is divided into two leagues, the red group and the white group, and decides the challenger to the throne. You can't challenge unless you pass the strict qualifying and enter the league. Fujii Nanadan is currently enrolled in the throne league, and the current throne is Kazuki Kimura.

[^ 5]: The King League is a league that decides the challenger to the King. You can't challenge this unless you pass the strict qualifying and enter the league. It is said to be one of the most difficult leagues in the game world because there are few slots. Fujii Nanadan is currently enrolled in the Osho League, and the current Osho is Akira Watanabe.

[^ 6]: In addition to the eight major titles, there are also shogi tournaments such as TV shogi tournaments that determine the winner throughout the year. Unlike the title match, even if you win, you cannot win unless you win the tournament the following year. Fujii Nanadan has won the Asahi Cup Shogi Open Game for the second time in a row, and has also won the rookie king game in which only young players participate.

Recommended Posts

Predict the number of titles won by Souta Fujii 7th Dan by gradient boosting
Minimize the number of polishings by combinatorial optimization
python beginners tried to predict the number of criminals
Get to know the feelings of gradient boosting trees
Predict the number of people infected with COVID-19 with Prophet
Predict the presence or absence of infidelity by machine learning