I wanted to do machine learning, so I chose "Pokemon". Since the race value of Pokemon is fixed for each Pokemon, I thought it was a jewel box of data.
However, this time it is a fully backward compatible </ b> article that is based on the one that came out at the top by searching for "Pokemon Machine Learning" on Google, so if you want to imitate it, please refer to the original article. Please give me. Machine learning with Pokemon
OS:Win10 home IDE:VScode Language: python 3.7.3 64bit
Based on Pokemon database up to 7 generations, Pokemon of "Flying" and "Esper" are extracted and binarized by logistic regression. I tried it. By the way, the number of each type in Pokemon is as follows (up to 7 generations)
type | Number of animals |
---|---|
normal | 116 animals |
Fighting | 63 animals |
Doku | 69 animals |
Jimen | 75 animals |
flight | 113 animals |
insect | 89 animals |
Iwa | 67 animals |
ghost | 55 animals |
Steel | 58 animals |
Fire | 72 animals |
Mizu | 141 animals |
Denki | 60 animals |
Kusa | 103 animals |
Ice | 43 animals |
Esper | 100 animals |
Dragon | 59 animals |
Evil | 59 animals |
Fairy | 54 animals |
Water was the most and ice was the smallest. It's freeze-dried and doubled.
Code is below.
lr_pokemon.py
import pandas as pd
import codecs
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
# read data by pandas
with codecs.open("data/pokemon_status.csv", "r", "Shift-JIS", "ignore") as file:
df = pd.read_table(file, delimiter=",")
# print(df.head(15))
p_type = ["normal","Fighting","Doku","Jimen","flight","insect","Iwa","ghost","Steel","Fire","Mizu","Denki","Kusa","Ice","Esper","Dragon","Evil","Fairy"]
print(len(p_type))
# make functions
def count_type(p_type):
list1 = df[df['Type 1'] == p_type]
list2 = df[df['Type 2'] == p_type]
lists = pd.concat([list1, list2])
print(p_type + "Pokemon: %d animals" % len(lists))
def type_to_num(p_type):
if p_type == "flight":
return 1
else:
return 0
# count number of type in pokemons
for i in p_type:
count_type(i)
# make sky_df
sky1 = df[df['Type 1'] == "flight"]
sky2 = df[df['Type 2'] == "flight"]
sky = pd.concat([sky1, sky2])
# make psycho_df
psycho1 = df[df['Type 1'] == "Esper"]
psycho2 = df[df['Type 2'] == "Esper"]
psycho = pd.concat([psycho1, psycho2])
df_s_p = pd.concat([sky, psycho], ignore_index=True)
type1 = df_s_p['Type 1'].apply(type_to_num)
type2 = df_s_p['Type 2'].apply(type_to_num)
df_s_p['type_num'] = type1 + type2
print(df_s_p)
X = df_s_p.iloc[:,7:13].values
y = df_s_p['type_num'].values
X_train,X_test,y_train,y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)
lr = LogisticRegression(C = 1.0)
lr.fit(X_train, y_train)
# show scores
print("train_score: %.3f" % lr.score(X_train, y_train))
print("test_score: %.3f" % lr.score(X_test, y_test))
i = 0
error1 = 0
success1 = 0
error2 = 0
success2 = 0
print("[List of Pokemon judged to be flying type]")
print("----------------------------------------")
print("")
while i < len(df_s_p):
y_pred = lr.predict(X[i].reshape(1, -1))
if y_pred == 1:
print(df_s_p.loc[i, ["Pokemon name"]])
if df_s_p.loc[i, ["type_num"]].values == 1:
success1 += 1
print("It ’s a flying type, is n’t it?")
print("")
else:
error1 += 1
print("I thought it was a flying type")
print("")
else:
print(df_s_p.loc[i, ["Pokemon name"]])
if df_s_p.loc[i, ["type_num"]].values == 0:
error2 += 1
print("It ’s an Esper type, is n’t it?")
print("")
else:
success2 += 1
print("I thought it was an Esper type")
print("")
i += 1
print("----------------------------------------")
print("Number of Pokemon judged to be the correct flying type: %d animals" % success1)
print("Number of Pokemon correctly judged to be Esper type: %d animals" % success2)
print("Number of Pokemon that were mistakenly judged to be flying type: %d animals" % error1)
print("Number of Pokemon that were mistakenly identified as Esper type: %d animals" % error2)
print("")
The result was a correct answer rate of 75%. It's low. It was a number that could not be used in machine learning.
I thought I could get better numbers. Because I thought that "flying" could be roughly divided into physical attackers and "esper" could be roughly divided into special attackers. The reality is not that simple. However, when I actually saw a Pokemon that was falsely detected, I got the reason that it wouldn't be falsely detected. For example, there were "Thunder" and "Freezer" as children who were mistaken for Esper even though they were flying, but that's right because they are expensive. Even I would make a mistake at first sight. On the other hand, there were "Abra" and "Ralts" as children who were mistaken for being an Esper, but I thought it couldn't be helped because these are low race values. Because it is difficult to make a difference in the numerical value in the low race value range. The evolutionary "Hoodin", "Gardevoir", and "Gallade" were allotted to Bakko Esper, so I'm relieved.
Gallade Han! I wonder if you could have been mistaken for a flying type! !! !!
After all it is wrong to judge Pokemon only by race value.
Recommended Posts