Winner of the 2016 buzzword award ** Pokemon GO **: exclamation: Did you all catch and play Pokemon a lot: question:
Well, in Pokemon GO,
** There are different feature values = individual values (different values for each individual) ** for each captured Pokemon. I've always been wondering how these individual values are distributed, or what is the relationship between them. ** ** Well, I just wanted to find out (sweat)
Therefore, in this article, ** Using the individual value data of Magikarp that I actually caught, I confirmed the question of whether there is a correlation between each parameter of CP / weight / height (can I say that there is no correlation) with an uncorrelated test. **: fish:
This article is written with the intention of telling you that ** "You can perform statistical analysis using familiar data" in a fun way, so I will avoid difficult terms and ideas as much as possible. ** ** Recently, the field of data science has become popular, and I think that some people are interested in this kind of analysis, so I hope that you will use this as an opportunity to study statistics.
Actually, this analysis can be done in Excel, but ** I think I'll try to make a script in Python because it's a big deal. ** Python version is 3.5.0.
I think the development environment can be anything, but I mainly made it with Sublime Text 3, which I'm used to, and the terminal.
This time, I used the Magikarp data ($ n = 100 $) that I caught around my house and around Kagurazaka, Tokyo from summer to autumn 2016: fishing_pole_and_fish: Data is acquired by the following method.
No, it was an analog method, so it was quite difficult (laughs) It is convenient to sync with the computer with Google Photos or Dropbox, manually enter the eigenvalues from the images collected like this (I wish Deep Learning could automatically read the values ...)
The entered data is saved in CSV format. If you want to use the data I have collected, please click here [http://tmp.imaizu.me/pokestat/magikarp.csv). The column structure of CSV data is as follows.
Only ** CP, Weight, and Height columns ** are used in this analysis.
Originally, various "preconditions" are required to analyze by statistical methods, but this time I will ignore many of them and write with the feeling of "trying for the time being", so please forgive me.
Now let's start the analysis of the main subject. First, let's take in the CSV data and plot it once on the scatter plot: scales: This time, the read data is converted to dataframe type using the Python library Pandas.
from pandas.tools.plotting import scatter_matrix
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv("magikarp.csv")
print(data.describe())
plt.figure()
scatter_matrix(data)
plt.savefig("image.png ")
The scatter plot for each variable looks like this.
In the case of Magikarp, weight and height show a fairly clean linear distribution. From fairly small to giant magikarp, it seems to be ecologically distributed with a reality similar to that of real fish: smile:
On the other hand, I don't know a little about CP ... Looking at the histogram, it seems that the CP of 10 is prominently large, and there is no difference in the number of individuals in the CP of other ranges. In Pokemon GO, the lowest CP is 10, and in weak Pokemon like Magikarp, the frequency of appearance of CP10 individuals is high, you certainly feel that you are actually playing: droplet:
Next, find the correlation coefficient ($ \ alpha $) of these variables.
** This value is an index showing whether there is a linear relationship between variables, and the closer the absolute value is to 1, the stronger the linear relationship between individual values **.
For the correlation coefficient, use the corr
function of dataframe
.
This is a great function that will calculate the correlation between all the variables in the dataframe.
print(data.corr())
#> CP Weight Height
#> CP 1.000000 0.010724 0.086286
#> Weight 0.010724 1.000000 0.865564
#> Height 0.086286 0.865564 1.000000
Looking at the plot above, it was confirmed that the values were as expected. You can see that the correlation coefficient between weight and height is 0.866, which is quite strong **. On the other hand, the correlation coefficient of CP is not so large at first glance, and it seems a little unconvincing to say that it is "correlated".
Therefore, finally, check whether these correlation coefficients are significant by ** uncorrelated test. ** ** In the uncorrelated test, a hypothesis (null hypothesis) that "the obtained correlation coefficient is 0" is set, whereas "the probability that the correlation coefficient is accidentally 0 is extremely low" is significant. By obtaining the probability, it is a method to confirm whether it is a really meaningful correlation coefficient. This time
Null hypothesis $ H_0: \ alpha = 0 $ Alternative hypothesis $ H_1: \ alpha \ neq 0 $
It is tested as.
Scipy has a function pearsonr
for performing tests using" Pearson's product-moment correlation coefficient "(there are several other types of uncorrelated tests), so this can be used for each combination of variables. Execute and test.
Given two corresponding variables, it returns a correlation coefficient of $ r $ and a significance probability of $ p $.
from scipy.stats import pearsonr
...
r, p = pearsonr(data.Height, data.Weight) #Height and weight
# r, p = pearsonr(data.Height, data.CP) #Height and CP
# r, p = pearsonr(data.Weight, data.CP) #Weight and CP
print('Correlation coefficient r= {r}'.format(r=r))
print('Significance probability p= {p}'.format(p=p))
print('Significance probability p> 0.05: {result}'.format(result=(p > 0.05)))
The result of the test is as follows.
This time, if the significance probability $ p $ is less than $ 0.05 $ ( True
in the result), $ H_0 $ that says" $ \ alpha = 0 $ is not correlated "is adopted, otherwise $ H_0 $ Is rejected.
Weight and height
>Correlation coefficient r: 0.8655637883468845
>Significance probability p: 1.7019782502122307e-31
>Significance probability p> 0.05: False #Significant
Again, as expected, it proved to be significantly correlated.
Height and CP
>Correlation coefficient r: 0.0862864395740605
>Significance probability p: 0.39090582918188466
>Significance probability p> 0.05: True #Not significant
Weight and CP
>Correlation coefficient r: 0.01072432286085844
>Significance probability p: 0.915233564101408
>Significance probability p> 0.05: True
On the other hand, the CP was also as expected until the end. The question of whether it makes sense to examine the correlation between CP and other variables is clearer, but it was just a simple example, but this method can predict the appearance parameters of the game to some extent. Did you know that?
So it was super easy, but I tried to do correlation analysis using Pokemon data. Since the distribution of the data this time is the distribution of the parameters of the game, it may be interesting to do something like estimating the parameters by keeping records in other Pokemon or other games. Perhaps the distribution of individual values may differ significantly in Pokemon other than Magikarp.
This time I made it an uncorrelated test, but I would like to do something else similar, so I would like to write a continuation somewhere. I have to study more statistics by then ...
Recommended Posts