I don't have much time, but I will proceed little by little.
By yesterday, I went close to the correlation. Yes, it's Pearson
Pearsonr
sp.stats.pearsonr(student_data_math.G1,student_data_math.G3) (0.8014679320174141, 9.001430312276602e-90)
As a result, the closer the value of 0.801 that appears is to 1, the stronger the correlation between the two variables.
Well, what happened to the second 9.001 ... so check the reference
Returns r : float Pearson's correlation coefficient p-value : float 2-tailed p-value
The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets.
Well, I'm not sure, so I'll rely on Japanese
If you refer to this, it seems that the p-value is the superior probability, so investigate further
Probability of superiority This is the standard for rejecting the null hypothesis and adopting the alternative hypothesis in the statistical hypothesis test. Also called the significance level. Generally 5% and 1% are used.
Yup. Is it really Japanese? It's unclear as much as I think, but if the probability of dominance is less than 5%, it means that the obtained correlation coefficient is a product of chance and must be credited. I'm not confident that my understanding is correct.
However, it should be noted that pearsonr is effective only when there is a linear correlation, so it is not useful when the correlation is non-linear. It's not always good to do it with pearsonr. Perhaps that will come up in future Chapters.
PairPlot
The syntax is as follows
seaborn.pairplot( DataFrame )
This will graphically display the correlation between the numeric elements in the DataFrame. In the above example, 4 elements in DataFrame are displayed.
A hist graph is displayed at the intersection of the axes, and a scatter plot between the two variables is displayed at other points so that the correlation can be seen.
When I tried pairPlot without processing the DataFrame that was in the example, it became like this
It was too big to capture properly. By the way, this was enough to save the displayed figure to a file
plot = sns.pairplot( DataFrame ) plot.savefig("output.png ")
When I investigated how to do it, I got stuck with savefig
after callingget_figure ()
, but it seems to be the method when the version was old, and now it is an error.
I'll do the details in the following Chapters, so I want to understand the meaning of the words.
Objective variable: Numerical value and variable to be obtained Explanatory variable: A variable to obtain the objective variable. Variables used to explain
Simple regression analysis seems to be solved by assuming an equation in which the relationship between the objective variable / explanatory variable consists of only one variable.
To proceed with these, we will use sklearn.
I've got a rough idea, but let's see the overall problem again tomorrow. Well, it's slow, but it can't be helped.
Recommended Posts