Examine the relationship between two variables (2)

Today I'm going to stand in the kitchen and make a fried cabbage Chinese soup. Stir well and then taste a little bit to judge the cooking.

Alternatively, in a company recruitment interview, just a few tens of minutes of face-to-face meetings will determine if the person is suitable for an employee.

Or, after just a few months or a year of dating, I decide to marry as a companion for the rest of my life.

In this way, inferring the population from some samples is the essence of inference statistics.

Sampling

In choosing the two variates, you will be extracting a sample from the population. As explained before, there are various types of Sampling method.

In Previous example, we focused on 10 students in a high school class and extracted sports results.

This does not mean that the grades of all high school students can be seen at all. However, it is possible to infer the whole with a certain degree of accuracy from the statistical information of such a sample. In other words, sampling is not an end in itself, but a means to grasp the whole.

Correlation

In the previous example, the relationship between grip strength and bead throwing seemed to be distributed somewhat upward to the right.

And its correlation coefficient was 0.53. There will be some positive correlation.

The value of the correlation coefficient r (x, y) ranges from -1 to 1, and the closer it is to the absolute value 1, the stronger the degree of correlation.

Regression line

Now consider again the two variates x and y.

item	value
Fluent x	x_1, x_2, ..., x_n
Fluent y	y_1, y_2, ..., y_n

A straight line passing through the center O'(x, y) in the correlation diagram of the variates x, y

y=a(x-\overline{x})+\overline{y}

N points out of

P_1(x_1,y_1), P_2(x_2,y_2), ... P_N(x_N,y_N)

Consider the straight line closest to.

The regression line of y to x is as follows.

\frac {y-\overline{y}} {\sigma(y)} = r(x,y) \frac {x-\overline{x}} {\sigma(x)}

I explained linear regression before. Let's recall again the least squares method.

When the correlation coefficient approaches 1, (r (x, y) → 1), S_0 → 0 above, so all the points in the scatter plot gradually become distributed in a form close to a straight line. is.

reference

Statistical analysis learned from scratch http://www.amazon.co.jp/dp/4061546562