https://bellcurve.jp/statistics/course/9208.html
According to the URL above, the chi-square distribution is the sum of the squares of random variables that follow a standard normal distribution. However, even if you look at the distribution, you can't really feel it, so try it with Jupyter.
Generate a random variable that follows N (0,1) according to the definition, perform multiple trials of taking the sum of squares, and confirm the distribution. The left side of the figure below is a drawing of KDE. The right side is the histogram.
In the figure on the left, it can be confirmed that almost the same shape as the distribution can be reproduced. The figure on the right has almost the same shape. It feels a little strange, but it seems to be closer if you increase the number of samples or adjust the vertical and horizontal axes.
From this, I could understand a little that "the chi-square distribution is the sum of the squares of random variables that follow the standard normal distribution". When the degree of freedom is 1, the value is often close to 0, but when the degree of freedom is large, the sum of them is taken, so the peak of the distribution gradually shifts to the right. Assuming that the mean value when the degree of freedom is 1 is 1 (although it is difficult to understand from the figure), the degree of freedom = the number of independent standard normal distributions, so the expected value matches the degree of freedom. It makes sense to do it.
On the other hand, the question remains, "So what?" After investigating, the following was easy to understand.
https://atarimae.biz/archives/13511
However, it is not possible to conclude that, for example, "when you throw the dice 120 times, you get only 1 and 6", but "it is hard to think of a coincidence" with the sample average alone.
Naturally, the bias of the sample cannot be expressed only by the sample "mean". Therefore, it is not possible to point out the contradiction of "results that are valid when viewed only on average, but are clearly biased." The idea to solve this is to "confirm the distribution of the sum of squares (≈ variance) of the sample", and it can be said that the tool for checking it is the chi-square distribution.
Until now, I had only a superficial understanding, but I feel that my understanding of the chi-square distribution has deepened.
The notebooks I used are as follows.
https://github.com/takotaketako/public-notebook/blob/master/%E3%82%AB%E3%82%A4%E4%BA%8C%E4%B9%97%E5%88%86%E5%B8%83.ipynb
Recommended Posts