Analysis story of quantification type I
- Convert a qualitative variable to a dummy variable and assume a multiple regression model by considering the dummy variable as a quantitative variable.
- Obtain the degree of freedom adjusted contribution rate and evaluate the performance of the obtained regression equation.
- Select the explanatory variables (variable selection) and select useful variables.
- Examine the residual and leverage ratio and judge the validity of the obtained regression equation.
- Using the obtained regression equation, estimate the population regression for the value of the explanatory variable specified arbitrarily, and predict the value of the data to be obtained in the future.
How to handle qualitative variables
A qualitative variable is a variable that is not originally a numerical variable, such as "excellent", "good", or "acceptable", but is quantified as 0,1.
This time,
Qualitative variables |
Quantitative variables |
Yu |
3 |
Good |
2 |
Yes |
1 |
Instead of quantifying like
x_{1\left(1\right)}=\left\{\begin{array}{l}
1 When you are good\\
0 When not excellent
\end{array}\right.
x_{1\left(2\right)}=\left\{\begin{array}{1}
1 Good time\\
0 When not good
\end{array}\right.
x_{1\left(3\right)}=\left\{\begin{array}{1}
1 When it is possible\\
0 When not possible
\end{array}\right.
Convert as follows.
This is because the difference between "excellent" and "good", the difference between "excellent" and "acceptable", and the difference between "good" and "acceptable" cannot be quantitatively expressed.
Practical example of quantification type I
The following data is handled as a specific example.
original data
No |
Math grades |
Overall grade |
1 |
Yu |
96 |
2 |
Yu |
88 |
3 |
Yu |
77 |
4 |
Yu |
89 |
5 |
Good |
80 |
6 |
Good |
71 |
7 |
Good |
77 |
8 |
Yes |
78 |
9 |
Yes |
70 |
10 |
Yes |
62 |
Data after conversion from qualitative variable to quantitative variable
sample |
Math grades |
x_1 |
x_2 |
x_3 |
Overall grade |
1 |
Yu |
1 |
0 |
0 |
96 |
2 |
Yu |
1 |
0 |
0 |
88 |
3 |
Yu |
1 |
0 |
0 |
77 |
4 |
Yu |
1 |
0 |
0 |
89 |
5 |
Good |
0 |
1 |
0 |
80 |
6 |
Good |
0 |
1 |
0 |
71 |
7 |
Good |
0 |
1 |
0 |
77 |
8 |
Yes |
0 |
0 |
1 |
78 |
9 |
Yes |
0 |
0 |
1 |
70 |
10 |
Yes |
0 |
0 |
1 |
62 |
Perform multiple regression analysis
The following consciousness is described below, but honestly, I don't think it is necessary to "force" understanding.
Basically, the calculation is executed by python, and if you solve about 20 questions, you can understand it as a feeling. .. ..
- Multiple regression model
y_{i}=\beta_{0}+\beta_{1\left(2\right)}x_{i1\left(2\right)}+\beta_{1\left(3\right)}x_{i1\left(3\right)}+\epsilon_{i}
- Error (assuming it follows a normal distribution)
\epsilon_{i}\sim N\left(0,\ \sigma^{2}\right)
- Predicted value
\hat{y_{i}}=\hat{\beta_{0}}+\hat{\beta_{1\left(2\right)}}x_{i1\left(2\right)}+\hat{\beta_{1\left(3\right)}}x_{i1\left(3\right)}
- Value of each coefficient of predicted value
\displaystyle \left[\begin{array}{l}
\hat{\beta_{1\left(2\right)}}\\\\
\hat{\beta_{1\left(3\right)}}
\end{array}\right]=\frac{1}{S_{11}S_{22}-S_{12}^{2}}\left[\begin{array}{l}
S_{22}S_{1y}-S_{12}S_{2y}\\\\
-S_{12}S_{1y}+S_{11}S_{2y}
\end{array}\right]
- Sum of squares and sum of deviations of each coefficient
S_{11}=\displaystyle \sum_{i=1}^{n}x_{i1\left(2\right)}^{2}-\frac{1}{n}\left(\sum_{i=1}^{n}x_{i1\left(2\right)}\right)^{2}
S_{22}=\displaystyle \sum_{i=1}^{n}x_{i1\left(3\right)}^{2}-\frac{1}{n}\left(\sum_{i=1}^{n}x_{i1\left(3\right)}\right)^{2}
S_{12}=\displaystyle \sum_{i=1}^{n}x_{i1\left(2\right)}x_{i1\left(3\right)}-\frac{1}{n}\sum_{i=1}^{n}x_{i1\left(2\right)}\sum_{i=1}^{n}x_{i1\left(3\right)}
S_{1y}=\displaystyle \sum_{i=1}^{n}x_{i1\left(2\right)}y_{i}-\frac{1}{n}\sum_{i=1}^{n}x_{i1\left(2\right)}\sum_{i=1}^{n}y_{i}
S_{2y}=\displaystyle \sum_{i=1}^{n}x_{i1\left(3\right)}y_{i}-\frac{1}{n}\sum_{i=1}^{n}x_{i1\left(3\right)}\sum_{i=1}^{n}y_{i}
6. Normal equation
\hat{\beta_{0}}=\overline{y}-\hat{\beta_{1\left(2\right)}}\overline{x_{1\left(2\right)}}-\hat{\beta_{1\left(3\right)}}\overline{x_{1\left(3\right)}}
7. Mean of each coefficient
\displaystyle \overline{y}=\frac{1}{n}\sum_{i=1}^{n}y_{i}
\displaystyle \overline{x_{1\left(2\right)}}=\frac{1}{n}\sum_{i=1}^{n}\overline{x_{i1\left(2\right)}}
\displaystyle \overline{x_{1\left(3\right)}}=\frac{1}{n}\sum_{i=1}^{n}\overline{x_{i1\left(3\right)}}
- The above formula will be black-boxed by the program, so you don't need to force yourself to remember it as mentioned above.
However, there is no loss in understanding it, so I will do it accurately.
Calculation of various constants
References
Introduction to Multivariate Analysis (Library New Math)
Yasushi Nagata (Author), Masahiko Muchinaka (Author)