In this article, I would like to roughly estimate the average score and standard deviation of all examinees based on the examinee data published on the Sapporo Medical University website.
Data published by the university shows the average, highest, and lowest scores of successful applicants. Since only the number of examinees can be grasped, I assumed that the score of the examinees follows a normal distribution, and used the given data to predict the parameters $ \ mu, \ sigma $ of the normal distribution. ..
First, the data used for the forecast are summarized in the table. Here, the lowest ranking is 75 every year, the capacity of the general entrance examination of Sapporo Medical University is 75 people, and even if additional passers occur, the score of the 75th passer will be announced as the lowest score. To do.
The analysis method is very simple. Just solve the following simultaneous equations.
\left\{
\begin{split}
Percentage of successful applicants&= \int_{Lowest point}^{\infty} \frac{1}{\sqrt{2 \pi} \sigma} \exp (-\frac{(x - \mu)^2}{2 \sigma^2}) dx \\
Average score of successful applicants&= \frac{\int_{Lowest point}^{\infty} \frac{x}{\sqrt{2 \pi} \sigma} \exp (-\frac{(x - \mu)^2}{2 \sigma^2})}{\int_{Lowest point}^{\infty} \frac{1}{\sqrt{2 \pi} \sigma} \exp (-\frac{(x - \mu)^2}{2 \sigma^2}) dx}
\end{split}
\right.
Let me give you a little supplementary explanation. The first formula is
Percentage of successful applicants= \frac{Bottom rank}{Number of examinees} = \int_{Lowest point}^{\infty}Normal distribution dx
is what it means. If you integrate the normal distribution from the lowest point to infinity, you can get the percentage of successful applicants.
The second formula is
Average score of successful applicants=Expected value of successful applicant score= \int_{Lowest point}^{\infty}Normalized constant\times x \times normal distribution dx= \frac{\int_{Lowest point}^{\infty}x \times normal distribution dx}{\int_{Lowest point}^{\infty}Normal distribution dx}
It means $$. The normalized constant is the constant $ C $ for the normal distribution dx = 1 $, that is, $ 1 / \ int_ {lowest point} ^ {\ infinty} normal. The distribution dx $.
$$ Now, the problem here is that the integral of the first equation is unknown for $ \ mu, \ sigma $, though I think that it is only calculated that the simultaneous equations with two unknowns are obtained. It cannot be calculated as it is. So I gave up trying to find a mathematically exact solution and decided to substitute various pairs of $ \ mu and \ sigma $ values to find the best fit. However, I can't do such a troublesome calculation, so pyhon is here.
First, import the required libraries and modules.
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
from scipy import integrate
import japanize_matplotlib
Next, create the data required for forecasting.
mu_I = [950,1000]
sigma_I = [60,90]
year = [2018,2019,2020]
n = [321,267,281]
pass_n = 75
pass_ratio = [pass_n/i for i in n]
pass_average = [1063,1073,1072]
worst = [1023,1029,1022]
mu_points = np.linspace(mu_I[0],mu_I[1],100)
sigma_points = np.linspace(sigma_I[0],sigma_I[1],60)
pass_ratio_err = 0.005
pass_average_err = 1
It also defines the functions needed to calculate the expected score for successful applicants.
def norm(x,mu, sigma):
return (x/(np.sqrt(2*np.pi)*sigma))*np.exp(-(x - mu)**2/(2*(sigma**2)))
Then, with the following code, plot point $ (\ mu, \ sigma) $
--"Make the percentage of successful applicants accurate to $ \ pm 0.5 $%" but "Do not make the average score of successful applicants accurate to $ \ pm 1 $" (yellow) --"Do not make the percentage of successful applicants accurate to $ \ pm 0.5 $%" but "Make the average score of successful applicants accurate to $ \ pm 1 $" (blue) --"Make the percentage of successful applicants accurate to $ \ pm 0.5 $%" and "Make the average score of successful applicants accurate to $ \ pm 1 $" (green)
It was classified into 3 ways.
ratio_average = []
ratio_only = []
average_only = []
for i in range(len(year)):
ratio_average.append([[],[]])
ratio_only.append([[],[]])
average_only.append([[],[]])
for mu_point in mu_points:
for sigma_point in sigma_points:
mu = mu_point
sigma = sigma_point
cdf = st.norm.cdf(worst[i], mu, sigma)
int_pdf = integrate.quad(norm,worst[i], np.inf, args = (mu, sigma))[0]
calculate_pass_ratio = 1 - cdf
calculate_pass_average = int_pdf / calculate_pass_ratio
if np.abs(calculate_pass_ratio - pass_ratio[i]) < pass_ratio_err:
if np.abs(calculate_pass_average - pass_average[i]) < pass_average_err:
ratio_average[i][0].append(mu)
ratio_average[i][1].append(sigma)
else:
ratio_only[i][0].append(mu)
ratio_only[i][1].append(sigma)
elif np.abs(calculate_pass_average - pass_average[i]) < pass_average_err:
average_only[i][0].append(mu)
average_only[i][1].append(sigma)
else:
pass
Finally, the classified points were color coded and plotted on the graph.
fig , axes = plt.subplots(1,3,figsize = (18,5))
for i, ax in zip([0,1,2],axes):
ax.scatter(ratio_only[i][0],ratio_only[i][1],c = 'y', s = 2, label= 'The percentage of successful applicants{:.3f} $\\pm$ {}%'.format(pass_ratio[i], pass_ratio_err*100))
ax.scatter(average_only[i][0],average_only[i][1],c = 'b', s = 2,label = 'The average number of successful applicants{} $\\pm$ {}point'.format(pass_average[i], pass_average_err))
ax.scatter(ratio_average[i][0],ratio_average[i][1],c = 'g', s = 2, label = 'Satisfy both of the above two conditions')
ax.set_xlim(mu_I[0], mu_I[1])
ax.set_ylim(sigma_I[0], sigma_I[1])
ax.set_xlabel('$\\mu$')
ax.set_ylabel('$\\sigma$')
ax.legend(loc = 'best')
ax.set_title('{}Year'.format(year[i]))
plt.show()
The execution result is as shown in the graph below.
The table below shows the approximate values of $ \ mu and \ sigma $ read from the green area of the graph.
This year (FY2020), the average score of successful applicants was higher than that of FY2018, but the average score of examinees was lower than that of FY2018. In addition, the standard deviation is increasing year by year, and it can be said that questions are being asked to widen the point difference among the examinees.
Recommended Posts