Test Mathematical Part 2 (Mathematical model of item reaction theory)

This is a continuation of Test Mathematics Part 1 (Question Setting and Data Generation).

Last time, it was "Problem setting and data generation". This time, it is "About the mathematical model used in the item reaction theory".

The environment used is

is.

Item characteristic curve

As I wrote last time, the problem consciousness here was to ** estimate the ability of the test taker and the difficulty of the problem ** when the test results were given. To estimate this, let's focus on a particular question and graph how well the test taker can answer that question. In extreme cases, you will get a graph like this:

import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-4, 4, 41)
y = x > 1.3
plt.step(x, y)
plt.xlabel("leaner's skill")
plt.ylabel("probability which learner can answer correctly")
plt.show()

image.png

The horizontal axis is the numerical value representing the ability of the examinee, and the vertical axis is the probability that the learner of that ability will answer the question correctly. Such a graph is called a ** item characteristic curve **. This example would always give the correct answer when the learner's ability exceeds 1.3, and always give the wrong answer when it is not. If you have such a question, you can put it in the exam to measure whether the learner's ability is above 1.3.

What is the learner's ability value here? I think that the question naturally arises. To conclude, the absolute value of this number has no meaning. However, when there are two relative problems, for example, question 1 above, question 2 is another. image.png Let's say you have the item characteristic curve. At this time, it can be judged that Q1 is more difficult than Q2. In fact,

Candidate 1 Candidate 2 Candidate 3
Question 1 Wrong Wrong Positive
Question 2 Wrong Positive Positive

When the result is obtained, the ability value will be Candidate 1 <Examinee 2 <Examinee 3. Also, in this situation, "Q1 (correct) and Q2 (wrong)" will not occur.

By the way, this situation is a little extreme because the two people who can answer correctly or answer incorrectly are decisive. Considering the actual statistical processing, it is expected that correct and incorrect answers are probabilistic to some extent. In particular, candidates with the ability to pass the mark will be able to answer correctly depending on the question. In that sense, what is actually used as the item characteristic curve is, for example, as follows. image.png For problems with good properties (problems that do not reverse the difficulty and correct answer rate), the cumulative density function of the probability distribution seems to be a good model. In item reaction theory, the logistic distribution is often used as a function that is mathematically easy to handle. A model using a logistic distribution is called a ** logistic model ** (logistic model). Depending on the number of parameters per problem, 1 to 3 parameter logistic models are well known [^ 1]. The following describes this 1 to 3 parameter logistic model.

1 parameter logistic model (1PL model, Rasch model)

The 1 parameter logistic model is one of the simplest items treated as an item characteristic curve and can be expressed by the following formula.

\Pr\{u_{ij} = 1|\theta, a, b\} = \frac{1}{1 + \exp(-a(\theta_j - b_i))}

Here, as in the previous article, the parameter for the question is subscripted with $ i $, and the parameter for the test taker is subscripted with $ j $. $ u_ {ij} $ is a random variable that indicates whether the test taker $ j $ can answer the question $ i $ correctly. Assuming that the total number of questions is $ I $ and the total number of test takers is $ J $, the number of paramters in this model is $ I $ ($ = b_i $) for questions and $ J for test takers. There are $ ($ = \ theta_j ) and one ( = a $) $ I + J + 1 $ related to the whole. The item characteristic curve is drawn as follows.

a = 3
def L1P(b, x, a=a):
    return 1 / (1 + np.exp(-  a * (x - b)))

x = np.linspace(-4, 4, 41)
for b in np.linspace(-2, 2, 5):
    y = partial(L1P, b)(x)
    plt.plot(x, y, label=f"{a=}, {b=}")
plt.xlabel("leaner's skill")
plt.ylabel("probability which learner can answer correctly")
plt.legend()
plt.show()

image.pngimage.png As you can see, the feature is that the inclination changes all at once. In other words, the ease of identifying the difficulty level for each question is the same. In this way, $ a $ is a quantity related to the ease of identification, so it is called ** discriminating power **. It can be seen that the range of discriminating power is a positive real number, and the larger the discriminating power, the easier it is to discriminate. Also, $ b $ is called ** difficulty ** because it represents the difficulty of the problem. The range of difficulty is the whole real number [^ 2], and the higher the difficulty, the more difficult the problem is. This model is also known as the Rasch model because it was studied by the Danish mathematician Rasch in the early 1960s.

2 parameter logistic model (2PL model)

The 2 parameter logistic model is a standard model and is the only [^ 3] model included in the python package pyirt. It is almost the same formula as the Rasch model and can be expressed as:

\Pr\{u_{ij} = 1|\theta, a, b\} = \frac{1}{1 + \exp(-a_i(\theta_j - b_i))}

This means that the problem-independent discriminating power $ a $ is now problem-dependent ($ a \ rightarrow a_j $). The number of paramters in this model is $ 2I $ ($ = a_i, b_i $) related to the question and $ 2I + J $ related to the test taker ($ = \ theta_j $). .. The item characteristic curve is drawn as follows.

def L2P(a, b, x):
    return 1 / (1 + np.exp(-  a * (x - b)))
x = np.linspace(-4, 4, 41)
for idx in range(5):
    a = 2 * (idx + 1) / 5
    b = -2.0 + idx
    y = partial(L2P, a, b)(x)
    plt.plot(x, y, label=f"{a=}, {b=}")
plt.xlabel("leaner's skill")
plt.ylabel("probability which learner can answer correctly")
plt.legend()
plt.show()

image.png 3 parameter logistic model (3PL model) The 3 parameter logistic model is a 2PL model plus a quantity called ** guessing **. In exams such as TOEIC, the questions are alternatives (3-choice questions, 4-choice questions, etc.). What action does a candidate take if the candidate does not have the ability to answer correctly in this alternative question? It's a random choice. In such a case, for example, in the case of a 4-choice question, a correct answer rate of at least 25% will be secured. This 25% part is a guess. Expressed in a mathematical formula, it looks like this:

\Pr\{u_{ij} = 1|\theta, a, b, c\} = c_i + \frac{1 - c_i}{1 + \exp(-a_i(\theta_j - b_i))}

Here, $ c_i $ is a guess in problem $ i $, and the range of possible values is $ 0 \ leq c_i \ leq 1 $. The number of paramters in this model is $ 3I + J $ ($ = a_i, b_i, c_i $) for questions and $ J $ ($ = \ theta_j $) for test takers. There are one. The item characteristic curve is drawn as follows.

def L3P(a, b, c, x):
    return c + (1 - c) / (1 + np.exp(-  a * (x - b)))
x = np.linspace(-4, 4, 41)
for idx in range(5):
    a = 2 * (idx + 1) / 5
    b = -2.0 + idx
    c = (4 -  idx) / 10
    y = partial(L3P, a, b, c)(x)
    plt.plot(x, y, label=f"{a=}, {b=}, {c=}")
plt.xlabel("leaner's skill")
plt.ylabel("probability which learner can answer correctly")
plt.legend()
plt.show()

image.png

next time

Introducing the parameter estimation method for the 3PL model. Test Mathematics Part 3 (3PL model optimization)

References

[^ 1]: There is also a 4-parameter logistic model that expresses that the correct answer rate reaches only $ d_i (<1) $, no matter how high the ability of the examinee, but I will omit it because I do not see much. [^ 2]: Actually limit to an appropriate range such as numerical calculation. [^ 3]: As of September 16, 2020

Recommended Posts

Test Mathematical Part 2 (Mathematical model of item reaction theory)
Exam Mathematical Part 4 (Implementation of Problem paramter Estimate)
Queuing theory part 4
Queuing theory part 3
Introduction of mathematical prediction model for infectious diseases (SIR model)