There is something called an error propagation formula. Below is a description of the error propagation formula. Since COV (a, b) is a covariance, it will be zero if the errors of a and b are independent.
y(a,b)Then the error σ of y_{y}Is,Error σ of a_{a}And b error σ_{b}Will be as follows using.
σ_{y}^2 = (\frac{∂y}{∂a})^2 σ_{a}^2 + (\frac{∂y}{∂b})^2 σ_{b}^2+ 2COV(a,b)\frac{∂y}{∂a}\frac{∂y}{∂b}
This time, I considered whether the error propagation formula can be applied to the standard error.
Below is a textbook explanation of statistics about what standard error is in the first place.
Suppose you want to estimate the parameters of the population distribution from some sample data. (For example, if the population distribution is normal, the parameters are mean and standard deviation.) Here, we also consider a confidence interval using the standard error derived during parameter estimation.
By the way, I will omit the details about how to derive the standard error, When the parameters are estimated by the maximum likelihood method, the log-likelihood function can be tampered with (differentiate and take the expected value, etc.). You can derive the standard error.
Considering the 95% confidence interval (= estimated parameter +/- 2 x standard error) Repeat the above 95% confidence interval estimate 100 times with new sample data sampled from the population distribution.
After 100 trials, 95 include the true parameter of the population distribution in the 95% confidence interval, and 5 do not. Standard error is used in the above sense. Roughly speaking, it's like an indicator of how reliable the estimated parameters are.
A similar term is standard deviation σ, which refers to the distribution in which the data can actually be combined.
Standard error is an indicator of how reliable the data is, and standard deviation is the range of data. Now let's get back to the first title. I often used the error propagation formula for standard deviation (or error), but what I wondered this time was "Is it applicable to standard error?" I thought that there would be no problem if I applied it as it was, but I tried numerical simulation with care.
As a flow, (1) Generate the data according to the following and estimate the parameters a, b, and c by the maximum likelihood method. Where ε is normally distributed random noise.
y = a+bx^2+cx^{-2} + ε
(2) After estimating b and c, calculate the parameter d = (c / b) ^ (1/4). (3) Calculate the standard error of d from the standard errors of b and c using the error propagation formula. This ①②③ is repeated multiple times, and the variation in d calculated in ② is considered to be the standard error of d. If this matches the calculated value in (3), the error propagation formula can be applied to the standard error. Since multiple calculations in (3) are also performed, the average value is used.
Below is the actual python code and output I think that the variation in (2) and the calculation result in (3) match.
main.py
import numpy as np
from numpy.random import *
from scipy.optimize import curve_fit
import statistics
import math
def create_toydata(x_list):
"""
x_with list as an argument,Y with error_create list
:param x_list:List of x
:return y_list:List of y that is a function of x
:return true_d:True d
"""
#Parameter true value Number is appropriate
a = 2e2
b = 2e-5
c = 2e7
y_list = [None] * len(x_list)
for i, x in enumerate(x_list):
#Add a normal distribution error to the data. I decided the number 2 appropriately
noize = 2 * randn()
y_list[i] = func(x, a, b, c) + noize
# c,Compute d, which is a function of b
true_d = (c/b)**(1/4)
return y_list, true_d
def func(x, a, b, c):
"""Define a function to fit. The function itself was decided appropriately"""
return a + b * x**2 + c * x**(-2)
def cal_se(C, B, C_se, B_se, COV):
"""
From the error propagation law when there is a correlation(C/B)^(1/4)Find the error of
:param C:Parameter C
:param B:Parameter B
:param C_se:C standard error
:param B_se:B standard error
:param COV: C,B covariance
:return: (C/B)^(1/4)Standard error
"""
del_c = (1/4) * C**(-3/4) * B**(-1/4)
del_b = (-1/4) * C**(1/4) * B**(-5/4)
return math.sqrt((del_c* C_se)**2 + (del_b * B_se)**2 + 2 * COV * del_c * del_b)
if __name__ == '__main__':
#Number of trials
n = 1000
a_list = [None] * len(range(n))
b_list = [None] * len(range(n))
c_list = [None] * len(range(n))
d_list = [None] * len(range(n))
d_se_list = [None] * len(range(n))
# cnt:95%The number of true values within the confidence interval
cnt = 0
#Repeat the following trial n times
for i in range(n):
#From toydata, the maximum likelihood method is "y"= a + x^2 + c^-Coefficient of 2 ”a,b,Find c
# b,Compute d, which is a function of c.
# b,Calculate the standard error of d from the standard error of c.
#Numbers are appropriate
x_st = 500
x_end = 2500
x_num = 200
x_list = np.linspace(x_st, x_end, x_num).tolist()
y_list, true_d = create_toydata(x_list=x_list)
popt, pcov = curve_fit(func, x_list, y_list)
# #Calculate d
d = (popt[2]/popt[1])**(1/4)
d_list[i] = d
#Standard error of d,b,Calculate from the standard error of c
d_se = cal_se(popt[2], popt[1], math.sqrt(pcov[2][2]), math.sqrt(pcov[1][1]), pcov[1][2])
d_se_list[i] = d_se
# 95%Find the confidence interval
d_low, d_up = d - 2 * d_se, d + 2 * d_se
# 95%Find out how many true values are in the confidence interval
if d_low < true_d and true_d < d_up:
cnt += 1
#Find the standard deviation of d tried n times(Consider this as the standard error of d)
true_d_se = statistics.pstdev(d_list)
print("Standard error of d:", true_d_se)
#The standard error of d is calculated for each number of trials, so the final result is averaged.
cal_d_se = statistics.mean(d_se_list)
print("Mean of calculated standard errors of d:", cal_d_se)
# 95%Find out the rate at which the standard error falls within the confidence interval.95%0 from the confidence interval definition.Should be 95
print("95%Percentage of standard errors within the confidence interval:", cnt/n)
The following is the execution result. The first line is the variation of ② and the second line is the calculation result of ③.
Standard error of d: 2.2655785060979126
Mean of calculated standard errors of d: 2.2443534560700673
95%Percentage of standard errors within the confidence interval: 0.947
Recommended Posts