When conducting an A / B test, it is very important not only to estimate the effect of taking measures, but also to calculate how reliable the estimated value is. Therefore, the estimation error of the effect and the confidence interval are derived.

Example) ATE estimate and 95% confidence interval above and below

Problem setting

Estimate how much the product purchase rate on the EC site will change by hitting a certain campaign. The intervention group targeted by the campaign is $ T $, the control group excluded from the campaign is $ C $, the sample size is $ n $ and $ m $, respectively, and the population mean is $ p_ {T} $, $ p_ {C. } $, Then the user behavior $ X_ {T} $, $ X_ {C} $ in the intervention and control groups follows the Bernoulli distribution. If the user purchased the item, it will be 1, and if it is not purchased, it will be 0. In addition, the user is assigned A / B independently, and the user's behavior is independent.

X_{T1}, X_{T2}, \ldots X_{Tn} \sim Bern(p_{T}) \\
X_{C1}, X_{C2}, \ldots X_{Cm} \sim Bern(p_{C})

Estimating the distribution of effects

The effect of the measure that hitting the campaign has on the purchase rate is expressed as $ ATE $. Derivate the distribution, mean, and variance of $ ATE $.

Central limit theorem

Suppose you have a sample sampled independently from the probability distribution $ F $ with mean and variance parameters $ \ mu, \ sigma ^ 2 $

X_1, X_2, \ldots, iid \sim F(\mu, \sigma^2)

From the central limit theorem, the following distribution convergence holds for the sample mean $ \ bar {X} $.

\lim_{n\rightarrow\infty}P \left( \frac{\sqrt n (\bar{X}-\mu)}{\sigma} \leq x \right) = \Phi(x)

In other words, the distribution followed by the sample mean of a sample that occurs independently of any probability distribution with mean and variance parameters asymptotically approaches the normal distribution. If the population mean of the Bernoulli distribution is $ p $, the population variance can be expressed as $ p (1-p) $, so if the sample size is large, the sample mean of the intervention and control groups is $ \ bar {X_T}, \ bar. {X_C} $ follows the following normal distribution.

\bar{X_T} =  \frac{1}{n}\sum^n X_{Ti} \\
\bar{X_C} = \frac{1}{m}\sum^m X_{Ci} \\
\bar{X_T} \sim N(p_T, \frac{1}{n}p_{T}(1-p_{T})) \\
\bar{X_C} \sim N(p_C, \frac{1}{m}p_{C}(1-p_{C})) \\

Definition of effect

The effect $ ATE $ is defined as the difference between the expected values of two potential outcome variables.

ATE = E[X_T - X_C] = E[X_T] - E[X_C] = p_T - p_C

The average difference between "results when the entire user population is assigned to the intervention group" and "results when the entire user population is assigned to the control group".

In reality, users can only be assigned to interventions or controls. For example, when a user becomes a campaign target, you can observe whether they have purchased the product afterwards. However, since the user has already been assigned to the campaign target, it is not known whether the user purchased the product when it was not targeted for the campaign. So $ ATE $ cannot be calculated directly from the data.

Therefore, A / B testing usually makes assignments completely random. If the allocation is random, the effect can be estimated unbiased (unbiased) by using the difference in the mean values of each observed group. Let the estimator of $ ATE $ be $ \ hat {ATE} $.

\begin{align}
\hat{ATE} &= \frac{1}{n}\sum^nX_{T,i} + \frac{1}{m}\sum^mX_{C,i} \\
&= \bar{X_T} - \bar{X_C} \\
E[\hat{ATE}] &= E[\bar{X_T}] - E[\bar{X_C}] = p_T - p_C = ATE \\
\end{align}

Derivation of the probability distribution that the estimator of effect follows

The estimated effect is calculated by purchase rate of intervention group-purchase rate of control group. Since the purchase rate of the intervention group and the purchase rate of the control group each follow a normal distribution, the statistics calculated based on them also follow a normal distribution.

\hat{ATE} = \bar{X_T} - \bar{X_C} \sim N(p_T - p_C, \frac{1}{n}p_{T}(1-p_{T})+\frac{1}{m}p_{C}(1-p_{C}))

The parameter of the probability distribution that the statistic $ \ hat {ATE} $ of the effect of taking the measure follows is

Average $ p_T-p_C $
Distributed $ \ frac {1} {n} p_ {T} (1-p_ {T}) + \ frac {1} {m} p_ {C} (1-p_ {C}) $
Standard deviation $ \ sqrt {\ frac {1} {n} p_ {T} (1-p_ {T}) + \ frac {1} {m} p_ {C} (1-p_ {C})} $

It will be.

Organizing words in inference statistics

So far, I have used various symbols without any notice. I will sort out various terms once.

Statistical survey and inference statistics

Statistical surveys are conducted to extract data to examine the characteristics of the population and to process the data to derive useful information about the population for judgment and decision making. If it is difficult to investigate the entire population, sample from the population and make inferences about the population.

In inference statistics, a probability model is assumed for the population, and the data is captured as the realization value of a random variable that follows the probability distribution. Samples $ X_1, X_2, \ ldots, X_n $ that follow a certain probability distribution are random variables, and their realizations $ x_1, x_2, \ ldots, x_n $ are data. $ n $ is called the sample size.

Parameter and statistics / estimator

A population parameter is a characteristic of the population, such as the mean $ \ mu $ or the variance $ \ sigma ^ 2 $ of the population. A function based on the sample $ X_1, X_2, \ ldots, X_n $ that does not include the parameter is called a statistic, and its probability distribution is called a sample distribution. For example, sample $ X_1 $ and sample mean $ \ bar {X_T} $ are statistics. In addition, $ \ hat {ATE} $ calculated based on the statistic is also a statistic. In inference statistics, the population parameter is estimated by calculating the ** statistic. ** ** When estimating the parameters of a population based on the statistics obtained from the sample, the statistic of the sample is called the estimator.

About estimators and estimates

The estimator is a random variable. It does not definitely take any value. The estimated value is calculated using the numbers obtained by actually observing the data.

Difference between estimator and estimated value

Estimating the parameter of the probability distribution of the effect

Estimate the population parameter using the actually observed data.

Estimating the population mean

The average $ \ mu $ of the population is a population parameter, but it is not easy to calculate directly based on all the data of the population. Therefore, after observing n data, the sample mean $ \ bar {X} = \ frac {1} {n} \ sum ^ n X_i $ is calculated. The sample mean is the same as the population mean. $ E [\ bar {X}] = \ mu $. Therefore, it is possible to estimate the population mean through the sample mean.

Assuming the data is $ (x_1, x_2, \ ldots, x_n) $, the estimate is $ \ bar {x} = \ sum ^ n x_i $.

Uncertainty of population mean

However, since the statistic (= estimator) is a random variable, there is uncertainty. The uncertainty about the value varies greatly between the sample mean calculated based on 10 data and the sample mean based on 10000 data. Therefore, the standard error is calculated to reveal the uncertainty.

Difference between standard deviation and standard error

With the random variable $ X $, we can define the mean $ \ mu = E [X] $ and the variance $ \ sigma ^ 2 = Var (X) = E [(X- \ mu) ^ 2] $. The distribution route is called the standard deviation $ \ sigma = \ sqrt {Var (X)} $.

Since statistics are also random variables, you can define the standard deviation in the same way. Statistics are often used to estimate the value of a population parameter by making the mean the population parameter. So ** the standard deviation of the statistic, the uncertainty of the estimate against the true value, is called the "standard error". ** **

For example, the standard deviation of the sample mean $ \ bar {X} = \ frac {1} {n} \ sum_ {i = 1} ^ n X_i $ is $ \ frac {\ sigma} {\ sqrt n} $ , $ \ Bar {X} $ represents the uncertainty in terms of the statistic for estimating the population mean $ \ mu $, $ \ frac {\ sigma} {\ sqrt n} $ is the standard error is.

Standard deviation estimation

When the expected value of the estimator is equal to the population parameter, it is called an unbiased estimator. For example, the sample mean is an unbiased estimator of the population mean.

The unbiased estimator $ V ^ 2 $ for the population variance $ \ sigma ^ 2 $ is:

V^2 = \frac{1}{n-1} \sum^n (X_i - \bar{X})^2 \\
E[V^2] = \sigma^2 \\

$ V ^ 2 $ is called unbiased variance. The sample mean is the sum of the individual data divided by $ n $, while the unbiased variance is divided by $ n-1 $. By default, the Pandas functions df.var () and df.std () calculate unbiased variance.

It is possible to calculate the unbiased variance according to the above definition, but taking advantage of the fact that the variance of the Bernoulli distribution can be calculated in $ p (1-p) $, the sample mean $ \ bar {X} $ and the sample variance $ \ You can also use bar {X} (1- \ bar {X}) $ to calculate the unbiased variance $ V ^ 2 $.

\begin{align}
E[\bar{X}(1-\bar{X})] &= E[\bar{X} - \bar{X}^2] \\
&= E[\bar{X}] - E[\bar{X}^2] \\
&= E[\bar{X}] - (E[\bar{X}]^2 + Var(\bar{X})) \\
&= p - p^2 - \frac{1}{n^2} \sum^n Var(X_i) \\
&= p(1 - p) - \frac{1}{n} p(1 - p) \\
&= \frac{n-1}{n} p (1 - p) \\
E \left[ \frac{n}{n-1}\bar{X}(1-\bar{X}) \right] &= \sigma^2 \\
\end{align} \\

Therefore, the unbiased variance of $ X $ is $ V ^ 2 = \ frac {n} {n-1} \ bar {X} (1- \ bar {X}) $.

The unbiased variance between the intervention group and the control group is as follows.

V_T^2 = \frac{n}{n-1} \bar{X_T} (1 - \bar{X_T}) \\
V_C^2 = \frac{m}{m-1} \bar{X_C} (1 - \bar{X_C}) \\

Calculate the sample mean $ \ bar {x} $ as an estimate of the population mean $ \ mu $ of $ X $. The estimated value $ v ^ 2 $ for the population variance $ \ sigma ^ 2 = p (1-p) $ of $ X $ is as follows.

\bar{x} = \frac{1}{n} \sum^n x_i \\
v^2 = \frac{n}{n-1} \bar{x} (1 - \bar{x}) \\

The standard deviation estimate is $ v $.

Standard error estimation

The variance (square of standard error) of the sample mean $ \ bar {X} $ is as follows.

SE^2 = \frac{V^2}{n}

The variance of the sample mean of the intervention and control groups is as follows.

SE_T^2 = \frac{V_T^2}{n} \\
SE_C^2 = \frac{V_C^2}{m} \\

Therefore, the variance of the effect is as follows.

V_{ATE}^2 = SE_T^2 + SE_C^2 = \frac{V_T^2}{n} + \frac{V_C^2}{m}

Pooled distribution

Assuming that the variances of the probability distributions followed by the intervention group and the control group are equal at $ V_ {TC} ^ 2 $, the variance of $ \ hat {ATE} $ can be reduced (= the estimation accuracy is improved). It is necessary to check in advance whether homoscedasticity is established.

\begin{align}
V_T^2 &= \frac{n}{n-1}\bar{X_T}(1-\bar{X_T}) \\
V_C^2 &= \frac{m}{m-1}\bar{X_C}(1-\bar{X_C}) \\
V_{TC}^2 &= \frac{(n-1)V_T^2 + (m-1)V_C^2}{(n-1)+(m-1)} \\
&= \frac{(n-1)V_T^2 + (m-1)V_C^2}{n+m-2} \\
V_{ATE,pool}^2 &= \frac{V_{TC}^2}{n} + \frac{V_{TC}^2}{m} \\
\end{align}

Estimated summary of population parameter of probability distribution of effect

I will summarize the formulas that have appeared so far.

About the probability distribution of the intervention group and the treatment group

X_{T1}, X_{T2}, \ldots X_{Tn} \sim Bern(p_{T}) \\
X_{C1}, X_{C2}, \ldots X_{Cm} \sim Bern(p_{C}) \\
\bar{X_T} \sim N(p_T, \frac{1}{n}p_{T}(1-p_{T})) \\
\bar{X_C} \sim N(p_C, \frac{1}{m}p_{C}(1-p_{C})) \\
\hat{ATE} \sim N(p_T - p_C, \frac{1}{n}p_{T}(1-p_{T})+\frac{1}{m}p_{C}(1-p_{C}))

Estimator of the parameter of the effect

Estimator of mean parameters

\bar{X_T} = \frac{1}{n}\sum^n X_{Ti} \\
\bar{X_C} = \frac{1}{m}\sum^n X_{Ci} \\
\hat{ATE} = \bar{X_T} - \bar{X_C}

Estimator of variance parameters

V_T^2 = \frac{n}{n-1}\bar{X_T}(1-\bar{X_T}) \\
V_C^2 = \frac{m}{m-1}\bar{X_C}(1-\bar{X_C}) \\
V_{ATE}^2 = \frac{V_T^2}{n} + \frac{V_C^2}{m} \\
V_{TC}^2 = \frac{(n-1)V_T^2 + (m-1)V_C^2}{n+m-2} \\
V_{ATE,pool}^2 = \frac{V_{TC}^2}{n} + \frac{V_{TC}^2}{m} \\

About the estimated parameter of the effect

Use the observed data to calculate the estimated value, which is the realization of the statistic. Since it is almost the same as the estimated amount, some parts will be omitted.

\hat{ate} = \bar{x_T} - \bar{x_C} \\
v_{ATE}^2 = \frac{v_T^2}{n} + \frac{v_C^2}{m} \\
v_{ATE,pool}^2 = \frac{v_{TC}^2}{n} + \frac{v_{TC}^2}{m} \\

Calculation of Confidence Intervals for Estimated Effects

Finally, calculate the confidence interval for $ \ hat {ATE} $ to measure the uncertainty of the effect estimate.

Hypothesis test and confidence interval

In the hypothesis test, two opposing hypotheses, the null hypothesis and the alternative hypothesis, are made. The null hypothesis is that normally the population mean $ \ mu $ is a value $ \ mu_0 $. On the other hand, the hypothesis that the value of the population mean is not $ \ mu_0 $ is called the alternative hypothesis.

By observing the realization value $ x $ of the random variable $ X $, we calculate how rare the probability that the realization value will be observed when the null hypothesis is correct, and accept the hypothesis based on the calculation result. Or reject it.

Suppose a random variable follows a normal distribution. The variance $ \ sigma_0 ^ 2 $ is known.

X \sim N(\mu, \sigma_0^2)

Set the null hypothesis and the alternative hypothesis as follows.

H_0: \mu = \mu_0 \\
H_1: \mu \neq \mu_0 \\

The rejection and acceptance areas of the null hypothesis are as follows.

R = \left\{ x \in \mathbb{R} \middle| \frac{|x - \mu_0|}{\sigma_0} > z_{\alpha/2}\right\} \\
A = \left\{ x \in \mathbb{R} \middle| \frac{|x - \mu_0|}{\sigma_0} \leq z_{\alpha/2}\right\}

$ z_ {\ alpha / 2} $ is the z value of the standard normal distribution. The probability that $ z_ {\ alpha} $ or more will be extracted from the standard normal distribution $ \ Phi = N (0,1) $ is $ \ alpha $.

How to use the standard normal table

Here, if the test method is inverted from $ X $ to $ \ mu_0 $,

\begin{align}
A &= \left\{ \mu_0 \in \mathbb{R} \middle| \frac{|x - \mu_0|}{\sigma_0} \leq z_{\alpha/2}\right\} \\
  &= \left\{ \mu_0 \in \mathbb{R} \middle| |x - \mu_0| \leq \sigma_0 z_{\alpha/2}\right\} \\
  &= \left\{ \mu_0 \in \mathbb{R} \middle| x - \sigma_0 z_{\alpha/2} \leq \mu_0 \leq x + \sigma_0 z_{\alpha/2} \right\}
\end{align}

It will be. If the null hypothesis is correct, the probability that the realization value of the sample $ X $ is observed as $ x $ is $ 1-\ alpha $% or less. $ [X-\ sigma_0 z_ {\ alpha / 2}, x + \ sigma_0 z_ {\ alpha / 2}] $. This is called the confidence interval with a confidence factor of $ 1-\ alpha $.

Calculation of Confidence Intervals for Estimator of Effect

Now that we know how to calculate the confidence intervals, we can actually calculate the confidence intervals for the effect.

Parameter of the probability distribution that the effect follows

The parameter of the probability distribution that the estimator $ \ hat {ATE} $ of the effect of taking the measure follows is

\begin{align}
\mu &= ATE = p_T - p_C \\
\sigma &= \sqrt{\frac{1}{n}p_{T}(1-p_{T})+\frac{1}{m}p_{C}(1-p_{C})} \\
\end{align}

was.

Calculate parameter estimate from data

Calculate the average estimate. We don't know the true value of the standard deviation, so we'll substitute an unbiased route instead.

\begin{align}
\hat{\mu} &= \hat{ate} = \bar{x_T} - \bar{x_C} \\
\sigma_0 &\approx v_{ATE} = \sqrt{\frac{1}{n-1}\bar{x_T}(1-\bar{x_T})+\frac{1}{m-1}\bar{x_C}(1-\bar{x_C})} \\
\end{align}

If the distribution that $ ATE $ follows is a normal distribution (* see below), the effect of taking measures $ ATE $'s $ 1-\ alpha $ confidence interval

[\hat{\mu} - v_{ATE} \, z_{\alpha/2}, \hat{\mu} + v_{ATE} \, z_{\alpha/2}]

It will be.

The relationship between $ \ alpha $ and $ z_ {\ alpha / 2} $

\begin{align}
\alpha = 0.05 &\leftrightarrow z_{\alpha/2} = 1.96 \\
\alpha = 0.1 &\leftrightarrow z_{\alpha/2} = 1.65 \\
\end{align}

So, for example, if $ \ alpha = 0.05 $, the 95% confidence interval on both sides is

[\hat{\mu} - 1.96v_{ATE}, \hat{\mu} + 1.96v_{ATE}]

is.

"Sample mean ± 2 x standard error" is an approximation of the 95% confidence interval on both sides. For practical purposes, I think it's a good idea to start by using this.

code

import numpy as np
import pandas as pd
import plotly.express as px

##Problem setting
t_imp = 10000
c_imp = 50000
t_click = 100
c_click = 350

t_ctr = t_click / t_imp
c_ctr = c_click / c_imp


##Calculate ATE population mean and population variance estimates
#average
ate_mean = t_ctr - c_ctr

#Distributed(Without assuming homoscedasticity)
ate_var = (t_ctr * (1 - t_ctr) / (t_imp - 1)) + (c_ctr * (1 - c_ctr) / (c_imp - 1)) #No assumption of homoscedasticity
ate_se = np.sqrt(ate_var)
#Distributed(Assuming homoscedasticity)
t_var = (t_ctr * (1 - t_ctr)) * (t_imp / (t_imp - 1)) #Unbiased dispersion of intervention group
c_var = (c_ctr * (1 - c_ctr)) * (c_imp / (c_imp - 1)) #Unbiased dispersion of control groups
tc_var_pool = ((t_imp - 1) * t_var + (c_imp - 1) * c_var) / (t_imp + c_imp - 2) #Pooled distribution
ate_var_pool = tc_var_pool * (1 / t_imp + 1 / c_imp) #Variance of ate
ate_se_pool = np.sqrt(ate_var_pool)


##Visualization
df_plot = pd.DataFrame({
    "mean": [ate_mean, ate_mean],
    "2se": [2 * ate_se, 2 * ate_se_pool], #Up and down 95%Confidence interval approximation
    "type": ["equal_var=False", "equal_var=True"]
})
px.bar(df_plot, x="type", y="mean", error_y="2se", height=400, width=400)

It can be seen that the estimation error is slightly smaller when the homoscedasticity is assumed.

Impressions etc.

Thanks to various research, I was able to organize the concept of inference statistics.

I want to know the parameter that determines the shape of a certain event by assuming a probability distribution in the process of occurrence.
Since the population parameter is unobservable, a (unbiased) estimator is derived from the sample.
Since the estimator is a random variable, the certainty is derived as the standard deviation or standard error.
Calculate estimates such as mean and standard error based on the actually observed data.

The terminology is confusing and confusing. ..

References

(Kubogawa, 2017) "Basics of Modern Mathematical Statistics" (Hoshino, 2009) "Statistical science of survey observation data" (Yasui, 2020) "Introduction to Effect Verification-Cause and Effect Reasoning for Correct Comparison / Basics of Quantitative Economics"

(Digression) Normal distribution and t distribution

We have assumed a normal distribution for the confidence intervals, but if the variance is unknown, $ \ hat {ATE} $ follows a t distribution instead of a normal distribution. In this problem setting, since it is the product purchase rate of the EC site, a sufficient number of data is collected, and the shapes of the t distribution and the normal distribution are almost the same. Therefore, the explanation is omitted. If you want to use the t distribution to estimate more rigorous tests and confidence intervals, replace $ z_ {\ alpha / 2} $ with $ t_ {\ alpha / 2} $. When actually calculating, it is easier to use a package such as stats.ttest_ind directly from the data instead of manually calculating the t-value.

Derivation of certainty of effect in A / B testing

Problem setting

Estimating the distribution of effects

Central limit theorem

Definition of effect

Derivation of the probability distribution that the estimator of effect follows

Organizing words in inference statistics

Statistical survey and inference statistics

Parameter and statistics / estimator

About estimators and estimates

Estimating the parameter of the probability distribution of the effect

Estimating the population mean

Uncertainty of population mean

Difference between standard deviation and standard error

Standard deviation estimation

Standard error estimation

Pooled distribution

Estimated summary of population parameter of probability distribution of effect

About the probability distribution of the intervention group and the treatment group

Estimator of the parameter of the effect

About the estimated parameter of the effect

Calculation of Confidence Intervals for Estimated Effects

Hypothesis test and confidence interval

Calculation of Confidence Intervals for Estimator of Effect

Parameter of the probability distribution that the effect follows

Calculate parameter estimate from data

code

Impressions etc.

References

(Digression) Normal distribution and t distribution