Introduction

If you run a general EC site, you will check what happened to CVR if you took some measures, and how sales changed if you distributed coupons. However, it may end with "CVR went up, isn't it a coincidence?" Or "It's significant! (Somehow)". In this article, I'll explain how to check significance using statistics and provide a simple tool that you can copy and paste into your jupyter notebook. The explanation is not very rigorous.

Target person

--I don't know the statistics --I have used python and jupyter notebook ――I want to check the significance for the time being because the theory is good (there are tools that can be copied and used as they are).

CVR blur

Mr. A's company sells apples on its website. Mr. A was told by his boss to calculate the CVR. CVR is the number sold divided by PV. That day, 30 pieces were sold for 1,000 PV, so the CVR was 3.0%. The next day, 28 pieces were sold for 1,000 PV, so the CVR was 2.8%, and my boss got angry when I lied. CVR is a blurry thing, so you need to tell it to the extent that it fits roughly. For the time being, it can be 2.5% to 3.5%, but I would like to think more statistically.

Here, there is a phenomenon called Bernoulli distribution or binomial distribution that expresses the phenomenon that people who visit the website have two choices, "buy" or "do not buy". In the graph below, the horizontal axis shows the number of items sold and the vertical axis shows the probability at that time. There is a high probability that 30 will sell the most, and it seems unlikely that it will be 20 or less.

import numpy as np
from scipy.stats import binom, norm
import pandas as pd
p = 0.03
N = 1000
k = np.arange(100)
pd.Series(binom.pmf(k, N, p), name='binom').plot(figsize=(12,6), legend=True)

With this, it seems that you can pick up the place with a large value and put out a range that roughly matches. By the way, this range is called the ** confidence interval **, and the probability of entering the confidence interval is called the ** confidence coefficient **. Use the tools below to get a 95% chance of getting the correct range. (The number of samples is the number of PV.)

#Confidence interval for population ratio
import ipywidgets as widgets
import math
from scipy.stats import binom, norm


def calc(v):
    input_N = w.value
    input_cvr = w2.value
    input_conf = w3.value
    p = input_cvr / 100
    
    #In the case of binomial distribution
    max_index = binom.isf((100 - input_conf) / 100, input_N, input_cvr / 100) 
    min_index = binom.isf(input_conf / 100, input_N, input_cvr / 100)
    #When approximating a normal distribution
    #max_index = norm.isf((100 - input_conf)/100, loc=input_N*p, scale=np.sqrt(input_N*p*(1-p)))
    #min_index = norm.isf(input_conf/100, loc=input_N*p, scale=np.sqrt(input_N*p*(1-p)))
    print(f'{math.floor(min_index / input_N * 10000) / 100}(％) <= CVR <= {math.ceil(max_index / input_N * 10000) / 100}(％)')


button = widgets.Button(description="Calculation")
button.on_click(calc)

w = widgets.IntText(
    value=1000,
    description='The number of samples:',
    disabled=False
)
w2 = widgets.BoundedFloatText(
    value=1,
    min=0,
    max=100.0,
    description='CVR（％）:',
    disabled=False
)
w3 = widgets.BoundedIntText(
    value=95,
    min=0,
    description='Confidence factor (%):',
    disabled=False
)

output = widgets.Output()
display(w, output)
display(w2, output)
display(w3, output)
display(button, output)

When executed, it will look like the image below. You can enter it freely.

2.11 (%) <= CVR <= 3.89 (%)! It's quite wide. A confidence factor of 95-> 90% narrows the range. While there is a high probability that you will be a "person in your teens to 20s, or a person in your 30s to 40s, or a person in your 50s or older," it seems that "teens to 20s" are more likely to come off. If you increase the number of samples, the range may become smaller.

The commented out part of the source is the Bernoulli distribution [normal distribution](https://ja.wikipedia.org/wiki/%E6%AD%A3%E8%A6%8F%E5%88%86%E5 When it is close to% B8% 83). If the number of samples is large enough, almost the same result will be obtained. When I run it, it seems that even if the number of samples is 1,000, it is large enough. It doesn't seem necessary to approximate it, but assuming a normal distribution is often statistically easy to do.

Test of CVR difference

The product didn't sell well, so when I tried to modally print hypnosis images, the CVR increased from 2.0% to 3.0%. Both are 1,000 PV. It seems to be significant because it increased 1.5 times, but I will try to verify (= test) statistically.

The test describes it as "significant" when it happens, even though it was thought it would rarely happen. The probability is called the significance level, and this is decided first. In this case, if the significance level is set to 5%, it can be said that it is "significant" if the probability that there is a difference is 5% or less because it has increased to 2.0-> 3.0%.

I will omit the details, but since the Bernoulli distribution can be approximated to a normal distribution when the number of samples is large enough, and the difference between the normal distributions is also a normal distribution (Reference: [Reproducibility of normal distribution](Reference: https://bellcurve.jp/statistics/course/7799.html)) It is better to have about 1,000 samples for accurate results.

#Test of difference in population ratio
import matplotlib.pyplot as plt
import ipywidgets as widgets
import math
from scipy.stats import norm

def calc_plot(v):  
    input_N1 = w_n1.value
    input_cvr1 = w_cvr1.value
    input_N2 = w_n2.value
    input_cvr2 = w_cvr2.value
    input_conf = w3.value

    p1 = input_cvr1 / 100
    p2 = input_cvr2 / 100
    N1 = input_N1
    N2 = input_N2

    p = (N1 * p1 + N2 * p2) / (N1 + N2)
    z = (p2 - p1) / math.sqrt(p * (1 - p) * (1 / N1 + 1 / N2))
    
    min_index = norm.isf(1 - (100 - input_conf)/(2*100), loc=0, scale=1)
    max_index = norm.isf((100 - input_conf)/(2*100), loc=0, scale=1)

    if min_index <= z and z <= max_index:
        print('Not significant')
        print(f'|{z}| <= {max_index}')
    else:
        print('There is a significant difference!')
        print(f'{max_index} <= |{z}|')


    xlimit = np.array([math.ceil(abs(z)), 5]).max()

    x = np.arange(- xlimit * 100, xlimit * 100)/100
    y = norm.pdf(x)
    plt.figure(figsize=(15, 7)) 
    plt.vlines([min_index, max_index], y.min(), y.max(), "red", linestyles='dashed', label='rejection')
    plt.legend()
    plt.vlines([z], y.min(), y.max(), "black", linestyles='dashed', label='statistics')
    plt.legend()
    plt.plot(x, y,'b-', lw=1, label='norm pdf')
    

button = widgets.Button(description="Calculation")
button.on_click(calc_plot)
w_n1 = widgets.IntText(
    value=10000,
    description='The number of samples:',
    disabled=False
)
w_n2 = widgets.IntText(
    value=12000,
    description='The number of samples:',
    disabled=False
)
w_cvr1 = widgets.BoundedFloatText(
    value=2,
    min=0,
    max=100.0,
    description='CVR（％）:',
    disabled=False
)
w_cvr2 = widgets.BoundedFloatText(
    value=3,
    min=0,
    max=100.0,
    description='CVR（％）:',
    disabled=False
)
w3 = widgets.BoundedIntText(
    value=95,
    min=0,
    description='Confidence factor (%):',# 100 -Significance level
    disabled=False
)

w_a = widgets.VBox([widgets.Label('A'), w_n1, w_cvr1])
w_b = widgets.VBox([widgets.Label('B'), w_n2, w_cvr2])
whbox = widgets.HBox([w_a, widgets.Label('　　'), w_b])

output = widgets.Output()
display(whbox, output)
display(w3, output)
display(button, output)

When I ran the code above, I got the result shown in the image below.

If there is a black line inside the two red lines, it will be regarded as "this kind of blurring is possible" and the result will not be significant. On the contrary, if it is on the outside, it is significant. After watching the A / B test for a while, if both increase from 2.0%-> 3.0% at 10,000 PV, it can be said that there is a significant difference. It's no coincidence that the blur is reduced by increasing the number of samples, but the difference is 1.0%. The point to note here is that "not significant" cannot be concluded to be insignificant. Also, just because the black and red lines are so far apart is not very significant or very significant. There are two choices, whether it is significant or not. With that in mind, I feel that the confidence interval has more information than the result of the test.

Whether B is larger than A

In the above case, we tested whether there was a difference, but in reality, we usually only want to know if the CVR has increased by taking measures. In that case, use ** one-sided test **. (In the above case, it is called a two-sided test.) When doing a one-sided test, replace calc_plot in the python code above with the code below.

    input_N1 = w_n1.value
    input_cvr1 = w_cvr1.value
    input_N2 = w_n2.value
    input_cvr2 = w_cvr2.value
    input_conf = w3.value

    p1 = input_cvr1 / 100
    p2 = input_cvr2 / 100
    N1 = input_N1
    N2 = input_N2

    p = (N1 * p1 + N2 * p2) / (N1 + N2)
    z = (p2 - p1) / math.sqrt(p * (1 - p) * (1 / N1 + 1 / N2))
    max_index = norm.isf((100 - input_conf)/100, loc=0, scale=1)

    if z <= max_index:
        print('Not significant')
        print(f'|{z}| <= {max_index}')
    else:
        print('There is a significant difference!')
        print(f'{max_index} <= |{z}|')


    xlimit = np.array([math.ceil(abs(z)), 5]).max()

    x = np.arange(- xlimit * 100, xlimit * 100)/100
    y = norm.pdf(x)
    plt.figure(figsize=(15, 7)) 
    plt.vlines([max_index], y.min(), y.max(), "red", linestyles='dashed', label='rejection')
    plt.legend()
    plt.vlines([z], y.min(), y.max(), "black", linestyles='dashed', label='statistics')
    plt.legend()
    plt.plot(x, y,'b-', lw=1, label='norm pdf')

In this case, it is significant if there is a black line to the right of the red line. Since we only see whether this is higher or not, it is easier to make a significance judgment with a smaller number of samples.

Confidence interval for average purchase price

Even if the CVR is up, it may be possible that the average purchase price is down. (When the minimum usage amount is increased when the coupon is distributed, etc.) Let's calculate how much the average purchase price will fluctuate. In general, the annual income has a lognormal distribution, and there is a positive correlation between the annual income and the purchase price, so it is thought that the purchase price also has a similar distribution on EC sites that handle various products. (Reference: Example of lognormal distribution and mean, variance)

#Lognormal distribution
from scipy.stats import lognorm
x = np.arange(1, 100000) / 1
A = 1.5000e+04 #Average purchase price
B = 1.5000e+04 ** 2 #Distributed
s = np.sqrt(np.log(B/(A ** 2) + 1))
mu = np.log(A) - (s ** 2 / 2)
y = pd.Series(lognorm.pdf(x, s, 0, np.exp(mu)))
y.index = x
y.plot(figsize=(12, 6))

If you combine these and take the average, you will have a normal distribution again. The code below is a histogram of the result of repeating the operation of calculating the average of 500 pieces 100,000 times. According to Central Limit Theorem, the average histogram will look similar if there are enough samples even if it is not a lognormal distribution. .. How many samples are enough depends on the distribution, but about 1,000 seems to be good.

means = []
n = 500
for i in range(0, 100000):
    means.append(np.array(lognorm.rvs(s, 0, np.exp(mu), size=n)).mean())
    
pd.Series(means).hist(bins=100)

The image below is called a Q-Q plot, and if blue is close to the red straight line, it is close to the normal distribution.

import pylab
stats.probplot(means, dist="norm", plot=pylab)

The code below calculates the confidence interval for the normal distribution. The "number of samples" this time is not the PV, but the number of purchases.

#Confidence interval for average purchase price
import ipywidgets as widgets
import math
import numpy as np
import pandas as pd
from scipy.stats import binom, norm

def calc(v):
    n = w.value
    mu = w2.value
    sigma = w3.value
    input_conf = w4.value
    
    max_index = norm.isf((100 - input_conf)/100, loc=mu, scale=sigma / np.sqrt(n))
    min_index = norm.isf(input_conf/100, loc=mu, scale=sigma / np.sqrt(n))
    print(f'{min_index} <=Average purchase price<= {max_index}')


button = widgets.Button(description="Calculation")
button.on_click(calc)

w = widgets.IntText(
    value=1000,
    description='The number of samples:',
    disabled=False
)
w2 = widgets.FloatText(
    value=15000,
    description='Average purchase price:',
    disabled=False
)
w3 = widgets.FloatText(
    value=15000,
    description='standard deviation:',
    disabled=False
)
w4 = widgets.BoundedIntText(
    value=95,
    min=0,
    description='Confidence factor (%):',
    disabled=False
)

output = widgets.Output()
display(w, output)
display(w2, output)
display(w3, output)
display(w4, output)
display(button, output)

Confidence interval for sales

Sales = number of PV x CVR x average purchase price Therefore, multiply the confidence interval of CVR x average purchase price.

#Confidence interval for sales
import ipywidgets as widgets
import math
from scipy.stats import binom, norm


def calc(v):
    n = w_n.value
    cvr = w_cvr.value
    mu = w_mu.value
    sigma = w_s.value
    input_conf = w_conf.value
    p = cvr / 100
    
    min_index_sales = norm.isf(1 - (100 - input_conf)/(2*100), loc=mu, scale=sigma / np.sqrt(n*p))
    max_index_sales = norm.isf((100 - input_conf)/(2*100), loc=mu, scale=sigma / np.sqrt(n*p))
    max_index = norm.isf((100 - input_conf)/100, loc=n*p, scale=np.sqrt(n*p*(1-p))) / n
    min_index = norm.isf(input_conf/100, loc=n*p, scale=np.sqrt(n*p*(1-p))) / n
    print(f'{n * min_index * min_index_sales} <=Earnings<= {n * max_index * max_index_sales}')


button = widgets.Button(description="Calculation")
button.on_click(calc)

w_n = widgets.IntText(
    value=10000,
    description='Number of samples (PV):',
    disabled=False
)
w_cvr = widgets.BoundedFloatText(
    value=12.4,
    min=0,
    max=100.0,
    description='CVR（％）:',
    disabled=False
)
w_mu = widgets.FloatText(
    value=18303,
    description='average:',
    disabled=False
)
w_s = widgets.FloatText(
    value=15217,
    description='standard deviation:',
    disabled=False
)
w_conf = widgets.BoundedIntText(
    value=90,
    min=0,
    description='Confidence factor (%):',
    disabled=False
)

output = widgets.Output()
display(w_n, output)
display(w_cvr, output)
display(w_mu, output)
display(w_s, output)
display(w_conf, output)
display(button, output)

In reality, CVR and average purchase price are not always independent. Given that the CVR goes down as the average purchase price goes up, I feel that the confidence interval is likely to be even narrower. I was exhausted before I examined it in detail ...

Simple statistics that can be used to analyze the effect of measures on EC sites and codes that can be used in jupyter notebook