Simple statistics that can be used to analyze the effect of measures on EC sites and codes that can be used in jupyter notebook

Introduction

If you run a general EC site, you will check what happened to CVR if you took some measures, and how sales changed if you distributed coupons. However, it may end with "CVR went up, isn't it a coincidence?" Or "It's significant! (Somehow)". In this article, I'll explain how to check significance using statistics and provide a simple tool that you can copy and paste into your jupyter notebook. The explanation is not very rigorous.

Target person

--I don't know the statistics --I have used python and jupyter notebook ――I want to check the significance for the time being because the theory is good (there are tools that can be copied and used as they are).

CVR blur

Mr. A's company sells apples on its website. Mr. A was told by his boss to calculate the CVR. CVR is the number sold divided by PV. That day, 30 pieces were sold for 1,000 PV, so the CVR was 3.0%. The next day, 28 pieces were sold for 1,000 PV, so the CVR was 2.8%, and my boss got angry when I lied. CVR is a blurry thing, so you need to tell it to the extent that it fits roughly. For the time being, it can be 2.5% to 3.5%, but I would like to think more statistically.

Here, there is a phenomenon called Bernoulli distribution or binomial distribution that expresses the phenomenon that people who visit the website have two choices, "buy" or "do not buy". In the graph below, the horizontal axis shows the number of items sold and the vertical axis shows the probability at that time. There is a high probability that 30 will sell the most, and it seems unlikely that it will be 20 or less.

import numpy as np
from scipy.stats import binom, norm
import pandas as pd
p = 0.03
N = 1000
k = np.arange(100)
pd.Series(binom.pmf(k, N, p), name='binom').plot(figsize=(12,6), legend=True)

image.png

With this, it seems that you can pick up the place with a large value and put out a range that roughly matches. By the way, this range is called the ** confidence interval **, and the probability of entering the confidence interval is called the ** confidence coefficient **. Use the tools below to get a 95% chance of getting the correct range. (The number of samples is the number of PV.)

#Confidence interval for population ratio
import ipywidgets as widgets
import math
from scipy.stats import binom, norm


def calc(v):
    input_N = w.value
    input_cvr = w2.value
    input_conf = w3.value
    p = input_cvr / 100
    
    #In the case of binomial distribution
    max_index = binom.isf((100 - input_conf) / 100, input_N, input_cvr / 100) 
    min_index = binom.isf(input_conf / 100, input_N, input_cvr / 100)
    #When approximating a normal distribution
    #max_index = norm.isf((100 - input_conf)/100, loc=input_N*p, scale=np.sqrt(input_N*p*(1-p)))
    #min_index = norm.isf(input_conf/100, loc=input_N*p, scale=np.sqrt(input_N*p*(1-p)))
    print(f'{math.floor(min_index / input_N * 10000) / 100}(%) <= CVR <= {math.ceil(max_index / input_N * 10000) / 100}(%)')


button = widgets.Button(description="Calculation")
button.on_click(calc)

w = widgets.IntText(
    value=1000,
    description='The number of samples:',
    disabled=False
)
w2 = widgets.BoundedFloatText(
    value=1,
    min=0,
    max=100.0,
    description='CVR(%):',
    disabled=False
)
w3 = widgets.BoundedIntText(
    value=95,
    min=0,
    description='Confidence factor (%):',
    disabled=False
)

output = widgets.Output()
display(w, output)
display(w2, output)
display(w3, output)
display(button, output)

When executed, it will look like the image below. You can enter it freely.

image.png

2.11 (%) <= CVR <= 3.89 (%)! It's quite wide. A confidence factor of 95-> 90% narrows the range. While there is a high probability that you will be a "person in your teens to 20s, or a person in your 30s to 40s, or a person in your 50s or older," it seems that "teens to 20s" are more likely to come off. If you increase the number of samples, the range may become smaller.

The commented out part of the source is the Bernoulli distribution [normal distribution](https://ja.wikipedia.org/wiki/%E6%AD%A3%E8%A6%8F%E5%88%86%E5 When it is close to% B8% 83). If the number of samples is large enough, almost the same result will be obtained. When I run it, it seems that even if the number of samples is 1,000, it is large enough. It doesn't seem necessary to approximate it, but assuming a normal distribution is often statistically easy to do.

Test of CVR difference

The product didn't sell well, so when I tried to modally print hypnosis images, the CVR increased from 2.0% to 3.0%. Both are 1,000 PV. It seems to be significant because it increased 1.5 times, but I will try to verify (= test) statistically.

The test describes it as "significant" when it happens, even though it was thought it would rarely happen. The probability is called the significance level, and this is decided first. In this case, if the significance level is set to 5%, it can be said that it is "significant" if the probability that there is a difference is 5% or less because it has increased to 2.0-> 3.0%.

I will omit the details, but since the Bernoulli distribution can be approximated to a normal distribution when the number of samples is large enough, and the difference between the normal distributions is also a normal distribution (Reference: [Reproducibility of normal distribution](Reference: https://bellcurve.jp/statistics/course/7799.html)) It is better to have about 1,000 samples for accurate results.

#Test of difference in population ratio
import matplotlib.pyplot as plt
import ipywidgets as widgets
import math
from scipy.stats import norm

def calc_plot(v):  
    input_N1 = w_n1.value
    input_cvr1 = w_cvr1.value
    input_N2 = w_n2.value
    input_cvr2 = w_cvr2.value
    input_conf = w3.value

    p1 = input_cvr1 / 100
    p2 = input_cvr2 / 100
    N1 = input_N1
    N2 = input_N2

    p = (N1 * p1 + N2 * p2) / (N1 + N2)
    z = (p2 - p1) / math.sqrt(p * (1 - p) * (1 / N1 + 1 / N2))
    
    min_index = norm.isf(1 - (100 - input_conf)/(2*100), loc=0, scale=1)
    max_index = norm.isf((100 - input_conf)/(2*100), loc=0, scale=1)

    if min_index <= z and z <= max_index:
        print('Not significant')
        print(f'|{z}| <= {max_index}')
    else:
        print('There is a significant difference!')
        print(f'{max_index} <= |{z}|')


    xlimit = np.array([math.ceil(abs(z)), 5]).max()

    x = np.arange(- xlimit * 100, xlimit * 100)/100
    y = norm.pdf(x)
    plt.figure(figsize=(15, 7)) 
    plt.vlines([min_index, max_index], y.min(), y.max(), "red", linestyles='dashed', label='rejection')
    plt.legend()
    plt.vlines([z], y.min(), y.max(), "black", linestyles='dashed', label='statistics')
    plt.legend()
    plt.plot(x, y,'b-', lw=1, label='norm pdf')
    

button = widgets.Button(description="Calculation")
button.on_click(calc_plot)
w_n1 = widgets.IntText(
    value=10000,
    description='The number of samples:',
    disabled=False
)
w_n2 = widgets.IntText(
    value=12000,
    description='The number of samples:',
    disabled=False
)
w_cvr1 = widgets.BoundedFloatText(
    value=2,
    min=0,
    max=100.0,
    description='CVR(%):',
    disabled=False
)
w_cvr2 = widgets.BoundedFloatText(
    value=3,
    min=0,
    max=100.0,
    description='CVR(%):',
    disabled=False
)
w3 = widgets.BoundedIntText(
    value=95,
    min=0,
    description='Confidence factor (%):',# 100 -Significance level
    disabled=False
)

w_a = widgets.VBox([widgets.Label('A'), w_n1, w_cvr1])
w_b = widgets.VBox([widgets.Label('B'), w_n2, w_cvr2])
whbox = widgets.HBox([w_a, widgets.Label('  '), w_b])

output = widgets.Output()
display(whbox, output)
display(w3, output)
display(button, output)

When I ran the code above, I got the result shown in the image below.

image.png

If there is a black line inside the two red lines, it will be regarded as "this kind of blurring is possible" and the result will not be significant. On the contrary, if it is on the outside, it is significant. After watching the A / B test for a while, if both increase from 2.0%-> 3.0% at 10,000 PV, it can be said that there is a significant difference. It's no coincidence that the blur is reduced by increasing the number of samples, but the difference is 1.0%. The point to note here is that "not significant" cannot be concluded to be insignificant. Also, just because the black and red lines are so far apart is not very significant or very significant. There are two choices, whether it is significant or not. With that in mind, I feel that the confidence interval has more information than the result of the test.

Whether B is larger than A

In the above case, we tested whether there was a difference, but in reality, we usually only want to know if the CVR has increased by taking measures. In that case, use ** one-sided test **. (In the above case, it is called a two-sided test.) When doing a one-sided test, replace calc_plot in the python code above with the code below.

    input_N1 = w_n1.value
    input_cvr1 = w_cvr1.value
    input_N2 = w_n2.value
    input_cvr2 = w_cvr2.value
    input_conf = w3.value

    p1 = input_cvr1 / 100
    p2 = input_cvr2 / 100
    N1 = input_N1
    N2 = input_N2

    p = (N1 * p1 + N2 * p2) / (N1 + N2)
    z = (p2 - p1) / math.sqrt(p * (1 - p) * (1 / N1 + 1 / N2))
    max_index = norm.isf((100 - input_conf)/100, loc=0, scale=1)

    if z <= max_index:
        print('Not significant')
        print(f'|{z}| <= {max_index}')
    else:
        print('There is a significant difference!')
        print(f'{max_index} <= |{z}|')


    xlimit = np.array([math.ceil(abs(z)), 5]).max()

    x = np.arange(- xlimit * 100, xlimit * 100)/100
    y = norm.pdf(x)
    plt.figure(figsize=(15, 7)) 
    plt.vlines([max_index], y.min(), y.max(), "red", linestyles='dashed', label='rejection')
    plt.legend()
    plt.vlines([z], y.min(), y.max(), "black", linestyles='dashed', label='statistics')
    plt.legend()
    plt.plot(x, y,'b-', lw=1, label='norm pdf')

image.png

In this case, it is significant if there is a black line to the right of the red line. Since we only see whether this is higher or not, it is easier to make a significance judgment with a smaller number of samples.

Confidence interval for average purchase price

Even if the CVR is up, it may be possible that the average purchase price is down. (When the minimum usage amount is increased when the coupon is distributed, etc.) Let's calculate how much the average purchase price will fluctuate. In general, the annual income has a lognormal distribution, and there is a positive correlation between the annual income and the purchase price, so it is thought that the purchase price also has a similar distribution on EC sites that handle various products. (Reference: Example of lognormal distribution and mean, variance)

#Lognormal distribution
from scipy.stats import lognorm
x = np.arange(1, 100000) / 1
A = 1.5000e+04 #Average purchase price
B = 1.5000e+04 ** 2 #Distributed
s = np.sqrt(np.log(B/(A ** 2) + 1))
mu = np.log(A) - (s ** 2 / 2)
y = pd.Series(lognorm.pdf(x, s, 0, np.exp(mu)))
y.index = x
y.plot(figsize=(12, 6))

image.png

If you combine these and take the average, you will have a normal distribution again. The code below is a histogram of the result of repeating the operation of calculating the average of 500 pieces 100,000 times. According to Central Limit Theorem, the average histogram will look similar if there are enough samples even if it is not a lognormal distribution. .. How many samples are enough depends on the distribution, but about 1,000 seems to be good.

means = []
n = 500
for i in range(0, 100000):
    means.append(np.array(lognorm.rvs(s, 0, np.exp(mu), size=n)).mean())
    
pd.Series(means).hist(bins=100)

image.png

The image below is called a Q-Q plot, and if blue is close to the red straight line, it is close to the normal distribution.

import pylab
stats.probplot(means, dist="norm", plot=pylab)

image.png

The code below calculates the confidence interval for the normal distribution. The "number of samples" this time is not the PV, but the number of purchases.

#Confidence interval for average purchase price
import ipywidgets as widgets
import math
import numpy as np
import pandas as pd
from scipy.stats import binom, norm

def calc(v):
    n = w.value
    mu = w2.value
    sigma = w3.value
    input_conf = w4.value
    
    max_index = norm.isf((100 - input_conf)/100, loc=mu, scale=sigma / np.sqrt(n))
    min_index = norm.isf(input_conf/100, loc=mu, scale=sigma / np.sqrt(n))
    print(f'{min_index} <=Average purchase price<= {max_index}')


button = widgets.Button(description="Calculation")
button.on_click(calc)

w = widgets.IntText(
    value=1000,
    description='The number of samples:',
    disabled=False
)
w2 = widgets.FloatText(
    value=15000,
    description='Average purchase price:',
    disabled=False
)
w3 = widgets.FloatText(
    value=15000,
    description='standard deviation:',
    disabled=False
)
w4 = widgets.BoundedIntText(
    value=95,
    min=0,
    description='Confidence factor (%):',
    disabled=False
)

output = widgets.Output()
display(w, output)
display(w2, output)
display(w3, output)
display(w4, output)
display(button, output)

image.png

Confidence interval for sales

Sales = number of PV x CVR x average purchase price Therefore, multiply the confidence interval of CVR x average purchase price.

#Confidence interval for sales
import ipywidgets as widgets
import math
from scipy.stats import binom, norm


def calc(v):
    n = w_n.value
    cvr = w_cvr.value
    mu = w_mu.value
    sigma = w_s.value
    input_conf = w_conf.value
    p = cvr / 100
    
    min_index_sales = norm.isf(1 - (100 - input_conf)/(2*100), loc=mu, scale=sigma / np.sqrt(n*p))
    max_index_sales = norm.isf((100 - input_conf)/(2*100), loc=mu, scale=sigma / np.sqrt(n*p))
    max_index = norm.isf((100 - input_conf)/100, loc=n*p, scale=np.sqrt(n*p*(1-p))) / n
    min_index = norm.isf(input_conf/100, loc=n*p, scale=np.sqrt(n*p*(1-p))) / n
    print(f'{n * min_index * min_index_sales} <=Earnings<= {n * max_index * max_index_sales}')


button = widgets.Button(description="Calculation")
button.on_click(calc)

w_n = widgets.IntText(
    value=10000,
    description='Number of samples (PV):',
    disabled=False
)
w_cvr = widgets.BoundedFloatText(
    value=12.4,
    min=0,
    max=100.0,
    description='CVR(%):',
    disabled=False
)
w_mu = widgets.FloatText(
    value=18303,
    description='average:',
    disabled=False
)
w_s = widgets.FloatText(
    value=15217,
    description='standard deviation:',
    disabled=False
)
w_conf = widgets.BoundedIntText(
    value=90,
    min=0,
    description='Confidence factor (%):',
    disabled=False
)

output = widgets.Output()
display(w_n, output)
display(w_cvr, output)
display(w_mu, output)
display(w_s, output)
display(w_conf, output)
display(button, output)

image.png

In reality, CVR and average purchase price are not always independent. Given that the CVR goes down as the average purchase price goes up, I feel that the confidence interval is likely to be even narrower. I was exhausted before I examined it in detail ...

Recommended Posts

Simple statistics that can be used to analyze the effect of measures on EC sites and codes that can be used in jupyter notebook
Goroutine (parallel control) that can be used in the field
[Django] Field names, user registration, and login methods that can be used in the User model
Goroutine that can be used in the field (errgroup.Group edition)
[Python] A program to find the number of apples and oranges that can be harvested
Understand the probabilities and statistics that can be used for progress management with a python program
A timer (ticker) that can be used in the field (can be used anywhere)
Python standard module that can be used on the command line
Easy padding of data that can be used in natural language processing
Hide the warning that zsh can be used by default on Mac
I wanted to quickly create a mail server that can be used freely with postfix + dovecot on EC2
QPS control that can be used in the field (Rate Limit) Limits execution to n times per second
[Python3] Code that can be used when you want to change the extension of an image at once
A personal memo of Pandas related operations that can be used in practice
To output a value even in the middle of a cell with Jupyter Notebook
Easy program installer and automatic program updater that can be used in any language
I made a familiar function that can be used in statistics with Python
A memorandum of how to execute the! Sudo magic command in Jupyter Notebook
Build a PYNQ environment on Ultra96 V2 and log in to Jupyter Notebook
Library for "I want to do that" of data science on Jupyter Notebook
How to make the font width of jupyter notebook put in pyenv equal width
List of tools that can be used to easily try sentiment analysis of Japanese sentences in Python (try with google colab)
Functions that can be used in for statements
The story of launching python2.x jupyter notebook using docker (crushed on Saturday and Sunday)
Summary of statistical data analysis methods using Python that can be used in business
plotly trace and layout templates that are likely to be used in scatter plots
How to filter foreign keys that can be selected on the Django admin screen
Geographic information visualization of R and Python that can be expressed in Power BI
[Python] Introduction to web scraping | Summary of methods that can be used with webdriver
Notes on how to use StatsModels that can use linear regression and GLM in python
Read the csv file with jupyter notebook and write the graph on top of it
Basic algorithms that can be used in competition pros
The kernel of jupyter notebook can no longer connect
ANTs image registration that can be used in 5 minutes
[Django] About users that can be used on template
How to deal with the phenomenon that Python (Jupyter notebook) executed on WSL becomes Aborted
How to start a simple WEB server that can execute cgi of php and python
Install Mecab and CaboCha on ubuntu16.04LTS so that it can be used from python3 series
How to set up a simple SMTP server that can be tested locally in Python
[Python3] Code that can be used when you want to resize images in folder units
A simple version of government statistics (immigration control) that is easy to handle with jupyter
How to set variables that can be used throughout the Django app-useful for templates, etc.-
By arranging the difference between "statistics" and "machine learning", I can see the reason why "machine learning" cannot be used in many business companies!