"Bayesian Inference Experienced in Python"
Which is better, Site A or Site B, in Bayesian inference? Is the problem.
Let $ p_A $ be the probability that a user who sees Site A will eventually lead to a profitable action (this is called conversion) such as requesting materials or purchasing. Suppose $ N $ people look at Site A and $ n $ people lead to conversions. Then
p_A=n/N
You might think, but it's not. That is, we do not know if $ p_A $ and $ n / N $ are equal. There is a difference between the observed frequency and the true frequency, and the true frequency can be interpreted as the probability that an event will occur. In other words, since I saw Site A, I don't know if it led to the action.
For example, taking the dice as an example, the true frequency of rolling a dice and getting a 1 is $ 1/6
First, we have to consider the prior distribution for the unknown $ p_A $. What do you think of $ p_A $ when you don't have the data? I'm not sure yet, so let's assume that $ p_A $ follows a uniform distribution of [0,1].
Give the variable p
a uniform distribution from 0 to 1 with @ pm.Uniform
.
import pymc3 as pm
# The parameters are the bounds of the Uniform.
with pm.Model() as model:
p = pm.Uniform('p', lower=0, upper=1)
For example, $ p_A = 0.05 $, and $ N = 1500 $ Let's simulate how many users converted (purchased or requested materials by looking at the site) when the user looked at site A. Actually, I don't know $ p_A $, but I assume that I know it for the time being and simulate it to create observation data.
Since there are N people, [Bernoulli distribution](https://ja.wikipedia.org/wiki/%E3%83%99%E3%83%AB%E3%83%] to simulate N trials 8C% E3% 83% BC% E3% 82% A4% E5% 88% 86% E5% B8% 83) is used. Since the Bernoulli distribution is a probability distribution for binary variables (variables that take either 0 or 1), it is a good probability distribution for calculating whether to convert or not.
X\sim \text{Ber}(p)
At this time, $ X $ is $ 1 $ with probability $ p $ and $ 0 $ with probability $ 1-p $.
The python
script that simulates this is
import scipy.stats as stats
import numpy as np
#set constants
p_true = 0.05 # remember, this is unknown.
N = 1500
# sample N Bernoulli random variables from Ber(0.05).
# each random variable has a 0.05 chance of being a 1.
# this is the data-generation step
occurrences = stats.bernoulli.rvs(p_true, size=N)
print(occurrences) # Remember: Python treats True == 1, and False == 0
print(np.sum(occurrences))
#[0 0 0 ... 0 0 0]
#67
# Occurrences.mean is equal to n/N.
print("What is the observed frequency in Group A? %.4f" % np.mean(occurrences))
print("Does this equal the true frequency? %s" % (np.mean(occurrences) == p_true))
#What is the observed frequency in Group A? 0.0447
#Does this equal the true frequency? False
Now that we have the observation data (occurrences), we set the observation data in the obs variable of PyMC
and execute the inference algorithm to make a graph of the posterior distribution of unknown $ p_A $.
#include the observations, which are Bernoulli
with model:
obs = pm.Bernoulli("obs", p, observed=occurrences)
# To be explained in chapter 3
step = pm.Metropolis()
trace = pm.sample(18000, step=step,cores =1)
burned_trace = trace[1000:]
import matplotlib.pyplot as plt
plt.title("Posterior distribution of $p_A$, the true effectiveness of site A")
plt.vlines(p_true, 0, 90, linestyle="--",color="r", label="true $p_A$ (unknown)")
plt.hist(burned_trace["p"], bins=25, histtype="bar",density=True)
plt.legend()
plt.show()
The resulting graph looks like this.
This time, the graph of the posterior probability is like this, while the prior probability of the true value is 0.5. If you repeat this, you will get a slightly different graph each time.
If site B is calculated in the same way, the posterior probability of $ p_B $ can be obtained. So I decided to calculate $ delta = p_A-p_B $.
Like $ p_A $, $ p_B $ doesn't know the true value. Let's assume that $ p_B = 0.4 $. Then $ delta = 0.1 $. Calculate $ p_B $ and $ delta $ in the same way as $ p_A $. I calculated everything again.
# -*- coding: utf-8 -*-
import pymc3 as pm
import scipy.stats as stats
import numpy as np
import matplotlib.pyplot as plt
#these two quantities are unknown to us.
true_p_A = 0.05
true_p_B = 0.04
#notice the unequal sample sizes -- no problem in Bayesian analysis.
N_A = 1500
N_B = 750
#generate some observations
observations_A = stats.bernoulli.rvs(true_p_A, size=N_A)
observations_B = stats.bernoulli.rvs(true_p_B, size=N_B)
print("Obs from Site A: ", observations_A[:30], "...")
print("Obs from Site B: ", observations_B[:30], "...")
#Obs from Site A: [0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] ...
#Obs from Site B: [0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] ...
print(np.mean(observations_A))
#0.04666666666666667
print(np.mean(observations_B))
#0.03866666666666667
# Set up the pymc3 model. Again assume Uniform priors for p_A and p_B.
with pm.Model() as model:
p_A = pm.Uniform("p_A", 0, 1)
p_B = pm.Uniform("p_B", 0, 1)
# Define the deterministic delta function. This is our unknown of interest.
delta = pm.Deterministic("delta", p_A - p_B)
# Set of observations, in this case we have two observation datasets.
obs_A = pm.Bernoulli("obs_A", p_A, observed=observations_A)
obs_B = pm.Bernoulli("obs_B", p_B, observed=observations_B)
# To be explained in chapter 3.
step = pm.Metropolis()
trace = pm.sample(20000, step=step,cores=1)
burned_trace=trace[1000:]
p_A_samples = burned_trace["p_A"]
p_B_samples = burned_trace["p_B"]
delta_samples = burned_trace["delta"]
#histogram of posteriors
ax = plt.subplot(311)
plt.xlim(0, .1)
plt.hist(p_A_samples, histtype='stepfilled', bins=25, alpha=0.85,
label="posterior of $p_A$", color="#A60628", density=True)
plt.vlines(true_p_A, 0, 80, linestyle="--", label="true $p_A$ (unknown)")
plt.legend(loc="upper right")
plt.title("Posterior distributions of $p_A$, $p_B$, and delta unknowns")
ax = plt.subplot(312)
plt.xlim(0, .1)
plt.hist(p_B_samples, histtype='stepfilled', bins=25, alpha=0.85,
label="posterior of $p_B$", color="#467821", density=True)
plt.vlines(true_p_B, 0, 80, linestyle="--", label="true $p_B$ (unknown)")
plt.legend(loc="upper right")
ax = plt.subplot(313)
plt.hist(delta_samples, histtype='stepfilled', bins=30, alpha=0.85,
label="posterior of delta", color="#7A68A6", density=True)
plt.vlines(true_p_A - true_p_B, 0, 60, linestyle="--",
label="true delta (unknown)")
plt.vlines(0, 0, 60, color="black", alpha=0.2)
plt.legend(loc="upper right");
plt.show()
# Count the number of samples less than 0, i.e. the area under the curve
# before 0, represent the probability that site A is worse than site B.
print("Probability site A is WORSE than site B: %.3f" % \
np.mean(delta_samples < 0))
print("Probability site A is BETTER than site B: %.3f" % \
np.mean(delta_samples > 0))
#Probability site A is WORSE than site B: 0.201
#Probability site A is BETTER than site B: 0.799
The resulting graph looks like this.
It can be seen that the base of the posterior distribution of $ p_B $ is slightly wider. This is because Site B has less data. This indicates that we are not confident in the value of $ p_B $ due to the lack of data on Site B.
The third graph, $ delta = p_A-p_B $, shows a value greater than 0. This means that Site A is more responsive to users as a conversion.
You can get a better understanding by changing the values of $ p_A $, $ p_B $ and the number of data.
Recommended Posts