mac python3.5
pip install pymc
mymodel.py
# Import relevant modules
import pymc
import numpy as np
# Some data
n = 5 * np.ones(4, dtype=int)
x = np.array([-.86, -.3, -.05, .73])
# Priors on unknown parameters
alpha = pymc.Normal('alpha', mu=0, tau=.01)
beta = pymc.Normal('beta', mu=0, tau=.01)
# Arbitrary deterministic function of parameters
@pymc.deterministic
def theta(a=alpha, b=beta):
"""theta = logit^{-1}(a+b)"""
return pymc.invlogit(a + b * x)
# Binomial likelihood for data
d = pymc.Binomial('d', n=n, p=theta, value=np.array([0., 1., 3., 5.]),
observed=True)
main.py
import pymc
import mymodel
S = pymc.MCMC(mymodel, db='pickle')
S.sample(iter=10000, burn=5000, thin=2)
pymc.Matplot.plot(S)
python main.py
[-----------------65%---- ] 6510 of 10000 complete in 0.5 sec
[-----------------100%-----------------] 10000 of 10000 complete in 0.8 sec
Plotting beta
Plotting alpha
Plotting theta_0
Plotting theta_1
Plotting theta_2
Plotting theta_3
pip install triangle-plot
pip install corner
jupyter notebook
Start by creating fake data. Use a linear regression model with two predictors (x, y) and one outcome variable (z).
python
from numpy import *
Nobs = 20
x_true = random.uniform(0,10, size=Nobs)
y_true = random.uniform(-1,1, size=Nobs)
alpha_true = 0.5
beta_x_true = 1.0
beta_y_true = 10.0
eps_true = 0.5
z_true = alpha_true + beta_x_true*x_true + beta_y_true*y_true
z_obs = z_true + random.normal(0, eps_true, size=Nobs)
python
%matplotlib inline
from matplotlib import pyplot as plt
plt.figure(figsize=(12,6))
plt.subplot(1,2,1)
plt.scatter(x_true, z_obs, c=y_true, marker='o')
plt.colorbar()
plt.xlabel('X')
plt.ylabel('Z')
plt.subplot(1,2,2)
plt.scatter(y_true, z_obs, c=x_true, marker='o')
plt.colorbar()
plt.xlabel('Y')
plt.ylabel('Z')
python
import pymc
# define the parameters with their associated priors
alpha = pymc.Uniform('alpha', -100,100, value=median(z_obs))
betax = pymc.Uniform('betax', -100,100, value=std(z_obs)/std(x_true))
betay = pymc.Uniform('betay', -100,100, value=std(z_obs)/std(y_true))
eps = pymc.Uniform('eps', 0, 100, value=0.01)
# Now define the model
@pymc.deterministic
def model(alpha=alpha, betax=betax, betay=betay, x=x_true, y=y_true):
return alpha + betax*x + betay*y
# pymc parametrizes the width of the normal distribution by tau=1/sigma**2
@pymc.deterministic
def tau(eps=eps):
return power(eps, -2)
# Lastly relate the model/parameters to the data
data = pymc.Normal('data', mu=model, tau=tau, value=z_obs, observed=True)
Once the definition is in place, send a pymc object to pymc.MCMC to create and run a sampler. Here we are adapting the Metropolis-Hastings sampler using use_step_method. This is a smarter sampler than the default Metropolis-Hastings. Build a covariance matrix using the previous sample to sample the parameter space more efficiently.
python
sampler = pymc.MCMC([alpha,betax,betay,eps,model,tau,z_obs,x_true,y_true])
sampler.use_step_method(pymc.AdaptiveMetropolis, [alpha,betax,betay,eps],
scales={alpha:0.1, betax:0.1, betay:1.0, eps:0.1})
sampler.sample(iter=10000)
[100%**] 10000 of 10000 complete
Plot.
python
pymc.Matplot.plot(sampler)
I will paste only the alpha. For each probability, pymc plots traces (upper left panel), autocorrelation (lower left pane), and sample histograms (right panel). You can see that nothing happens at first. After that, the adaptive MH sampler will find a better covariance matrix.
Next, let's get the median value of each variable from the sample. The MCMC sample examines the trace for each variable.
python
m_alpha = median(alpha.trace())
m_betax = median(betax.trace())
m_betay = median(betay.trace())
m_eps = median(eps.trace())
The median is more robust with respect to the asymmetric distribution (eg, the distribution of eps). This time we'll modify the y predictor when reversing the plot and x. That way you can see if you have found the right trend. You can also plot shaded areas that represent intrinsic variability.
python
plt.figure(figsize=(12,6))
plt.subplot(1,2,1)
plt.plot(x_true, z_obs-m_alpha-m_betay*y_true, 'o')
plt.xlabel('X')
plt.ylabel('Z - alpha - beta_y y')
# Now plot the model
xx = array([x_true.min(), x_true.max()])
plt.plot(xx, xx*m_betax)
plt.plot(xx, xx*m_betax + m_eps, '--', color='k')
plt.plot(xx, xx*m_betax - m_eps, '--', color='k')
plt.subplot(1,2,2)
plt.plot(y_true, z_obs-m_alpha-m_betax*x_true, 'o')
plt.xlabel('Y')
plt.ylabel('Z - alpha - beta_x x')
yy = array([y_true.min(), y_true.max()])
plt.plot(yy, yy*m_betay)
plt.plot(yy, yy*m_betay + m_eps, '--', color='k')
plt.plot(yy, yy*m_betay - m_eps, '--', color='k')
Examine the covariance of various parameters in the triangle_plot plot package. To do this, it seems that a Markov chain of two-dimensional arrays indexed by [i, p] is needed. All elements are connected to samples and transposed.
python
samples = array([alpha.trace(),betax.trace(),betay.trace(),eps.trace()]).T
print((samples.shape))
import triangle
triangle.corner(samples[:,:], labels=['alpha','betax','betay','eps'], truths=[alpha_true, beta_x_true, beta_y_true, eps_true])
https://github.com/QuantScientist/kaggle-seizure-prediction-challenge-2016/blob/master/jupyter/bayesian-logistic-regression/ieegPymc3Version1.ipynb
https://users.obs.carnegiescience.edu/cburns/ipynbs/PyMC.html https://pymc-devs.github.io/pymc/INSTALL.html#dependencies
Recommended Posts