environment

mac python3.5

Installation

pip install pymc

Try to move part 1

`mymodel.py`


# Import relevant modules
import pymc
import numpy as np

# Some data
n = 5 * np.ones(4, dtype=int)
x = np.array([-.86, -.3, -.05, .73])

# Priors on unknown parameters
alpha = pymc.Normal('alpha', mu=0, tau=.01)
beta = pymc.Normal('beta', mu=0, tau=.01)

# Arbitrary deterministic function of parameters
@pymc.deterministic
def theta(a=alpha, b=beta):
    """theta = logit^{-1}(a+b)"""
    return pymc.invlogit(a + b * x)

# Binomial likelihood for data
d = pymc.Binomial('d', n=n, p=theta, value=np.array([0., 1., 3., 5.]),
                  observed=True)

`main.py`


import pymc
import mymodel

S = pymc.MCMC(mymodel, db='pickle')
S.sample(iter=10000, burn=5000, thin=2)
pymc.Matplot.plot(S)

python main.py

[-----------------65%----              ] 6510 of 10000 complete in 0.5 sec
[-----------------100%-----------------] 10000 of 10000 complete in 0.8 sec
Plotting beta
Plotting alpha
Plotting theta_0
Plotting theta_1
Plotting theta_2
Plotting theta_3

Try to move part 2

pip install triangle-plot
pip install corner

jupyter notebook

Start by creating fake data. Use a linear regression model with two predictors (x, y) and one outcome variable (z).

`python`


from numpy import *
Nobs = 20
x_true = random.uniform(0,10, size=Nobs)
y_true = random.uniform(-1,1, size=Nobs)
alpha_true = 0.5
beta_x_true = 1.0
beta_y_true = 10.0
eps_true = 0.5
z_true = alpha_true + beta_x_true*x_true + beta_y_true*y_true
z_obs = z_true + random.normal(0, eps_true, size=Nobs)

`python`


%matplotlib inline
from matplotlib import pyplot as plt
plt.figure(figsize=(12,6))
plt.subplot(1,2,1)
plt.scatter(x_true, z_obs, c=y_true, marker='o')
plt.colorbar()
plt.xlabel('X')
plt.ylabel('Z')
plt.subplot(1,2,2)
plt.scatter(y_true, z_obs, c=x_true, marker='o')
plt.colorbar()
plt.xlabel('Y')
plt.ylabel('Z')

スクリーンショット 2017-04-18 8.23.31.png

`python`


import pymc
# define the parameters with their associated priors
alpha = pymc.Uniform('alpha', -100,100, value=median(z_obs))
betax = pymc.Uniform('betax', -100,100, value=std(z_obs)/std(x_true))
betay = pymc.Uniform('betay', -100,100, value=std(z_obs)/std(y_true))
eps = pymc.Uniform('eps', 0, 100, value=0.01)

# Now define the model
@pymc.deterministic
def model(alpha=alpha, betax=betax, betay=betay, x=x_true, y=y_true):
    return alpha + betax*x + betay*y

# pymc parametrizes the width of the normal distribution by tau=1/sigma**2
@pymc.deterministic
def tau(eps=eps):
    return power(eps, -2)

# Lastly relate the model/parameters to the data
data = pymc.Normal('data', mu=model, tau=tau, value=z_obs, observed=True)

Once the definition is in place, send a pymc object to pymc.MCMC to create and run a sampler. Here we are adapting the Metropolis-Hastings sampler using use_step_method. This is a smarter sampler than the default Metropolis-Hastings. Build a covariance matrix using the previous sample to sample the parameter space more efficiently.

`python`


sampler = pymc.MCMC([alpha,betax,betay,eps,model,tau,z_obs,x_true,y_true])
sampler.use_step_method(pymc.AdaptiveMetropolis, [alpha,betax,betay,eps],
                        scales={alpha:0.1, betax:0.1, betay:1.0, eps:0.1})
sampler.sample(iter=10000)

[100%**] 10000 of 10000 complete

Plot.

`python`


pymc.Matplot.plot(sampler)

I will paste only the alpha. For each probability, pymc plots traces (upper left panel), autocorrelation (lower left pane), and sample histograms (right panel). You can see that nothing happens at first. After that, the adaptive MH sampler will find a better covariance matrix. スクリーンショット 2017-04-18 8.30.54.png

Next, let's get the median value of each variable from the sample. The MCMC sample examines the trace for each variable.

`python`


m_alpha = median(alpha.trace())
m_betax = median(betax.trace())
m_betay = median(betay.trace())
m_eps = median(eps.trace())

The median is more robust with respect to the asymmetric distribution (eg, the distribution of eps). This time we'll modify the y predictor when reversing the plot and x. That way you can see if you have found the right trend. You can also plot shaded areas that represent intrinsic variability.

`python`


plt.figure(figsize=(12,6))
plt.subplot(1,2,1)
plt.plot(x_true, z_obs-m_alpha-m_betay*y_true, 'o')
plt.xlabel('X')
plt.ylabel('Z - alpha - beta_y y')
# Now plot the model
xx = array([x_true.min(), x_true.max()])
plt.plot(xx, xx*m_betax)
plt.plot(xx, xx*m_betax + m_eps, '--', color='k')
plt.plot(xx, xx*m_betax - m_eps, '--', color='k')
plt.subplot(1,2,2)
plt.plot(y_true, z_obs-m_alpha-m_betax*x_true, 'o')
plt.xlabel('Y')
plt.ylabel('Z - alpha - beta_x x')
yy = array([y_true.min(), y_true.max()])
plt.plot(yy, yy*m_betay)
plt.plot(yy, yy*m_betay + m_eps, '--', color='k')
plt.plot(yy, yy*m_betay - m_eps, '--', color='k')

スクリーンショット 2017-04-18 8.36.00.png

Examine the covariance of various parameters in the triangle_plot plot package. To do this, it seems that a Markov chain of two-dimensional arrays indexed by [i, p] is needed. All elements are connected to samples and transposed.

`python`


samples = array([alpha.trace(),betax.trace(),betay.trace(),eps.trace()]).T
print((samples.shape))
import triangle
triangle.corner(samples[:,:], labels=['alpha','betax','betay','eps'], truths=[alpha_true, beta_x_true, beta_y_true, eps_true])

スクリーンショット 2017-04-18 8.39.33.png

Example of use

https://github.com/QuantScientist/kaggle-seizure-prediction-challenge-2016/blob/master/jupyter/bayesian-logistic-regression/ieegPymc3Version1.ipynb

reference

https://users.obs.carnegiescience.edu/cburns/ipynbs/PyMC.html https://pymc-devs.github.io/pymc/INSTALL.html#dependencies

I tried running pymc

environment

Installation

Try to move part 1

mymodel.py

main.py

Try to move part 2

python

python

python

python

python

python

python

python

Example of use

reference

`mymodel.py`

`main.py`

`python`

`python`

`python`

`python`

`python`

`python`

`python`

`python`