Sample Adaptive MCMC

  1. Overview I found a new MCMC called "Sample Adaptive MCMC $ ^ {[1]} $" in Proseeding of NIPS2019, so I experimented. Already implemented in Numpyro. If you are interested, please take a look at the paper. Adjustment of the proposed distribution is important in the MCMC method, but this method is a method of adaptively adjusting the parameters (mean, covariance matrix) of the parametric proposed distribution (normal distribution, etc.) based on the sample values. It is also a method that makes it easy to earn an effective sample size.  [1] Michael Zhu, Sample Adaptive MCMC (https://papers.nips.cc/paper/9107-sample-adaptive-mcmc)
  2. Algorithm The outline of the algorithm is new from the parametric proposed distribution that calculates the mean $ μ $ and the covariance matrix $ Σ $ from the sample values of $ N $ and uses the mean $ μ $ and the covariance matrix $ Σ $ as parameters. Sample the sample value $ θ_ {N + 1} $. Probabilistically discard any one of these $ N + 1 $ sample values and calculate the mean $ μ $ and covariance matrix $ Σ $ of the newly constructed $ N $ sample values. The procedure of ... is repeated, and the union of the finally obtained sample values is an approximation of the target distribution. The algorithm is as follows. Here, $ θ $, which is the sample target, is a multidimensional vector.

__ * Input * __: Objective distribution $ p (θ) $, initial proposal distribution $ q_0 (・) $, parametric proposal distribution $ q (・ | μ, Σ) $, number of particles $ N $, number of samples $ K $ , Burn-in number $ κ $   Step.1 Get $ N $ sample values $ S \ _0 = $ {$ θ \ _1, θ \ _2, ... θ \ _N $} from the initial proposal distribution $ q \ _0 (・) $. Let $ k = 1 $.   Step.2 -Substitute $ S \ _ {k-1} $ into $ S $ and calculate the mean $ μ $ and the covariance matrix $ Σ $ from the sample value $ S $. -Obtain a new sample value $ θ \ _ {N + 1} $ from the parametric proposal distribution $ q (・ | μ, Σ) $ and add it to $ S $. -Create a $ N + 1 $ set of $ S \ _ {-n} $ excluding the $ n $ th sample value of $ S $, and share it with the average $ μ \ _ {-n} $. Calculate $ N + 1 $ of variance matrix $ Σ \ _ {-n} $. ・ $ Λ \ _n = q (θ \ _n | μ \ _ {-n}, Σ \ _ {-n}) / p (θn) $ for all $ n (= 1 to N + 1) $ And standardize so that the sum is $ 1 $. ・ As the probability that $ λ \ _n $ becomes $ z \ _n = 1 $, get $ n $ that becomes $ z \ _n = 1 $ from the categorical distribution, and $ S \ _k = S \ _ {-n} Let it be $.   Step.3 Repeat __ * Step.2 * __ until the number of repetitions reaches the number of samples $ K $. __ * Output * __: Union of $ S_k (k = κ + 1 ~ K) $ elements $ θ $

The figure below is an image diagram when $ N = 3 $ described in the paper. s.jpg

If $ θ $ is multidimensional, the calculation load will be high if the covariance matrix is used. Therefore, it is possible to increase the speed by approximating the off-diagonal components of the covariance matrix to $ 0 $. It is also possible to use a mixed Gaussian distribution for the parametric proposed distribution.

  1. Implementation I implemented it in Python to promote understanding. Here, assuming a linear model, the design matrix, observed values, and measurement accuracy are also given as inputs.
# SA-MCMC
import numpy as np
from numba import jit

#Logarithmic probability density of unstandardized target distribution p
@jit("f8(f8[:,:],f8[:],f8[:],f8)", nopython=True)
def target(X, y, w, ram):
    mu_w0 = 0.0
    ram_w0 =1.0
    logp1 = np.sum(-0.5 * ram_w0 * (w - mu_w0) ** 2)# logP(w)+C
    logp2 = np.sum(-0.5 * ram * (y - X @ w) ** 2)   # logP(y|X, w, ram)+C
    logp = logp1 + logp2
    return logp

#Logarithmic probability density of unstandardized proposed distribution q
@jit("f8(f8[:],f8[:,:],f8[:])", nopython=True)
def proposal(mu, cov, sita):
    A = -0.5 * np.linalg.slogdet(cov)[1]
    B = -0.5 * (sita - mu).T @ np.linalg.solve(cov, (sita - mu))
    return A+B

#Sampling from the proposed distribution q
@jit("f8[:](f8[:],f8[:,:],i8)",nopython=True)
def sampling(mu, cov, w_dim):
    L = np.linalg.cholesky(cov)
    t = np.random.randn(w_dim)
    return L @ t + mu

#Sequential update of average value
@jit("f8[:](f8[:],f8[:],f8[:],i8)",nopython=True)
def update_mean(mean, new_s, old_s, N):
    mu = mean + 1/N * (new_s - old_s)
    return mu

@jit("f8[:,:](f8[:,:],f8[:],f8,i8,i8,i8,f8[:])",nopython=True)
def SA_MCMC(X, y, ram, N, K, paradim, U):
    """
    X      :Design matrix
    y      :measured value
    ram    :Measurement accuracy
    N      :Number of particles
    K      :The number of samples
    paradim:Estimated number of parameters
    U      :Uniform random number(0~1),Used as an argument due to restrictions by numba
    """
    # Step.1:Initialization
    #Variable definition
    np.random.seed(8)
    chain = np.zeros((K, paradim))
    S = np.random.randn(N+1, paradim)
    S_mean = np.zeros(paradim)    
    S_replace = np.zeros((N+1, paradim))
    q = np.zeros(N+1)                     
    r = np.zeros(N+1)
    
    #Setting initial values of log probability density obtained from mean, covariance matrix and proposed distribution
    for d in range(paradim):
        S_mean[d] = S[:-1, d].mean()
    S_cov = np.cov(S[:-1, :].T)
    for n in range(N):
        q[n] = target(X, y, S[n, :], ram)

    # Step.2:sampling
    cnt = 0
    for i in range(K):
        s = sampling(S_mean, S_cov, paradim)
        q[-1] = target(X, y, s, ram)
        for n in range(N+1):
            S_replace[:, :] = S[:, :]
            S_replace[n, :] = s
            S_mean_replace = update_mean(S_mean, s, S[n,:], N)
            S_cov_replace = np.cov(S_replace[:-1, :].T)
            p = proposal(S_mean_replace, S_cov_replace, sita=S[n, :])
            r[n] = p - q[n]
        rmax = r.max()
        r = np.exp(r - rmax)
        r = r / np.sum(r)
        
        #Rejection step:J=N+Adopted if other than 1
        c = 0
        u=U[i]
        for j in range(N+1):
            c += r[j]
            if c > u:
                J = j
                break
        S_mean = update_mean(S_mean, s, S[J,:], N)
        S[J, :] = s
        S_cov = np.cov(S[:-1, :].T)
        q[J] = q[-1]
        if J != N+1:
            chain[cnt] = s
            cnt += 1

    # Step.3:Equivalent to union
    chian = chain[:cnt-1, :]
    return chain
  1. Experiment In this experiment, Bayesian linear regression is targeted. (a) Creating experimental data
# make data
def makedata(n, w_dim, vmin, vmax, seed, ram=0.0, Train=False):
    np.random.seed(seed)
    x = np.linspace(vmin, vmax, n)
    w = np.random.uniform(-1, 1, w_dim) 
    X = np.zeros((n, w_dim))
    for d in range(w_dim):
        X[:, d] = x ** d 
    y = X @ w
    if Train:
        y += np.random.randn(n) / ram ** 0.5
    return x, w, X, y

w_dim = 4
vmin = -2
vmax = 2
seed = 29
ram = 5
x_true, w_true, X_true, y_true = makedata(2000, w_dim, vmin-2, vmax+2, seed, Train=False)
x_train, _, X_train, y_train = makedata(30, w_dim, vmin, vmax, seed, ram, Train=True)
print(w_true)
# [ 0.72751997 -0.43018807 -0.85348722  0.52647441]

fig1.png

(b) SA-MCMC

%%time
sample_n = 10000
U1 = np.random.uniform(size=sample_n)
chain1 = SA_MCMC(X_train, y_train, ram, 100, sample_n, w_dim, U1)
U2 = np.random.uniform(size=sample_n)
chain2 = SA_MCMC(X_train, y_train, ram, 100, sample_n, w_dim, U2)
U3 = np.random.uniform(size=sample_n)
chain3 = SA_MCMC(X_train, y_train, ram, 100, sample_n, w_dim, U3)
# Wall time: 19.2 s

The parameter estimation results are as follows (burnin: 2000). The effective sample size (ess) is very large. Likelihood evaluation by the objective distribution is once per sampling, but if the number of particles is $ N $ and the parameter dimension is $ M $, $ O (NM ^ 2) $ is calculated in addition to the likelihood evaluation. It takes time. By setting the off-diagonal component of the covariance matrix to 0, it is possible to reduce the additional calculation time to $ O (NM) $, but in that case the performance seems to be considerably reduced.

新しいビットマップ イメージ.png fig2.jpg

  1. Summary I implemented Sample Adaptive MCMC. How was it?

Recommended Posts

Sample Adaptive MCMC
PyAudio sample
Scraping sample