Simple posterior simulation with gam fits
GAM coefficients can be simulated directly from the Gaussian approximation to the posterior for the coefficients, or using a simple Metropolis Hastings sampler. See also ginla
.
gam.mh(b,ns=10000,burn=1000,t.df=40,rw.scale=.25,thin=1)
b |
|
ns |
the number of samples to generate. |
burn |
the length of any initial burn in period to discard (in addition to codens). |
t.df |
degrees of freedom for static multivariate t proposal. Lower for heavier tailed proposals. |
rw.scale |
Factor by which to scale posterior covariance matrix when generating random walk proposals. Negative or non finite to skip the random walk step. |
thin |
retain only every |
Posterior simulation is particularly useful for making inferences about non-linear functions of the model coefficients. Simulate random draws from the posterior, compute the function for each draw, and you have a draw from the posterior for the function. In many cases the Gaussian approximation to the posterior of the model coefficients is accurate, and samples generated from it can be treated as samples from the posterior for the coefficients. See example code below. This approach is computationally very efficient.
In other cases the Gaussian approximation can become poor. A typical example is in a spatial model with a log or logit link when there is a large area of observations containing only zeroes. In this case the linear predictor is poorly identified and the Gaussian approximation can become useless (an example is provided below). In that case it can sometimes be useful to simulate from the posterior using a Metropolis Hastings sampler. A simple approach alternates fixed proposals, based on the Gaussian approximation to the posterior, with random walk proposals, based on a shrunken version of the approximate posterior covariane matrix. gam.mh
implements this. The fixed proposal often promotes rapid mixing, while the random walk component ensures that the chain does not become stuck in regions for which the fixed Gaussian proposal density is much lower than the posterior density.
The function reports the acceptance rate of the two types of step. If the random walk acceptance probability is higher than a quarter then rw.step
should probably be increased. Similarly if the acceptance rate is too low, it should be decreased. The random walk steps can be turned off altogether (see above), but it is important to check the chains for stuck sections if this is done.
A list containing the retained simulated coefficients in matrix bs
and two entries for the acceptance probabilities.
Simon N. Wood simon.wood@r-project.org
Wood, S.N. (2015) Core Statistics, Cambridge
library(mgcv) set.seed(3);n <- 400 ############################################ ## First example: simulated Tweedie model... ############################################ dat <- gamSim(1,n=n,dist="poisson",scale=.2) dat$y <- rTweedie(exp(dat$f),p=1.3,phi=.5) ## Tweedie response b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=tw(), data=dat,method="REML") ## simulate directly from Gaussian approximate posterior... br <- rmvn(1000,coef(b),vcov(b)) ## Alternatively use MH sampling... br <- gam.mh(b,thin=2,ns=2000,rw.scale=.15)$bs ## If 'coda' installed, can check effective sample size ## require(coda);effectiveSize(as.mcmc(br)) ## Now compare simulation results and Gaussian approximation for ## smooth term confidence intervals... x <- seq(0,1,length=100) pd <- data.frame(x0=x,x1=x,x2=x,x3=x) X <- predict(b,newdata=pd,type="lpmatrix") par(mfrow=c(2,2)) for(i in 1:4) { plot(b,select=i,scale=0,scheme=1) ii <- b$smooth[[i]]$first.para:b$smooth[[i]]$last.para ff <- X[,ii]%*%t(br[,ii]) ## posterior curve sample fq <- apply(ff,1,quantile,probs=c(.025,.16,.84,.975)) lines(x,fq[1,],col=2,lty=2);lines(x,fq[4,],col=2,lty=2) lines(x,fq[2,],col=2);lines(x,fq[3,],col=2) } ############################################################### ## Second example, where Gaussian approximation is a failure... ############################################################### y <- c(rep(0, 89), 1, 0, 1, 0, 0, 1, rep(0, 13), 1, 0, 0, 1, rep(0, 10), 1, 0, 0, 1, 1, 0, 1, rep(0,4), 1, rep(0,3), 1, rep(0, 3), 1, rep(0, 10), 1, rep(0, 4), 1, 0, 1, 0, 0, rep(1, 4), 0, rep(1, 5), rep(0, 4), 1, 1, rep(0, 46)) set.seed(3);x <- sort(c(0:10*5,rnorm(length(y)-11)*20+100)) b <- gam(y ~ s(x, k = 15),method = 'REML', family = binomial) br <- gam.mh(b,thin=2,ns=2000,rw.scale=.4)$bs X <- model.matrix(b) par(mfrow=c(1,1)) plot(x, y, col = rgb(0,0,0,0.25), ylim = c(0,1)) ff <- X%*%t(br) ## posterior curve sample linv <- b$family$linkinv ## Get intervals for the curve on the response scale... fq <- linv(apply(ff,1,quantile,probs=c(.025,.16,.5,.84,.975))) lines(x,fq[1,],col=2,lty=2);lines(x,fq[5,],col=2,lty=2) lines(x,fq[2,],col=2);lines(x,fq[4,],col=2) lines(x,fq[3,],col=4) ## Compare to the Gaussian posterior approximation fv <- predict(b,se=TRUE) lines(x,linv(fv$fit)) lines(x,linv(fv$fit-2*fv$se.fit),lty=3) lines(x,linv(fv$fit+2*fv$se.fit),lty=3) ## ... Notice the useless 95% CI (black dotted) based on the ## Gaussian approximation!
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.