Markov Chain Monte Carlo for the Hierarchical Binomial Linear Regression Model using the logit link function
MCMChlogit generates a sample from the posterior distribution of a Hierarchical Binomial Linear Regression Model using the logit link function and Algorithm 2 of Chib and Carlin (1999). This model uses a multivariate Normal prior for the fixed effects parameters, an Inverse-Wishart prior on the random effects variance matrix, and an Inverse-Gamma prior on the variance modelling over-dispersion. The user supplies data and priors, and a sample from the posterior distribution is returned as an mcmc object, which can be subsequently analyzed with functions provided in the coda package.
MCMChlogit( fixed, random, group, data, burnin = 5000, mcmc = 10000, thin = 10, verbose = 1, seed = NA, beta.start = NA, sigma2.start = NA, Vb.start = NA, mubeta = 0, Vbeta = 1e+06, r, R, nu = 0.001, delta = 0.001, FixOD = 0, ... )
fixed |
A two-sided linear formula of the form 'y~x1+...+xp' describing the fixed-effects part of the model, with the response on the left of a '~' operator and the p fixed terms, separated by '+' operators, on the right. Response variable y must be 0 or 1 (Binomial process). |
random |
A one-sided formula of the form '~x1+...+xq' specifying the model for the random effects part of the model, with the q random terms, separated by '+' operators. |
group |
String indicating the name of the grouping variable in
|
data |
A data frame containing the variables in the model. |
burnin |
The number of burnin iterations for the sampler. |
mcmc |
The number of Gibbs iterations for the sampler. Total number of
Gibbs iterations is equal to |
thin |
The thinning interval used in the simulation. The number of mcmc iterations must be divisible by this value. |
verbose |
A switch (0,1) which determines whether or not the progress of the sampler is printed to the screen. Default is 1: a progress bar is printed, indicating the step (in %) reached by the Gibbs sampler. |
seed |
The seed for the random number generator. If NA, the Mersenne Twister generator is used with default seed 12345; if an integer is passed it is used to seed the Mersenne twister. |
beta.start |
The starting values for the β vector. This can either be a scalar or a p-length vector. The default value of NA will use the OLS β estimate of the corresponding Gaussian Linear Regression without random effects. If this is a scalar, that value will serve as the starting value mean for all of the betas. |
sigma2.start |
Scalar for the starting value of the residual error variance. The default value of NA will use the OLS estimates of the corresponding Gaussian Linear Regression without random effects. |
Vb.start |
The starting value for variance matrix of the random effects. This must be a square q-dimension matrix. Default value of NA uses an identity matrix. |
mubeta |
The prior mean of β. This can either be a scalar or a p-length vector. If this takes a scalar value, then that value will serve as the prior mean for all of the betas. The default value of 0 will use a vector of zeros for an uninformative prior. |
Vbeta |
The prior variance of β. This can either be a scalar or a square p-dimension matrix. If this takes a scalar value, then that value times an identity matrix serves as the prior variance of beta. Default value of 1.0E6 will use a diagonal matrix with very large variance for an uninformative flat prior. |
r |
The shape parameter for the Inverse-Wishart prior on variance matrix for the random effects. r must be superior or equal to q. Set r=q for an uninformative prior. See the NOTE for more details |
R |
The scale matrix for the Inverse-Wishart prior on variance matrix for the random effects. This must be a square q-dimension matrix. Use plausible variance regarding random effects for the diagonal of R. See the NOTE for more details |
nu |
The shape parameter for the Inverse-Gamma prior on the residual
error variance. Default value is |
delta |
The rate (1/scale) parameter for the Inverse-Gamma prior on the
residual error variance. Default value is |
FixOD |
A switch (0,1) which determines whether or not the variance for
over-dispersion (sigma2) should be fixed (1) or not (0). Default is 0,
parameter sigma2 is estimated. If FixOD=1, sigma2 is fixed to the value
provided for |
... |
further arguments to be passed |
MCMChlogit
simulates from the posterior distribution sample using the
blocked Gibbs sampler of Chib and Carlin (1999), Algorithm 2. The simulation
is done in compiled C++ code to maximize efficiency. Please consult the coda
documentation for a comprehensive list of functions that can be used to
analyze the posterior sample.
The model takes the following form:
y_i \sim \mathcal{B}ernoulli(θ_i)
With latent variables φ(θ_i), φ being the logit link function:
φ(θ_i) = X_i β + W_i b_i + \varepsilon_i
Where each group i have k_i observations.
Where the random effects:
b_i \sim \mathcal{N}_q(0,V_b)
And the over-dispersion terms:
\varepsilon_i \sim \mathcal{N}(0, σ^2 I_{k_i})
We assume standard, conjugate priors:
β \sim \mathcal{N}_p(μ_{β},V_{β})
And:
σ^{2} \sim \mathcal{IG}amma(ν, 1/δ)
And:
V_b \sim \mathcal{IW}ishart(r, rR)
See Chib and Carlin (1999) for more details.
NOTE: We do not provide default parameters for the priors on
the precision matrix for the random effects. When fitting one of
these models, it is of utmost importance to choose a prior that
reflects your prior beliefs about the random effects. Using the
dwish
and rwish
functions might be useful in choosing
these values.
mcmc |
An mcmc object that contains the posterior sample. This object can be summarized by functions provided by the coda package. The posterior sample of the deviance D, with D=-2\log(∏_i P(y_i|θ_i)), is also provided. |
theta.pred |
Predictive posterior mean for the inverse-logit of the latent variables. The approximation of Diggle et al. (2004) is used to marginalized with respect to over-dispersion terms: E[θ_i|β,b_i,σ^2]=φ^{-1}((X_iβ+W_ib_i)/√{(16√{3}/15π)^2σ^2+1}) |
Ghislain Vieilledent <ghislain.vieilledent@cirad.fr>
Siddhartha Chib and Bradley P. Carlin. 1999. “On MCMC Sampling in Hierarchical Longitudinal Models.” Statistics and Computing. 9: 17-26.
Daniel Pemstein, Kevin M. Quinn, and Andrew D. Martin. 2007. Scythe Statistical Library 1.0. http://scythe.lsa.umich.edu.
Andrew D. Martin and Kyle L. Saunders. 2002. “Bayesian Inference for Political Science Panel Data.” Paper presented at the 2002 Annual Meeting of the American Political Science Association.
Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines. 2006. “Output Analysis and Diagnostics for MCMC (CODA)”, R News. 6(1): 7-11. https://CRAN.R-project.org/doc/Rnews/Rnews_2006-1.pdf.
Diggle P., Heagerty P., Liang K., and Zeger S. 2004. “Analysis of Longitudinal Data.” Oxford University Press, 2sd Edition.
## Not run: #======================================== # Hierarchical Binomial Linear Regression #======================================== #== inv.logit function inv.logit <- function(x, min=0, max=1) { p <- exp(x)/(1+exp(x)) p <- ifelse( is.na(p) & !is.na(x), 1, p ) # fix problems with +Inf return(p*(max-min)+min) } #== Generating data # Constants nobs <- 1000 nspecies <- 20 species <- c(1:nspecies,sample(c(1:nspecies),(nobs-nspecies),replace=TRUE)) # Covariates X1 <- runif(n=nobs,min=-10,max=10) X2 <- runif(n=nobs,min=-10,max=10) X <- cbind(rep(1,nobs),X1,X2) W <- X # Target parameters # beta beta.target <- matrix(c(0.3,0.2,0.1),ncol=1) # Vb Vb.target <- c(0.5,0.05,0.05) # b b.target <- cbind(rnorm(nspecies,mean=0,sd=sqrt(Vb.target[1])), rnorm(nspecies,mean=0,sd=sqrt(Vb.target[2])), rnorm(nspecies,mean=0,sd=sqrt(Vb.target[3]))) # Response theta <- vector() Y <- vector() for (n in 1:nobs) { theta[n] <- inv.logit(X[n,]%*%beta.target+W[n,]%*%b.target[species[n],]) Y[n] <- rbinom(n=1,size=1,prob=theta[n]) } # Data-set Data <- as.data.frame(cbind(Y,theta,X1,X2,species)) plot(Data$X1,Data$theta) #== Call to MCMChlogit model <- MCMChlogit(fixed=Y~X1+X2, random=~X1+X2, group="species", data=Data, burnin=5000, mcmc=1000, thin=1,verbose=1, seed=NA, beta.start=0, sigma2.start=1, Vb.start=1, mubeta=0, Vbeta=1.0E6, r=3, R=diag(c(1,0.1,0.1)), nu=0.001, delta=0.001, FixOD=1) #== MCMC analysis # Graphics pdf("Posteriors-MCMChlogit.pdf") plot(model$mcmc) dev.off() # Summary summary(model$mcmc) # Predictive posterior mean for each observation model$theta.pred # Predicted-Observed plot(Data$theta,model$theta.pred) abline(a=0,b=1) ## #Not run ## #You can also compare with lme4 results ## #== lme4 resolution ## library(lme4) ## model.lme4 <- lmer(Y~X1+X2+(1+X1+X2|species),data=Data,family="binomial") ## summary(model.lme4) ## plot(fitted(model.lme4),model$theta.pred,main="MCMChlogit/lme4") ## abline(a=0,b=1) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.