Stochastic search variable selection for quantile regression
This function uses stochastic search to select promising regression models at a fixed quantile τ. Indicator variables γ are used to represent whether a predictor is included in the model or not. The user supplies the data and the prior distribution on the model size. A list is returned containing the posterior sample of γ and the associated regression parameters β.
SSVSquantreg( formula, data = NULL, tau = 0.5, include = NULL, burnin = 1000, mcmc = 10000, thin = 1, verbose = 0, seed = sample(1:1e+06, 1), pi0a0 = 1, pi0b0 = 1, ... )
formula |
Model formula. |
data |
Data frame. |
tau |
The quantile of interest. Must be between 0 and 1. The default value of 0.5 corresponds to median regression model selection. |
include |
The predictor(s) that should definitely appear in the model. Can be specified by name, or their position in the formula (taking into account the intercept). |
burnin |
The number of burn-in iterations for the sampler. |
mcmc |
The number of MCMC iterations after burnin. |
thin |
The thinning interval used in the simulation. The number of MCMC iterations must be divisible by this value. |
verbose |
A switch which determines whether or not the progress of the
sampler is printed to the screen. If |
seed |
The seed for the random number generator. If NA, the Mersenne
Twister generator is used with default seed 12345; if an integer is passed
it is used to seed the Mersenne twister. The default value for this argument
is a random integer between 1 and 1,000,000. This default value ensures
that if the function is used again with a different value of
τ, it is extremely unlikely that the seed will be identical.
The user can also pass a list of length two to use the L'Ecuyer random
number generator, which is suitable for parallel computation. The first
element of the list is the L'Ecuyer seed, which is a vector of length six or
NA (if NA a default seed of |
pi0a0, pi0b0 |
Hyperparameters of the beta prior on π_0, the prior probability of including a predictor. Default values of (1,1) are equivalent to a uniform distribution. |
... |
Further arguments |
SSVSquantreg
implements stochastic search variable selection
over a set of potential predictors to obtain promising models. The
models considered take the following form:
Q_{τ}(y_i|x_{iγ}) = x_{iγ} ' β_{γ},
where Q_{τ}(y_i|x_{iγ}) denotes the conditional τth quantile of y_i given x_{iγ}, x_{iγ} denotes x_i with those predictors x_{ij} for which γ_j=0 removed and β_{γ} denotes the model specific regression parameters.
The likelihood is formed based on the assumption of independent asymmetric Laplace distributions on the y_i with skewness parameter τ and location parameters x_{iγ} ' β_{γ}. This assumption ensures that the likelihood function is maximised by the τth conditional quantile of the response variable.
The prior on each β_j is
(1-γ_j)δ_0+γ_j\mbox{Cauchy}(0,1),
where δ_0 denotes a degenerate distribution with all mass at 0. A standard Cauchy distribution is chosen conditional on γ_j=1. This allows for a wider range of nonzero values of β_j than a standard Normal distribution, improving the robustness of the method. Each of the indicator variables γ_j is independently assigned a Bernoulli prior, with prior probability of inclusion π_0. This in turn is assigned a beta distribution, resulting in a beta-binomial prior on the model size. The user can supply the hyperparameters for the beta distribution. Starting values are randomly generated from the prior distribution.
It is recommended to standardise any non-binary predictors in order to
compare these predictors on the same scale. This can be achieved using the
scale
function.
If it is certain that a predictor should be included, all predictors specified are brought to the first positions for computational convenience. The regression parameters associated with these predictors are given independent improper priors. Users may notice a small speed advantage if they specify the predictors that they feel certain should appear in the model, particularly for large models with a large number of observations.
A list containing:
gamma |
The posterior sample of γ. This has associated summary and plot methods. |
beta |
The posterior sample of the associated regression parameters β. This can be analysed with functions from the coda package. |
Craig Reed
Craig Reed, David B. Dunson and Keming Yu. 2010. "Bayesian Variable Selection for Quantile Regression" Technical Report.
Daniel Pemstein, Kevin M. Quinn, and Andrew D. Martin. 2007. Scythe Statistical Library 1.2. http://scythe.lsa.umich.edu.
Keming Yu and Jin Zhang. 2005. "A Three Parameter Asymmetric Laplace Distribution and it's extensions." Communications in Statistics - Theory and Methods, 34, 1867-1879.
Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines. 2006. “Output Analysis and Diagnostics for MCMC (CODA)”, R News. 6(1): 7-11. https://CRAN.R-project.org/doc/Rnews/Rnews_2006-1.pdf.
## Not run: set.seed(1) epsilon<-rnorm(100) set.seed(2) x<-matrix(rnorm(1000),100,10) y<-x[,1]+x[,10]+epsilon qrssvs<-SSVSquantreg(y~x) model.50pc<-SSVSquantreg(y~x) model.90pc<-SSVSquantreg(y~x,tau=0.9) summary(model.50pc) ## Intercept not in median probability model summary(model.90pc) ## Intercept appears in median probability model ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.