Cross-validated variable selection (varsel)
Perform cross-validation for the projective variable selection for a generalized linear model or generalized lienar and additive multilevel models.
cv_varsel(object, ...) ## Default S3 method: cv_varsel(object, ...) ## S3 method for class 'refmodel' cv_varsel( object, method = NULL, cv_method = NULL, ndraws = NULL, nclusters = NULL, ndraws_pred = NULL, nclusters_pred = NULL, cv_search = TRUE, nterms_max = NULL, intercept = NULL, penalty = NULL, verbose = TRUE, nloo = NULL, K = NULL, lambda_min_ratio = 1e-05, nlambda = 150, thresh = 1e-06, regul = 1e-04, validate_search = TRUE, seed = NULL, search_terms = NULL, ... )
object |
Same as in varsel. |
... |
Additional arguments to be passed to the
|
method |
Same as in varsel. |
cv_method |
The cross-validation method, either 'LOO' or 'kfold'. Default is 'LOO'. |
ndraws |
Number of posterior draws used for selection. Ignored if nclusters is provided or if method='L1'. |
nclusters |
Number of clusters used for selection. Default is 1 and ignored if method='L1' (L1-search uses always one cluster). |
ndraws_pred |
Number of samples used for prediction (after selection). Ignored if nclusters_pred is given. |
nclusters_pred |
Number of clusters used for prediction (after selection). Default is 5. |
cv_search |
Same as in varsel. |
nterms_max |
Same as in varsel. |
intercept |
Same as in varsel. |
penalty |
Same as in varsel. |
verbose |
Whether to print out some information during the validation, Default is TRUE. |
nloo |
Number of observations used to compute the LOO validation
(anything between 1 and the total number of observations). Smaller values
lead to faster computation but higher uncertainty (larger errorbars) in the
accuracy estimation. Default is to use all observations, but for faster
experimentation, one can set this to a small value such as 100. Only
applicable if |
K |
Number of folds in the K-fold cross validation. Default is 5 for genuine reference models and 10 for datafits (that is, for penalized maximum likelihood estimation). |
lambda_min_ratio |
Same as in varsel. |
nlambda |
Same as in varsel. |
thresh |
Same as in varsel. |
regul |
Amount of regularization in the projection. Usually there is no need for regularization, but sometimes for some models the projection can be ill-behaved and we need to add some regularization to avoid numerical problems. |
validate_search |
Whether to cross-validate also the selection process, that is, whether to perform selection separately for each fold. Default is TRUE and we strongly recommend not setting this to FALSE, because this is known to bias the accuracy estimates for the selected submodels. However, setting this to FALSE can sometimes be useful because comparing the results to the case where this parameter is TRUE gives idea how strongly the feature selection is (over)fitted to the data (the difference corresponds to the search degrees of freedom or the effective number of parameters introduced by the selectin process). |
seed |
Random seed used in the subsampling LOO. By default uses a fixed seed. |
search_terms |
User defined list of terms to consider for selection. |
An object of type vsel
that contains information about the
feature selection. The fields are not meant to be accessed directly by the
user but instead via the helper functions (see the vignettes or type
?projpred to see the main functions in the package.)
if (requireNamespace('rstanarm', quietly=TRUE)) { ### Usage with stanreg objects n <- 30 d <- 5 x <- matrix(rnorm(n*d), nrow=n) y <- x[,1] + 0.5*rnorm(n) data <- data.frame(x,y) fit <- rstanarm::stan_glm(y ~ X1 + X2 + X3 + X4 + X5, gaussian(), data=data, chains=2, iter=500) cvs <- cv_varsel(fit) plot(cvs) }
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.