projpred: suggest_size.vsel – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

suggest_size.vsel

Suggest model size

Description

This function can be used for suggesting an appropriate model size based on a certain default rule. Notice that the decision rules are heuristic and should be interpreted as guidelines. It is recommended that the user studies the results via varsel_plot and/or summary and makes the final decision based on what is most appropriate for the given problem.

Usage

suggest_size(object, ...)

## S3 method for class 'vsel'
suggest_size(
  object,
  stat = "elpd",
  alpha = 0.32,
  pct = 0,
  type = "upper",
  baseline = NULL,
  warnings = TRUE,
  ...
)

Arguments

`object`	The object returned by varsel or cv_varsel.
`...`	Currently ignored.
`stat`	Statistic used for the decision. Default is 'elpd'. See `summary` for other possible choices.
`alpha`	A number indicating the desired coverage of the credible intervals based on which the decision is made. E.g. `alpha=0.32` corresponds to 68% probability mass within the intervals (one standard error intervals). See details for more information.
`pct`	Number indicating the relative proportion between baseline model and null model utilities one is willing to sacrifice. See details for more information.
`type`	Either 'upper' (default) or 'lower' determining whether the decisions are based on the upper or lower credible bounds. See details for more information.
`baseline`	Either 'ref' or 'best' indicating whether the baseline is the reference model or the best submodel found. Default is 'ref' when the reference model exists, and 'best' otherwise.
`warnings`	Whether to give warnings if automatic suggestion fails, mainly for internal use. Default is TRUE, and usually there is no reason to set to FALSE.

Details

The suggested model size is the smallest model for which either the lower or upper (depending on argument type) credible bound of the submodel utility u_k with significance level alpha falls above

u_base - pct*(u_base - u_0)

Here u_base denotes the utility for the baseline model and u_0 the null model utility. The baseline is either the reference model or the best submodel found (see argument baseline). The lower and upper bounds are defined to contain the submodel utility with probability 1-alpha (each tail has mass alpha/2).

By default ratio=0, alpha=0.32 and type='upper' which means that we select the smallest model for which the upper tail exceeds the baseline model level, that is, which is better than the baseline model with probability 0.16 (and consequently, worse with probability 0.84). In other words, the estimated difference between the baseline model and submodel utilities is at most one standard error away from zero, so the two utilities are considered to be close.

NOTE: Loss statistics like RMSE and MSE are converted to utilities by multiplying them by -1, so call such as suggest_size(object, stat='rmse', type='upper') should be interpreted as finding the smallest model whose upper credible bound of the negative RMSE exceeds the cutoff level (or equivalently has the lower credible bound of RMSE below the cutoff level). This is done to make the interpretation of the argument type the same regardless of argument stat.

Examples

if (requireNamespace('rstanarm', quietly=TRUE)) {
  ### Usage with stanreg objects
  n <- 30
  d <- 5
  x <- matrix(rnorm(n*d), nrow=n)
  y <- x[,1] + 0.5*rnorm(n)
  data <- data.frame(x,y)
  fit <- rstanarm::stan_glm(y ~ X1 + X2 + X3 + X4 + X5, gaussian(),
           data=data, chains=2, iter=500)
  vs <- cv_varsel(fit)
  suggest_size(vs)
}

projpred

Projection Predictive Feature Selection

v2.0.2

GPL-3

Authors

Juho Piironen [aut], Markus Paasiniemi [aut], Alejandro Catalina [cre, aut], Aki Vehtari [aut], Jonah Gabry [ctb], Marco Colombo [ctb], Paul-Christian Bürkner [ctb]

Initial release