Bayesian regularized linear but big models via Stan
stan_biglm( biglm, xbar, ybar, s_y, ..., prior = R2(stop("'location' must be specified")), prior_intercept = NULL, prior_PD = FALSE, algorithm = c("sampling", "meanfield", "fullrank"), adapt_delta = NULL ) stan_biglm.fit( b, R, SSR, N, xbar, ybar, s_y, has_intercept = TRUE, ..., prior = R2(stop("'location' must be specified")), prior_intercept = NULL, prior_PD = FALSE, algorithm = c("sampling", "meanfield", "fullrank", "optimizing"), adapt_delta = NULL, importance_resampling = TRUE, keep_every = 1 )
biglm |
The list output by |
xbar |
A numeric vector of column means in the implicit design matrix excluding the intercept for the observations included in the model. |
ybar |
A numeric scalar indicating the mean of the outcome for the observations included in the model. |
s_y |
A numeric scalar indicating the unbiased sample standard deviation of the outcome for the observations included in the model. |
... |
Further arguments passed to the function in the rstan
package ( |
prior |
Must be a call to |
prior_intercept |
Either Note: If using a dense representation of the design matrix
—i.e., if the |
prior_PD |
A logical scalar (defaulting to |
algorithm |
A string (possibly abbreviated) indicating the
estimation approach to use. Can be |
adapt_delta |
Only relevant if |
b |
A numeric vector of OLS coefficients, excluding the intercept |
R |
A square upper-triangular matrix from the QR decomposition of the design matrix, excluding the intercept |
SSR |
A numeric scalar indicating the sum-of-squared residuals for OLS |
N |
A integer scalar indicating the number of included observations |
has_intercept |
A logical scalar indicating whether to add an intercept to the model when estimating it. |
importance_resampling |
Logical scalar indicating whether to use
importance resampling when approximating the posterior distribution with
a multivariate normal around the posterior mode, which only applies
when |
keep_every |
Positive integer, which defaults to 1, but can be higher
in order to thin the importance sampling realizations and also only
apples when |
The stan_biglm
function is intended to be used in the same
circumstances as the biglm
function in the biglm
package but with an informative prior on the R^2 of the regression.
Like biglm
, the memory required to estimate the model
depends largely on the number of predictors rather than the number of
observations. However, stan_biglm
and stan_biglm.fit
have
additional required arguments that are not necessary in
biglm
, namely xbar
, ybar
, and s_y
.
If any observations have any missing values on any of the predictors or the
outcome, such observations do not contribute to these statistics.
The output of both stan_biglm
and stan_biglm.fit
is an
object of stanfit-class
rather than
stanreg-objects
, which is more limited and less convenient
but necessitated by the fact that stan_biglm
does not bring the full
design matrix into memory. Without the full design matrix,some of the
elements of a stanreg-objects
object cannot be calculated,
such as residuals. Thus, the functions in the rstanarm package that
input stanreg-objects
, such as
posterior_predict
cannot be used.
# create inputs ols <- lm(mpg ~ wt + qsec + am, data = mtcars, # all row are complete so ... na.action = na.exclude) # not necessary in this case b <- coef(ols)[-1] R <- qr.R(ols$qr)[-1,-1] SSR <- crossprod(ols$residuals)[1] not_NA <- !is.na(fitted(ols)) N <- sum(not_NA) xbar <- colMeans(mtcars[not_NA,c("wt", "qsec", "am")]) y <- mtcars$mpg[not_NA] ybar <- mean(y) s_y <- sd(y) post <- stan_biglm.fit(b, R, SSR, N, xbar, ybar, s_y, prior = R2(.75), # the next line is only to make the example go fast chains = 1, iter = 500, seed = 12345) cbind(lm = b, stan_lm = rstan::get_posterior_mean(post)[13:15,]) # shrunk
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.