Standardization (Z-scoring)
Performs a standardization of data (z-scoring), i.e., centering and scaling,
so that the data is expressed in terms of standard deviation (i.e., mean = 0,
SD = 1) or Median Absolute Deviance (median = 0, MAD = 1). When applied to a
statistical model, this function extracts the dataset, standardizes it, and
refits the model with this standardized version of the dataset. The
normalize()
function can also be used to scale all numeric variables within
the 0 - 1 range.
standardize( x, robust = FALSE, two_sd = FALSE, weights = NULL, verbose = TRUE, ... ) ## S3 method for class 'numeric' standardize( x, robust = FALSE, two_sd = FALSE, weights = NULL, verbose = TRUE, ... ) ## S3 method for class 'data.frame' standardize( x, robust = FALSE, two_sd = FALSE, weights = NULL, verbose = TRUE, select = NULL, exclude = NULL, remove_na = c("none", "selected", "all"), force = FALSE, append = FALSE, suffix = "_z", ... ) ## Default S3 method: standardize( x, robust = FALSE, two_sd = FALSE, weights = TRUE, verbose = TRUE, include_response = TRUE, ... ) unstandardize( x, center = NULL, scale = NULL, reference = NULL, robust = FALSE, two_sd = FALSE, ... )
x |
A data frame, a vector or a statistical model (for |
robust |
Logical, if |
two_sd |
If |
weights |
Can be
|
verbose |
Toggle warnings and messages on or off. |
... |
Arguments passed to or from other methods. |
select |
Character vector of column names. If |
exclude |
Character vector of column names to be excluded from selection. |
remove_na |
How should missing values ( |
force |
Logical, if |
append |
Logical, if |
suffix |
Character value, will be appended to variable (column) names of
|
include_response |
For a model, if |
center, scale, reference |
Used by |
The standardized object (either a standardize data frame or a statistical model fitted on standardized data).
If x
is a model object, standardization is done by completely refitting the
model on the standardized data. Hence, this approach is equal to
standardizing the variables before fitting the model and will return a new
model object. However, this method is particularly recommended for complex
models that include interactions or transformations (e.g., polynomial or
spline terms). The robust
(default to FALSE
) argument enables a robust
standardization of data, i.e., based on the median
and MAD
instead of the
mean
and SD
. See standardize_parameters()
for other methods of
standardizing model coefficients.
When the model's formula contains transformations (e.g. y ~ exp(X)
) the
transformation effectively takes place after standardization (e.g.,
exp(scale(X))
). Some transformations are undefined for negative values,
such as log()
and sqrt()
. To avoid dropping these values, the
standardized data is shifted by Z - min(Z) + 1
or Z - min(Z)
(respectively).
When standardizing coefficients of a generalized model (GLM, GLMM, etc), only the predictors are standardized, maintaining the interpretability of the coefficients (e.g., in a binomial model: the exponent of the standardized parameter is the OR of a change of 1 SD in the predictor, etc.)
When x
is a vector or a data frame with remove_na = "none")
,
missing values are preserved, so the return value has the same length /
number of rows as the original input.
Other transform utilities:
change_scale()
,
normalize()
,
ranktransform()
Other standardize:
standardize_info()
,
standardize_parameters()
# Data frames summary(standardize(swiss)) # Models model <- lm(Infant.Mortality ~ Education * Fertility, data = swiss) coef(standardize(model))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.