effectsize: standardize – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

standardize

Standardization (Z-scoring)

Description

Performs a standardization of data (z-scoring), i.e., centering and scaling, so that the data is expressed in terms of standard deviation (i.e., mean = 0, SD = 1) or Median Absolute Deviance (median = 0, MAD = 1). When applied to a statistical model, this function extracts the dataset, standardizes it, and refits the model with this standardized version of the dataset. The normalize() function can also be used to scale all numeric variables within the 0 - 1 range.

Usage

standardize(
  x,
  robust = FALSE,
  two_sd = FALSE,
  weights = NULL,
  verbose = TRUE,
  ...
)

## S3 method for class 'numeric'
standardize(
  x,
  robust = FALSE,
  two_sd = FALSE,
  weights = NULL,
  verbose = TRUE,
  ...
)

## S3 method for class 'data.frame'
standardize(
  x,
  robust = FALSE,
  two_sd = FALSE,
  weights = NULL,
  verbose = TRUE,
  select = NULL,
  exclude = NULL,
  remove_na = c("none", "selected", "all"),
  force = FALSE,
  append = FALSE,
  suffix = "_z",
  ...
)

## Default S3 method:
standardize(
  x,
  robust = FALSE,
  two_sd = FALSE,
  weights = TRUE,
  verbose = TRUE,
  include_response = TRUE,
  ...
)

unstandardize(
  x,
  center = NULL,
  scale = NULL,
  reference = NULL,
  robust = FALSE,
  two_sd = FALSE,
  ...
)

Arguments

`x`	A data frame, a vector or a statistical model (for `unstandardize()` cannot be a model).
`robust`	Logical, if `TRUE`, centering is done by subtracting the median from the variables and dividing it by the median absolute deviation (MAD). If `FALSE`, variables are standardized by subtracting the mean and dividing it by the standard deviation (SD).
`two_sd`	If `TRUE`, the variables are scaled by two times the deviation (SD or MAD depending on `robust`). This method can be useful to obtain model coefficients of continuous parameters comparable to coefficients related to binary predictors, when applied to the predictors (not the outcome) (Gelman, 2008).
`weights`	Can be `NULL` (for no weighting), or: For model: if `TRUE` (default), a weighted-standardization is carried out. For `data.frame`s: a numeric vector of weights, or a character of the name of a column in the `data.frame` that contains the weights. For numeric vectors: a numeric vector of weights.
`verbose`	Toggle warnings and messages on or off.
`...`	Arguments passed to or from other methods.
`select`	Character vector of column names. If `NULL` (the default), all variables will be selected.
`exclude`	Character vector of column names to be excluded from selection.
`remove_na`	How should missing values (`NA`) be treated: if `"none"` (default): each column's standardization is done separately, ignoring `NA`s. Else, rows with `NA` in the columns selected with `select` / `exclude` (`"selected"`) or in all columns (`"all"`) are dropped before standardization, and the resulting data frame does not include these cases.
`force`	Logical, if `TRUE`, forces standardization of factors and dates as well. Factors are converted to numerical values, with the lowest level being the value `1` (unless the factor has numeric levels, which are converted to the corresponding numeric value).
`append`	Logical, if `TRUE` and `x` is a data frame, standardized variables will be added as additional columns; if `FALSE`, existing variables are overwritten.
`suffix`	Character value, will be appended to variable (column) names of `x`, if `x` is a data frame and `append = TRUE`.
`include_response`	For a model, if `TRUE` (default), the response value will also be standardized. If `FALSE`, only the predictors will be standardized. Note that for certain models (logistic regression, count models, ...), the response value will never be standardized, to make re-fitting the model work. (For `mediate` models, only applies to the y model; m model's response will always be standardized.)
`center, scale, reference`	Used by `unstandardize()`; `center` and `scale` correspond to the center (the mean / median) and the scale (SD / MAD) of the original non-standardized data (for data frames, should be named, or have column order correspond to the numeric column). However, one can also directly provide the original data through `reference`, from which the center and the scale will be computed (according to `robust` and `two_sd`. Alternatively, if the input contains the attributes `center` and `scale` (as does the output of `standardize()`), it will take it from there if the rest of the arguments are absent.

Value

The standardized object (either a standardize data frame or a statistical model fitted on standardized data).

Model Standardization

If x is a model object, standardization is done by completely refitting the model on the standardized data. Hence, this approach is equal to standardizing the variables before fitting the model and will return a new model object. However, this method is particularly recommended for complex models that include interactions or transformations (e.g., polynomial or spline terms). The robust (default to FALSE) argument enables a robust standardization of data, i.e., based on the median and MAD instead of the mean and SD. See standardize_parameters() for other methods of standardizing model coefficients.

Transformed Variables

When the model's formula contains transformations (e.g. y ~ exp(X)) the transformation effectively takes place after standardization (e.g., exp(scale(X))). Some transformations are undefined for negative values, such as log() and sqrt(). To avoid dropping these values, the standardized data is shifted by Z - min(Z) + 1 or Z - min(Z) (respectively).

Generalized Linear Models

When standardizing coefficients of a generalized model (GLM, GLMM, etc), only the predictors are standardized, maintaining the interpretability of the coefficients (e.g., in a binomial model: the exponent of the standardized parameter is the OR of a change of 1 SD in the predictor, etc.)

Note

When x is a vector or a data frame with remove_na = "none"), missing values are preserved, so the return value has the same length / number of rows as the original input.

Examples

# Data frames
summary(standardize(swiss))

# Models
model <- lm(Infant.Mortality ~ Education * Fertility, data = swiss)
coef(standardize(model))

effectsize

Indices of Effect Size and Standardized Parameters

v0.4.4-1

GPL-3

Authors

Mattan S. Ben-Shachar [aut, cre] (<https://orcid.org/0000-0002-4287-4801>), Dominique Makowski [aut] (<https://orcid.org/0000-0001-5375-9967>), Daniel Lüdecke [aut] (<https://orcid.org/0000-0002-8895-3206>), Indrajeet Patil [ctb] (<https://orcid.org/0000-0003-1995-6531>, @patilindrajeets), Ken Kelley [ctb], David Stanley [ctb]

Initial release