fpc: distancefactor – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

distancefactor

Factor for dissimilarity of mixed type data

Description

Computes a factor that can be used to standardise ordinal categorical variables and binary dummy variables coding categories of nominal scaled variables for Euclidean dissimilarity computation in mixed type data. See Hennig and Liao (2013).

Usage

distancefactor(cat,n=NULL, catsizes=NULL,type="categorical",
               normfactor=2,qfactor=ifelse(type=="categorical",1/2,
                             1/(1+1/(cat-1))))

Arguments

`cat`	integer. Number of categories of the variable to be standardised. Note that for `type="categorical"` the number of categories of the original variable is required, although the `distancefactor` is used to standardise dummy variables for the categories.
`n`	integer. Number of data points.
`catsizes`	vector of integers giving numbers of observations per category. One of `n` and `catsizes` must be supplied. If `catsizes=NULL`, `rep(round(n/cat),cat)` is used (this may be appropriate as well if numbers of observations of categories are unequal, if the researcher decides that the dissimilarity measure should not be influenced by empirical category sizes.
`type`	`"categorical"` if the factor is used for dummy variables belonging to a nominal variable, `"ordinal"` if the factor is used for an ordinal variable ind standard Likert coding.
`normfactor`	numeric. Factor on which standardisation is based. As a default, this is `E(X_1-X_2)^2=2` for independent unit variance variables.
`qfactor`	numeric. Factor q in Hennig and Liao (2013) to adjust for clumping effects due to discreteness.

Value

A factor by which to multiply the variable in order to make it comparable to a unit variance continuous variable when aggregated in Euclidean fashion for dissimilarity computation, so that expected effective difference between two realisations of the variable equals qfactor*normfactor.

Author(s)

Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en

References

Hennig, C. and Liao, T. (2013) How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification, Journal of the Royal Statistical Society, Series C Applied Statistics, 62, 309-369.

Examples

set.seed(776655)
  d1 <- sample(1:5,20,replace=TRUE)
  d2 <- sample(1:4,20,replace=TRUE)
  ldata <- cbind(d1,d2)
  lc <- cat2bin(ldata,categorical=1)$data
  lc[,1:5] <- lc[,1:5]*distancefactor(5,20,type="categorical")
  lc[,6] <- lc[,6]*distancefactor(4,20,type="ordinal")

fpc

Flexible Procedures for Clustering

v2.2-9

GPL

Authors

Christian Hennig <christian.hennig@unibo.it>

Initial release

2020-12-06