Fitting mixed Gaussian/multinomial mixtures with flexmix
flexmixedruns
fits a latent class
mixture (clustering) model where some variables are continuous
and modelled within the mixture components by Gaussian distributions
and some variables are categorical and modelled within components by
independent multinomial distributions. The fit is by maximum
likelihood estimation computed with the EM-algorithm. The number of
components can be estimated by the BIC.
Note that at least one categorical variable is needed, but it is possible to use data without continuous variable.
flexmixedruns(x,diagonal=TRUE,xvarsorted=TRUE, continuous,discrete,ppdim=NULL,initial.cluster=NULL, simruns=20,n.cluster=1:20,verbose=TRUE,recode=TRUE, allout=TRUE,control=list(minprior=0.001),silent=TRUE)
x |
data matrix or data frame. The data need to be organised case-wise, i.e., if there are categorical variables only, and 15 cases with values c(1,1,2) on the 3 variables, the data matrix needs 15 rows with values 1 1 2. (Categorical variables could take numbers or strings or anything that can be coerced to factor levels as values.) |
diagonal |
logical. If |
xvarsorted |
logical. If |
continuous |
vector of integers giving positions of the
continuous variables. If |
discrete |
vector of integers giving positions of the
categorical variables. If |
ppdim |
vector of integers specifying the number of (in the data)
existing categories for each categorical variable. If
|
initial.cluster |
this corresponds to the |
simruns |
integer. Number of starts of the EM algorithm with random initialisation in order to find a good global optimum. |
n.cluster |
vector of integers, numbers of components (the optimum one is found by minimising the BIC). |
verbose |
logical. If |
recode |
logical. If |
allout |
logical. If |
control |
list of control parameters for |
silent |
logical. This is passed on to the
|
Sometimes flexmix produces errors because of degenerating covariance
matrices, too small clusters etc. flexmixedruns
tolerates these
and treats them as non-optimal runs. (Higher simruns
or
different control
may be required to get a valid solution.)
General documentation on flexmix can be found in Friedrich Leisch's "FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R", https://CRAN.R-project.org/package=flexmix
A list with components
optsummary |
summary object for |
optimalk |
optimal number of components. |
errcount |
vector with numbers of EM runs for each number of components that led to flexmix errors. |
flexout |
if
If |
bicvals |
vector of values of the BIC for each number of components. |
ppdim |
vector of categorical variable-wise numbers of categories. |
discretelevels |
list of levels of the categorical variables
belonging to what is treated by |
Hennig, C. and Liao, T. (2013) How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification, Journal of the Royal Statistical Society, Series C Applied Statistics, 62, 309-369.
options(digits=3) set.seed(776655) v1 <- rnorm(100) v2 <- rnorm(100) d1 <- sample(1:5,100,replace=TRUE) d2 <- sample(1:4,100,replace=TRUE) ldata <- cbind(v1,v2,d1,d2) fr <- flexmixedruns(ldata, continuous=2,discrete=2,simruns=2,n.cluster=2:3,allout=FALSE) print(fr$optimalk) print(fr$optsummary) print(fr$flexout@cluster) print(fr$flexout@components)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.