Mixture Model ML for Clusterwise Linear Regression
Computes an ML-estimator for clusterwise linear regression under a
regression mixture model with Normal errors. Parameters are
proportions, regression coefficients and error variances, all
independent of the values of the independent variable, and all may
differ for different clusters. Computation is by the EM-algorithm. The
number of clusters is estimated via the Bayesian Information Criterion
(BIC). Note that package flexmix
has more sophisticated tools to do the
same thing and is recommended. The functions are kept in here only for
compatibility reasons.
regmix(indep, dep, ir=1, nclust=1:7, icrit=1.e-5, minsig=1.e-6, warnings=FALSE) regem(indep, dep, m, cln, icrit=1.e-5, minsig=1.e-6, warnings=FALSE)
indep |
numerical matrix or vector. Independent variables. |
dep |
numerical vector. Dependent variable. |
ir |
positive integer. Number of iteration runs for every number of clusters. |
nclust |
vector of positive integers. Numbers of clusters. |
icrit |
positive numerical. Stopping criterion for the iterations (difference of loglikelihoods). |
minsig |
positive numerical. Minimum value for the variance parameters (likelihood is unbounded if variances are allowed to converge to 0). |
warnings |
logical. If |
cln |
positive integer. (Single) number of clusters. |
m |
matrix of positive numericals. Number of columns must be
|
The result of the EM iteration depends on the initial configuration,
which is generated randomly by randcmatrix
for regmix
.
regmix
calls regem
. To provide the initial configuration
manually, use parameter m
of
regem
directly. Take a look at the example about how to
generate m
if you want to specify initial parameters.
The original paper DeSarbo and Cron (1988) suggests the AIC for
estimating the number of clusters. The use of the BIC is advocated by
Wedel and DeSarbo (1995). The BIC is defined here as
2*loglik - log(n)*((p+3)*cln-1)
, p
being the number of
independent variables, i.e., the larger the better.
See the entry for the input parameter warnings
for the
treatment of several numerical problems.
regmix
returns a list
containing the components clnopt, loglik, bic,
coef, var, eps, z, g
.
regem
returns a list
containing the components loglik,
coef, var, z, g, warn
.
clnopt |
optimal number of clusters according to the BIC. |
loglik |
loglikelihood for the optimal model. |
bic |
vector of BIC values for all numbers of clusters in |
coef |
matrix of regression coefficients. First row: intercept parameter. Second row: parameter of first independent variable and so on. Columns corresponding to clusters. |
var |
vector of error variance estimators for the clusters. |
eps |
vector of cluster proportion estimators. |
z |
matrix of estimated a posteriori probabilities of the points
(rows) to be generated by the clusters (columns). Compare input
argument |
g |
integer vector of estimated cluster numbers for the points
(via argmax over |
warn |
logical. |
DeSarbo, W. S. and Cron, W. L. (1988) A maximum likelihood methodology for clusterwise linear regression, Journal of Classification 5, 249-282.
Wedel, M. and DeSarbo, W. S. (1995) A mixture likelihood approach for generalized linear models, Journal of Classification 12, 21-56.
Regression mixtures can also (and probably better) be computed with the
flexmix package, see flexmix
. (When I first write
the regmix
-function, flexmix
didn't exist.)
fixreg
for fixed point clusters for clusterwise linear regression.
EMclust
for Normal mixture model fitting (non-regression).
## Not run: # This apparently gives slightly different # but data-analytically fine results # on some versions of R. set.seed(12234) data(tonedata) attach(tonedata) rmt1 <- regmix(stretchratio,tuned,nclust=1:2) # nclust=1:2 makes the example fast; # a more serious application would rather use the default. rmt1$g round(rmt1$bic,digits=2) # start with initial parameter values cln <- 3 n <- 150 initcoef <- cbind(c(2,0),c(0,1),c(0,2.5)) initvar <- c(0.001,0.0001,0.5) initeps <- c(0.4,0.3,0.3) # computation of m from initial parameters m <- matrix(nrow=n, ncol=cln) stm <- numeric(0) for (i in 1:cln) for (j in 1:n){ m[j,i] <- initeps[i]*dnorm(tuned[j],mean=initcoef[1,i]+ initcoef[2,i]*stretchratio[j], sd=sqrt(initvar[i])) } for (j in 1:n){ stm[j] <- sum(m[j,]) for (i in 1:cln) m[j,i] <- m[j,i]/stm[j] } rmt2 <- regem(stretchratio, tuned, m, cln) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.