The funFEM algorithm for the clustering of functional data.
The funFEM algorithm allows to cluster time series or, more generally, functional data. It is based on a discriminative functional mixture model which allows the clustering of the data in a unique and discriminative functional subspace. This model presents the advantage to be parsimonious and can therefore handle long time series.
funFEM(fd, K=2:6, model = "AkjBk", crit = "bic", init = "hclust", Tinit = c(), maxit = 50, eps = 1e-06, disp = FALSE, lambda = 0, graph = FALSE)
fd |
a functional data object produced by the fda package. |
K |
an integer vector specifying the numbers of mixture components (clusters) among which the model selection criterion will choose the most appropriate number of groups. Default is 2:6. |
model |
a vector of discriminative latent mixture (DLM) models to fit. There are 12 different models: "DkBk", "DkB", "DBk", "DB", "AkjBk", "AkjB", "AkBk", "AkBk", "AjBk", "AjB", "ABk", "AB". The option "all" executes the funFEM algorithm on the 12 models and select the best model according to the maximum value obtained by model selection criterion. |
crit |
the criterion to be used for model selection ('bic', 'aic' or 'icl'). 'bic' is the default. |
init |
the initialization type ('random', 'kmeans' of 'hclust'). 'hclust' is the default. |
Tinit |
a n x K matrix which contains posterior probabilities for initializing the algorithm (each line corresponds to an individual). |
maxit |
the maximum number of iterations before the stop of the Fisher-EM algorithm. |
eps |
the threshold value for the likelihood differences to stop the Fisher-EM algorithm. |
disp |
if true, some messages are printed during the clustering. Default is false. |
lambda |
the l0 penalty (between 0 and 1) for the sparse version. See (Bouveyron et al., 2014) for details. Default is 0. |
graph |
if true, it plots the evolution of the log-likelhood. Default is false. |
A list is returned:
model |
the model name. |
K |
the number of groups. |
cls |
the group membership of each individual estimated by the Fisher-EM algorithm. |
P |
the posterior probabilities of each individual for each group. |
prms |
the model parameters. |
U |
the orientation of the functional subspace according to the basis functions. |
aic |
the value of the Akaike information criterion. |
bic |
the value of the Bayesian information criterion. |
icl |
the value of the integrated completed likelihood criterion. |
loglik |
the log-likelihood values computed at each iteration of the FEM algorithm. |
ll |
the log-likelihood value obtained at the last iteration of the FEM algorithm. |
nbprm |
the number of free parameters in the model. |
call |
the call of the function. |
plot |
some information to pass to the plot.fem function. |
crit |
the model selction criterion used. |
Charles Bouveyron
C. Bouveyron, E. Côme and J. Jacques, The discriminative functional mixture model for the analysis of bike sharing systems, Preprint HAL n.01024186, University Paris Descartes, 2014.
# Clustering the well-known "Canadian temperature" data (Ramsay & Silverman) basis <- create.bspline.basis(c(0, 365), nbasis=21, norder=4) fdobj <- smooth.basis(day.5, CanadianWeather$dailyAv[,,"Temperature.C"],basis, fdnames=list("Day", "Station", "Deg C"))$fd res = funFEM(fdobj,K=4) # Visualization of the partition and the group means par(mfrow=c(1,2)) plot(fdobj,col=res$cls,lwd=2,lty=1) fdmeans = fdobj; fdmeans$coefs = t(res$prms$my) plot(fdmeans,col=1:max(res$cls),lwd=2) ## DO NOT RUN # # Load the velib data and smoothing # data(velib) # basis<- create.fourier.basis(c(0, 181), nbasis=25) # fdobj <- smooth.basis(1:181,t(velib$data),basis)$fd # # # Clustrering with FunFEM # res = funFEM(fdobj,K=6,model='AkjBk',init='kmeans',lambda=0,disp=TRUE) # # # Visualization of group means # fdmeans = fdobj; fdmeans$coefs = t(res$prms$my) # plot(fdmeans,col=1:res$K,xaxt='n',lwd=2) # axis(1,at=seq(5,181,6),labels=velib$dates[seq(5,181,6)],las=2) # # # Choice of K # res = funFEM(fdobj,K=2:20,model='AkjBk',init='kmeans',lambda=0,disp=TRUE) # plot(2:20,res$plot$bic,type='b',xlab='K',main='BIC') # # # Computation of the closest stations from the group means # par(mfrow=c(3,2)) # for (i in 1:res$K) { # matplot(t(velib$data[which.max(res$P[,i]),]),type='l',lty=i,col=i,xaxt='n', # lwd=2,ylim=c(0,1)) # axis(1,at=seq(5,181,6),labels=velib$dates[seq(5,181,6)],las=2) # title(main=paste('Cluster',i,' - ',velib$names[which.max(res$P[,i])])) # } # # # Visualization in the discriminative subspace (projected scores) # par(mfrow=c(1,1)) # plot(t(fdobj$coefs) # text(t(fdobj$coefs) # # # Spatial visualization of the clustering (with library ggmap) # library(ggmap) # Mymap = get_map(location = 'Paris', zoom = 12, maptype = 'terrain') # ggmap(Mymap) + geom_point(data=velib$position,aes(longitude,latitude), # colour = I(res$cl), size = I(3)) # # # FunFEM clustering with sparsity # res2 = funFEM(fdobj,K=res$K,model='AkjBk',init='user',Tinit=res$P, # lambda=0.01,disp=TRUE) # # # Visualization of group means and the selected functional bases # split.screen(c(2,1)) # fdmeans = fdobj; fdmeans$coefs = t(res2$prms$my) # screen(1); plot(fdmeans,col=1:res2$K,xaxt='n',lwd=2); axis(1,at=seq(5,181,6), # labels=velib$dates[seq(5,181,6)],las=2) # basis$dropind = which(rowSums(abs(res2$U))==0) # screen(2); plot(basis,col=1,lty=1,xaxt='n',xlab='Disc. basis functions') # axis(1,at=seq(5,181,6),labels=velib$dates[seq(5,181,6)],las=2) # close.screen(all=TRUE)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.