Estimate Rank for NMF Models
A critical parameter in NMF algorithms is the
factorization rank r. It defines the number of
basis effects used to approximate the target matrix.
Function nmfEstimateRank
helps in choosing an
optimal rank by implementing simple approaches proposed
in the literature.
Note that from version 0.7, one can equivalently
call the function nmf
with a range of
ranks.
In the plot generated by plot.NMF.rank
, each curve
represents a summary measure over the range of ranks in
the survey. The colours correspond to the type of data to
which the measure is related: coefficient matrix, basis
component matrix, best fit, or consensus matrix.
nmfEstimateRank(x, range, method = nmf.getOption("default.algorithm"), nrun = 30, model = NULL, ..., verbose = FALSE, stop = FALSE) ## S3 method for class 'NMF.rank' plot(x, y = NULL, what = c("all", "cophenetic", "rss", "residuals", "dispersion", "evar", "sparseness", "sparseness.basis", "sparseness.coef", "silhouette", "silhouette.coef", "silhouette.basis", "silhouette.consensus"), na.rm = FALSE, xname = "x", yname = "y", xlab = "Factorization rank", ylab = "", main = "NMF rank survey", ...)
x |
For For |
range |
a |
method |
A single NMF algorithm, in one of the
format accepted by the function |
nrun |
a |
model |
model specification passed to each
|
verbose |
toggle verbosity. This parameter only
affects the verbosity of the outer loop over the values
in |
stop |
logical flag for running the estimation
process with fault tolerance. When |
... |
For For |
y |
reference object of class |
what |
a |
na.rm |
single logical that specifies if the rank
for which the measures are NA values should be removed
from the graph or not (default to |
xname,yname |
legend labels for the curves
corresponding to measures from |
xlab |
x-axis label |
ylab |
y-axis label |
main |
main title |
Given a NMF algorithm and the target matrix, a common way of estimating r is to try different values, compute some quality measures of the results, and choose the best value according to this quality criteria. See Brunet et al. (2004) and Hutchins et al. (2008).
The function nmfEstimateRank
allows to perform
this estimation procedure. It performs multiple NMF runs
for a range of rank of factorization and, for each,
returns a set of quality measures together with the
associated consensus matrix.
In order to avoid overfitting, it is recommended to run
the same procedure on randomized data. The results on the
original and the randomised data may be plotted on the
same plots, using argument y
.
nmfEstimateRank
returns a S3 object (i.e. a list)
of class NMF.rank
with the following elements:
measures |
a |
consensus |
a |
fit |
a |
Brunet J, Tamayo P, Golub TR and Mesirov JP (2004). "Metagenes and molecular pattern discovery using matrix factorization." _Proceedings of the National Academy of Sciences of the United States of America_, *101*(12), pp. 4164-9. ISSN 0027-8424, <URL: http://dx.doi.org/10.1073/pnas.0308531101>, <URL: http://www.ncbi.nlm.nih.gov/pubmed/15016911>.
Hutchins LN, Murphy SM, Singh P and Graber JH (2008). "Position-dependent motif characterization using non-negative matrix factorization." _Bioinformatics (Oxford, England)_, *24*(23), pp. 2684-90. ISSN 1367-4811, <URL: http://dx.doi.org/10.1093/bioinformatics/btn526>, <URL: http://www.ncbi.nlm.nih.gov/pubmed/18852176>.
if( !isCHECK() ){ set.seed(123456) n <- 50; r <- 3; m <- 20 V <- syntheticNMF(n, r, m) # Use a seed that will be set before each first run res <- nmfEstimateRank(V, seq(2,5), method='brunet', nrun=10, seed=123456) # or equivalently res <- nmf(V, seq(2,5), method='brunet', nrun=10, seed=123456) # plot all the measures plot(res) # or only one: e.g. the cophenetic correlation coefficient plot(res, 'cophenetic') # run same estimation on randomized data rV <- randomize(V) rand <- nmfEstimateRank(rV, seq(2,5), method='brunet', nrun=10, seed=123456) plot(res, rand) }
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.