Algorithms and Framework for Nonnegative Matrix Factorization (NMF)

Provides a framework to perform Non-negative Matrix Factorization (NMF). The package implements a set of already published algorithms and seeding methods, and provides a framework to test, develop and plug new/custom algorithms. Most of the built-in algorithms have been optimized in C++, and the main interface function provides an easy way of performing parallel computations on multicore machines.


Nonnegative Matrix Factorization (NMF) is an unsupervised learning technique that has been applied successfully in several fields, including signal processing, face recognition and text mining. Recent applications of NMF in bioinformatics have demonstrated its ability to extract meaningful information from high-dimensional data such as gene expression microarrays. Developments in NMF theory and applications have resulted in a variety of algorithms and methods. However, most NMF implementations have been on commercial platforms, while those that are freely available typically require programming skills. This limits their use by the wider research community.

Our objective is to provide the bioinformatics community with an open-source, easy-to-use and unified interface to standard NMF algorithms, as well as with a simple framework to help implement and test new NMF methods. For that purpose, we have developed a package for the R/BioConductor platform. The package ports public code to R, and is structured to enable users to easily modify and/or add algorithms. It includes a number of published NMF algorithms and initialization methods and facilitates the combination of these to produce new NMF strategies. Commonly used benchmark data and visualization methods are provided to help in the comparison and interpretation of the results.

The NMF package helps realize the potential of Nonnegative Matrix Factorization, especially in bioinformatics, providing easy access to methods that have already yielded new insights in many applications.

Documentation, source code and sample data are available from:

  • Latest stable release from CRAN: http://cran.r-project.org/package=NMF

  • Development versions:

    • project: http://r-forge.r-project.org/projects/nmf
      _NOTE: due to some unknown problem on R-forge build system, the package is not being built. Please use the MyCRAN repository until the issue is fixed._

    • MyCRAN: personal CRAN-like repository at http://web.cbio.uct.ac.za/~renaud/CRAN

Travis check:

News


Changes in version 0.20.6


FIXES o fixed new NOTEs in R CMD check (about requireNamespace) o fixed error in heatmaps due to new version of stringr (>= 1.0.0)


Changes in version 0.20


NEW FEATURES o aheatmap gains an argument txt that enables displaying text in each cell of the heatmap.

FIXES o now all examples, vignettes and unit tests comply with CRAN policies on the maximum number of cores to be used when checking packages (2).


Changes in version 0.18


NEW FEATURES o aheatmap gains distfun methods 'spearman', 'kendall', and, for completeness, 'pearson' (for which 'correlation' is now an alias), which specifies the correlation method to use when computing the distance matrix.

CHANGES o In order to fully comply with CRAN policies, internals of the aheatmap function slightly changed. In particular, this brings an issue when directly plotting to PDF graphic devices, where a first blank page may appear. See the dedicated section in the man page ?aheatmap for a work around.


Changes in version 0.17.3


NEW FEATURES o add silhouette computation for NMF results, which can be performed on samples, features or consensus matrix. The average silhouette width is also computed by the summary methods. (Thanks to Gordon Robertson for this suggestion)

o New runtime option 'shared.memory' (or 'm') for toggling usage of shared 
memory (requires package synchronicity).

CHANGES o some plots are -- finally -- generated using ggplot2 This adds an Imports dependency to ggplot2 and reshape2.

BUG FIXES o fix a bug when running parallel NMF computation with algorithms defined in other packages (this most probably only affected my own packages, e.g., CellMix)


Changes in version 0.17.1


CHANGES o Computations seeded with an NMF object now set slot @seed to 'NMF' o Added unit tests on seeding with an NMF object o Removed some obsolete comments

BUG FIXES o an error was thrown when running multiple sequential NMF computations with nrun>=50: object '.MODE_SEQ' not found. (reported by Kenneth Lopiano)


Changes in version 0.16.5


CHANGES o Now depends on pkgmaker 0.16 o Disabled shared memory on Mac machines by default, due to un-resolved bug that occurs with big-matrix descriptors. It can be restored using nmf.options(shared.memory=TRUE) o Package version is now shown on startup message o Re-enabled unit test checks on CRAN, using the new function pkgmaker::isCHECK, that definitely identifies if tests are run under R CMD check.


  •                   Changes in version 0.15.2                       *
    

NEW FEATURES

o New function nmfObject() to update old versions of NMF objects, 
eg., saved on disk.
o NMF strategies can now define default values for their parameters,
eg., `seed` to associate them with a seeding method that is not the 
default method, `maxIter` to make default runs with more iterations 
(for algorithms defined as NMFStrategyIterative), or any other
algorithm-specific parameter.
o new general utility function hasArg2 that is identical to hasArg
but takes the argument name as a character string, so that no 
check NOTE is thrown.

CHANGES

o All vignettes are now generated using knitr.
o New dependency to pkgmaker which incorporated many of the 
general utility functions, initially defined for the NMF package.
o example check is faster (as requested by CRAN)
o the function selectMethodNMF was renamed into selectNMFMethod to 
be consistent with the other related *NMFMethod functions (get, set, etc..)

FIXES

o Fix computation of R-squared in profplot/corplot: now compute from 
a linear fit that includes an intercept.

  •                   Changes in version 0.8.6                        *
    

NEW FEATURES

o Formula based NMF models that can incorporate fixed terms, which may 
be used to correct for covariates or define group specific offsets.

CHANGES

o Subsetting an NMF object with a single index now returns an NMF object, except if argument drop is not missing (i.e. either TRUE or FALSE).


  •                   Changes in version 0.6.03                       *
    

WARNING Due to the major changes made in the internal structure of the standard NMF models, previous NMF fits are not compatible with this version.

NEW FEATURES

o The factory function nmfModel has been enhanced and provides new 
methods that makes more easier the creation of NMF objects.
See ?nmfModel.

o A new heatmap drawing function 'aheatmap' (for annotated heatmap) is 
now used to generate the different heatmaps (basismap, coefmap and consensusmap). 
It is a enhancement/fork of the function pheatmap from package pheatmap and draw 
-- I think -- very nice heatmaps, providing a convenient and flexible 
interface to add annotation tracks to both the columns and rows, with 
sensible automatic legends.

CHANGES

o Method nmfModel when called with no arguments does not return anymore
the list of available NMF models, but an empty NMF model.
To list the available models, directly call `nmfModels()`.

o The function `rmatrix` is now a S4 generic function.
It gains methods for generating random matrices based on a template matrix 
or an NMF model.
See ?rmatrix.

o The function `rnmf` gains a method to generate a random NMF model
given numerical dimensions.

o The function nmfEstimateRank now returns the fits for each value of the 
rank in element 'fit' of the result list. 
See ?nmfEstimateRank.

  •                   Changes in version 0.5.3                        *
    

NEW FEATURES

o The state of the random number generator is systematically stored in the 
'NMFfit' object returned by function 'nmf'.
It is stored in a new slot 'rng.seed' and can be access via the new method 
'rngSeed' of class 'NMFfit'. 
See ?rngSeed for more details.

o The number of cores to use in multicore computations can now also be
 specified by the 'p - parallel' runtime option (e.g. 'p4' to use 4 cores).
 Note that the value specified in option 'p' takes precedence on the one 
 passed via argument '.pbackend'.
 See section 'Runtime options' in ?nmf for more details. 

o Function 'nmfApply' now allows values 3 and 4 for argument 'MARGIN' to 
apply a given function to the basis vectors (i.e. columns of the basis matrix W) 
and basis profiles (i.e. rows of the mixture coefficients H) respectively.
See ?nmfApply for more details.

o New S4 generic 'basiscor' and 'profcor' to compute the 
correlation matrices of the basis vectors and basis profiles respectively 
from two NMF models or from an NMF model and a given compatible matrix. 
See ?basiscor or ?profcor for more details.

o New S4 generic 'fitcmp' to compare the NMF models fitted by two 
different runs of function 'nmf' (i.e. two 'NMFfit' objects). 
See ?fitcmp for more details.

o New S4 generic 'canFit' that tells if an NMF method is able to 
exactly or partly fit a given NMF model. 
See ?canFit for more details.

o New function 'selectMethodNMF' that selects an appropriate NMF method to 
fit a given NMF model. 
See ?selectMethodNMF for more details.
    
o The verbosity level can be controlled more finely by the 'v - verbose' 
runtime option (e.g. using .options='v1' or 'v2'). The greater is 
the level the more information is printed.
The verbose outputs have been cleaned-up and should be more consistent 
across the run mode (sequential or multicore).
See section 'Runtime options' in ?nmf for more details.

CHANGES

o The standard update equations have been optimised further, by making 
them modify the factor matrices in place. This speeds up the computation 
and greatly improves the algorithms' memory footprint.

o The package NMF now depends on the package digest that is used to display 
the state of random number generator in a concise way.

o The methods 'metaHeatmap' are split into 3 new S4 generic 'basismap' to 
plot a heatmap of the basis matrix [formerly plotted by the call : 
'metaHeatmap(object.of.class.NMF, 'features', ...) ], 'coefmap' to plot a 
heatmap of the mixture coefficient matrix [formerly plotted by the call : 
'metaHeatmap(object.of.class.NMF, 'samples', ...) ], and 'consensusmap' to 
plot a heatmap of the consensus matrix associated with multiple runs of NMF 
[formerly plotted by the call : 'metaHeatmap(object.of.class.NMFfitX, ...) ].

BUG FIX

o In factory method 'nmfModel': the colnames (resp. rownames) of 
matrix W (resp. H) are now correctly set when the rownames of H 
(resp. the colnames of W) are not null.

NEWLY DEPRECATED CLASSES, METHODS, FUNCTIONS, DATA SETS

o Deprecated Generics/Methods
   1) 'metaHeatmap,NMF' and 'metaHeatmap,NMFfitX' - S4 methods remain 
   with .Deprecated message.
   They are replaced by the definition of 3 new S4 generic 'basismap', 
   'coefmap' and 'consensusmap'. See the related point in section 
   CHANGES above.
   They will be completely removed from the package in the next version.      

  •                   Changes in version 0.5.1                        *
    

BUG FIX

o fix a small bug in method 'rss' to allow argument 'target' to be a 
'data.frame'. This was generating errors when computing the summary 
measures and especially when the function 'nmfEstimateRank' was called 
on a 'data.frame'. Thanks to Pavel Goldstein for reporting this.

  •                   Changes in version 0.5                          *
    

NEW FEATURES

o Method 'fcnnls' provides a user-friendly interface for the internal 
function '.fcnnls' to solve non-nengative linear least square problems, 
using the fast combinatorial approach from Benthem and Keenan (2004).
See '?fcnnls' for more details.

o New argument 'callback' in method 'nmf' allows to provide a callback 
function when running multiple runs in default mode, that only keeps the 
best result. The callback function is applied to the result of each run 
before it possibly gets discarding. The results are stored in the 
miscellaneous slot '.callback' accessible via the '$' operator (e.g. res$.callback).
See '?nmf' for more details.

o New method 'niter' to retrieve the number of iterations performed to fit 
a NMF model. It is defined on objects of class 'NMFfit'.

o New function 'isNMFfit' to check if an object is a result from NMF fitting.

o New function 'rmatrix' to easily generate random matrices, and allow to specify the 
distribution from which the entries are drawn. For example:

    * 'rmatrix(100, 10)' generates a 100x10 matrix whose entries are drawn 
    from the uniform distribution
     
    * 'rmatrix(100, 10, rnorm)' generates a 100x10 matrix whose entries are drawn 
    from the standard Normal distribution.
            
o New methods 'basisnames' and 'basisnames<-' to retrieve and set the basis vector 
names. See '?basisnames'. 

CHANGES

o Add a CITATION file that provides the bibtex entries for citing the 
BMC Bioinformatics paper for the package 
(http://www.biomedcentral.com/1471-2105/11/367), the vignette and manual.
See 'citation('NMF')' for complete references.

o New argument 'ncol' in method 'nmfModel' to specify the target dimensions more 
easily by calls like 'nmfModel(r, n, p)' to create a r-rank NMF model that fits a 
n x p target matrix.

o The subset method '[]' of objects of class 'NMF' has been changed to be more convenient.
See  

BUG FIX

o Method 'dimnames' for objects of class 'NMF' now correctly sets the 
names of each dimension.

REMOVED CLASSES, METHODS, FUNCTIONS, DATA SETS

o Method 'extra' has completely been removed.  

  •                   Changes in version 0.4.8                        *
    

BUG FIX

o When computing NMF with the SNMF/R(L) algorithms, an error could 
occur if a restart (recomputation of the initial value) for the H 
matrix (resp. W matrix) occured at the first iteration.
Thanks to Joe Maisog for reporting this.     

  •                   Changes in version 0.4.7                        *
    

BUG FIX

o When computing the cophenetic correlation coefficient of a diagonal 
matrix, the 'cophcor' method was returning 'NA' with a warning from 
the 'cor' function. It now correctly returns 1.
Thanks to Joe Maisog for reporting this.

  •                   Changes in version 0.4.4                        *
    

CHANGES

o The major change is the explicit addition of the synchronicity package into 
the suggested package dependencies. Since the publication of versions 4.x 
of the bigmemory package, it is used by the NMF package to provide the mutex 
functionality required by multicore computations.
Note This is relevant only for Linux/Mac-like platforms as the multicore 
package is not yet supported on MS Windows.
Users using a recent version of the bigmemory package (i.e. >=4.x) 
DO NEED to install the synchronicity package to be able to run multicore NMF 
computations.
Versions of bigmemory prior to 4.x include the mutex functionality.

o Minor enhancement in error messages

o Method 'nmfModel' can now be called with arguments 'rank' and 'target' 
swapped. This is for convenience and ease of use.

BUG FIX

o Argument 'RowSideColors' of the 'metaHeatmap' function is now correctly 
subset according to the value of argument 'filter'. However the argument 
must be named with its complete name 'RowSideColors', not assuming 
partial match. See KNOWN ISSUES. 

KNOWN ISSUES

o In the 'metaHeatmap' function, when argument 'filter' is not 'FALSE': 
arguments 'RowSideColors', 'labRow', 'Rowv' that are intended to the 
'heatmap.plus'-like function (and are used to label the rows) should be 
named using their complete name (i.e. not assuming partial match), 
otherwise the filtering is not applied to these argument and an error 
is generated. 
This issue will be fixed in a future release.

  •                   Changes in version 0.4.3                        *
    

CHANGES

o function 'nmfEstimateRank' has been enhanced: 
   -run options can be passed to each internal call to the 'nmf' function. 
   See ?nmfEstiamteRank and ?nmf for details on the supported options.
   
   - a new argument 'stop' allows to run the estimation with fault tolerance,
   skipping runs that throw an error. Summary measures for these runs are set 
   to NAs and a warning is thrown with details about the errors.
   
o in function  'plot.NMF.rank': a new argument 'na.rm' allows to remove from 
the plots the ranks for which the measures are NAs (due to errors during the 
estimation process with 'nmfEstimateRank').

BUG FIX

o Method 'consensus' is now exported in the NAMESPACE file.
Thanks to Gang Su for reporting this.

o Warnings and messages about loading packages are now suppressed.
This was particularly confusing for users that do not have the packages 
and/or platform required for parallel computation: warnings were printed 
whereas the computation was -- sequentially -- performed without 
problem.
Thanks to Joe Maisog for reporting this.

  •                   Changes in version 0.4.1                        *
    

BUG FIX

o The 'metaHeatmap' function was not correctly handling row labels 
when argument filter was not FALSE. All usual row formating in heatmaps 
(label and ordering) are now working as expected.
Thanks to Andreas Schlicker, from The Netherlands Cancer Institute for 
reporting this. 

o An error was thrown on some environments/platforms (e.g. Solaris) 
where not all the packages required for parallel computation were not 
available -- even when using option 'p' ('p' in lower case), which 
should have switched the computation to sequential. 
This is solved and the error is only thrown when running NMF with 
option 'P' (capital 'P').

o Not all the options were passed (e.g. 't' for tracking) in sequential 
mode (option '-p').

o verbose/debug global nmf.options were not restored if a numerical random 
seed were used.

CHANGES

o The 'metaHeatmap' function nows support passing the name of a filtering 
method in argument 'filter', which is passed to method 'extractFeatures'.
See ?metaHeatmap.

o Verbose and debug messages are better handled. When running a parallel 
computation of multiple runs, verbose messages from each run are shown only
in debug mode.

  •                   Changes in version 0.4                          *
    

NEW FEATURES

o Part of the code has been optimised in C++ for speed and memory efficiency:
    
    - the multiplicative updates for reducing the KL divergence and the euclidean 
    distance have been optimised in C++. This significantly reduces the 
    computation time of the methods that make use of them: 'lee', 'brunet', 
    'offset', 'nsNMF' and 'lnmf'.
    Old R version of the algorithm are still accessible with the suffix '.R#'. 
    
    - the computation of euclidean distance and KL divergence are implemented 
    in C++, and do not require the duplication of the original matrices as done 
    in R.

o Generic 'dimnames' is now defined for objects of class 'NMF' and returns
a list with 3 elements: the row names of the basis matrix, the column names 
of the mixture coefficient matrix , and the column names of the basis matrix. 
This implies that methods 'rownames' and 'columnames' are also available for 
'NMF' objects.

o A new class structure has been developed to handle the results of multiple 
NMF runs in a cleaner and more structured way:

    - Class 'NMFfitX' defines a common interface for multiple NMF runs of a 
      single algorithm.

    - Class 'NMFfitX1' handles the sub-case where only the best fit is returned.
      In particular, this class allows to handle such results as if coming 
      from a single run.

    - Class 'NMFfitXn' handles the sub-case where the list of all the fits 
      is returned. 

    - Class 'NMFList' handles the case of heterogeneous NMF runs (different 
      algorithms, different factorization rank, different dimension, etc...)
      
o The vignette contains more examples and details about the use of package.

o The package is compatible with both versions 3.x and 4.x of the bigmemory
package. This package is used when running multicore parallel computations. 
With version 4.x of bigmemory, the synchronicity package is also required as 
it provides the mutex functionality that used to be provided by bigmemory 3.x.

BUG FIX

o Running in multicore mode from the GUI on MacOS X is not allowed anymore as 
it is not safe and were throwing an error ['The process has forked and ...'].
Thanks to Stephen Henderson from the UCL Cancer Institute (UK) for reporting this.

o Function 'nmf' now restores the random seed to its original value as before its call 
with a numeric seed. This behaviour can be disabled with option 'restore.seed=FALSE' or '-r'

NEWLY DEPRECATED CLASSES, METHODS, FUNCTIONS, DATA SETS

o Deprecated Generics/Methods
   1) 'errorPlot' - S4 generic/methods remains with .Deprecated message.
   It is replaced by a definition of the 'plot' method for signatures 
   'NMFfit,missing' and 'NMFList,missing'
   It will be completely removed from the package in the next version.
   
o Deprecated Class
   1) 'NMFSet' - S4 class remains for backward compatibility, but is not 
   used anymore. It is replaced by the classes 'NMFfitX1', 'NMFfitXn', 
   'NMFList'.

  •                   Changes in version 0.3                          *
    

NEW FEATURES

o Now requires R 2.10

o New list slot 'misc' in class 'NMF' to be able to define new NMF models, 
without having to extend the S4 class. Access is done by methods '$' and '$<-'.

o More robust and convenient interface 'nmf'

o New built-in algorithm : PE-NMF [Zhang (2008)]

o The vignette and documentation have been enriched with more examples and 
details on the functionalities.

o When possible the computation is run in parallel on all the available cores.
See option 'p' or 'parallel=TRUE' in argument '.options' of function 'nmf'.

o Algorithms have been optimized and run faster

o Plot for rank estimation: quality measure curves can be plotted together 
with a set of reference measures.
The reference measures could for example come from the rank estimation of 
randomized data, to investigate overfitting.

o New methods '$' and '$<-' to access slot 'extra' in class 'NMFfit'
These methods replace method 'extra' that is now defunct.

o Function 'randomize' allows to randomise a target matrix or 
'ExpressionSet' by permuting the entries within each columns using a
different permutation for each column. It is useful when checking for
over-fitting. 

CHANGES

o The 'random' method of class 'NMF' is renamed 'rnmf', but is still 
accessible through name 'random' via the 'seed' argument in the 
interface method 'nmf'. 

NEWLY DEFUNCT CLASSES, METHODS, FUNCTIONS, DATA SETS

o Defunct Generics/Methods
   1) 'extra' - S4 generic/methods remains with .Defunct message.
   It will be completely removed from the package in the next version.

  •                   Changes in version 0.2.4                        *
    

CHANGES

o Class 'NMFStrategy' has a new slot 'mixed', that specify if the algorithm 
can handle mixed signed input matrices.

o Method 'nmf-matrix,numeric,function' accepts a new parameter 'mixed' to 
specify if the custom algorithm can handle mixed signed input matrices.

  •                   Changes in version 0.2.3                        *
    

NEW FEATURES

o Package 'Biobase' is not required anymore, and only suggested.
The definition and export of the NMF-BioConductor layer is done at 
loading.

o 'nmfApply' S4 generic/method: a 'apply'-like method for objects of 
class 'NMF'.

o 'predict' S4 method for objects of class 'NMF': the method replace the now
deprecated 'clusters' method.

o 'featureNames' and 'sampleNames' S4 method for objects of class 'NMFSet'.

o sub-setting S4 method '[' for objects of class 'NMF': row subsets are 
applied to the rows of the basis matrix, column subsets are applied to 
the columns of the mixture coefficient matrix.

CHANGES

o method 'featureScore' has a new argument 'method' to allow choosing between 
different scoring schema.

o method 'extractFeatures' has two new arguments: 'method' allows choosing 
between different scoring and selection methods; 'format' allows to specify 
the output format.
NOTE: the default output format has changed. The default result is now a 
list whose elements are integer vectors (one per basis component) that 
contain the indices of the basis-specific features. It is in practice what 
one is usually interested in.  

BUG FIXES

o Methods 'basis<-' and 'coef<-' were not exported by file NAMESPACE.

o Method 'featureNames' was returning the column names of the mixture 
coefficient matrix, instead of the row names of the basis matrix.

o 'metaHeatmap' with argument 'class' set would throw an error.

NEWLY DEPRECATED CLASSES, METHODS, FUNCTIONS, DATA SETS

o Deprecated Generics/Methods
   1) clusters - S4 generic/methods remain with .Deprecated message

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("NMF")

0.20.6 by Renaud Gaujoux, 2 years ago


http://renozao.github.io/NMF


Report a bug at http://github.com/renozao/NMF/issues


Browse source code at https://github.com/cran/NMF


Authors: Renaud Gaujoux, Cathal Seoighe


Documentation:   PDF Manual  


GPL (>= 2) license


Imports graphics, stats, stringr, digest, grid, grDevices, gridBase, colorspace, RColorBrewer, foreach, doParallel, ggplot2, reshape2

Depends on methods, utils, pkgmaker, registry, rngtools, cluster

Suggests RcppOctave, fastICA, doMPI, bigmemory, synchronicity, corpcor, xtable, devtools, knitr, bibtex, RUnit, mail, Biobase


Imported by Seurat, diceR, hNMF, phantom.

Depended on by IntNMF, InterSIM.

Suggested by PAC, dendextend, igraph.


See at CRAN