Extensible, Parallelizable Implementation of the Random Forest Algorithm

Scalable implementation of classification and regression forests, as described by Breiman (2001), .


Changes in 0.1-9:

  • Option 'nThread' limits OpenMP parallelization to maximum number of threads.

  • Option 'oob' specifies an out-of-bag constraint for prediction.

  • Row sampling now implemented using 'Rcpp', in place of 'rcppArmadillo'.

  • Package 'data.table' now implements block decomposition of 'data.frame'.

Changes in 0.1-8:

  • Command 'Validate' enables separate execution of out-of-bag validation.

  • Command 'Streamline' shrinks trained Rborist objects by emptying unused fields.

  • Option 'maxLeaf' prunes trees during training to a maximum number of leaves.

Changes in 0.1-4:

  • Sparse 'dcGMatrix' matrices accepted, if encoded in 'i/p' format.

  • Autocompression conserves space on a per-predictor basis.

  • Space-saving 'thinLeaves' option suppresses creation of summary data.

  • 'splitQuantile' option allows fine tuning of split-point placement for numerical predictors.

  • Improved scaling with row count.

Changes in 0.1-2:

  • Improved scaling with predictor count.

  • Improved conformance with Caret package.

  • 'minNode' default lowered to reflect uniqueness of indices referenced within a node.

  • Name change: PreTrain deprecated in favor of PreFormat.

  • Minor reorganization to support sparse internal representation planned for next release.

Changes in 0.1-1:

  • Significant reductions in memory footprint.

  • Default predictor-selction mode changed to 'predFixed' (like 'mTry') for small predictor counts. 'predProb' remains the default at higher count.

  • Binary classification now employs faster, weight-based algorithm.

  • Training produces rich internal state by default. In particular, quantile validation and prediction can be performed without having to train specially for them.

  • ForestFloorExport objects can be produced from training state for use by 'forestFloor' feature-analysis package.

  • PreTrain method produces pre-sorted predictor format, saving recomputation when retraining iteratively, such as during a Caret session.

  • OMP parallelization now performed per node/predictor pair, rather than per predictor.

  • Optional 'regMono' vector enforces monotonic constraints on numeric regressors.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.2-3 by Mark Seligman, 2 years ago

http://www.suiji.org/arborist, https://github.com/suiji/Arborist

Report a bug at https://github.com/suiji/Arborist/issues

Browse source code at https://github.com/cran/Rborist

Authors: Mark Seligman

Documentation:   PDF Manual  

Task views: High-Performance and Parallel Computing with R, Machine Learning & Statistical Learning

MPL (>= 2) | GPL (>= 2) | file LICENSE license

Imports Rcpp, data.table, digest

Suggests testthat, knitr, rmarkdown

Linking to Rcpp

System requirements: g++ (>= 4.8)

Imported by VSURF, mob.

See at CRAN