A Fast Implementation of Random Forests

A fast implementation of Random Forests, particularly suited for high dimensional data. Ensembles of classification, regression, survival and probability prediction trees are supported. Data from genome-wide association studies can be analyzed efficiently. In addition to data frames, datasets of class 'gwaa.data' (R package 'GenABEL') and 'dgCMatrix' (R package 'Matrix') can be directly analyzed.


News

Version 0.9.0
  • Add bias-corrected impurity importance (actual impurity reduction, AIR)
  • Add quantile prediction as in quantile regression forests
  • Add treeInfo() function to extract human readable tree structure
  • Add standard error estimation with the infinitesimal jackknife (now the default)
  • Add impurity importance for survival forests
  • Faster aggregation of predictions
  • Fix memory issues on Windows 7
  • Bug fixes
Version 0.8.0
  • Handle sparse data of class Matrix::dgCMatrix
  • Add prediction of standard errors to predict()
  • Allow devtools::install_github() without subdir and on Windows
  • Bug fixes
Version 0.7.0
  • Add randomized splitting (extraTrees)
  • Better formula interface: Support interactions terms and faster computation
  • Split at mid-point between candidate values
  • Improvements in holdoutRF and importance p-value estimation
  • Drop unused factor levels in outcome before growing
  • Add predict.all for probability and survival prediction
  • Bug fixes
Version 0.6.0
  • Set write.forest=TRUE by default
  • Add num.trees option to predict()
  • Faster version of getTerminalNodeIDs(), included in predict()
  • Handle new factor levels in 'order' mode
  • Use unadjusted p-value for 2 categories in maxstat splitting
  • Bug fixes
Version 0.5.0
  • Add Windows multithreading support for new toolchain
  • Add splitting by maximally selected rank statistics for survival and regression forests
  • Faster method for unordered factor splitting
  • Add p-values for variable importance
  • Runtime improvement for regression forests on classification data
  • Bug fixes
Version 0.4.0
  • Reduce memory usage of savest forest objects (changed child.nodeIDs interface)
  • Add keep.inbag option to track in-bag counts
  • Add option sample.fraction for fraction of sampled observations
  • Add tree-wise split.select.weights
  • Add predict.all option in predict() to get individual predictions for each tree for classification and regression
  • Add case-specific random forests
  • Add case weights (weighted bootstrapping or subsampling)
  • Remove tuning functions, please use mlr or caret
  • Catch error of outdated gcc not supporting C++11 completely
  • Bug fixes
Version 0.3.0
  • Allow the user to interrupt computation from R
  • Transpose classification.table and rename to confusion.matrix
  • Respect R seed for prediction
  • Memory improvements for variable importance computation
  • Fix bug: Probability prediction for single observations
  • Fix bug: Results not identical when using alternative interface
Version 0.2.7
  • Small fixes for Solaris compiler
Version 0.2.6
  • Add C-index splitting
  • Fix NA SNP handling
Version 0.2.5
  • Fix matrix and gwaa alternative survival interface
  • Version submitted to JSS
Version 0.2.4
  • Small changes in documentation
Version 0.2.3
  • Preallocate memory for splitting
Version 0.2.2
  • Remove recursive splitting
Version 0.2.1
  • Allow matrix as input data in R version
Version 0.2.0
  • Fix prediction of classification forests in R
Version 0.1.9
  • Speedup growing for continuous covariates
  • Add memory save option to save memory for very large datasets (but slower)
  • Remove memory mode option from R version since no performance gain
Version 0.1.8
  • Fix problems when using Rcpp <0.11.4
Version 0.1.7
  • Add option to split on unordered categorical covariates
Version 0.1.6
  • Optimize memory management for very large survival forests
Version 0.1.5
  • Set required Rcpp version to 0.11.2
  • Fix large $call objects when using BatchJobs
  • Add details and example on GenABEL usage to documentation
  • Minor changes to documentation
Version 0.1.4
  • Speedup for survival forests with continuous covariates
  • R version: Generate seed from R. It is no longer necessary to set the seed argument in ranger calls.
Version 0.1.3
  • Windows support for R version (without multithreading)
Version 0.1.2
  • Speedup growing of regression and probability prediction forests
  • Prediction forests are now handled like regression forests: MSE used for prediction error and permutation importance
  • Fixed name conflict with randomForest package for "importance"
  • Fixed a bug: prediction function is now working for probability prediction forests
  • Slot "predictions" for probability forests now contains class probabilities
  • importance function is now working even if randomForest package is loaded after ranger
  • Fixed a bug: Split selection weights are now working as expected
  • Small changes in documentation

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.