Meta-package for statistical and machine learning with a unified interface for model fitting, prediction, performance assessment, and presentation of results. Approaches for model fitting and prediction of numerical, categorical, or censored time-to-event outcomes include traditional regression models, regularization methods, tree-based methods, support vector machines, neural networks, ensembles, data preprocessing, filtering, and model tuning and selection. Performance metrics are provided for model assessment and can be estimated with independent test sets, split sampling, cross-validation, or bootstrap resampling. Resample estimation can be executed in parallel for faster processing and nested in cases of model tuning and selection. Modeling results can be summarized with descriptive statistics; calibration curves; variable importance; partial dependence plots; confusion matrices; and ROC, lift, and other performance curves.

`MachineShop`

is a meta-package for statistical and machine learning
with a unified interface for model fitting, prediction, performance
assessment, and presentation of results. Support is provided for
predictive modeling of numerical, categorical, and censored
time-to-event outcomes and for resample (bootstrap, cross-validation,
and split training-test sets) estimation of model performance. This
vignette introduces the package interface with a survival data analysis
example, followed by supported methods of variable specification;
applications to other response variable types; available performance
metrics, resampling techniques, and graphical and tabular summaries; and
modeling strategies.

- Unified and concise interface for model fitting, prediction, and performance assessment.
- Current support for 49 established models from 25
**R**packages. - Ensemble modeling with stacked regression and super learners.
- Modeling of response variables types: binary factors, multi-class nominal and ordinal factors, numeric vectors and matrices, and censored time-to-event survival.
- Model specification with traditional formulas and with flexible pre-processing recipes.
- Resample estimation of predictive performance, including cross-validation, bootstrap resampling, and split training-test set validation.
- Parallel execution of resampling algorithms.
- Choices of performance metrics: accuracy, areas under ROC and
precision recall curves, Brier score, coefficient of determination
(R
^{2}), concordance index, cross entropy, F score, Gini coefficient, unweighted and weighted Cohen’s kappa, mean absolute error, mean squared error, mean squared log error, positive and negative predictive values, precision and recall, and sensitivity and specificity. - Graphical and tabular performance summaries: calibration curves, confusion matrices, partial dependence plots, performance curves, lift curves, and variable importance.
- Model tuning over automatically generated grids of parameter values and randomly sampled grid points.
- Model selection and comparisons for any combination of models and model parameter values.
- User-definable models and performance metrics.

# Current release from CRANinstall.packages("MachineShop")# Development version from GitHub# install.packages("devtools")devtools::install_github("brian-j-smith/MachineShop", ref = "develop")# Development version with vignettesdevtools::install_github("brian-j-smith/MachineShop", ref = "develop", build_vignettes = TRUE)

Once installed, the following `R`

commands will load the package and
display its help system documentation. Online documentation and examples
are available at the MachineShop main
website.

library(MachineShop)# Package help summary?MachineShop# VignetteRShowDoc("Introduction", package = "MachineShop")

- Implement metrics:
`auc`

,`fnr`

,`fpr`

,`rpp`

,`tnr`

,`tpr`

. - Implement performance curves, including ROC and precision recall.
- Implement
`SurvMatrix`

classes for predicted survival events and probabilities to eliminate need for separate`times`

arguments in calibration, confusion, metrics, and performance functions. - Add calibration curves for predicted survival means.
- Add lift curves for predicted survival probabilities.
- Add recipe support for survival and matrix outcomes.
- Rename
`MLControl`

argument`surv_times`

to`times`

. - Fix identification of recipe
`case_weight`

and`case_strata`

variables. - Launch package website.
- Bring Introduction vignette up to date with package features.

- Implement model:
`BARTModel`

. - Implement model tuning over automatically generated grids of parameter values and random sampling of grid points.
- Add metrics for predicted survival times:
`accuracy`

,`f_score`

,`kappa2`

,`npv`

,`ppv`

,`pr_auc`

,`precision`

,`recall`

,`roc_index`

,`sensitivity`

,`specificity`

- Add metrics for predicted survival means:
`cindex`

,`gini`

,`mae`

,`mse`

,`msle`

,`r2`

,`rmse`

,`rmsle`

. - Add
`performance`

and metric methods for`ConfusionMatrix`

. - Add confusion matrices for predicted survival times.
- Standardize predict functions to return mean survival when times are not specified.
- Replace
`MLModel`

slot and constructor argument`nvars`

with`design`

.

- Implement models:
`BARTMachineModel`

,`LARSModel`

. - Implement performance metrics:
`gini`

, multi-class`pr_auc`

and`roc_auc`

, multivariate`rmse`

,`msle`

,`rmsle`

. - Implement smooth calibration curves.
- Implement
`MLMetric`

class for performance metrics. - Add
`as.data.frame`

method for`ModelFrame`

. - Add
`expand.model`

function. - Add
`label`

slot to`MLModel`

. - Expand
`metricinfo/modelinfo`

support for mixed argument types. - Rename
`calibration`

argument`n`

to`breaks`

. - Rename
`modelmetrics`

function to`performance`

. - Rename
`ModelMetrics/Diff`

classes to`Performance/Diff`

. - Change
`MLModelTune`

slot`resamples`

to`performance`

.

- Implement models:
`AdaBagModel`

,`AdaBoostModel`

,`BlackBoostModel`

,`EarthModel`

,`FDAModel`

,`GAMBoostModel`

,`GLMBoostModel`

,`MDAModel`

,`NaiveBayesModel`

,`PDAModel`

,`RangerModel`

,`RPartModel`

,`TreeModel`

- Implement user-specified performance metrics in
`modelmetrics`

function. - Implement metrics:
`accuracy`

,`brier`

,`cindex`

,`cross_entropy`

,`f_score`

,`kappa2`

,`mae`

,`mse`

,`npv`

,`ppv`

,`pr_auc`

,`precision`

,`r2`

,`recall`

,`roc_auc`

,`roc_index`

,`sensitivity`

,`specificity`

,`weighted_kappa2`

. - Add
`cutoff`

argument to`confusion`

function. - Add
`modelinfo`

and`metricinfo`

functions. - Add
`modelmetrics`

method for`Resamples`

. - Add
`ModelMetrics`

class with`print`

and`summary`

methods. - Add
`response`

method for`recipe`

. - Export
`Calibration`

constructor. - Export
`Confusion`

constructor. - Export
`Lift`

constructor. - Extend
`calibration`

arguments to observed and predicted responses. - Extend
`confusion`

arguments to observed and predicted responses. - Extend
`lift`

arguments to observed and predicted responses. - Extend
`metrics`

and`stats`

function arguments to accept function names. - Extend
`Resamples`

to arguments with multiple models. - Change
`CoxModel`

,`GLMModel`

, and`SurvRegModel`

constructor definitions so that model control parameters are specified directly instead of with a separate`control`

argument/structure. - Change
`predict(..., times = numeric())`

function calls to survival model fits to return predicted values in the same direction as survival times. - Change
`predict(..., times = numeric())`

function calls to`CForestModel`

fits to return predicted means instead of medians. - Change
`tune`

function argument`metrics`

to be defined in terms of a user-specified metric or metrics. - Deprecate MLControl arguments
`cutoff`

,`cutoff_index`

,`na.rm`

, and`summary`

.

- Implement linear models (
`LMModel`

), linear discriminant analysis (`LDAModel`

), and quadratic discriminant analysis (`QDAModel`

). - Implement confusion matrices.
- Support matrix response variables.
- Support user-specified stratification variables for resampling via the
`strata`

argument of`ModelFrame`

or the role of`"case_strata"`

for recipe variables. - Support user-specified case weights for model fitting via the role of
`"case_weight"`

for recipe variables. - Provide fallback for models with undefined variable importance.
- Update the importing of
`prepper`

due to its relocation from`rsample`

to`recipes`

.

- Implement partial dependence, calibration, and lift estimation and plotting.
- Implement k-nearest neighbors model (
`KNNModel`

), stacked regression models (`StackedModel`

), super learner models (`SuperModel`

), and extreme gradient boosting (`XGBModel`

). - Implement resampling constructors for training resubstitution (
`TrainControl`

) and split training and test sets (`SplitControl`

). - Implement
`ModelFrame`

class for general model formula and dataset specification. - Add multi-class Brier score to
`modelmetrics()`

. - Extend
`predict()`

to automatically preprocess recipes and to use training data as the`newdata`

default. - Extend
`tune()`

to lists of models. - Extent
`summary()`

argument`stats`

to functions. - Fix survival probability calculations in
`GBMModel`

and`GLMNetModel`

. - Change
`MLControl`

argument`na.rm`

default from`FALSE`

to`TRUE`

. - Removed
`na.rm`

argument from`modelmetrics()`

.

- Initial public release