# Partial Dependence Plots

A general framework for constructing partial dependence (i.e., marginal effect) plots from various types machine learning models in R.

## Overview

Complex nonparametric models---like neural networks, random forests, and support vector machines---are more common than ever in predictive analytics, especially when dealing with large observational databases that don't adhere to the strict assumptions imposed by traditional statistical techniques (e.g., multiple linear regression which assumes linearity, homoscedasticity, and normality). Unfortunately, it can be challenging to understand the results of such models and explain them to management. Partial dependence plots offer a simple solution. Partial dependence plots are low-dimensional graphical renderings of the prediction function $\widehat{f}\left(\boldsymbol{x}\right)$ so that the relationship between the outcome and predictors of interest can be more easily understood. These plots are especially useful in explaining the output from black box models. The pdp package offers a general framework for constructing partial dependence plots for various types of fitted models in R.

A detailed introduction to pdp has been published in The R Journal: "pdp: An R Package for Constructing Partial Dependence Plots", https://journal.r-project.org/archive/2017/RJ-2017-016/index.html. You can track development at https://github.com/bgreenwell/pdp. To report bugs or issues, contact the main author directly or submit them to https://github.com/bgreenwell/pdp/issues.

As of right now, pdp exports four functions:

• partial - compute partial dependence functions (i.e., objects of class "partial") from various fitted model objects;
• plotPartial" - plot partial dependence functions (i.e., objects of class "partial") using lattice graphics;
• autoplot - plot partial dependence functions (i.e., objects of class "partial") using ggplot2 graphics;
• topPredictors extract most "important" predictors from various types of fitted models.

## Installation

The pdp package is currently listed on CRAN and can easily be installed:

## Random forest example

As a first example, we'll fit a random forest to the famous Boston housing data included with the package (see ?boston for details). In fact the original motivation for this package was to be able to compute two-predictor partial dependence plots from random forest models in R.

Next, we'll fit a classification model to the Pima Indians Diabetes data.

## Support vector machine (SVM) example

As a second example, we'll fit an SVM to the Pima Indians diabetes data included with the package (see ?pima for details). Note that for some fitted model objects (e.g., "ksvm" objects) it is necessary to supply the original training data via the train argument in the call to partial.

# NEWS for pdp package

### Changes for version 0.6.0

• Properly registered native routines and disabled symbol search.
• Fixed a bug for gbm models using the multinomial distribution.
• Refactored code to improve structure.
• partial gained three new options: inv.link (experimental), ice, and center. The latter two have to do with constructing individual conditional expectation (ICE) curves and cetered ICE (c-ICE) curves. The inv.link option is for transforming predictions from models that can use non-Gaussian distibutions (e.g., glm, gbm, and xgboost). Note that these options were added for convenience and the same results (plus much more) can still be obtained using the flexible pred.fun argument. (#36).
• plotPartial gained five new options: center, plot.pdp, pdp.col, pdp.lwd, and pdp.lty; see ?plotPartial for details.
• Fixed default y-axis label for autoplot with two numeric predictors (#48).
• Added CITATION file.
• Better support for neuaral networks from the nnet package.
• Fixed a bug for nnet::multinom models with binary response.

### Changes for version 0.5.2

• Fixed minor pandoc conversion issue with README.md.
• Added subdirectory called tools to hold figures for README.md.

### Changes for version 0.5.1

• Registered native routines and disabled symbol search.

### Changes for version 0.5.0

• Added support for MASS::lda, MASS::qda, and mda::mars.
• New arguments quantiles, probs, and trim.outliers in partial. These arguments make it easier to construct PDPs over the relevant range of a numeric predictor without having to specify pred.grid, especially when outliers are present in the predictors (which can distort the plotted relationship).
• The train argument can now accept matrices; in particular, object of class "matrix" or "dgCMatrix". This is useful, for example, when working with XGBoost models (i.e., objects of class "xgb.Booster").
• New logical argument prob indicating whether or not partial dependence values for classification problems should be returned on the original probability scale, rather than the centered logit; details for the centered logit can be found on page 370 in the second edition of The Elements of Statistical Learning.
• Fixed some typos in NEWS.md.
• New function autoplot for automatically creating ggplot2 graphics from "partial" objects.

### Changes for version 0.4.0

• partial is now much faster with "gbm" object due to a call to gbm::plot.gbm whenever pred.grid is not explicitly given by the user. (gbm::plot.gbm exploits a computational shortcut that does not involve any passes over the training data.)
• New (experimental) function topPredictors for extracting the names of the most "important" predictors. This should make it one step easier (in most cases) to construct PDPs for the most "important"" features in a fitted model.
• A new argument, pred.fun, allows the user to supply their own prediction function. Hence, it is possible to obtain PDPs based on the median, rather than the mean. It is also possible to obtain PDPs for classification problems on the probability scale. See ?partial for examples.
• Minor bug fixes and documentation tweaks.

### Changes for version 0.3.0

• The ... argument in the call to partial now refers to additional arguments to be passed onto stats::predict rather than plyr::aaply. For example, using partial with "gbm" objects will require specification of n.trees which can now simply be passed to partial via the ... argument.
• Added the following arguments to partial: progress (plyr-based progress bars), parallel (plyr/foreach-based parallel execution), and paropts (list of additional arguments passed onto foreach when parallel = TRUE).
• Various bug fixes.
• partial now throws an informative error message when the pred.grid argument refers to predictors not in the original training data.
• The column name for the predicted value has been changed from "y" to "yhat".

### Changes for version 0.2.0

• randomForest is no longer imported.
• Added support for the caret package (i.e., objects of class "train").
• Added example data sets: boston (corrected Boston housing data) and pima (corrected Pima Indians diabetes data).
• Fixed error that sometimes occurred when chull = TRUE causing the convex hull to not be computed.
• Refactored plotPartial to be more modular.
• Added gbm support for most non-"binomial" families.

### Changes for version 0.1.0

• randomForest is now imported.

### Changes for version 0.0.6

• Fixed a non canonical CRAN URL in the README file.

### Changes for version 0.0.5

• partial now makes sure each column of pred.grid has the correct class, levels, etc.
• partial gained a new option, levelplot, which defaults to TRUE. The original option, contour, has changed and now specifies whether or not to add contour lines whenever levelplot = TRUE`.

### Changes for version 0.0.4

• Fixed a number of URLs.
• More thorough documentation.

### Changes for version 0.0.2

• Fixed a couple of URLs and typos.
• Added more thorough documentation.
• Added support for C5.0, Cubist, nonlinear least squares, and XGBoost models.

### Changes for version 0.0.1

• Initial release.

# Reference manual

install.packages("pdp")

0.6.0 by Brandon Greenwell, 8 months ago

https://github.com/bgreenwell/pdp

Report a bug at https://github.com/bgreenwell/pdp/issues

Browse source code at https://github.com/cran/pdp

Authors: Brandon Greenwell [aut, cre]

Documentation:   PDF Manual

Task views: Machine Learning & Statistical Learning

GPL (>= 2) license

Imported by DALEX.

See at CRAN