Partial Dependence Plots

A general framework for constructing partial dependence (i.e., marginal effect) plots from various types machine learning models in R.

CRAN_Status_Badge Build Status AppVeyor Build Status Coverage Status Downloads Total Downloads


Complex nonparametric models---like neural networks, random forests, and support vector machines---are more common than ever in predictive analytics, especially when dealing with large observational databases that don't adhere to the strict assumptions imposed by traditional statistical techniques (e.g., multiple linear regression which assumes linearity, homoscedasticity, and normality). Unfortunately, it can be challenging to understand the results of such models and explain them to management. Partial dependence plots offer a simple solution. Partial dependence plots are low-dimensional graphical renderings of the prediction function $\widehat{f}\left(\boldsymbol{x}\right)$ so that the relationship between the outcome and predictors of interest can be more easily understood. These plots are especially useful in explaining the output from black box models. The pdp package offers a general framework for constructing partial dependence plots for various types of fitted models in R.

A detailed introduction to pdp has been published in The R Journal: "pdp: An R Package for Constructing Partial Dependence Plots", You can track development at To report bugs or issues, contact the main author directly or submit them to

As of right now, pdp exports four functions:

  • partial - compute partial dependence functions (i.e., objects of class "partial") from various fitted model objects;
  • plotPartial" - plot partial dependence functions (i.e., objects of class "partial") using lattice graphics;
  • autoplot - plot partial dependence functions (i.e., objects of class "partial") using ggplot2 graphics;
  • topPredictors extract most "important" predictors from various types of fitted models.


The pdp package is currently listed on CRAN and can easily be installed:

  # Alternatively, install the development version from GitHub

Random forest example

As a first example, we'll fit a random forest to the famous Boston housing data included with the package (see ?boston for details). In fact the original motivation for this package was to be able to compute two-predictor partial dependence plots from random forest models in R.

# Fit a random forest to the Boston housing data
library(randomForest)  # install.packages("randomForest")
data (boston)  # load the boston housing data
set.seed(101)  # for reproducibility
boston.rf <- randomForest(cmedv ~ ., data = boston)
# Partial dependence of cmedv on lstat and rm
pd <- partial(boston.rf, pred.var = c("lstat", "rm"), chull = TRUE)
head(pd)  # print first 6 rows
#>     lstat      rm     yhat
#> 1  7.5284 3.66538 24.13683
#> 2  8.2532 3.66538 23.24916
#> 3  8.9780 3.66538 23.13119
#> 4  9.7028 3.66538 22.13531
#> 5 10.4276 3.66538 20.62331
#> 6 11.1524 3.66538 20.51258
# Lattice version
p1 <- plotPartial(pd, main = "lattice version")
# ggplot2 version
p2 <- autoplot(pd, contour = TRUE, main = "ggplot2 version", 
               legend.title = "Partial\ndependence")
# Show both plots in one figure
grid.arrange(p1, p2, ncol = 2)

Next, we'll fit a classification model to the Pima Indians Diabetes data.

Support vector machine (SVM) example

As a second example, we'll fit an SVM to the Pima Indians diabetes data included with the package (see ?pima for details). Note that for some fitted model objects (e.g., "ksvm" objects) it is necessary to supply the original training data via the train argument in the call to partial.

# Fit an SVM to the Pima Indians diabetes data
library(kernlab)  # install.packages("kernlab")
data (pima)  # load the Pima Indians diabetes data
pima.svm <- ksvm(diabetes ~ ., data = pima, type = "C-svc", kernel = "rbfdot",
                 C = 0.5, prob.model = TRUE)
# Partial dependence of diabetes test result on glucose (default is logit scale)
pd.glucose <- partial(pima.svm, pred.var = "glucose", train = pima)
# Partial dependence of diabetes test result on glucose (probability scale)
pd.glucose.prob <- partial(pima.svm, pred.var = "glucose", prob = TRUE, 
                           train = pima)
# Show both plots in one figure
grid.arrange(autoplot(pd.glucose, main = "Logit scale"), 
             autoplot(pd.glucose.prob, main = "Probability scale"), 
             ncol = 2)


NEWS for pdp package

Changes for version 0.6.0

  • Properly registered native routines and disabled symbol search.
  • Fixed a bug for gbm models using the multinomial distribution.
  • Refactored code to improve structure.
  • partial gained three new options: (experimental), ice, and center. The latter two have to do with constructing individual conditional expectation (ICE) curves and cetered ICE (c-ICE) curves. The option is for transforming predictions from models that can use non-Gaussian distibutions (e.g., glm, gbm, and xgboost). Note that these options were added for convenience and the same results (plus much more) can still be obtained using the flexible argument. (#36).
  • plotPartial gained five new options: center, plot.pdp, pdp.col, pdp.lwd, and pdp.lty; see ?plotPartial for details.
  • Fixed default y-axis label for autoplot with two numeric predictors (#48).
  • Added CITATION file.
  • Better support for neuaral networks from the nnet package.
  • Fixed a bug for nnet::multinom models with binary response.

Changes for version 0.5.2

  • Fixed minor pandoc conversion issue with
  • Added subdirectory called tools to hold figures for

Changes for version 0.5.1

  • Registered native routines and disabled symbol search.

Changes for version 0.5.0

  • Added support for MASS::lda, MASS::qda, and mda::mars.
  • New arguments quantiles, probs, and trim.outliers in partial. These arguments make it easier to construct PDPs over the relevant range of a numeric predictor without having to specify pred.grid, especially when outliers are present in the predictors (which can distort the plotted relationship).
  • The train argument can now accept matrices; in particular, object of class "matrix" or "dgCMatrix". This is useful, for example, when working with XGBoost models (i.e., objects of class "xgb.Booster").
  • New logical argument prob indicating whether or not partial dependence values for classification problems should be returned on the original probability scale, rather than the centered logit; details for the centered logit can be found on page 370 in the second edition of The Elements of Statistical Learning.
  • Fixed some typos in
  • New function autoplot for automatically creating ggplot2 graphics from "partial" objects.

Changes for version 0.4.0

  • partial is now much faster with "gbm" object due to a call to gbm::plot.gbm whenever pred.grid is not explicitly given by the user. (gbm::plot.gbm exploits a computational shortcut that does not involve any passes over the training data.)
  • New (experimental) function topPredictors for extracting the names of the most "important" predictors. This should make it one step easier (in most cases) to construct PDPs for the most "important"" features in a fitted model.
  • A new argument,, allows the user to supply their own prediction function. Hence, it is possible to obtain PDPs based on the median, rather than the mean. It is also possible to obtain PDPs for classification problems on the probability scale. See ?partial for examples.
  • Minor bug fixes and documentation tweaks.

Changes for version 0.3.0

  • The ... argument in the call to partial now refers to additional arguments to be passed onto stats::predict rather than plyr::aaply. For example, using partial with "gbm" objects will require specification of n.trees which can now simply be passed to partial via the ... argument.
  • Added the following arguments to partial: progress (plyr-based progress bars), parallel (plyr/foreach-based parallel execution), and paropts (list of additional arguments passed onto foreach when parallel = TRUE).
  • Various bug fixes.
  • partial now throws an informative error message when the pred.grid argument refers to predictors not in the original training data.
  • The column name for the predicted value has been changed from "y" to "yhat".

Changes for version 0.2.0

  • randomForest is no longer imported.
  • Added support for the caret package (i.e., objects of class "train").
  • Added example data sets: boston (corrected Boston housing data) and pima (corrected Pima Indians diabetes data).
  • Fixed error that sometimes occurred when chull = TRUE causing the convex hull to not be computed.
  • Refactored plotPartial to be more modular.
  • Added gbm support for most non-"binomial" families`.

Changes for version 0.1.0

  • randomForest is now imported.
  • Added examples.

Changes for version 0.0.6

  • Fixed a non canonical CRAN URL in the README file.

Changes for version 0.0.5

  • partial now makes sure each column of pred.grid has the correct class, levels, etc.
  • partial gained a new option, levelplot, which defaults to TRUE. The original option, contour, has changed and now specifies whether or not to add contour lines whenever levelplot = TRUE.

Changes for version 0.0.4

  • Fixed a number of URLs.
  • More thorough documentation.

Changes for version 0.0.2

  • Fixed a couple of URLs and typos.
  • Added more thorough documentation.
  • Added support for C5.0, Cubist, nonlinear least squares, and XGBoost models.

Changes for version 0.0.1

  • Initial release.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.