| Title: | Individual Conditional Expectation Plot Toolbox |
|---|---|
| Description: | Implements Individual Conditional Expectation (ICE) plots, a tool for visualizing the model estimated by any supervised learning algorithm. ICE plots refine Friedman's partial dependence plot by graphing the functional relationship between the predicted response and a covariate of interest for individual observations. Specifically, ICE plots highlight the variation in the fitted values across the range of a covariate of interest, suggesting where and to what extent they may exist. |
| Authors: | Alex Goldstein [aut], Adam Kapelner [aut, cre] (ORCID: <https://orcid.org/0000-0001-5985-6792>), Justin Bleich [aut] |
| Maintainer: | Adam Kapelner <[email protected]> |
| License: | GPL-2 | GPL-3 |
| Version: | 1.2 |
| Built: | 2026-05-13 09:13:05 UTC |
| Source: | https://github.com/kapelner/icebox |
This function creates a lineup plot to assess the additivity of a predictor's effect. It uses a nonparametric bootstrap approach to generate null plots.
additivityLineup( backfit_obj, fitMethod, realICE, figs = 10, colorvecfcn, usecolorvecfcn_inreal = FALSE, null_predictfcn, ... )additivityLineup( backfit_obj, fitMethod, realICE, figs = 10, colorvecfcn, usecolorvecfcn_inreal = FALSE, null_predictfcn, ... )
backfit_obj |
An object of class |
fitMethod |
A function that accepts |
realICE |
The |
figs |
The total number of plots in the lineup (including the real one). Default is 10. |
colorvecfcn |
Optional function to generate a color vector for the curves. |
usecolorvecfcn_inreal |
If |
null_predictfcn |
Optional prediction function for the null models. |
... |
Additional arguments passed to |
An object of class additivityLineup (invisibly).
Fits a model of the form using backfitting.
backfitter( X, y, predictor, fitMethod, predictfcn, eps = 0.01, iter.max = 10, verbose = TRUE, ... )backfitter( X, y, predictor, fitMethod, predictfcn, eps = 0.01, iter.max = 10, verbose = TRUE, ... )
X |
The design matrix. |
y |
The response vector. |
predictor |
The name or index of the predictor of interest ( |
fitMethod |
A function that accepts |
predictfcn |
A function that accepts |
eps |
Convergence threshold. |
iter.max |
Maximum number of iterations. |
verbose |
If |
... |
Additional arguments passed to |
An object of class backfitter.
Clustering if ICE and d-ICE curves by kmeans. All curves are centered to have mean 0 and then kmeans is applied to the curves with the specified number of clusters.
clusterICE( ice_obj, nClusters, plot = TRUE, plot_margin = 0.05, colorvec, plot_pdp = FALSE, x_quantile = FALSE, avg_lwd = 3, centered = FALSE, plot_legend = FALSE, main = NULL, num_cores = 1, ... )clusterICE( ice_obj, nClusters, plot = TRUE, plot_margin = 0.05, colorvec, plot_pdp = FALSE, x_quantile = FALSE, avg_lwd = 3, centered = FALSE, plot_legend = FALSE, main = NULL, num_cores = 1, ... )
ice_obj |
Object of class |
nClusters |
Number of clusters to find. |
plot |
If |
plot_margin |
Extra margin to pass to |
colorvec |
Optional vector of colors to use for each cluster. |
plot_pdp |
If |
x_quantile |
If |
avg_lwd |
Average line width to use when plotting the cluster means. Line width is proportional to the cluster's size. |
centered |
If |
plot_legend |
If |
main |
Optional title for the plot. |
num_cores |
Integer number of cores to use for parallel operations. Default is 1. |
... |
Additional arguments for plotting. |
A list with the following elements:
cl |
The output of the |
plot |
The ggplot object used for plotting (if |
## Not run: require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bh_rf = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bh.ice = ice(object = bh_rf, X = X, y = y, predictor = "age", frac_to_build = .1) ## cluster the curves into 2 groups. clusterICE(bh.ice, nClusters = 2, plot_legend = TRUE) ## cluster the curves into 3 groups, start all at 0. clusterICE(bh.ice, nClusters = 3, plot_legend = TRUE, center = TRUE) ## End(Not run)## Not run: require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bh_rf = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bh.ice = ice(object = bh_rf, X = X, y = y, predictor = "age", frac_to_build = .1) ## cluster the curves into 2 groups. clusterICE(bh.ice, nClusters = 2, plot_legend = TRUE) ## cluster the curves into 3 groups, start all at 0. clusterICE(bh.ice, nClusters = 3, plot_legend = TRUE, center = TRUE) ## End(Not run)
Efficient Column Standard Deviations
colSds_cpp(x, n_cores = 1L)colSds_cpp(x, n_cores = 1L)
x |
Numeric Matrix |
n_cores |
Number of cores to use |
Computes the first derivative using centered differences, mirroring sfsmisc::D1tr.
derivative_cpp(x, gridpts, n_cores = 1L)derivative_cpp(x, gridpts, n_cores = 1L)
x |
Numeric Matrix (smoothed values) |
gridpts |
Grid points corresponding to columns of x |
n_cores |
Number of cores to use |
dice.Estimates the partial derivative function for each curve in an ice object.
See Goldstein et al (2013) for further details.
dice( ice_obj, DerivEstimator = NULL, use_supsmu = FALSE, verbose = TRUE, num_cores = 1, sg_poly_order = 2, sg_window_size = NULL )dice( ice_obj, DerivEstimator = NULL, use_supsmu = FALSE, verbose = TRUE, num_cores = 1, sg_poly_order = 2, sg_window_size = NULL )
ice_obj |
Object of class |
DerivEstimator |
Optional function with a single argument |
use_supsmu |
If |
verbose |
If |
num_cores |
Integer number of cores to use for parallel derivative estimation. Defaults to 1. |
sg_poly_order |
Polynomial order for Savitzky-Golay filter. Default is 2. |
sg_window_size |
Window size for Savitzky-Golay filter. Default is 30% of the grid. |
A list of class dice with the following elements. Most are passed directly through
from ice_object and exist to enable various plotting facilities.
d_ice_curves |
Matrix of dimension |
xj |
The actual values of |
actual_deriv |
Vector of length |
sd_deriv |
Vector of length |
logodds |
Passed from |
gridpts |
Passed from |
predictor |
Passed from |
xlab |
Passed from |
nominal_axis |
Passed from |
range_y |
Passed from |
Xice |
Passed from |
dpdp |
The estimated partial derivative of the PDP. |
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E., Peeking
Inside the Black Box: Visualizing Statistical Learning With Plots of
Individual Conditional Expectation. (2014) Journal of Computational
and Graphical Statistics, in press
## Not run: # same examples as for 'ice', but now create a derivative estimate as well. require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima ######## regression example data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bhd_rf_mod = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1) # make a dice object: bhd.dice = dice(bhd.ice) #### classification example data(Pima.te) #Pima Indians diabetes classification y = Pima.te$type X = Pima.te X$type = NULL ## build a RF: pima_rf = randomForest(x = X, y = y) ## Create an 'ice' object for the predictor "skin": # For classification we plot the centered log-odds. If we pass a predict # function that returns fitted probabilities, setting logodds = TRUE instructs # the function to set each ice curve to the centered log-odds of the fitted # probability. pima.ice = ice(object = pima_rf, X = X, predictor = "skin", logodds = TRUE, predictfcn = function(object, newdata){ predict(object, newdata, type = "prob")[, 2] } ) # make a dice object: pima.dice = dice(pima.ice) ## End(Not run)## Not run: # same examples as for 'ice', but now create a derivative estimate as well. require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima ######## regression example data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bhd_rf_mod = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1) # make a dice object: bhd.dice = dice(bhd.ice) #### classification example data(Pima.te) #Pima Indians diabetes classification y = Pima.te$type X = Pima.te X$type = NULL ## build a RF: pima_rf = randomForest(x = X, y = y) ## Create an 'ice' object for the predictor "skin": # For classification we plot the centered log-odds. If we pass a predict # function that returns fitted probabilities, setting logodds = TRUE instructs # the function to set each ice curve to the centered log-odds of the fitted # probability. pima.ice = ice(object = pima_rf, X = X, predictor = "skin", logodds = TRUE, predictfcn = function(object, newdata){ predict(object, newdata, type = "prob")[, 2] } ) # make a dice object: pima.dice = dice(pima.ice) ## End(Not run)
ice.Creates an ice object with individual conditional expectation curves
for the passed model object, X matrix, predictor, and response. See
Goldstein et al (2013) for further details.
ice( object, X, y, predictor, predictfcn, verbose = TRUE, frac_to_build = 1, indices_to_build = NULL, num_grid_pts, logodds = FALSE, probit = FALSE, num_cores = 1, ... )ice( object, X, y, predictor, predictfcn, verbose = TRUE, frac_to_build = 1, indices_to_build = NULL, num_grid_pts, logodds = FALSE, probit = FALSE, num_cores = 1, ... )
object |
The fitted model to estimate ICE curves for. |
X |
The design matrix we wish to estimate ICE curves for. Rows are observations, columns are
predictors. Typically this is taken to be |
y |
Optional vector of the response values |
predictor |
The column number or variable name in |
predictfcn |
Optional function that accepts two arguments, |
verbose |
If |
frac_to_build |
Number between 0 and 1, with 1 as default. For large |
indices_to_build |
Vector of indices, |
num_grid_pts |
Optional number of values in the range of |
logodds |
If |
probit |
If |
num_cores |
Integer number of cores to use for parallel prediction. Defaults to 1. |
... |
Other arguments to be passed to |
A list of class ice with the following elements:
gridpts |
Sorted values of |
ice_curves |
Matrix of dimension |
xj |
The actual values of |
actual_prediction |
Vector of length |
xlab |
String with the predictor name corresponding to |
nominal_axis |
If |
range_y |
If |
sd_y |
If |
Xice |
A matrix containing the subset of |
pdp |
A vector of size |
predictor |
Same as the argument, see argument description. |
logodds |
Same as the argument, see argument description. |
indices_to_build |
Same as the argument, see argument description. |
frac_to_build |
Same as the argument, see argument description. |
predictfcn |
Same as the argument, see argument description. |
Jerome Friedman. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5): 1189-1232, 2001.
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E., Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation. (2014) Journal of Computational and Graphical Statistics, in press
plot.ice, print.ice, summary.ice
## Not run: require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima ######## regression example data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bhd_rf_mod = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1) ## End(Not run)## Not run: require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima ######## regression example data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bhd_rf_mod = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1) ## End(Not run)
Efficiently converts a matrix to a long-format vector (row-major order) for plotting.
melt_ice_curves_cpp(x, n_cores = 1L)melt_ice_curves_cpp(x, n_cores = 1L)
x |
Numeric Matrix |
n_cores |
Number of cores to use |
dice object.Plotting of dice objects.
## S3 method for class 'dice' plot( x, plot_margin = 0.05, frac_to_plot = 1, plot_sd = TRUE, plot_orig_pts_deriv = TRUE, pts_preds_size = 1.5, colorvec, color_by = NULL, x_quantile = TRUE, plot_dpdp = TRUE, rug_quantile = seq(from = 0, to = 1, by = 0.1), verbose = TRUE, ... )## S3 method for class 'dice' plot( x, plot_margin = 0.05, frac_to_plot = 1, plot_sd = TRUE, plot_orig_pts_deriv = TRUE, pts_preds_size = 1.5, colorvec, color_by = NULL, x_quantile = TRUE, plot_dpdp = TRUE, rug_quantile = seq(from = 0, to = 1, by = 0.1), verbose = TRUE, ... )
x |
Object of class |
plot_margin |
Extra margin to pass to |
frac_to_plot |
If |
plot_sd |
If |
plot_orig_pts_deriv |
If |
pts_preds_size |
Size of points to make if |
colorvec |
Optional vector of colors to use for each curve. |
color_by |
Optional variable name (or column number) in |
x_quantile |
If |
plot_dpdp |
If |
rug_quantile |
If not null, tick marks are drawn on the x-axis corresponding to the vector of quantiles specified by this parameter.
Forced to |
verbose |
If |
... |
Additional plotting arguments. |
A list with the following elements.
plot_points_indices |
Row numbers of |
legend_text |
If the |
plot |
The ggplot object used for plotting. |
## Not run: require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bhd_rf_mod = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1) # estimate derivatives, then plot. bhd.dice = dice(bhd.ice) plot(bhd.dice) ## End(Not run)## Not run: require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bhd_rf_mod = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1) # estimate derivatives, then plot. bhd.dice = dice(bhd.ice) plot(bhd.dice) ## End(Not run)
ice objects.Plotting of ice objects.
## S3 method for class 'ice' plot( x, plot_margin = 0.05, frac_to_plot = 1, plot_points_indices = NULL, plot_orig_pts_preds = TRUE, pts_preds_size = 1.5, colorvec, color_by = NULL, x_quantile = TRUE, plot_pdp = TRUE, centered = FALSE, prop_range_y = TRUE, rug_quantile = seq(from = 0, to = 1, by = 0.1), centered_percentile = 0, point_labels = NULL, point_labels_size = NULL, prop_type = "sd", verbose = TRUE, num_cores = 1, ... )## S3 method for class 'ice' plot( x, plot_margin = 0.05, frac_to_plot = 1, plot_points_indices = NULL, plot_orig_pts_preds = TRUE, pts_preds_size = 1.5, colorvec, color_by = NULL, x_quantile = TRUE, plot_pdp = TRUE, centered = FALSE, prop_range_y = TRUE, rug_quantile = seq(from = 0, to = 1, by = 0.1), centered_percentile = 0, point_labels = NULL, point_labels_size = NULL, prop_type = "sd", verbose = TRUE, num_cores = 1, ... )
x |
Object of class |
plot_margin |
Extra margin to pass to |
frac_to_plot |
If |
plot_points_indices |
If not |
plot_orig_pts_preds |
If |
pts_preds_size |
Size of points to make if |
colorvec |
Optional vector of colors to use for each curve. |
color_by |
Optional variable name in |
x_quantile |
If |
plot_pdp |
If |
centered |
If |
prop_range_y |
When |
rug_quantile |
If not |
centered_percentile |
The percentile of |
point_labels |
If not |
point_labels_size |
If not |
prop_type |
Scaling factor for the right vertical axis in centered plots if |
verbose |
If |
num_cores |
Used for parallel plotting speedup. Default is 1. |
... |
Other arguments to be passed to the |
A list with the following elements.
plot_points_indices |
Row numbers of |
legend_text |
If the |
## Not run: require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bhd_rf_mod = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1) ## plot plot(bhd.ice, x_quantile = TRUE, plot_pdp = TRUE, frac_to_plot = 1) ## centered plot plot(bhd.ice, x_quantile = TRUE, plot_pdp = TRUE, frac_to_plot = 1, centered = TRUE) ## color the curves by high and low values of 'rm'. # First create an indicator variable which is 1 if the number of # rooms is greater than the median: median_rm = median(X$rm) bhd.ice$Xice$I_rm = ifelse(bhd.ice$Xice$rm > median_rm, 1, 0) plot(bhd.ice, frac_to_plot = 1, centered = TRUE, prop_range_y = TRUE, x_quantile = T, plot_orig_pts_preds = T, color_by = "I_rm") bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = 1) plot(bhd.ice, frac_to_plot = 1, centered = TRUE, prop_range_y = TRUE, x_quantile = T, plot_orig_pts_preds = T, color_by = y) ## End(Not run)## Not run: require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bhd_rf_mod = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1) ## plot plot(bhd.ice, x_quantile = TRUE, plot_pdp = TRUE, frac_to_plot = 1) ## centered plot plot(bhd.ice, x_quantile = TRUE, plot_pdp = TRUE, frac_to_plot = 1, centered = TRUE) ## color the curves by high and low values of 'rm'. # First create an indicator variable which is 1 if the number of # rooms is greater than the median: median_rm = median(X$rm) bhd.ice$Xice$I_rm = ifelse(bhd.ice$Xice$rm > median_rm, 1, 0) plot(bhd.ice, frac_to_plot = 1, centered = TRUE, prop_range_y = TRUE, x_quantile = T, plot_orig_pts_preds = T, color_by = "I_rm") bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = 1) plot(bhd.ice, frac_to_plot = 1, centered = TRUE, prop_range_y = TRUE, x_quantile = T, plot_orig_pts_preds = T, color_by = y) ## End(Not run)
dice objects.Prints a summary of a dice object.
## S3 method for class 'dice' print(x, ...)## S3 method for class 'dice' print(x, ...)
x |
Object of class |
... |
Ignored for now. |
ice objects.Prints a summary of an ice object.
## S3 method for class 'ice' print(x, ...)## S3 method for class 'ice' print(x, ...)
x |
Object of class |
... |
Ignored for now. |
Centers each row of a matrix by subtracting the row mean.
rowCenter_cpp(x, n_cores = 1L)rowCenter_cpp(x, n_cores = 1L)
x |
Numeric Matrix |
n_cores |
Number of cores to use |
Smooths each row of a matrix using a Savitzky-Golay filter.
sg_smooth_cpp(x, window_size, order, deriv, n_cores = 1L)sg_smooth_cpp(x, window_size, order, deriv, n_cores = 1L)
x |
Matrix to smooth row-wise |
window_size |
Size of the filter window (must be odd) |
order |
Polynomial order |
deriv |
Derivative order (0=smooth, 1=first deriv, etc.) |
n_cores |
Number of cores to use |
dice objects.Alias of print method.
## S3 method for class 'dice' summary(object, ...)## S3 method for class 'dice' summary(object, ...)
object |
Object of class |
... |
Ignored for now. |
ice objects.Alias of print method.
## S3 method for class 'ice' summary(object, ...)## S3 method for class 'ice' summary(object, ...)
object |
Object of class |
... |
Ignored for now. |
Efficiently applies logodds or probit transformation to a matrix.
transform_ice_curves_cpp(x, method, n_cores = 1L)transform_ice_curves_cpp(x, method, n_cores = 1L)
x |
Numeric Matrix (probabilities) |
method |
1 for centered logodds, 2 for probit |
n_cores |
Number of cores to use |
The WhiteWine data frame has 4898 rows and 12 columns and concerns white wines from a region in Portugal. The response variable, quality, is a wine quality metric, taken to be the median preference score of three blind tasters on a scale of 1-10. The 11 covariates are physicochemical metrics of wine quality such as citric acid content, sulphates, etc.
data(WhiteWine)data(WhiteWine)
A data frame of 4898 cases on 12 variables.
K Bache and M Lichman. UCI machine learning repository, 2013. http://archive.ics.uci.edu/ml