Title: | Individual Conditional Expectation Plot Toolbox |
---|---|
Description: | Implements Individual Conditional Expectation (ICE) plots, a tool for visualizing the model estimated by any supervised learning algorithm. ICE plots refine Friedman's partial dependence plot by graphing the functional relationship between the predicted response and a covariate of interest for individual observations. Specifically, ICE plots highlight the variation in the fitted values across the range of a covariate of interest, suggesting where and to what extent they may exist. |
Authors: | Alex Goldstein, Adam Kapelner, Justin Bleich |
Maintainer: | Adam Kapelner <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 1.1.5 |
Built: | 2024-11-25 05:44:21 UTC |
Source: | https://github.com/cran/ICEbox |
Clustering if ICE and d-ICE curves by kmeans. All curves are centered to have mean 0 and then kmeans is applied to the curves with the specified number of clusters.
clusterICE(ice_obj, nClusters, plot = TRUE, plot_margin = 0.05, colorvec, plot_pdp = FALSE, x_quantile = FALSE, avg_lwd = 3, centered = FALSE, plot_legend = FALSE, ...)
clusterICE(ice_obj, nClusters, plot = TRUE, plot_margin = 0.05, colorvec, plot_pdp = FALSE, x_quantile = FALSE, avg_lwd = 3, centered = FALSE, plot_legend = FALSE, ...)
ice_obj |
Object of class |
nClusters |
Number of clusters to find. |
plot |
If |
plot_margin |
Extra margin to pass to |
colorvec |
Optional vector of colors to use for each cluster. |
plot_pdp |
If |
x_quantile |
If |
avg_lwd |
Average line width to use when plotting the cluster means. Line width is proportional to the cluster's size. |
centered |
If |
plot_legend |
If |
... |
Additional arguments for plotting. |
The ouput of the kmeans
call (a list of class kmeans
).
ice, dice
## Not run: require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bh_rf = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bh.ice = ice(object = bh_rf, X = X, y = y, predictor = "age", frac_to_build = .1) ## cluster the curves into 2 groups. clusterICE(bh.ice, nClusters = 2, plot_legend = TRUE) ## cluster the curves into 3 groups, start all at 0. clusterICE(bh.ice, nClusters = 3, plot_legend = TRUE, center = TRUE) ## End(Not run)
## Not run: require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bh_rf = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bh.ice = ice(object = bh_rf, X = X, y = y, predictor = "age", frac_to_build = .1) ## cluster the curves into 2 groups. clusterICE(bh.ice, nClusters = 2, plot_legend = TRUE) ## cluster the curves into 3 groups, start all at 0. clusterICE(bh.ice, nClusters = 3, plot_legend = TRUE, center = TRUE) ## End(Not run)
dice
.
Estimates the partial derivative function for each curve in an ice
object.
See Goldstein et al (2013) for further details.
dice(ice_obj, DerivEstimator)
dice(ice_obj, DerivEstimator)
ice_obj |
Object of class |
DerivEstimator |
Optional function with a single argument |
A list of class dice
with the following elements. Most are passed directly through
from ice_object
and exist to enable various plotting facilities.
d_ice_curves |
Matrix of dimension |
xj |
The actual values of |
actual_deriv |
Vector of length |
sd_deriv |
Vector of length |
logodds |
Passed from |
gridpts |
Passed from |
predictor |
Passed from |
xlab |
Passed from |
nominal_axis |
Passed from |
range_y |
Passed from |
Xice |
Passed from |
dpdp |
The estimated partial derivative of the PDP. |
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E., Peeking
Inside the Black Box: Visualizing Statistical Learning With Plots of
Individual Conditional Expectation. (2014) Journal of Computational
and Graphical Statistics, in press
Martin Maechler et al. sfsmisc: Utilities from Seminar fuer Statistik ETH Zurich. R package version 1.0-24.
plot.dice, print.dice, summary.dice
## Not run: # same examples as for 'ice', but now create a derivative estimate as well. require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima ######## regression example data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bhd_rf_mod = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1) # make a dice object: bhd.dice = dice(bhd.ice) #### classification example data(Pima.te) #Pima Indians diabetes classification y = Pima.te$type X = Pima.te X$type = NULL ## build a RF: pima_rf = randomForest(x = X, y = y) ## Create an 'ice' object for the predictor "skin": # For classification we plot the centered log-odds. If we pass a predict # function that returns fitted probabilities, setting logodds = TRUE instructs # the function to set each ice curve to the centered log-odds of the fitted # probability. pima.ice = ice(object = pima_rf, X = X, predictor = "skin", logodds = TRUE, predictfcn = function(object, newdata){ predict(object, newdata, type = "prob")[, 2] } ) # make a dice object: pima.dice = dice(pima.ice) ## End(Not run)
## Not run: # same examples as for 'ice', but now create a derivative estimate as well. require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima ######## regression example data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bhd_rf_mod = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1) # make a dice object: bhd.dice = dice(bhd.ice) #### classification example data(Pima.te) #Pima Indians diabetes classification y = Pima.te$type X = Pima.te X$type = NULL ## build a RF: pima_rf = randomForest(x = X, y = y) ## Create an 'ice' object for the predictor "skin": # For classification we plot the centered log-odds. If we pass a predict # function that returns fitted probabilities, setting logodds = TRUE instructs # the function to set each ice curve to the centered log-odds of the fitted # probability. pima.ice = ice(object = pima_rf, X = X, predictor = "skin", logodds = TRUE, predictfcn = function(object, newdata){ predict(object, newdata, type = "prob")[, 2] } ) # make a dice object: pima.dice = dice(pima.ice) ## End(Not run)
ice
.
Creates an ice
object with individual conditional expectation curves
for the passed model object, X
matrix, predictor, and response. See
Goldstein et al (2013) for further details.
ice(object, X, y, predictor, predictfcn, verbose = TRUE, frac_to_build = 1, indices_to_build = NULL, num_grid_pts, logodds = FALSE, probit = FALSE, ...)
ice(object, X, y, predictor, predictfcn, verbose = TRUE, frac_to_build = 1, indices_to_build = NULL, num_grid_pts, logodds = FALSE, probit = FALSE, ...)
object |
The fitted model to estimate ICE curves for. |
X |
The design matrix we wish to estimate ICE curves for. Rows are observations, columns are
predictors. Typically this is taken to be |
y |
Optional vector of the response values |
predictor |
The column number or variable name in |
predictfcn |
Optional function that accepts two arguments, |
verbose |
If |
frac_to_build |
Number between 0 and 1, with 1 as default. For large |
indices_to_build |
Vector of indices, |
num_grid_pts |
Optional number of values in the range of |
logodds |
If |
probit |
If |
... |
Other arguments to be passed to |
A list of class ice
with the following elements.
gridpts |
Sorted values of |
ice_curves |
Matrix of dimension |
xj |
The actual values of |
actual_predictions |
Vector of length |
xlab |
String with the predictor name corresponding to |
nominal_axis |
If |
range_y |
If |
sd_y |
If |
Xice |
A matrix containing the subset of |
pdp |
A vector of size |
predictor |
Same as the argument, see argument description. |
logodds |
Same as the argument, see argument description. |
indices_to_build |
Same as the argument, see argument description. |
frac_to_build |
Same as the argument, see argument description. |
predictfcn |
Same as the argument, see argument description. |
Jerome Friedman. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5): 1189-1232, 2001.
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E., Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation. (2014) Journal of Computational and Graphical Statistics, in press
plot.ice, print.ice, summary.ice
## Not run: require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima ######## regression example data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bhd_rf_mod = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1) #### classification example data(Pima.te) #Pima Indians diabetes classification y = Pima.te$type X = Pima.te X$type = NULL ## build a RF: pima_rf_mod = randomForest(x = X, y = y) ## Create an 'ice' object for the predictor "skin": # For classification we plot the centered log-odds. If we pass a predict # function that returns fitted probabilities, setting logodds = TRUE instructs # the function to set each ice curve to the centered log-odds of the fitted # probability. pima.ice = ice(object = pima_rf_mod, X = X, predictor = "skin", logodds = TRUE, predictfcn = function(object, newdata){ predict(object, newdata, type = "prob")[, 2] } ) ## End(Not run)
## Not run: require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima ######## regression example data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bhd_rf_mod = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1) #### classification example data(Pima.te) #Pima Indians diabetes classification y = Pima.te$type X = Pima.te X$type = NULL ## build a RF: pima_rf_mod = randomForest(x = X, y = y) ## Create an 'ice' object for the predictor "skin": # For classification we plot the centered log-odds. If we pass a predict # function that returns fitted probabilities, setting logodds = TRUE instructs # the function to set each ice curve to the centered log-odds of the fitted # probability. pima.ice = ice(object = pima_rf_mod, X = X, predictor = "skin", logodds = TRUE, predictfcn = function(object, newdata){ predict(object, newdata, type = "prob")[, 2] } ) ## End(Not run)
dice
object.
Plotting of dice
objects.
## S3 method for class 'dice' plot(x, plot_margin = 0.05, frac_to_plot = 1, plot_sd = TRUE, plot_orig_pts_deriv = TRUE, pts_preds_size = 1.5, colorvec, color_by = NULL, x_quantile = TRUE, plot_dpdp = TRUE, rug_quantile = seq(from = 0, to = 1, by = 0.1), ...)
## S3 method for class 'dice' plot(x, plot_margin = 0.05, frac_to_plot = 1, plot_sd = TRUE, plot_orig_pts_deriv = TRUE, pts_preds_size = 1.5, colorvec, color_by = NULL, x_quantile = TRUE, plot_dpdp = TRUE, rug_quantile = seq(from = 0, to = 1, by = 0.1), ...)
x |
Object of class |
plot_margin |
Extra margin to pass to |
frac_to_plot |
If |
plot_sd |
If |
plot_orig_pts_deriv |
If |
pts_preds_size |
Size of points to make if |
colorvec |
Optional vector of colors to use for each curve. |
color_by |
Optional variable name (or column number) in |
x_quantile |
If |
plot_dpdp |
If |
rug_quantile |
If not null, tick marks are drawn on the x-axis corresponding to the vector of quantiles specified by this parameter.
Forced to |
... |
Additional plotting arguments. |
A list with the following elements.
plot_points_indices |
Row numbers of |
legend_text |
If the |
dice
## Not run: require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bhd_rf_mod = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1) # estimate derivatives, then plot. bhd.dice = dice(bhd.ice) plot(bhd.dice) ## End(Not run)
## Not run: require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bhd_rf_mod = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1) # estimate derivatives, then plot. bhd.dice = dice(bhd.ice) plot(bhd.dice) ## End(Not run)
ice
objects.
Plotting of ice
objects.
## S3 method for class 'ice' plot(x, plot_margin = 0.05, frac_to_plot = 1, plot_points_indices = NULL, plot_orig_pts_preds = TRUE, pts_preds_size = 1.5, colorvec, color_by = NULL, x_quantile = TRUE, plot_pdp = TRUE, centered = FALSE, prop_range_y = TRUE, rug_quantile = seq(from = 0, to = 1, by = 0.1), centered_percentile = 0, point_labels = NULL, point_labels_size = NULL, prop_type,...)
## S3 method for class 'ice' plot(x, plot_margin = 0.05, frac_to_plot = 1, plot_points_indices = NULL, plot_orig_pts_preds = TRUE, pts_preds_size = 1.5, colorvec, color_by = NULL, x_quantile = TRUE, plot_pdp = TRUE, centered = FALSE, prop_range_y = TRUE, rug_quantile = seq(from = 0, to = 1, by = 0.1), centered_percentile = 0, point_labels = NULL, point_labels_size = NULL, prop_type,...)
x |
Object of class |
plot_margin |
Extra margin to pass to |
frac_to_plot |
If |
plot_points_indices |
If not |
plot_orig_pts_preds |
If |
pts_preds_size |
Size of points to make if |
colorvec |
Optional vector of colors to use for each curve. |
color_by |
Optional variable name in |
x_quantile |
If |
plot_pdp |
If |
centered |
If |
prop_range_y |
When |
centered_percentile |
The percentile of |
point_labels |
If not |
point_labels_size |
If not |
rug_quantile |
If not |
prop_type |
Scaling factor for the right vertical axis in centered plots if |
... |
Other arguments to be passed to the |
A list with the following elements.
plot_points_indices |
Row numbers of |
legend_text |
If the |
ice
## Not run: require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bhd_rf_mod = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1) ## plot plot(bhd.ice, x_quantile = TRUE, plot_pdp = TRUE, frac_to_plot = 1) ## centered plot plot(bhd.ice, x_quantile = TRUE, plot_pdp = TRUE, frac_to_plot = 1, centered = TRUE) ## color the curves by high and low values of 'rm'. # First create an indicator variable which is 1 if the number of # rooms is greater than the median: median_rm = median(X$rm) bhd.ice$Xice$I_rm = ifelse(bhd.ice$Xice$rm > median_rm, 1, 0) plot(bhd.ice, frac_to_plot = 1, centered = TRUE, prop_range_y = TRUE, x_quantile = T, plot_orig_pts_preds = T, color_by = "I_rm") bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = 1) plot(bhd.ice, frac_to_plot = 1, centered = TRUE, prop_range_y = TRUE, x_quantile = T, plot_orig_pts_preds = T, color_by = y) ## End(Not run)
## Not run: require(ICEbox) require(randomForest) require(MASS) #has Boston Housing data, Pima data(Boston) #Boston Housing data X = Boston y = X$medv X$medv = NULL ## build a RF: bhd_rf_mod = randomForest(X, y) ## Create an 'ice' object for the predictor "age": bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1) ## plot plot(bhd.ice, x_quantile = TRUE, plot_pdp = TRUE, frac_to_plot = 1) ## centered plot plot(bhd.ice, x_quantile = TRUE, plot_pdp = TRUE, frac_to_plot = 1, centered = TRUE) ## color the curves by high and low values of 'rm'. # First create an indicator variable which is 1 if the number of # rooms is greater than the median: median_rm = median(X$rm) bhd.ice$Xice$I_rm = ifelse(bhd.ice$Xice$rm > median_rm, 1, 0) plot(bhd.ice, frac_to_plot = 1, centered = TRUE, prop_range_y = TRUE, x_quantile = T, plot_orig_pts_preds = T, color_by = "I_rm") bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = 1) plot(bhd.ice, frac_to_plot = 1, centered = TRUE, prop_range_y = TRUE, x_quantile = T, plot_orig_pts_preds = T, color_by = y) ## End(Not run)
dice
objects.
Prints a summary of a dice
object.
## S3 method for class 'dice' print(x, ...)
## S3 method for class 'dice' print(x, ...)
x |
Object of class |
... |
Ignored for now. |
ice
objects.
Prints a summary of an ice
object.
## S3 method for class 'ice' print(x, ...)
## S3 method for class 'ice' print(x, ...)
x |
Object of class |
... |
Ignored for now. |
dice
objects.
Alias of print
method.
## S3 method for class 'dice' summary(object, ...)
## S3 method for class 'dice' summary(object, ...)
object |
Object of class |
... |
Ignored for now. |
ice
objects.
Alias of print
method.
## S3 method for class 'ice' summary(object, ...)
## S3 method for class 'ice' summary(object, ...)
object |
Object of class |
... |
Ignored for now. |
The WhiteWine data frame has 4898 rows and 12 columns and concerns white wines from a region in Portugal. The response variable, quality, is a wine quality metric, taken to be the median preference score of three blind tasters on a scale of 1-10. The 11 covariates are physicochemical metrics of wine quality such as citric acid content, sulphates, etc.
data(WhiteWine)
data(WhiteWine)
A data frame of 4898 cases on 12 variables.
K Bache and M Lichman. UCI machine learning repository, 2013. http://archive.ics.uci.edu/ml