Create an explainer object with Shapley weights for test data.
shapr(x, model, n_combinations = NULL, group = NULL)
Numeric matrix or data.frame/data.table. Contains the data used to estimate the (conditional) distributions for the features needed to properly estimate the conditional expectations in the Shapley formula.
The model whose predictions we want to explain. Run
shapr:::get_supported_models()
for a table of which models shapr
supports natively.
Integer. The number of feature combinations to sample. If NULL
,
the exact method is used and all combinations are considered. The maximum number of
combinations equals 2^ncol(x)
.
List. If NULL
regular feature wise Shapley values are computed.
If provided, group wise Shapley values are computed. group
then has length equal to
the number of groups. The list element contains character vectors with the features included
in each of the different groups.
Named list that contains the following items:
Boolean. Equals TRUE
if n_combinations = NULL
or
n_combinations < 2^ncol(x)
, otherwise FALSE
.
Positive integer. The number of columns in x
Binary matrix. The number of rows equals the number of unique combinations, and
the number of columns equals the total number of features. I.e. let's say we have a case with
three features. In that case we have 2^3 = 8
unique combinations. If the j-th
observation for the i-th row equals 1
it indicates that the j-th feature is present in
the i-th combination. Otherwise it equals 0
.
Matrix. This matrix is equal to the matrix R_D
in Equation 7 in the reference
of link{explain}
. The Shapley value for a test observation will be equal to the matrix-vector product
of W
and the contribution vector.
data.table. Returned object from feature_combinations
data.table. Transformed x
into a data.table.
List. The updated_feature_list
output from
preprocess_data
In addition to the items above, model
and n_combinations
are also present in the returned object.
if (requireNamespace("MASS", quietly = TRUE)) {
# Load example data
data("Boston", package = "MASS")
df <- Boston
# Example using the exact method
x_var <- c("lstat", "rm", "dis", "indus")
y_var <- "medv"
df0 <- df[, x_var]
model <- lm(medv ~ lstat + rm + dis + indus, data = df)
explainer <- shapr(df0, model)
print(nrow(explainer$X))
# 16 (which equals 2^4)
# Example using approximation
y_var <- "medv"
model <- lm(medv ~ ., data = df)
explainer <- shapr(df, model, n_combinations = 1e3)
print(nrow(explainer$X))
# Example using approximation where n_combinations > 2^m
x_var <- c("lstat", "rm", "dis", "indus")
y_var <- "medv"
model <- lm(medv ~ lstat + rm + dis + indus, data = df)
explainer <- shapr(df0, model, n_combinations = 1e3)
print(nrow(explainer$X))
# 16 (which equals 2^4)
# Example using groups
group <- list(A=x_var[1:2], B=x_var[3:4])
explainer_group <- shapr(df0, model, group = group)
print(nrow(explainer_group$X))
# 4 (which equals 2^(#groups))
}
#> [1] 16
#>
#> Success with message:
#> The columns(s) medv is not used by the model and thus removed from the data.
#> [1] 572
#>
#> Success with message:
#> n_combinations is larger than or equal to 2^m = 16.
#> Using exact instead.
#> [1] 16
#> [1] 4