Create an explainer object with Shapley weights for test data.

shapr(x, model, n_combinations = NULL, group = NULL)

Arguments

x

Numeric matrix or data.frame/data.table. Contains the data used to estimate the (conditional) distributions for the features needed to properly estimate the conditional expectations in the Shapley formula.

model

The model whose predictions we want to explain. Run shapr:::get_supported_models() for a table of which models shapr supports natively.

n_combinations

Integer. The number of feature combinations to sample. If NULL, the exact method is used and all combinations are considered. The maximum number of combinations equals 2^ncol(x).

group

List. If NULL regular feature wise Shapley values are computed. If provided, group wise Shapley values are computed. group then has length equal to the number of groups. The list element contains character vectors with the features included in each of the different groups.

Value

Named list that contains the following items:

exact

Boolean. Equals TRUE if n_combinations = NULL or n_combinations < 2^ncol(x), otherwise FALSE.

n_features

Positive integer. The number of columns in x

S

Binary matrix. The number of rows equals the number of unique combinations, and the number of columns equals the total number of features. I.e. let's say we have a case with three features. In that case we have 2^3 = 8 unique combinations. If the j-th observation for the i-th row equals 1 it indicates that the j-th feature is present in the i-th combination. Otherwise it equals 0.

W

Matrix. This matrix is equal to the matrix R_D in Equation 7 in the reference of link{explain}. The Shapley value for a test observation will be equal to the matrix-vector product of W and the contribution vector.

X

data.table. Returned object from feature_combinations

x_train

data.table. Transformed x into a data.table.

feature_list

List. The updated_feature_list output from preprocess_data

In addition to the items above, model and n_combinations are also present in the returned object.

Author

Nikolai Sellereite

Examples

if (requireNamespace("MASS", quietly = TRUE)) {
  # Load example data
  data("Boston", package = "MASS")
  df <- Boston

  # Example using the exact method
  x_var <- c("lstat", "rm", "dis", "indus")
  y_var <- "medv"
  df0 <- df[, x_var]
  model <- lm(medv ~ lstat + rm + dis + indus, data = df)
  explainer <- shapr(df0, model)

  print(nrow(explainer$X))
  # 16 (which equals 2^4)

  # Example using approximation
  y_var <- "medv"
  model <- lm(medv ~ ., data = df)
  explainer <- shapr(df, model, n_combinations = 1e3)

  print(nrow(explainer$X))

  # Example using approximation where n_combinations > 2^m
  x_var <- c("lstat", "rm", "dis", "indus")
  y_var <- "medv"
  model <- lm(medv ~ lstat + rm + dis + indus, data = df)
  explainer <- shapr(df0, model, n_combinations = 1e3)

  print(nrow(explainer$X))
  # 16 (which equals 2^4)

  # Example using groups
  group <- list(A=x_var[1:2], B=x_var[3:4])

  explainer_group <- shapr(df0, model, group = group)
  print(nrow(explainer_group$X))
  # 4 (which equals 2^(#groups))
}
#> [1] 16
#> 
#> Success with message:
#> The columns(s) medv is not used by the model and thus removed from the data.
#> [1] 572
#> 
#> Success with message:
#> n_combinations is larger than or equal to 2^m = 16. 
#> Using exact instead.
#> [1] 16
#> [1] 4