`R/explanation.R`

`explain.Rd`

Explain the output of machine learning models with more accurately estimated Shapley values

```
explain(
x,
explainer,
approach,
prediction_zero,
n_samples = 1000,
n_batches = 1,
...
)
# S3 method for independence
explain(
x,
explainer,
approach,
prediction_zero,
n_samples = 1000,
n_batches = 1,
seed = 1,
...
)
# S3 method for empirical
explain(
x,
explainer,
approach,
prediction_zero,
n_samples = 1000,
n_batches = 1,
seed = 1,
w_threshold = 0.95,
type = "fixed_sigma",
fixed_sigma_vec = 0.1,
n_samples_aicc = 1000,
eval_max_aicc = 20,
start_aicc = 0.1,
cov_mat = NULL,
...
)
# S3 method for gaussian
explain(
x,
explainer,
approach,
prediction_zero,
n_samples = 1000,
n_batches = 1,
seed = 1,
mu = NULL,
cov_mat = NULL,
...
)
# S3 method for copula
explain(
x,
explainer,
approach,
prediction_zero,
n_samples = 1000,
n_batches = 1,
seed = 1,
...
)
# S3 method for ctree
explain(
x,
explainer,
approach,
prediction_zero,
n_samples = 1000,
n_batches = 1,
seed = 1,
mincriterion = 0.95,
minsplit = 20,
minbucket = 7,
sample = TRUE,
...
)
# S3 method for combined
explain(
x,
explainer,
approach,
prediction_zero,
n_samples = 1000,
n_batches = 1,
seed = 1,
mu = NULL,
cov_mat = NULL,
...
)
# S3 method for ctree_comb_mincrit
explain(
x,
explainer,
approach,
prediction_zero,
n_samples,
n_batches = 1,
seed = 1,
mincriterion,
...
)
```

- x
A matrix or data.frame. Contains the the features, whose predictions ought to be explained (test data).

- explainer
An

`explainer`

object to use for explaining the observations. See`shapr`

.- approach
Character vector of length

`1`

or`n_features`

.`n_features`

equals the total number of features in the model. All elements should, either be`"gaussian"`

,`"copula"`

,`"empirical"`

,`"ctree"`

, or`"independence"`

. See details for more information.- prediction_zero
Numeric. The prediction value for unseen data, typically equal to the mean of the response.

- n_samples
Positive integer. Indicating the maximum number of samples to use in the Monte Carlo integration for every conditional expectation. See also details.

- n_batches
Positive integer. Specifies how many batches the total number of feature combinations should be split into when calculating the contribution function for each test observation. The default value is 1. Increasing the number of batches may significantly reduce the RAM allocation for models with many features. This typically comes with a small increase in computation time.

- ...
Additional arguments passed to

`prepare_and_predict`

- seed
Positive integer. If

`NULL`

the seed will be inherited from the calling environment.- w_threshold
Numeric vector of length 1, with

`0 < w_threshold <= 1`

representing the minimum proportion of the total empirical weight that data samples should use. If e.g.`w_threshold = .8`

we will choose the`K`

samples with the largest weight so that the sum of the weights accounts for 80% of the total weight.`w_threshold`

is the \(\eta\) parameter in equation (15) of Aas et al (2021).- type
Character. Should be equal to either

`"independence"`

,`"fixed_sigma"`

,`"AICc_each_k"`

or`"AICc_full"`

.- fixed_sigma_vec
Numeric. Represents the kernel bandwidth. Note that this argument is only applicable when

`approach = "empirical"`

, and`type = "fixed_sigma"`

- n_samples_aicc
Positive integer. Number of samples to consider in AICc optimization. Note that this argument is only applicable when

`approach = "empirical"`

, and`type`

is either equal to`"AICc_each_k"`

or`"AICc_full"`

- eval_max_aicc
Positive integer. Maximum number of iterations when optimizing the AICc. Note that this argument is only applicable when

`approach = "empirical"`

, and`type`

is either equal to`"AICc_each_k"`

or`"AICc_full"`

- start_aicc
Numeric. Start value of

`sigma`

when optimizing the AICc. Note that this argument is only applicable when`approach = "empirical"`

, and`type`

is either equal to`"AICc_each_k"`

or`"AICc_full"`

- cov_mat
Numeric matrix. (Optional) Containing the covariance matrix of the data generating distribution.

`NULL`

means it is estimated from the data if needed (in the Gaussian approach).- mu
Numeric vector. (Optional) Containing the mean of the data generating distribution. If

`NULL`

the expected values are estimated from the data. Note that this is only used when`approach = "gaussian"`

.- mincriterion
Numeric value or vector where length of vector is the number of features in model. Value is equal to 1 - alpha where alpha is the nominal level of the conditional independence tests. If it is a vector, this indicates which mincriterion to use when conditioning on various numbers of features.

- minsplit
Numeric value. Equal to the value that the sum of the left and right daughter nodes need to exceed.

- minbucket
Numeric value. Equal to the minimum sum of weights in a terminal node.

- sample
Boolean. If TRUE, then the method always samples

`n_samples`

from the leaf (with replacement). If FALSE and the number of obs in the leaf is less than`n_samples`

, the method will take all observations in the leaf. If FALSE and the number of obs in the leaf is more than`n_samples`

, the method will sample`n_samples`

(with replacement). This means that there will always be sampling in the leaf unless`sample`

= FALSE AND the number of obs in the node is less than`n_samples`

.

Object of class `c("shapr", "list")`

. Contains the following items:

- dt
data.table

- model
Model object

- p
Numeric vector

- x_test
data.table

Note that the returned items `model`

, `p`

and `x_test`

are mostly added due
to the implementation of `plot.shapr`

. If you only want to look at the numerical results
it is sufficient to focus on `dt`

. `dt`

is a data.table where the number of rows equals
the number of observations you'd like to explain, and the number of columns equals `m +1`

,
where `m`

equals the total number of features in your model.

If `dt[i, j + 1] > 0`

it indicates that the j-th feature increased the prediction for
the i-th observation. Likewise, if `dt[i, j + 1] < 0`

it indicates that the j-th feature
decreased the prediction for the i-th observation. The magnitude of the value is also important
to notice. E.g. if `dt[i, k + 1]`

and `dt[i, j + 1]`

are greater than `0`

,
where `j != k`

, and `dt[i, k + 1]`

> `dt[i, j + 1]`

this indicates that feature
`j`

and `k`

both increased the value of the prediction, but that the effect of the k-th
feature was larger than the j-th feature.

The first column in `dt`

, called `none`, is the prediction value not assigned to any of the features
(\(\phi\)_{0}).
It's equal for all observations and set by the user through the argument `prediction_zero`

.
In theory this value should be the expected prediction without conditioning on any features.
Typically we set this value equal to the mean of the response variable in our training data, but other choices
such as the mean of the predictions in the training data are also reasonable.

The most important thing to notice is that `shapr`

has implemented five different
approaches for estimating the conditional distributions of the data, namely `"empirical"`

,
`"gaussian"`

, `"copula"`

, `"ctree"`

and `"independence"`

.
In addition, the user also has the option of combining the four approaches.
E.g., if you're in a situation where you have trained a model that consists of 10 features,
and you'd like to use the `"gaussian"`

approach when you condition on a single feature,
the `"empirical"`

approach if you condition on 2-5 features, and `"copula"`

version
if you condition on more than 5 features this can be done by simply passing
`approach = c("gaussian", rep("empirical", 4), rep("copula", 5))`

. If
`"approach[i]" = "gaussian"`

means that you'd like to use the `"gaussian"`

approach
when conditioning on `i`

features.

For `approach="ctree"`

, `n_samples`

corresponds to the number of samples
from the leaf node (see an exception related to the `sample`

argument).
For `approach="empirical"`

, `n_samples`

is the \(K\) parameter in equations (14-15) of
Aas et al. (2021), i.e. the maximum number of observations (with largest weights) that is used, see also the
`w_threshold`

argument.

Aas, K., Jullum, M., & Løland, A. (2021). Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artificial Intelligence, 298, 103502.

```
if (requireNamespace("MASS", quietly = TRUE)) {
# Load example data
data("Boston", package = "MASS")
# Split data into test- and training data
x_train <- head(Boston, -3)
x_test <- tail(Boston, 3)
# Fit a linear model
model <- lm(medv ~ lstat + rm + dis + indus, data = x_train)
# Create an explainer object
explainer <- shapr(x_train, model)
# Explain predictions
p <- mean(x_train$medv)
# Empirical approach
explain1 <- explain(x_test, explainer,
approach = "empirical",
prediction_zero = p, n_samples = 1e2
)
# Gaussian approach
explain2 <- explain(x_test, explainer,
approach = "gaussian",
prediction_zero = p, n_samples = 1e2
)
# Gaussian copula approach
explain3 <- explain(x_test, explainer,
approach = "copula",
prediction_zero = p, n_samples = 1e2
)
# ctree approach
explain4 <- explain(x_test, explainer,
approach = "ctree",
prediction_zero = p
)
# Combined approach
approach <- c("gaussian", "gaussian", "empirical", "empirical")
explain5 <- explain(x_test, explainer,
approach = approach,
prediction_zero = p, n_samples = 1e2
)
# Print the Shapley values
print(explain1$dt)
# Plot the results
if (requireNamespace("ggplot2", quietly = TRUE)) {
plot(explain1)
}
# Group-wise explanations
group <- list(A = c("lstat", "rm"), B = c("dis", "indus"))
explainer_group <- shapr(x_train, model, group = group)
explain_groups <- explain(
x_test,
explainer_group,
approach = "empirical",
prediction_zero = p,
n_samples = 1e2
)
print(explain_groups$dt)
}
#>
#> Success with message:
#> The columns(s) crim, zn, chas, nox, age, rad, tax, ptratio, black, medv is not used by the model and thus removed from the data.
#>
#> Success with message:
#> The columns(s) crim, zn, chas, nox, age, rad, tax, ptratio, black, medv is not used by the model and thus removed from the data.
#>
#> Success with message:
#> The columns(s) crim, zn, chas, nox, age, rad, tax, ptratio, black, medv is not used by the model and thus removed from the data.
#>
#> Success with message:
#> The columns(s) crim, zn, chas, nox, age, rad, tax, ptratio, black, medv is not used by the model and thus removed from the data.
#>
#> Success with message:
#> The columns(s) crim, zn, chas, nox, age, rad, tax, ptratio, black, medv is not used by the model and thus removed from the data.
#>
#> Success with message:
#> The columns(s) crim, zn, chas, nox, age, rad, tax, ptratio, black, medv is not used by the model and thus removed from the data.
#>
#> Success with message:
#> The columns(s) crim, zn, chas, nox, age, rad, tax, ptratio, black, medv is not used by the model and thus removed from the data.
#>
#> Success with message:
#> The columns(s) crim, zn, chas, nox, age, rad, tax, ptratio, black, medv is not used by the model and thus removed from the data.
#> none lstat rm dis indus
#> 1: 22.55229 4.936125 4.227149 0.9360181 -0.80703737
#> 2: 22.55229 4.319264 2.689601 1.1058710 -0.42851521
#> 3: 22.55229 3.821546 -1.891280 1.1444728 0.08623941
#>
#> Success with message:
#> The columns(s) crim, zn, chas, nox, age, rad, tax, ptratio, black, medv is not used by the model and thus removed from the data.
#>
#> Success with message:
#> The columns(s) crim, zn, chas, nox, age, rad, tax, ptratio, black, medv is not used by the model and thus removed from the data.
#> none A B
#> 1: 22.55229 8.718123 0.5741309
#> 2: 22.55229 6.838269 0.8479526
#> 3: 22.55229 1.842581 1.3183971
```