check_setup
setup(
x_train,
x_explain,
approach,
prediction_zero,
output_size = 1,
n_combinations,
group,
n_samples,
n_batches,
seed,
keep_samp_for_vS,
feature_specs,
MSEv_uniform_comb_weights = TRUE,
type = "normal",
horizon = NULL,
y = NULL,
xreg = NULL,
train_idx = NULL,
explain_idx = NULL,
explain_y_lags = NULL,
explain_xreg_lags = NULL,
group_lags = NULL,
timing,
verbose,
is_python = FALSE,
...
)
Matrix or data.frame/data.table. Contains the data used to estimate the (conditional) distributions for the features needed to properly estimate the conditional expectations in the Shapley formula.
A matrix or data.frame/data.table. Contains the the features, whose predictions ought to be explained.
Character vector of length 1
or one less than the number of features.
All elements should, either be "gaussian"
, "copula"
, "empirical"
, "ctree"
, "vaeac"
,
"categorical"
, "timeseries"
, "independence"
, "regression_separate"
, or "regression_surrogate"
.
The two regression approaches can not be combined with any other approach. See details for more information.
Numeric. The prediction value for unseen data, i.e. an estimate of the expected prediction without conditioning on any features. Typically we set this value equal to the mean of the response variable in our training data, but other choices such as the mean of the predictions in the training data are also reasonable.
TODO: Document
Integer.
If group = NULL
, n_combinations
represents the number of unique feature combinations to sample.
If group != NULL
, n_combinations
represents the number of unique group combinations to sample.
If n_combinations = NULL
, the exact method is used and all combinations are considered.
The maximum number of combinations equals 2^m
, where m
is the number of features.
List.
If NULL
regular feature wise Shapley values are computed.
If provided, group wise Shapley values are computed. group
then has length equal to
the number of groups. The list element contains character vectors with the features included
in each of the different groups.
Positive integer. Indicating the maximum number of samples to use in the Monte Carlo integration for every conditional expectation. See also details.
Positive integer (or NULL).
Specifies how many batches the total number of feature combinations should be split into when calculating the
contribution function for each test observation.
The default value is NULL which uses a reasonable trade-off between RAM allocation and computation speed,
which depends on approach
and n_combinations
.
For models with many features, increasing the number of batches reduces the RAM allocation significantly.
This typically comes with a small increase in computation time.
Positive integer.
Specifies the seed before any randomness based code is being run.
If NULL
the seed will be inherited from the calling environment.
Logical.
Indicates whether the samples used in the Monte Carlo estimation of v_S should be returned
(in internal$output
)
List. The output from get_model_specs()
or get_data_specs()
.
Contains the 3 elements:
Character vector with the names of each feature.
Character vector with the classes of each features.
Character vector with the levels for any categorical features.
Logical. If TRUE
(default), then the function weights the combinations
uniformly when computing the MSEv criterion. If FALSE
, then the function use the Shapley kernel weights to
weight the combinations when computing the MSEv criterion. Note that the Shapley kernel weights are replaced by the
sampling frequency when not all combinations are considered.
Character.
Either "normal" or "forecast" corresponding to function setup()
is called from,
correspondingly the type of explanation that should be generated.
Numeric.
The forecast horizon to explain. Passed to the predict_model
function.
Matrix, data.frame/data.table or a numeric vector. Contains the endogenous variables used to estimate the (conditional) distributions needed to properly estimate the conditional expectations in the Shapley formula including the observations to be explained.
Matrix, data.frame/data.table or a numeric vector. Contains the exogenous variables used to estimate the (conditional) distributions needed to properly estimate the conditional expectations in the Shapley formula including the observations to be explained. As exogenous variables are used contemporaneusly when producing a forecast, this item should contain nrow(y) + horizon rows.
Numeric vector
The row indices in data and reg denoting points in time to use when estimating the conditional expectations in
the Shapley value formula.
If train_idx = NULL
(default) all indices not selected to be explained will be used.
Numeric vector The row indices in data and reg denoting points in time to explain.
Numeric vector.
Denotes the number of lags that should be used for each variable in y
when making a forecast.
Numeric vector.
If xreg != NULL
, denotes the number of lags that should be used for each variable in xreg
when making a forecast.
Logical.
If TRUE
all lags of each variable are grouped together and explained as a group.
If FALSE
all lags of each variable are explained individually.
Logical.
Whether the timing of the different parts of the explain()
should saved in the model object.
An integer specifying the level of verbosity. If 0
, shapr
will stay silent.
If 1
, it will print information about performance. If 2
, some additional information will be printed out.
Use 0
(default) for no verbosity, 1
for low verbose, and 2
for high verbose.
TODO: Make this clearer when we end up fixing this and if they should force a progressr bar.
Logical. Indicates whether the function is called from the Python wrapper. Default is FALSE which is
never changed when calling the function via explain()
in R. The parameter is later used to disallow
running the AICc-versions of the empirical as that requires data based optimization.
Further arguments passed to specific approaches