Explain a Forecast from Time Series Models with Dependence-Aware (Conditional/Observational) Shapley Values
Source:R/explain_forecast.R
explain_forecast.RdComputes dependence-aware Shapley values for observations in explain_idx from the specified
model by using the method specified in approach to estimate the conditional expectation.
See
Aas, et. al (2021)
for a thorough introduction to dependence-aware prediction explanation with Shapley values.
For an overview of the methodology and capabilities of the shapr package, see the software paper
Jullum et al. (2025), or the pkgdown site at
norskregnesentral.github.io/shapr/.
Usage
explain_forecast(
model,
y,
xreg = NULL,
train_idx = NULL,
explain_idx,
explain_y_lags,
explain_xreg_lags = explain_y_lags,
horizon,
approach,
phi0,
max_n_coalitions = NULL,
iterative = NULL,
group_lags = TRUE,
group = NULL,
n_MC_samples = 1000,
seed = NULL,
predict_model = NULL,
get_model_specs = NULL,
verbose = "basic",
extra_computation_args = list(),
iterative_args = list(),
output_args = list(),
...
)Arguments
- model
Model object. The model whose predictions you want to explain. Run
get_supported_models()for a table of which modelsexplainsupports natively. Unsupported models can still be explained by passingpredict_modeland (optionally)get_model_specs, see details for more information.- y
Matrix, data.frame/data.table or a numeric vector. Contains the endogenous variables used to estimate the (conditional) distributions needed to properly estimate the conditional expectations in the Shapley formula including the observations to be explained.
- xreg
Matrix, data.frame/data.table or a numeric vector. Contains the exogenous variables used to estimate the (conditional) distributions needed to properly estimate the conditional expectations in the Shapley formula including the observations to be explained. As exogenous variables are used contemporaneously when producing a forecast, this item should contain nrow(y) + horizon rows.
- train_idx
Numeric vector. The row indices in data and reg denoting points in time to use when estimating the conditional expectations in the Shapley value formula. If
train_idx = NULL(default) all indices not selected to be explained will be used.- explain_idx
Numeric vector. The row indices in data and reg denoting points in time to explain.
- explain_y_lags
Numeric vector. Denotes the number of lags that should be used for each variable in
ywhen making a forecast.- explain_xreg_lags
Numeric vector. If
xreg != NULL, denotes the number of lags that should be used for each variable inxregwhen making a forecast.- horizon
Numeric. The forecast horizon to explain. Passed to the
predict_modelfunction.- approach
Character vector of length
1or one less than the number of features. All elements should either be"gaussian","copula","empirical","ctree","vaeac","categorical","timeseries","independence","regression_separate", or"regression_surrogate". The two regression approaches cannot be combined with any other approach. See details for more information.- phi0
Numeric. The prediction value for unseen data, i.e., an estimate of the expected prediction without conditioning on any features. Typically set this equal to the mean of the response in the training data, but alternatives such as the mean of the training predictions are also reasonable.
- max_n_coalitions
Integer. Upper limit on the number of unique feature/group coalitions to use in the iterative procedure (if
iterative = TRUE). Ifiterative = FALSE, it represents the number of feature/group coalitions to use directly. The quantity refers to the number of unique feature coalitions ifgroup = NULL, and group coalitions ifgroup != NULL.max_n_coalitions = NULLcorresponds to2^n_features.- iterative
Logical or NULL. If
NULL(default), set toTRUEif there are more than 5 features/groups, andFALSEotherwise. IfTRUE, Shapley values are estimated iteratively for faster, sufficiently accurate results. First an initial number of coalitions is sampled, then bootstrapping estimates the variance of the Shapley values. A convergence criterion determines if the variances are sufficiently small. If not, additional samples are added. The process repeats until the variances are below the threshold. Specifics for the iterative process and convergence criterion are set viaiterative_args.- group_lags
Logical. If
TRUEall lags of each variable are grouped together and explained as a group. IfFALSEall lags of each variable are explained individually.- group
List. If
NULL, regular feature-wise Shapley values are computed. If provided, group-wise Shapley values are computed.groupthen has length equal to the number of groups. Each list element contains the character vectors with the features included in the corresponding group. See Jullum et al. (2021) for more information on group-wise Shapley values.- n_MC_samples
Positive integer. For most approaches, it indicates the maximum number of samples to use in the Monte Carlo integration of every conditional expectation. For
approach="ctree",n_MC_samplescorresponds to the number of samples from the leaf node (see an exception related to thectree.sampleargument insetup_approach.ctree()). Forapproach="empirical",n_MC_samplesis the \(K\) parameter in equations (14-15) of Aas et al. (2021), i.e. the maximum number of observations (with largest weights) that is used, see also theempirical.etaargumentsetup_approach.empirical().- seed
Positive integer. Specifies the seed before any code involving randomness is run. If
NULL(default), no seed is set in the calling environment.- predict_model
Function. Prediction function to use when
modelis not natively supported. (Runget_supported_models()for a list of natively supported models.) The function must have two arguments,modelandnewdata, which specify the model and a data.frame/data.table to compute predictions for, respectively. The function must give the prediction as a numeric vector.NULL(the default) uses functions specified internally. Can also be used to override the default function for natively supported model classes.- get_model_specs
Function. An optional function for checking model/data consistency when
modelis not natively supported. (Runget_supported_models()for a list of natively supported models.) The function takesmodelas an argument and provides a list with 3 elements:- labels
Character vector with the names of each feature.
- classes
Character vector with the class of each feature.
- factor_levels
Character vector with the levels for any categorical features.
If
NULL(the default), internal functions are used for natively supported model classes, and checking is disabled for unsupported model classes. Can also be used to override the default function for natively supported model classes.- verbose
String vector or NULL. Controls verbosity (printout detail level) via one or more of
"basic","progress","convergence","shapley"and"vS_details"."basic"(default) displays basic information about the computation and messages about parameters/checks."progress"displays where in the calculation process the function currently is."convergence"displays how close the Shapley value estimates are to convergence (only wheniterative = TRUE)."shapley"displays intermediate Shapley value estimates and standard deviations (only wheniterative = TRUE), and the final estimates."vS_details"displays information about the v(S) estimates, most relevant forapproach %in% c("regression_separate", "regression_surrogate", "vaeac").NULLmeans no printout. Any combination can be used, e.g.,verbose = c("basic", "vS_details").- extra_computation_args
Named list. Specifies extra arguments related to the computation of the Shapley values. See
get_extra_comp_args_default()for description of the arguments and their default values.- iterative_args
Named list. Specifies the arguments for the iterative procedure. See
get_iterative_args_default()for description of the arguments and their default values.- output_args
Named list. Specifies certain arguments related to the output of the function. See
get_output_args_default()for description of the arguments and their default values.- ...
Arguments passed on to
setup_approach.categorical,setup_approach.copula,setup_approach.ctree,setup_approach.empirical,setup_approach.gaussian,setup_approach.independence,setup_approach.timeseries,setup_approach.vaeaccategorical.joint_prob_dtData.table. (Optional) Containing the joint probability distribution for each combination of feature values.
NULLmeans it is estimated from thex_trainandx_explain.categorical.epsilonNumeric value. (Optional) If
categorical.joint_prob_dtis not supplied, probabilities/frequencies are estimated usingx_train. If certain observations occur inx_explainand NOT inx_train, then epsilon is used as the proportion of times that these observations occur in the training data. In theory, this proportion should be zero, but this causes an error later in the Shapley computation.internalList. Not used directly, but passed through from
explain().ctree.mincriterionNumeric scalar or vector. Either a scalar or vector of length equal to the number of features in the model. The value is equal to 1 - \(\alpha\) where \(\alpha\) is the nominal level of the conditional independence tests. If it is a vector, this indicates which value to use when conditioning on various numbers of features. The default value is 0.95.
ctree.minsplitNumeric scalar. Determines the minimum value that the sum of the left and right daughter nodes must reach for a split. The default value is 20.
ctree.minbucketNumeric scalar. Determines the minimum sum of weights in a terminal node required for a split. The default value is 7.
ctree.sampleBoolean. If
TRUE(default), then the method always samplesn_MC_samplesobservations from the leaf nodes (with replacement). IfFALSEand the number of observations in the leaf node is less thann_MC_samples, the method will take all observations in the leaf. IfFALSEand the number of observations in the leaf node is more thann_MC_samples, the method will samplen_MC_samplesobservations (with replacement). This means that there will always be sampling in the leaf unlesssample = FALSEand the number of obs in the node is less thann_MC_samples.empirical.typeCharacter. (default =
"fixed_sigma") Must be one of"independence","fixed_sigma","AICc_each_k", or"AICc_full". Note:"empirical.type = independence"is deprecated; useapproach = "independence"instead."fixed_sigma"uses a fixed bandwidth (set throughempirical.fixed_sigma) in the kernel density estimation."AICc_each_k"and"AICc_full"optimize the bandwidth using the AICc criterion, with respectively one bandwidth per coalition size and one bandwidth for all coalition sizes.empirical.etaNumeric scalar. Needs to be
0 < eta <= 1. The default value is 0.95. Represents the minimum proportion of the total empirical weight that data samples should use. For example, ifeta = .8, we choose theKsamples with the largest weights so that the sum of the weights accounts for 80\etais the \(\eta\) parameter in equation (15) of Aas et al. (2021).empirical.fixed_sigmaPositive numeric scalar. The default value is 0.1. Represents the kernel bandwidth in the distance computation used when conditioning on all different coalitions. Only used when
empirical.type = "fixed_sigma"empirical.n_samples_aiccPositive integer. Number of samples to consider in AICc optimization. The default value is 1000. Only used when
empirical.typeis either"AICc_each_k"or"AICc_full".empirical.eval_max_aiccPositive integer. Maximum number of iterations when optimizing the AICc. The default value is 20. Only used when
empirical.typeis either"AICc_each_k"or"AICc_full".empirical.start_aiccNumeric. Start value of the
sigmaparameter when optimizing the AICc. The default value is 0.1. Only used whenempirical.typeis either"AICc_each_k"or"AICc_full".empirical.cov_matNumeric matrix. (Optional) The covariance matrix of the data generating distribution used to define the Mahalanobis distance.
NULLmeans it is estimated fromx_train.gaussian.muNumeric vector. (Optional) Containing the mean of the data generating distribution.
NULLmeans it is estimated from thex_train.gaussian.cov_matNumeric matrix. (Optional) Containing the covariance matrix of the data generating distribution.
NULLmeans it is estimated from thex_train.timeseries.fixed_sigmaPositive numeric scalar. Represents the kernel bandwidth in the distance computation. The default value is 2.
timeseries.boundsNumeric vector of length two. Specifies the lower and upper bounds of the timeseries. The default is
c(NULL, NULL), i.e. no bounds. If one or both of these bounds are notNULL, we restrict the sampled time series to be between these bounds. This is useful if the underlying time series are scaled between 0 and 1, for example.vaeac.depthPositive integer (default is
3). The number of hidden layers in the neural networks of the masked encoder, full encoder, and decoder.vaeac.widthPositive integer (default is
32). The number of neurons in each hidden layer in the neural networks of the masked encoder, full encoder, and decoder.vaeac.latent_dimPositive integer (default is
8). The number of dimensions in the latent space.vaeac.lrPositive numeric (default is
0.001). The learning rate used in thetorch::optim_adam()optimizer.vaeac.activation_functionAn
torch::nn_module()representing an activation function such as, e.g.,torch::nn_relu()(default),torch::nn_leaky_relu(),torch::nn_selu(), ortorch::nn_sigmoid().vaeac.n_vaeacs_initializePositive integer (default is
4). The number of different vaeac models to initiate in the start. Pick the best performing one aftervaeac.extra_parameters$epochs_initiation_phaseepochs (default is2) and continue training that one.vaeac.epochsPositive integer (default is
100). The number of epochs to train the final vaeac model. This includesvaeac.extra_parameters$epochs_initiation_phase, where the default is2.vaeac.extra_parametersNamed list with extra parameters to the
vaeacapproach. Seevaeac_get_extra_para_default()for description of possible additional parameters and their default values.
Value
Object of class c("shapr", "list"). Contains the following items:
shapley_values_estdata.table with the estimated Shapley values with explained observation in the rows and features along the columns. The column
noneis the prediction not devoted to any of the features (given by the argumentphi0)shapley_values_sddata.table with the standard deviation of the Shapley values reflecting the uncertainty in the coalition sampling part of the kernelSHAP procedure. These are, by definition, 0 when all coalitions are used. Only present when
extra_computation_args$compute_sd=TRUE, which is the default wheniterative = TRUE.internalList with the different parameters, data, functions and other output used internally.
pred_explainNumeric vector with the predictions for the explained observations.
MSEvList with the values of the MSEv evaluation criterion for the approach. See the MSEv evaluation section in the general usage vignette for details.
timingList containing timing information for the different parts of the computation.
summarycontains the time stamps for the start and end time in addition to the total execution time.overall_timing_secsgives the time spent on different parts of the explanation computation.main_computation_timing_secsfurther decomposes the main computation time into different parts of the computation for each iteration of the iterative estimation routine, if used.
Details
This function explains a forecast of length horizon. The argument train_idx
is analogous to x_train in explain(), however, it just contains the time indices of where
in the data the forecast should start for each training sample. In the same way explain_idx
defines the time index (indices) which will precede a forecast to be explained.
As any autoregressive forecast model will require a set of lags to make a forecast at an
arbitrary point in time, explain_y_lags and explain_xreg_lags define how many lags
are required to "refit" the model at any given time index. This allows the different
approaches to work in the same way they do for time-invariant models.
See the forecasting section of the general usage vignette for further details. See also the software paper Jullum et al. (2025, Sec. 6) for a more detailed introduction to the methodology, and additional examples.
Examples
# \donttest{
# Load example data
data("airquality")
data <- data.table::as.data.table(airquality)
# Fit an AR(2) model.
model_ar_temp <- ar(data$Temp, order = 2)
# Calculate the zero prediction values for a three step forecast.
p0_ar <- rep(mean(data$Temp), 3)
# Empirical approach, explaining forecasts starting at T = 152 and T = 153.
explain_forecast(
model = model_ar_temp,
y = data[, "Temp"],
train_idx = 2:151,
explain_idx = 152:153,
explain_y_lags = 2,
horizon = 3,
approach = "empirical",
phi0 = p0_ar,
group_lags = FALSE
)
#>
#> ── Starting `shapr::explain_forecast()` at 2025-10-24 16:07:21 ─────────────────
#> ℹ Feature names extracted from the model contain `NA`.
#> Consistency checks between model and data are therefore disabled.
#> ℹ `max_n_coalitions` is `NULL` or larger than `2^n_features = 4`, and is
#> therefore set to `2^n_features = 4`.
#>
#> ── Explanation overview ──
#>
#> • Model class: <ar>
#> • v(S) estimation class: Monte Carlo integration
#> • Approach: empirical
#> • Procedure: Non-iterative
#> • Number of Monte Carlo integration samples: 1000
#> • Number of feature-wise Shapley values: 2
#> • Number of observations to explain: 2
#> • Computations (temporary) saved at: /tmp/RtmpqTpcyz/shapr_obj_73d67003190c.rds
#>
#> ── Main computation started ──
#>
#> ℹ Using 4 of 4 coalitions.
#> explain_idx horizon none Temp.1 Temp.2
#> <int> <int> <num> <num> <num>
#> 1: 152 1 77.9 -0.397 -1.391
#> 2: 153 1 77.9 -6.618 -0.184
#> 3: 152 2 77.9 -0.329 -1.203
#> 4: 153 2 77.9 -6.021 -0.337
#> 5: 152 3 77.9 -0.291 -1.055
#> 6: 153 3 77.9 -5.212 -0.255
# }