pyshapr

pyshapr is a Python wrapper for the R package shapr, using the rpy2 Python library to access R from within Python.

Renamed: This package was previously published as shaprpy. It has been renamed to pyshapr. The old shaprpy package remains available on PyPI for a transition period and simply forwards to pyshapr. Please switch to pip install pyshapr and import pyshapr.

Note: This wrapper is not as comprehensively tested as the R package. rpy2 has limited support on Windows, and the same therefore applies to pyshapr. pyshapr has only been tested on Linux (and WSL - Windows Subsystem for Linux), and the below instructions assume a Linux environment.

Requirement: Python 3.11 or later is required to use pyshapr.

Changelog

For a list of changes and updates to the pyshapr package, see the pyshapr CHANGELOG.

Installation

These instructions assume you already have pip and R installed and available to the Python environment in which you want to run pyshapr.

Official instructions for installing pip can be found here.
Official instructions for installing R can be found here.

On Debian/Ubuntu-based systems, R can also be installed via:

sudo apt update
sudo apt install r-base r-base-dev -y

1. Install the R package `shapr`

pyshapr requires the R package shapr (version 1.0.5 or newer). In your R environment, install the latest version from CRAN using:

Rscript -e 'install.packages("shapr", repos="https://cran.rstudio.com")'

2. Ensure R is discoverable (R_HOME and PATH)

Sometimes rpy2 (which pyshapr relies on) cannot automatically locate your R installation. To ensure proper detection, verify that:

R is available in your system PATH, or
The R_HOME environment variable is set to your R installation directory.

Example:

export R_HOME=$(R RHOME)
export PATH=$PATH:$(R RHOME)/bin

3. Install the Python wrapper

Install directly from PyPI with:

pip install pyshapr

Local development install (for contributors)

If you have cloned the repository and want to install in development mode for local changes, navigate to the ./python directory and run:

pip install -e .

The -e flag installs in editable mode, allowing local code changes to be reflected immediately.

Quick Demo

from sklearn.ensemble import RandomForestRegressor
from pyshapr import explain
from pyshapr.datasets import load_california_housing

# Load example data
dfx_train, dfx_explain, dfy_train, dfy_explain = load_california_housing()

# Fit a model
model = RandomForestRegressor()
model.fit(dfx_train, dfy_train.values.flatten())

# Explain predictions
explanation = explain(
    model=model,
    x_train=dfx_train,
    x_explain=dfx_explain,
    approach="gaussian",
    phi0=dfy_train.mean().item(),
    seed=1
)

explanation.print() # Print the Shapley values

# Get a summary object with computation details
summary = explanation.summary()
print(summary)  # Displays a formatted summary (also available directly via explanation.summary())

# Access specific summary attributes (available with tab-completion in Jupyter)
summary['approach']     # Approach used
summary['timing_summary']['total_time_secs']  # Total computation time

# Extract one or more specific result objects directly
explanation.get_results("proglang") # Programming language used (Python/R)
explanation.get_results("approach") # Approach used
explanation.get_results().keys()  # All available result objects

# Plotting (requires the 'shap' library)
# Convert to a SHAP Explanation object
shap_exp = explanation.to_shap()

import shap
shap.plots.waterfall(shap_exp[0]) # Plot the first observation

Supported Models

pyshapr can explain predictions from models built with:

scikit-learn
keras (Sequential API)
xgboost

For other model types, you can supply:

A custom predict_model function
(Optionally) a custom get_model_specs function to pyshapr.explain.

Supported Approaches

pyshapr forwards all approach-specific arguments to shapr::explain(). Commonly used approaches include:

"arf", "categorical", "copula", "ctree", "empirical", "gaussian", "regression_separate", "regression_surrogate", "timeseries", "vaeac"
"independence" (not recommended)

"arf", "ctree", "regression_separate", "regression_surrogate" and "vaeac" support mixed numerical/categorical feature sets, "categorical" supports categorical features only, while "copula", "empirical", "gaussian", "timeseries" and "independence" support numerical features only.

SAGE Values

pyshapr can also compute SAGE (Shapley Additive Global importancE) values, which explain the model’s global loss rather than individual predictions. Pass scope="global" together with the observed responses y_explain. By default the loss is log-loss for binary 0/1 responses and MSE otherwise; a custom Python loss can be supplied via extra_computation_args={"global_loss_func": my_loss}, where my_loss(y, pred) returns a single number. The per-observation Shapley values computed alongside the SAGE values are available through explanation.get_shap_values_est().

explanation = pyshapr.explain(
    model=model,
    x_train=x_train,
    x_explain=x_explain,
    approach="gaussian",
    phi0=y_train.mean().item(),
    scope="global",
    y_explain=y_explain,
)

Examples

See the examples folder on GitHub for runnable examples, including:

Basic usage with scikit-learn models
Usage with xgboost models
Usage with keras models
A custom PyTorch model
Usage of the Shapr class and associated ShaprSummary class for exploration and extraction of explanation results.
Plotting functionality for the Shapley values through the shap package
ARF and VAEAC examples for both numerical and mixed categorical feature sets
The regression paradigm described in Olsen et al. (2024), which shows:
- How to specify the regression model
- How to enable automatic cross-validation of hyperparameters
- How to apply pre-processing steps before fitting regression models