pyshapr
Source:.github/pyshapr.md
pyshapr is a Python wrapper for the R package shapr, using the rpy2 Python library to access R from within Python.
Renamed: This package was previously published as
shaprpy. It has been renamed topyshapr. The oldshaprpypackage remains available on PyPI for a transition period and simply forwards topyshapr. Please switch topip install pyshaprandimport pyshapr.
Note: This wrapper is not as comprehensively tested as the R package.
rpy2has limited support on Windows, and the same therefore applies topyshapr.pyshaprhas only been tested on Linux (and WSL - Windows Subsystem for Linux), and the below instructions assume a Linux environment.Requirement: Python 3.11 or later is required to use
pyshapr.
Installation
These instructions assume you already have pip and R installed and available to the Python environment in which you want to run pyshapr.
- Official instructions for installing
pipcan be found here. - Official instructions for installing R can be found here.
On Debian/Ubuntu-based systems, R can also be installed via:
1. Install the R package shapr
pyshapr requires the R package shapr (version 1.0.5 or newer). In your R environment, install the latest version from CRAN using:
2. Ensure R is discoverable (R_HOME and PATH)
Sometimes rpy2 (which pyshapr relies on) cannot automatically locate your R installation. To ensure proper detection, verify that:
- R is available in your system
PATH, or - The
R_HOMEenvironment variable is set to your R installation directory.
Example:
Quick Demo
from sklearn.ensemble import RandomForestRegressor
from pyshapr import explain
from pyshapr.datasets import load_california_housing
# Load example data
dfx_train, dfx_explain, dfy_train, dfy_explain = load_california_housing()
# Fit a model
model = RandomForestRegressor()
model.fit(dfx_train, dfy_train.values.flatten())
# Explain predictions
explanation = explain(
model=model,
x_train=dfx_train,
x_explain=dfx_explain,
approach="gaussian",
phi0=dfy_train.mean().item(),
seed=1
)
explanation.print() # Print the Shapley values
# Get a summary object with computation details
summary = explanation.summary()
print(summary) # Displays a formatted summary (also available directly via explanation.summary())
# Access specific summary attributes (available with tab-completion in Jupyter)
summary['approach'] # Approach used
summary['timing_summary']['total_time_secs'] # Total computation time
# Extract one or more specific result objects directly
explanation.get_results("proglang") # Programming language used (Python/R)
explanation.get_results("approach") # Approach used
explanation.get_results().keys() # All available result objects
# Plotting (requires the 'shap' library)
# Convert to a SHAP Explanation object
shap_exp = explanation.to_shap()
import shap
shap.plots.waterfall(shap_exp[0]) # Plot the first observationSupported Models
pyshapr can explain predictions from models built with:
scikit-learn-
keras(Sequential API) xgboost
For other model types, you can supply:
- A custom
predict_modelfunction - (Optionally) a custom
get_model_specsfunction topyshapr.explain.
Supported Approaches
pyshapr forwards all approach-specific arguments to shapr::explain(). Commonly used approaches include:
-
"arf","categorical","copula","ctree","empirical","gaussian","regression_separate","regression_surrogate","vaeac" -
"independence"(not recommended)
"arf", "ctree", "regression_separate", "regression_surrogate" and "vaeac" support mixed numerical/categorical feature sets, "categorical" supports categorical features only, while "copula", "empirical", "gaussian" and "independence" support numerical features only.
Examples
See the examples folder on GitHub for runnable examples, including:
- Basic usage with
scikit-learnmodels - Usage with
xgboostmodels - Usage with
kerasmodels - A custom PyTorch model
- Usage of the
Shaprclass and associatedShaprSummaryclass for exploration and extraction of explanation results. - Plotting functionality for the Shapley values through the
shappackage - ARF and VAEAC examples for both numerical and mixed categorical feature sets
- The regression paradigm described in Olsen et al. (2024), which shows:
- How to specify the regression model
- How to enable automatic cross-validation of hyperparameters
- How to apply pre-processing steps before fitting regression models