causallib.evaluation.plots.plots module

Copyright 2019 IBM Corp.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Created on Aug 22, 2018

causallib.evaluation.plots.plots.calibration_curve(y_true, y_prob, bins=5)[source]

Compute calibration curve of a classifier given its scores output and true label assignment.

Parameters

y_true (pd.Series) – True binary label assignment.
y_prob (pd.Series) – Predicted probability of each sample being the positive label.
bins (int | list | np.ndarray | pd.Series) – If int, it defines the number of equal-width bins in the given range (5, by default). If bins a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths.

Returns

empirical_prob, predicted_prob, bin_counts: empirical_prob: The fraction of positive labels in each bins predicted_prob: The average of predicted probability in each bin bin_counts: The number of samples fallen in each bin

Return type

(pd.Series, pd.Series, pd.Series)

References

[1] Zadrozny, B., & Elkan, C. (2002, July).: Transforming classifier scores into accurate multiclass probability estimates

causallib.evaluation.plots.plots.get_subplots(n_features, max_cols=5, fig_size=(16, 16), sharex=False, sharey=False)[source]

Initializes the grid of subplots and returns the axes

Parameters

n_features (int) – The total number of features to plot
max_cols (int) – The maximal number of figures in each row of figures
fig_size (tuple[int, int]) – Passed on to matplotlib
sharex (str|bool) – will be passed to subplots
sharey (str|bool) – will be passed to subplots

Returns

the figure and the array of axes

Return type

tuple[Figure, np.ndarray]

causallib.evaluation.plots.plots.lookup_name(name: str) → Callable[source]

Lookup function for plot name.

Canonical plot names are defined in this file as globals. Incorrect names will raise KeyError.

Parameters: name (str) – plot name to lookup
Returns: plot function
Return type: Callable

causallib.evaluation.plots.plots.plot_calibration(predictions, targets, n_bins=10, plot_se=True, plot_rug=False, plot_histogram=True, quantile=False, ax=None)[source]

causallib.evaluation.plots.plots.plot_calibration_folds(predictions, targets, cv, n_bins=10, plot_se=True, plot_rug=False, plot_histogram=False, quantile=False, ax=None)[source]

Plot calibration curves for multiple models (presumably in folds)

Parameters

predictions (list[pd.Series]) – list (each entry of a fold) of arrays - probability (“scores”) predictions.
targets (pd.Series) – true labels to calibrate against on the overall data (not divided to folds).
cv (list[np.array]) –
n_bins (int) – number of bins to evaluate in the plot
plot_se (bool) – Whether to plot standard errors around the mean bin-probability estimation.
plot_rug –
plot_histogram –
quantile (bool) – If true, the binning of the calibration curve is by quantiles. Default is false
ax (plt.Axes) – Optional

Note

One of plot_propensity or plot_model must be True.

Returns:

causallib.evaluation.plots.plots.plot_continuous_prediction_accuracy(predictions, y, a, alpha_by_density=True, ax=None)[source]

causallib.evaluation.plots.plots.plot_continuous_prediction_accuracy_folds(predictions, y, a, cv, alpha_by_density=True, plot_residuals=False, ax=None)[source]

causallib.evaluation.plots.plots.plot_counterfactual_common_support(prediction, a, ax=None)[source]

causallib.evaluation.plots.plots.plot_counterfactual_common_support_folds(predictions, hue_by, cv, alpha_by_density=True, ax=None)[source]

Plot the scatter plot of y0 vs. y1 for multiple scoring results, colored by the treatment

Parameters

predictions (list[pd.Series]) – List, the size of number of folds, of outcome prediction values.
hue_by (pd.Series) – Group assignment (as in treatment assignment) of the entire dataset. (indices from cv will be used to slice this vector)
cv (list[np.array]) – List, the size of number of folds, of row indices (as in iloc locations) - the indices of samples participating the fold.
alpha_by_density (bool) – Whether to calculate points alpha value (transparent-opaque) with density estimation. This can take some time to compute for large number of points. If False, alpha calculation will be a simple fast heuristic.
ax (plt.Axes) – The axes on which the plot will be displayed. Optional.

causallib.evaluation.plots.plots.plot_mean_features_imbalance_love_folds(table1_folds, cv=None, aggregate_folds=True, thresh=None, plot_semi_grid=True, ax=None)[source]

causallib.evaluation.plots.plots.plot_mean_features_imbalance_scatter_plot(table1_folds, aggregate_folds=True, thresh=None, label_imbalanced=True, ax=None)[source]

causallib.evaluation.plots.plots.plot_mean_features_imbalance_slope_folds(table1_folds, cv=None, thresh=None, label_imbalanced=True, ax=None)[source]

causallib.evaluation.plots.plots.plot_precision_recall_curve_folds(curve_data, ax=None, plot_folds=False, label_folds=False, label_std=False, **kwargs)[source]

causallib.evaluation.plots.plots.plot_propensity_score_distribution(propensity, treatment, reflect=True, kde=False, cumulative=False, norm_hist=True, ax=None)[source]

Plot the distribution of propensity score

Parameters

propensity (pd.Series) –
treatment (pd.Series) –
reflect (bool) – Whether to plot second treatment group on the opposite sides of the x-axis. This can only work if there are exactly two groups.
kde (bool) – Whether to plot kernel density estimation
cumulative (bool) – Whether to plot cumulative distribution.
norm_hist (bool) – If False - use raw counts on the y-axis. If kde=True, then norm_hist should be True as well.
ax (plt.Axes | None) –

Returns:

causallib.evaluation.plots.plots.plot_propensity_score_distribution_folds(predictions, hue_by, cv, reflect=True, kde=False, cumulative=False, norm_hist=True, ax=None)[source]

Parameters

predictions (list[pd.Series]) –
X (pd.DataFrame) –
hue_by (pd.Series) –
y (pd.Series) –
cv (list[np.array]) –
reflect (bool) – Whether to plot second treatment group on the opposite sides of the x-axis. This can only work if there are exactly two groups.
kde (bool) – Whether to plot kernel density estimation
cumulative (bool) – Whether to plot cumulative distribution.
norm_hist (bool) – If False - use raw counts on the y-axis. If kde=True, then norm_hist should be True as well.
ax (plt.Axis) –

Returns:

causallib.evaluation.plots.plots.plot_residual(predictions, y, a, alpha_by_density=True, ax=None)[source]

causallib.evaluation.plots.plots.plot_residual_folds(predictions, y, a, cv, alpha_by_density=True, ax=None)[source]

causallib.evaluation.plots.plots.plot_roc_curve_folds(curve_data, ax=None, plot_folds=False, label_folds=False, label_std=False, **kwargs)[source]

causallib.evaluation.plots.plots.slope_graph(left, right, thresh=None, label_imbalanced=True, color_below='C0', color_above='C1', marker='o', ax=None)[source]