causallib.evaluation.plots.plots module

  1. Copyright 2019 IBM Corp.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Created on Aug 22, 2018

causallib.evaluation.plots.plots.calibration_curve(y_true, y_prob, bins=5)[source]

Compute calibration curve of a classifier given its scores output and true label assignment.

Parameters
  • y_true (pd.Series) – True binary label assignment.

  • y_prob (pd.Series) – Predicted probability of each sample being the positive label.

  • bins (int | list | np.ndarray | pd.Series) – If int, it defines the number of equal-width bins in the given range (5, by default). If bins a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths.

Returns

empirical_prob, predicted_prob, bin_counts

empirical_prob: The fraction of positive labels in each bins predicted_prob: The average of predicted probability in each bin bin_counts: The number of samples fallen in each bin

Return type

(pd.Series, pd.Series, pd.Series)

References

[1] Zadrozny, B., & Elkan, C. (2002, July).

Transforming classifier scores into accurate multiclass probability estimates

causallib.evaluation.plots.plots.get_subplots(n_features, max_cols=5, fig_size=(16, 16), sharex=False, sharey=False)[source]

Initializes the grid of subplots and returns the axes

Parameters
  • n_features (int) – The total number of features to plot

  • max_cols (int) – The maximal number of figures in each row of figures

  • fig_size (tuple[int, int]) – Passed on to matplotlib

  • sharex (str|bool) – will be passed to subplots

  • sharey (str|bool) – will be passed to subplots

Returns

the figure and the array of axes

Return type

tuple[Figure, np.ndarray]

causallib.evaluation.plots.plots.lookup_name(name: str) Callable[source]

Lookup function for plot name.

Canonical plot names are defined in this file as globals. Incorrect names will raise KeyError.

Parameters

name (str) – plot name to lookup

Returns

plot function

Return type

Callable

causallib.evaluation.plots.plots.plot_calibration(predictions, targets, n_bins=10, plot_se=True, plot_rug=False, plot_histogram=True, quantile=False, ax=None)[source]
causallib.evaluation.plots.plots.plot_calibration_folds(predictions, targets, cv, n_bins=10, plot_se=True, plot_rug=False, plot_histogram=False, quantile=False, ax=None)[source]

Plot calibration curves for multiple models (presumably in folds)

Parameters
  • predictions (list[pd.Series]) – list (each entry of a fold) of arrays - probability (“scores”) predictions.

  • targets (pd.Series) – true labels to calibrate against on the overall data (not divided to folds).

  • cv (list[np.array]) –

  • n_bins (int) – number of bins to evaluate in the plot

  • plot_se (bool) – Whether to plot standard errors around the mean bin-probability estimation.

  • plot_rug

  • plot_histogram

  • quantile (bool) – If true, the binning of the calibration curve is by quantiles. Default is false

  • ax (plt.Axes) – Optional

Note

One of plot_propensity or plot_model must be True.

Returns:

causallib.evaluation.plots.plots.plot_continuous_prediction_accuracy(predictions, y, a, alpha_by_density=True, ax=None)[source]
causallib.evaluation.plots.plots.plot_continuous_prediction_accuracy_folds(predictions, y, a, cv, alpha_by_density=True, plot_residuals=False, ax=None)[source]
causallib.evaluation.plots.plots.plot_counterfactual_common_support(prediction, a, ax=None)[source]
causallib.evaluation.plots.plots.plot_counterfactual_common_support_folds(predictions, hue_by, cv, alpha_by_density=True, ax=None)[source]

Plot the scatter plot of y0 vs. y1 for multiple scoring results, colored by the treatment

Parameters
  • predictions (list[pd.Series]) – List, the size of number of folds, of outcome prediction values.

  • hue_by (pd.Series) – Group assignment (as in treatment assignment) of the entire dataset. (indices from cv will be used to slice this vector)

  • cv (list[np.array]) – List, the size of number of folds, of row indices (as in iloc locations) - the indices of samples participating the fold.

  • alpha_by_density (bool) – Whether to calculate points alpha value (transparent-opaque) with density estimation. This can take some time to compute for large number of points. If False, alpha calculation will be a simple fast heuristic.

  • ax (plt.Axes) – The axes on which the plot will be displayed. Optional.

causallib.evaluation.plots.plots.plot_mean_features_imbalance_love_folds(table1_folds, cv=None, aggregate_folds=True, thresh=None, plot_semi_grid=True, ax=None)[source]
causallib.evaluation.plots.plots.plot_mean_features_imbalance_scatter_plot(table1_folds, aggregate_folds=True, thresh=None, label_imbalanced=True, ax=None)[source]
causallib.evaluation.plots.plots.plot_mean_features_imbalance_slope_folds(table1_folds, cv=None, thresh=None, label_imbalanced=True, ax=None)[source]
causallib.evaluation.plots.plots.plot_precision_recall_curve_folds(curve_data, ax=None, plot_folds=False, label_folds=False, label_std=False, **kwargs)[source]
causallib.evaluation.plots.plots.plot_propensity_score_distribution(propensity, treatment, reflect=True, kde=False, cumulative=False, norm_hist=True, ax=None)[source]

Plot the distribution of propensity score

Parameters
  • propensity (pd.Series) –

  • treatment (pd.Series) –

  • reflect (bool) – Whether to plot second treatment group on the opposite sides of the x-axis. This can only work if there are exactly two groups.

  • kde (bool) – Whether to plot kernel density estimation

  • cumulative (bool) – Whether to plot cumulative distribution.

  • norm_hist (bool) – If False - use raw counts on the y-axis. If kde=True, then norm_hist should be True as well.

  • ax (plt.Axes | None) –

Returns:

causallib.evaluation.plots.plots.plot_propensity_score_distribution_folds(predictions, hue_by, cv, reflect=True, kde=False, cumulative=False, norm_hist=True, ax=None)[source]
Parameters
  • predictions (list[pd.Series]) –

  • X (pd.DataFrame) –

  • hue_by (pd.Series) –

  • y (pd.Series) –

  • cv (list[np.array]) –

  • reflect (bool) – Whether to plot second treatment group on the opposite sides of the x-axis. This can only work if there are exactly two groups.

  • kde (bool) – Whether to plot kernel density estimation

  • cumulative (bool) – Whether to plot cumulative distribution.

  • norm_hist (bool) – If False - use raw counts on the y-axis. If kde=True, then norm_hist should be True as well.

  • ax (plt.Axis) –

Returns:

causallib.evaluation.plots.plots.plot_residual(predictions, y, a, alpha_by_density=True, ax=None)[source]
causallib.evaluation.plots.plots.plot_residual_folds(predictions, y, a, cv, alpha_by_density=True, ax=None)[source]
causallib.evaluation.plots.plots.plot_roc_curve_folds(curve_data, ax=None, plot_folds=False, label_folds=False, label_std=False, **kwargs)[source]
causallib.evaluation.plots.plots.slope_graph(left, right, thresh=None, label_imbalanced=True, color_below='C0', color_above='C1', marker='o', ax=None)[source]