causallib.evaluation.plots.curve_data_makers module

Functions that calculate curve data for cross validation plots.

causallib.evaluation.plots.curve_data_makers.calculate_curve_data_binary_outcome(folds_predictions, targets, curve_metric, area_metric, stratify_by=None)[source]

Calculate different performance (ROC or PR) curves

Parameters

folds_predictions (list[pd.Series]) – Predictions for each fold.
targets (pd.Series) – True labels
curve_metric (callable) – Performance metric returning 3 output vectors - metric1, metric2 and thresholds. Where metric1 and metric2 depict the curve when plotted on x-axis and y-axis.
area_metric (callable) – Performance metric of the area under the curve.
stratify_by (pd.Series) – Group assignment to stratify by.

Returns

Evaluation of the metric: for each fold and for each curve. One curve for each group level in stratify_by. On general: {curve_name: {metric1: [evaluation_fold_1, …]}}. For example: {“Treatment=1”: {“FPR”: [FPR_fold_1, FPR_fold_2, FPR_fold_3]}}

Return type

dict[str, dict[str, list[np.ndarray]]]

causallib.evaluation.plots.curve_data_makers.calculate_curve_data_propensity(fold_predictions: List[causallib.evaluation.weight_predictor.PropensityPredictions], targets, curve_metric, area_metric)[source]

Calculate different performance (ROC or PR) curves

Parameters

fold_predictions (list[PropensityEvaluatorPredictions]) – Predictions for each fold.
targets (pd.Series) – True labels
curve_metric (callable) – Performance metric returning 3 output vectors - metric1, metric2 and thresholds. Where metric1 and metric2 depict the curve when plotted on x-axis and y-axis.
area_metric (callable) – Performance metric of the area under the curve.
**kwargs –

Returns

Evaluation of the metric

for each fold and for each curve. 3 curves:

”unweighted” (regular)

”weighted” (weighted by inverse propensity)

”expected” (duplicated population, weighted by propensity)

On general: {curve_name: {metric1: [evaluation_fold_1, …]}}. For example: {“weighted”: {“FPR”: [FPR_fold_1, FPR_fold_2, FPR_fold3]}}

Return type

dict[str, dict[str, list[np.ndarray]]]

causallib.evaluation.plots.curve_data_makers.calculate_performance_curve_data_on_folds(folds_predictions, folds_targets, sample_weights=None, area_metric=<function roc_auc_score>, curve_metric=<function roc_curve>, pos_label=None)[source]

Calculates performance curves of the predictions across folds.

Parameters

folds_predictions (list[pd.Series]) – Score prediction (as in continuous output of classifier, predict_proba or decision_function) for every fold.
folds_targets (list[pd.Series]) – True labels for every fold.
sample_weights (list[pd.Series] | None) – weight for each sample for every fold.
area_metric (callable) – Performance metric of the area under the curve.
curve_metric (callable) – Performance metric returning 3 output vectors - metric1, metric2 and thresholds. Where metric1 and metric2 depict the curve when plotted on x-axis and y-axis.
pos_label – What label in targets is considered the positive label.

Returns

For every fold, the calculated metric1 and metric2 (the curves), the thresholds and the area calculations.

Return type

(list[np.ndarray], list[np.ndarray], list[np.ndarray], list[float])

causallib.evaluation.plots.curve_data_makers.calculate_pr_curve(curve_data, targets)[source]

Calculates precision-recall curve on the folds.

Parameters

curve_data (dict) – dict of curves produced by BaseEvaluationPlotDataExtractor.calculate_curve_data
targets (pd.Series) – True labels.

Returns

Keys being “Precision”, “Recall” and “AP” (PR metrics): and values are a list the size of number of folds with the evaluation of each fold. Additional “prevalence” key, with positive-label “prevalence” is added to be used by the chance curve.

Return type

dict[str, list[np.ndarray]]

causallib.evaluation.plots.curve_data_makers.calculate_roc_curve(curve_data)[source]

Calculates ROC curve on the folds

Parameters

curve_data (dict) – dict of curves produced by BaseEvaluationPlotDataExtractor.calculate_curve_data

Returns

Keys being “FPR”, “TPR” and “AUC” (ROC metrics): and values are a list the size of number of folds with the evaluation of each fold.

Return type

dict[str, list[np.ndarray]]