Module causallib.survival
This module allows estimating counterfactual outcomes in a setting of right-censored data
(also known as survival analysis, or time-to-event modeling).
In addition to the standard inputs of X
- baseline covariates, a
- treatment assignment and y
- outcome indicator,
a new variable t
is introduced, measuring time from the beginning of observation period to an occurrence of event.
An event may be right-censoring (where y=0
) or an outcome of interest, or “death” (where y=1
,
which is also considered as censoring).
Each of these methods uses an underlying machine learning model of choice, and can also integrate with the
``lifelines` <https://github.com/CamDavidsonPilon/lifelines>`_ survival analysis Python package.
Additional methods will be added incrementally.
Available Methods
The methods that are currently available are:
Weighting:
causallib.survival.WeightedSurvival
- usescausallib
‘sWeightEstimator
(e.g.,IPW
) to generate weighted pseudo-population for survival analysis.Standardization (parametric g-formula):
causallib.survival.StandardizedSurvival
- fits a parametric hazards model that includes baseline covariates.Weighted Standardization:
causallib.survival.WeightedStandardizedSurvival
- combines the two above-mentioned methods.
Example: Weighted survival analysis with Inverse Probability Weighting
from sklearn.linear_model import LogisticRegression
from causallib.survival import WeightedSurvival
from causallib.estimation import IPW
from causallib.datasets import load_nhefs_survival
ipw = IPW(learner=LogisticRegression())
weighted_survival_estimator = WeightedSurvival(weight_model=ipw)
X, a, t, y = load_nhefs_survival()
weighted_survival_estimator.fit(X, a)
population_averaged_survival_curves = weighted_survival_estimator.estimate_population_outcome(X, a, t, y)
Example: Standardized survival (parametric g-formula)
from causallib.survival import StandardizedSurvival
standardized_survival = StandardizedSurvival(survival_model=LogisticRegression())
standardized_survival.fit(X, a, t, y)
population_averaged_survival_curves = standardized_survival.estimate_poplatuon_outcome(X, a, t)
individual_survival_curves = standardized_survival.estimate_individual_outcome(X, a, t)
Submodules
- causallib.survival.base_survival module
- causallib.survival.marginal_survival module
- causallib.survival.regression_curve_fitter module
- causallib.survival.standardized_survival module
- causallib.survival.survival_utils module
- causallib.survival.univariate_curve_fitter module
- causallib.survival.weighted_standardized_survival module
- causallib.survival.weighted_survival module
Module contents
Causal Survival Analysis Models
- class causallib.survival.MarginalSurvival(survival_model: Optional[Any] = None)[source]
Bases:
causallib.survival.weighted_survival.WeightedSurvival
Marginal (un-adjusted) survival estimator. Essentially it is a degenerated WeightedSurvival instance without a weight model.
Marginal (un-adjusted) survival estimator. :param survival_model: Three alternatives:
None - compute non-parametric KaplanMeier survival curve
Scikit-Learn estimator (needs to implement predict_proba) - compute parametric curve by fitting a time-varying hazards model
lifelines UnivariateFitter - use lifelines fitter to compute survival curves from events and durations
- class causallib.survival.RegressionCurveFitter(learner: sklearn.base.BaseEstimator)[source]
Bases:
object
Default implementation of a parametric survival curve fitter with covariates (pooled regression). API follows ‘lifelines’ convention for regression models, see here for example: https://lifelines.readthedocs.io/en/latest/fitters/regression/CoxPHFitter.html#lifelines.fitters.coxph_fitter.CoxPHFitter.fit
- Parameters
learner – scikit-learn estimator (needs to implement predict_proba) - compute parametric curve by fitting a time-varying hazards model that includes baseline covariates. Note that the model is fitted on a person-time table with all covariates, and might be computationally and memory expansive.
- fit(df: pandas.core.frame.DataFrame, duration_col: str, event_col: Optional[str] = None, weights_col: Optional[str] = None)[source]
Fits a parametric curve with covariates.
- Parameters
df (pd.DataFrame) – DataFrame, must contain a ‘duration_col’, and optional ‘event_col’ / ‘weights_col’. All other columns are treated as baseline covariates.
duration_col (str) – Name of column with subjects’ lifetimes (time-to-event)
event_col (Optional[str]) – Name of column with event type (outcome=1, censor=0). If unspecified, assumes that all events are ‘outcome’ (no censoring).
weights_col (Optional[str]) – Name of column with optional subject weights.
- Returns
Self
- predict_survival_function(X: Optional[Union[pandas.core.series.Series, pandas.core.frame.DataFrame]] = None, times: Optional[Union[List[float], numpy.ndarray, pandas.core.series.Series]] = None) pandas.core.frame.DataFrame [source]
Predicts survival function (table) for individuals, given their covariates. :param X: Subjects covariates :type X: pd.DataFrame / pd.Series :param times: An iterable of increasing time points to predict cumulative hazard at.
If unspecified, predict all observed time points in data.
- Returns
Each column contains a survival curve for an individual, indexed by time-steps
- Return type
pd.DataFrame
- class causallib.survival.StandardizedSurvival(survival_model: Any, stratify: bool = True, **kwargs)[source]
Bases:
causallib.survival.base_survival.SurvivalBase
Standardization survival estimator. Computes parametric curve by fitting a time-varying hazards model that includes baseline covariates. :param survival_model: Two alternatives:
- Scikit-Learn estimator (needs to implement predict_proba) - compute parametric curve by fitting a
time-varying hazards model that includes baseline covariates. Note that the model is fitted on a person-time table with all covariates, and might be computationally and memory expansive.
- lifelines RegressionFitter - use lifelines fitter to compute survival curves from baseline covariates,
events and durations
- Parameters
stratify (bool) – if True, fit a separate model per treatment group
- estimate_individual_outcome(X: pandas.core.frame.DataFrame, a: pandas.core.series.Series, t: pandas.core.series.Series, y: Optional[Any] = None, timeline_start: Optional[int] = None, timeline_end: Optional[int] = None) pandas.core.frame.DataFrame [source]
Returns individual survival curves for each subject row in X/a/t
- Parameters
X (pd.DataFrame) – Baseline covariate matrix of size (num_subjects, num_features).
a (pd.Series) – Treatment assignment of size (num_subjects,).
t (pd.Series) – Followup durations, size (num_subjects,).
y – NOT USED (for API compatibility only).
timeline_start (int) – Common start time-step. If provided, will generate survival curves starting from ‘timeline_start’ for all patients. If None, will predict from first observed event (t.min()).
timeline_end (int) – Common end time-step. If provided, will generate survival curves up to ‘timeline_end’ for all patients. If None, will predict up to last observed event (t.max()).
- Returns
with time-step index, subject IDs (X.index) as columns and point survival as entries
- Return type
pd.DataFrame
- estimate_population_outcome(X: pandas.core.frame.DataFrame, a: pandas.core.series.Series, t: pandas.core.series.Series, y: Optional[Any] = None, timeline_start: Optional[int] = None, timeline_end: Optional[int] = None) pandas.core.frame.DataFrame [source]
Returns population averaged survival curves.
- Parameters
X (pd.DataFrame) – Baseline covariate matrix of size (num_subjects, num_features).
a (pd.Series) – Treatment assignment of size (num_subjects,).
t (pd.Series) – Followup durations, size (num_subjects,).
y – NOT USED (for API compatibility only).
timeline_start (int) – Common start time-step. If provided, will generate survival curves starting from ‘timeline_start’ for all patients. If None, will predict from first observed event (t.min()).
timeline_end (int) – Common end time-step. If provided, will generate survival curves up to ‘timeline_end’ for all patients. If None, will predict up to last observed event (t.max()).
- Returns
with time-step index, treatment values as columns and survival as entries
- Return type
pd.DataFrame
- fit(X: pandas.core.frame.DataFrame, a: pandas.core.series.Series, t: pandas.core.series.Series, y: pandas.core.series.Series, w: Optional[pandas.core.series.Series] = None, fit_kwargs: Optional[dict] = None)[source]
Fits parametric models and calculates internal survival functions.
- Parameters
X (pd.DataFrame) – Baseline covariate matrix of size (num_subjects, num_features).
a (pd.Series) – Treatment assignment of size (num_subjects,).
t (pd.Series) – Followup duration, size (num_subjects,).
y (pd.Series) – Observed outcome (1) or right censoring event (0), size (num_subjects,).
w (pd.Series) – Optional subject weights.
fit_kwargs (dict) – Optional kwargs for fit call of survival model
- Returns
self
- class causallib.survival.UnivariateCurveFitter(learner: Optional[sklearn.base.BaseEstimator] = None)[source]
Bases:
object
Default implementation of a univariate survival curve fitter. Construct a curve fitter, either non-parametric (Kaplan-Meier) or parametric. API follows ‘lifelines’ convention for univariate models, see here for example: https://lifelines.readthedocs.io/en/latest/fitters/univariate/KaplanMeierFitter.html#lifelines.fitters.kaplan_meier_fitter.KaplanMeierFitter.fit :param learner: optional scikit-learn estimator (needs to implement predict_proba). If provided, will
compute parametric curve by fitting a time-varying hazards model. if None, will compute non-parametric Kaplan-Meier estimator.
- fit(durations, event_observed=None, weights=None)[source]
Fits a univariate survival curve (Kaplan-Meier or parametric, if a learner was provided in constructor)
- Parameters
durations (Iterable) – Duration subject was observed
event_observed (Optional[Iterable]) – Boolean or 0/1 iterable, where True means ‘outcome event’ and False means ‘right censoring’. If unspecified, assumes that all events are ‘outcome’ (no censoring).
weights (Optional[Iterable]) – Optional subject weights
- Returns
Self
- predict(times=None, interpolate=False)[source]
Compute survival curve for time points given in ‘times’ param. :param times: sequence of time points for prediction :param interpolate: if True, linearly interpolate non-observed times. Otherwise, repeat last observed time point.
- Returns
with times index and survival values
- Return type
pd.Series
- class causallib.survival.WeightedStandardizedSurvival(weight_model: causallib.estimation.base_weight.WeightEstimator, survival_model: Any, stratify: bool = True, outcome_covariates=None, weight_covariates=None)[source]
Bases:
causallib.survival.standardized_survival.StandardizedSurvival
- Combines WeightedSurvival and StandardizedSurvival:
Adjusts for treatment assignment by creating weighted pseudo-population (e.g., inverse propensity weighting).
Computes parametric curve by fitting a time-varying hazards model that includes baseline covariates.
- Parameters
weight_model – causallib compatible weight model (e.g., IPW)
survival_model –
- Two alternatives:
- Scikit-Learn estimator (needs to implement predict_proba) - compute parametric curve by fitting a
time-varying hazards model that includes baseline covariates. Note that the model is fitted on a person-time table with all covariates, and might be computationally and memory expansive.
- lifelines RegressionFitter - use lifelines fitter to compute survival curves from baseline covariates,
events and durations
stratify (bool): if True, fit a separate model per treatment group outcome_covariates (array): Covariates to use for outcome model.
If None - all covariates passed will be used. Either list of column names or boolean mask.
- weight_covariates (array): Covariates to use for weight model.
If None - all covariates passed will be used. Either list of column names or boolean mask.
- estimate_individual_outcome(X: pandas.core.frame.DataFrame, a: pandas.core.series.Series, t: pandas.core.series.Series, y: Optional[Any] = None, timeline_start: Optional[int] = None, timeline_end: Optional[int] = None) pandas.core.frame.DataFrame [source]
Returns individual survival curves for each subject row in X/a/t
- Parameters
X (pd.DataFrame) – Baseline covariate matrix of size (num_subjects, num_features).
a (pd.Series) – Treatment assignment of size (num_subjects,).
t (pd.Series) – Followup durations, size (num_subjects,).
y – NOT USED (for API compatibility only).
timeline_start (int) – Common start time-step. If provided, will generate survival curves starting from ‘timeline_start’ for all patients. If None, will predict from first observed event (t.min()).
timeline_end (int) – Common end time-step. If provided, will generate survival curves up to ‘timeline_end’ for all patients. If None, will predict up to last observed event (t.max()).
- Returns
with time-step index, subject IDs (X.index) as columns and point survival as entries
- Return type
pd.DataFrame
- fit(X: pandas.core.frame.DataFrame, a: pandas.core.series.Series, t: pandas.core.series.Series, y: pandas.core.series.Series, w: Optional[pandas.core.series.Series] = None, fit_kwargs: Optional[dict] = None)[source]
Fits parametric models and calculates internal survival functions.
- Parameters
X (pd.DataFrame) – Baseline covariate matrix of size (num_subjects, num_features).
a (pd.Series) – Treatment assignment of size (num_subjects,).
t (pd.Series) – Followup duration, size (num_subjects,).
y (pd.Series) – Observed outcome (1) or right censoring event (0), size (num_subjects,).
w (pd.Series) – NOT USED (for compatibility only) optional subject weights.
fit_kwargs (dict) – Optional kwargs for fit call of survival model
- Returns
self
- class causallib.survival.WeightedSurvival(weight_model: Optional[causallib.estimation.base_weight.WeightEstimator] = None, survival_model: Optional[Any] = None)[source]
Bases:
causallib.survival.base_survival.SurvivalBase
Weighted survival estimator
Weighted survival estimator. :param weight_model: causallib compatible weight model (e.g., IPW) :param survival_model: Three alternatives:
None - compute non-parametric KaplanMeier survival curve
- Scikit-Learn estimator (needs to implement predict_proba) - compute parametric curve by fitting a
time-varying hazards model
lifelines UnivariateFitter - use lifelines fitter to compute survival curves from events and durations
- estimate_population_outcome(X: pandas.core.frame.DataFrame, a: pandas.core.series.Series, t: pandas.core.series.Series, y: pandas.core.series.Series, timeline_start: Optional[int] = None, timeline_end: Optional[int] = None) pandas.core.frame.DataFrame [source]
Returns population averaged survival curves.
- Parameters
X (pd.DataFrame) – Baseline covariate matrix of size (num_subjects, num_features).
a (pd.Series) – Treatment assignment of size (num_subjects,).
t (pd.Series|int) – Followup durations, size (num_subjects,).
y (pd.Series) – Observed outcome (1) or right censoring event (0), size (num_subjects,).
timeline_start (int) – Common start time-step. If provided, will generate survival curves starting from ‘timeline_start’ for all patients. If None, will predict from first observed event.
timeline_end (int) – Common end time-step. If provided, will generate survival curves up to ‘timeline_end’ for all patients. If None, will predict up to last observed event.
- Returns
with timestep index, treatment values as columns and survival as entries
- Return type
pd.DataFrame
- fit(X: pandas.core.frame.DataFrame, a: pandas.core.series.Series, t: Optional[pandas.core.series.Series] = None, y: Optional[pandas.core.series.Series] = None, fit_kwargs: Optional[dict] = None)[source]
Fits internal weight module (e.g. IPW module, adversarial weighting, etc).
- Parameters
X (pd.DataFrame) – Baseline covariate matrix of size (num_subjects, num_features).
a (pd.Series) – Treatment assignment of size (num_subjects,).
t (pd.Series) – NOT USED (for compatibility only)
y (pd.Series) – NOT USED (for compatibility only)
fit_kwargs (dict) – Optional kwargs for fit call of survival model (NOT USED, since fit call of survival model occurs in ‘estimate_population_outcome’ rather than here)
- Returns
self