Module causallib.survival

This module allows estimating counterfactual outcomes in a setting of right-censored data (also known as survival analysis, or time-to-event modeling). In addition to the standard inputs of X - baseline covariates, a - treatment assignment and y - outcome indicator, a new variable t is introduced, measuring time from the beginning of observation period to an occurrence of event. An event may be right-censoring (where y=0) or an outcome of interest, or “death” (where y=1, which is also considered as censoring).
Each of these methods uses an underlying machine learning model of choice, and can also integrate with the ``lifelines` <https://github.com/CamDavidsonPilon/lifelines>`_ survival analysis Python package.

Additional methods will be added incrementally.

Available Methods

The methods that are currently available are:

  1. Weighting: causallib.survival.WeightedSurvival - uses causallib‘s WeightEstimator (e.g., IPW) to generate weighted pseudo-population for survival analysis.

  2. Standardization (parametric g-formula): causallib.survival.StandardizedSurvival - fits a parametric hazards model that includes baseline covariates.

  3. Weighted Standardization: causallib.survival.WeightedStandardizedSurvival - combines the two above-mentioned methods.

Example: Weighted survival analysis with Inverse Probability Weighting

from sklearn.linear_model import LogisticRegression
from causallib.survival import WeightedSurvival
from causallib.estimation import IPW
from causallib.datasets import load_nhefs_survival

ipw = IPW(learner=LogisticRegression())
weighted_survival_estimator = WeightedSurvival(weight_model=ipw)
X, a, t, y = load_nhefs_survival()

weighted_survival_estimator.fit(X, a)
population_averaged_survival_curves = weighted_survival_estimator.estimate_population_outcome(X, a, t, y)

Example: Standardized survival (parametric g-formula)

from causallib.survival import StandardizedSurvival

standardized_survival = StandardizedSurvival(survival_model=LogisticRegression())
standardized_survival.fit(X, a, t, y)
population_averaged_survival_curves = standardized_survival.estimate_poplatuon_outcome(X, a, t)
individual_survival_curves = standardized_survival.estimate_individual_outcome(X, a, t)

Submodules

Module contents

Causal Survival Analysis Models

class causallib.survival.MarginalSurvival(survival_model: Optional[Any] = None)[source]

Bases: causallib.survival.weighted_survival.WeightedSurvival

Marginal (un-adjusted) survival estimator. Essentially it is a degenerated WeightedSurvival instance without a weight model.

Marginal (un-adjusted) survival estimator. :param survival_model: Three alternatives:

  1. None - compute non-parametric KaplanMeier survival curve

  2. Scikit-Learn estimator (needs to implement predict_proba) - compute parametric curve by fitting a time-varying hazards model

  3. lifelines UnivariateFitter - use lifelines fitter to compute survival curves from events and durations

class causallib.survival.RegressionCurveFitter(learner: sklearn.base.BaseEstimator)[source]

Bases: object

Default implementation of a parametric survival curve fitter with covariates (pooled regression). API follows ‘lifelines’ convention for regression models, see here for example: https://lifelines.readthedocs.io/en/latest/fitters/regression/CoxPHFitter.html#lifelines.fitters.coxph_fitter.CoxPHFitter.fit

Parameters

learner – scikit-learn estimator (needs to implement predict_proba) - compute parametric curve by fitting a time-varying hazards model that includes baseline covariates. Note that the model is fitted on a person-time table with all covariates, and might be computationally and memory expansive.

fit(df: pandas.core.frame.DataFrame, duration_col: str, event_col: Optional[str] = None, weights_col: Optional[str] = None)[source]

Fits a parametric curve with covariates.

Parameters
  • df (pd.DataFrame) – DataFrame, must contain a ‘duration_col’, and optional ‘event_col’ / ‘weights_col’. All other columns are treated as baseline covariates.

  • duration_col (str) – Name of column with subjects’ lifetimes (time-to-event)

  • event_col (Optional[str]) – Name of column with event type (outcome=1, censor=0). If unspecified, assumes that all events are ‘outcome’ (no censoring).

  • weights_col (Optional[str]) – Name of column with optional subject weights.

Returns

Self

predict_survival_function(X: Optional[Union[pandas.core.series.Series, pandas.core.frame.DataFrame]] = None, times: Optional[Union[List[float], numpy.ndarray, pandas.core.series.Series]] = None) pandas.core.frame.DataFrame[source]

Predicts survival function (table) for individuals, given their covariates. :param X: Subjects covariates :type X: pd.DataFrame / pd.Series :param times: An iterable of increasing time points to predict cumulative hazard at.

If unspecified, predict all observed time points in data.

Returns

Each column contains a survival curve for an individual, indexed by time-steps

Return type

pd.DataFrame

class causallib.survival.StandardizedSurvival(survival_model: Any, stratify: bool = True, **kwargs)[source]

Bases: causallib.survival.base_survival.SurvivalBase

Standardization survival estimator. Computes parametric curve by fitting a time-varying hazards model that includes baseline covariates. :param survival_model: Two alternatives:

  1. Scikit-Learn estimator (needs to implement predict_proba) - compute parametric curve by fitting a

    time-varying hazards model that includes baseline covariates. Note that the model is fitted on a person-time table with all covariates, and might be computationally and memory expansive.

  2. lifelines RegressionFitter - use lifelines fitter to compute survival curves from baseline covariates,

    events and durations

Parameters

stratify (bool) – if True, fit a separate model per treatment group

estimate_individual_outcome(X: pandas.core.frame.DataFrame, a: pandas.core.series.Series, t: pandas.core.series.Series, y: Optional[Any] = None, timeline_start: Optional[int] = None, timeline_end: Optional[int] = None) pandas.core.frame.DataFrame[source]

Returns individual survival curves for each subject row in X/a/t

Parameters
  • X (pd.DataFrame) – Baseline covariate matrix of size (num_subjects, num_features).

  • a (pd.Series) – Treatment assignment of size (num_subjects,).

  • t (pd.Series) – Followup durations, size (num_subjects,).

  • y – NOT USED (for API compatibility only).

  • timeline_start (int) – Common start time-step. If provided, will generate survival curves starting from ‘timeline_start’ for all patients. If None, will predict from first observed event (t.min()).

  • timeline_end (int) – Common end time-step. If provided, will generate survival curves up to ‘timeline_end’ for all patients. If None, will predict up to last observed event (t.max()).

Returns

with time-step index, subject IDs (X.index) as columns and point survival as entries

Return type

pd.DataFrame

estimate_population_outcome(X: pandas.core.frame.DataFrame, a: pandas.core.series.Series, t: pandas.core.series.Series, y: Optional[Any] = None, timeline_start: Optional[int] = None, timeline_end: Optional[int] = None) pandas.core.frame.DataFrame[source]

Returns population averaged survival curves.

Parameters
  • X (pd.DataFrame) – Baseline covariate matrix of size (num_subjects, num_features).

  • a (pd.Series) – Treatment assignment of size (num_subjects,).

  • t (pd.Series) – Followup durations, size (num_subjects,).

  • y – NOT USED (for API compatibility only).

  • timeline_start (int) – Common start time-step. If provided, will generate survival curves starting from ‘timeline_start’ for all patients. If None, will predict from first observed event (t.min()).

  • timeline_end (int) – Common end time-step. If provided, will generate survival curves up to ‘timeline_end’ for all patients. If None, will predict up to last observed event (t.max()).

Returns

with time-step index, treatment values as columns and survival as entries

Return type

pd.DataFrame

fit(X: pandas.core.frame.DataFrame, a: pandas.core.series.Series, t: pandas.core.series.Series, y: pandas.core.series.Series, w: Optional[pandas.core.series.Series] = None, fit_kwargs: Optional[dict] = None)[source]

Fits parametric models and calculates internal survival functions.

Parameters
  • X (pd.DataFrame) – Baseline covariate matrix of size (num_subjects, num_features).

  • a (pd.Series) – Treatment assignment of size (num_subjects,).

  • t (pd.Series) – Followup duration, size (num_subjects,).

  • y (pd.Series) – Observed outcome (1) or right censoring event (0), size (num_subjects,).

  • w (pd.Series) – Optional subject weights.

  • fit_kwargs (dict) – Optional kwargs for fit call of survival model

Returns

self

class causallib.survival.UnivariateCurveFitter(learner: Optional[sklearn.base.BaseEstimator] = None)[source]

Bases: object

Default implementation of a univariate survival curve fitter. Construct a curve fitter, either non-parametric (Kaplan-Meier) or parametric. API follows ‘lifelines’ convention for univariate models, see here for example: https://lifelines.readthedocs.io/en/latest/fitters/univariate/KaplanMeierFitter.html#lifelines.fitters.kaplan_meier_fitter.KaplanMeierFitter.fit :param learner: optional scikit-learn estimator (needs to implement predict_proba). If provided, will

compute parametric curve by fitting a time-varying hazards model. if None, will compute non-parametric Kaplan-Meier estimator.

fit(durations, event_observed=None, weights=None)[source]

Fits a univariate survival curve (Kaplan-Meier or parametric, if a learner was provided in constructor)

Parameters
  • durations (Iterable) – Duration subject was observed

  • event_observed (Optional[Iterable]) – Boolean or 0/1 iterable, where True means ‘outcome event’ and False means ‘right censoring’. If unspecified, assumes that all events are ‘outcome’ (no censoring).

  • weights (Optional[Iterable]) – Optional subject weights

Returns

Self

predict(times=None, interpolate=False)[source]

Compute survival curve for time points given in ‘times’ param. :param times: sequence of time points for prediction :param interpolate: if True, linearly interpolate non-observed times. Otherwise, repeat last observed time point.

Returns

with times index and survival values

Return type

pd.Series

class causallib.survival.WeightedStandardizedSurvival(weight_model: causallib.estimation.base_weight.WeightEstimator, survival_model: Any, stratify: bool = True, outcome_covariates=None, weight_covariates=None)[source]

Bases: causallib.survival.standardized_survival.StandardizedSurvival

Combines WeightedSurvival and StandardizedSurvival:
  1. Adjusts for treatment assignment by creating weighted pseudo-population (e.g., inverse propensity weighting).

  2. Computes parametric curve by fitting a time-varying hazards model that includes baseline covariates.

Parameters
  • weight_model – causallib compatible weight model (e.g., IPW)

  • survival_model

    Two alternatives:
    1. Scikit-Learn estimator (needs to implement predict_proba) - compute parametric curve by fitting a

      time-varying hazards model that includes baseline covariates. Note that the model is fitted on a person-time table with all covariates, and might be computationally and memory expansive.

    2. lifelines RegressionFitter - use lifelines fitter to compute survival curves from baseline covariates,

      events and durations

    stratify (bool): if True, fit a separate model per treatment group outcome_covariates (array): Covariates to use for outcome model.

    If None - all covariates passed will be used. Either list of column names or boolean mask.

    weight_covariates (array): Covariates to use for weight model.

    If None - all covariates passed will be used. Either list of column names or boolean mask.

estimate_individual_outcome(X: pandas.core.frame.DataFrame, a: pandas.core.series.Series, t: pandas.core.series.Series, y: Optional[Any] = None, timeline_start: Optional[int] = None, timeline_end: Optional[int] = None) pandas.core.frame.DataFrame[source]

Returns individual survival curves for each subject row in X/a/t

Parameters
  • X (pd.DataFrame) – Baseline covariate matrix of size (num_subjects, num_features).

  • a (pd.Series) – Treatment assignment of size (num_subjects,).

  • t (pd.Series) – Followup durations, size (num_subjects,).

  • y – NOT USED (for API compatibility only).

  • timeline_start (int) – Common start time-step. If provided, will generate survival curves starting from ‘timeline_start’ for all patients. If None, will predict from first observed event (t.min()).

  • timeline_end (int) – Common end time-step. If provided, will generate survival curves up to ‘timeline_end’ for all patients. If None, will predict up to last observed event (t.max()).

Returns

with time-step index, subject IDs (X.index) as columns and point survival as entries

Return type

pd.DataFrame

fit(X: pandas.core.frame.DataFrame, a: pandas.core.series.Series, t: pandas.core.series.Series, y: pandas.core.series.Series, w: Optional[pandas.core.series.Series] = None, fit_kwargs: Optional[dict] = None)[source]

Fits parametric models and calculates internal survival functions.

Parameters
  • X (pd.DataFrame) – Baseline covariate matrix of size (num_subjects, num_features).

  • a (pd.Series) – Treatment assignment of size (num_subjects,).

  • t (pd.Series) – Followup duration, size (num_subjects,).

  • y (pd.Series) – Observed outcome (1) or right censoring event (0), size (num_subjects,).

  • w (pd.Series) – NOT USED (for compatibility only) optional subject weights.

  • fit_kwargs (dict) – Optional kwargs for fit call of survival model

Returns

self

class causallib.survival.WeightedSurvival(weight_model: Optional[causallib.estimation.base_weight.WeightEstimator] = None, survival_model: Optional[Any] = None)[source]

Bases: causallib.survival.base_survival.SurvivalBase

Weighted survival estimator

Weighted survival estimator. :param weight_model: causallib compatible weight model (e.g., IPW) :param survival_model: Three alternatives:

  1. None - compute non-parametric KaplanMeier survival curve

  2. Scikit-Learn estimator (needs to implement predict_proba) - compute parametric curve by fitting a

    time-varying hazards model

  3. lifelines UnivariateFitter - use lifelines fitter to compute survival curves from events and durations

estimate_population_outcome(X: pandas.core.frame.DataFrame, a: pandas.core.series.Series, t: pandas.core.series.Series, y: pandas.core.series.Series, timeline_start: Optional[int] = None, timeline_end: Optional[int] = None) pandas.core.frame.DataFrame[source]

Returns population averaged survival curves.

Parameters
  • X (pd.DataFrame) – Baseline covariate matrix of size (num_subjects, num_features).

  • a (pd.Series) – Treatment assignment of size (num_subjects,).

  • t (pd.Series|int) – Followup durations, size (num_subjects,).

  • y (pd.Series) – Observed outcome (1) or right censoring event (0), size (num_subjects,).

  • timeline_start (int) – Common start time-step. If provided, will generate survival curves starting from ‘timeline_start’ for all patients. If None, will predict from first observed event.

  • timeline_end (int) – Common end time-step. If provided, will generate survival curves up to ‘timeline_end’ for all patients. If None, will predict up to last observed event.

Returns

with timestep index, treatment values as columns and survival as entries

Return type

pd.DataFrame

fit(X: pandas.core.frame.DataFrame, a: pandas.core.series.Series, t: Optional[pandas.core.series.Series] = None, y: Optional[pandas.core.series.Series] = None, fit_kwargs: Optional[dict] = None)[source]

Fits internal weight module (e.g. IPW module, adversarial weighting, etc).

Parameters
  • X (pd.DataFrame) – Baseline covariate matrix of size (num_subjects, num_features).

  • a (pd.Series) – Treatment assignment of size (num_subjects,).

  • t (pd.Series) – NOT USED (for compatibility only)

  • y (pd.Series) – NOT USED (for compatibility only)

  • fit_kwargs (dict) – Optional kwargs for fit call of survival model (NOT USED, since fit call of survival model occurs in ‘estimate_population_outcome’ rather than here)

Returns

self