causallib.survival.survival_utils module

causallib.survival.survival_utils.add_random_suffix(name, suffix_length=4)[source]

Adds a random suffix to string, by computing uuid64.hex.

Parameters

name – input string
suffix_length – length of desired added suffix.

Returns

string with suffix

causallib.survival.survival_utils.canonize_dtypes_and_names(a=None, t=None, y=None, w=None, X=None)[source]

Housekeeping method that assign names for unnamed series and canonizes their data types.

Parameters

a (pd.Series|None) – Treatment assignment of size (num_subjects,).
t (pd.Series|None) – Followup duration, size (num_subjects,).
y (pd.Series|None) – Observed outcome (1) or right censoring event (0), size (num_subjects,).
w (pd.Series|None) – Optional subject weights
X (pd.DataFrame|None) – Baseline covariate matrix of size (num_subjects, num_features).

Returns

a, y, t, w, X

causallib.survival.survival_utils.compute_survival_from_single_hazard_curve(hazard: List, logspace: bool = False) → List[source]

Computes survival curve from an array of point hazards. Note that trailing NaN are supported :param hazard: list/array of point hazards :type hazard: list :param logspace: whether to compute in logspace, for numerical stability :type logspace: bool

Returns: survival at each time-step
Return type: list

causallib.survival.survival_utils.get_person_time_df(t: pandas.core.series.Series, y: pandas.core.series.Series, a: Optional[pandas.core.series.Series] = None, w: Optional[pandas.core.series.Series] = None, X: Optional[pandas.core.frame.DataFrame] = None, return_individual_series: bool = False) → pandas.core.frame.DataFrame[source]

Converts standard input format into an expanded person-time format. Input series need to be indexed by subject IDs and have non-null names (including index name).

Parameters

t (pd.Series) – Followup duration, size (num_subjects,).
y (pd.Series) – Observed outcome (1) or right censoring event (0), size (num_subjects,).
a (pd.Series) – Treatment assignment of size (num_subjects,).
w (pd.Series) – Optional subject weights
X (pd.DataFrame) – Optional baseline covariate matrix of size (num_subjects, num_features).
return_individual_series (bool) – If True, returns a tuple of Series/DataFrames instead of a single DataFrame

Returns

Expanded person-time format with columns from X and expanded ‘a’, ‘y’, ‘t’ columns

Return type

pd.DataFrame

Examples

This example standard input:

age height a y t

id 1 22 170 0 1 2 2 40 180 1 0 1 3 30 165 1 0 2

Will be expanded to:

age height a y t

id 1 22 170 0 0 0 1 22 170 0 0 1 1 22 170 0 1 2 2 40 180 1 0 0 2 40 180 1 0 1 3 30 165 1 0 0 3 30 165 1 0 1 3 30 165 1 0 2

causallib.survival.survival_utils.get_regression_predict_data(X: pandas.core.frame.DataFrame, times: pandas.core.series.Series)[source]

Generates prediction data for a regression fitter: repeats patient covariates per time point in ‘times’. .. rubric:: Example

age height

id 1 22 170 2 40 180

0 1 2

age height t

id 1 22 170 0 1 22 170 1 1 22 170 2 2 40 180 0 2 40 180 1 2 40 180 2

Parameters

X (pd.DataFrame) – Covariates DataFrame
times (pd.Series) – A Series of time points to predict

Returns

DataFrame with repeated covariates per time point.: Index is subject ID with repeats, columns are X + a time column, which is a repeat of ‘times’ per subject.
t_name (str): Name of time column in pred_data_X. Default is ‘t’, but since we concatenate a column to a: covariates frame, we might need to add a random suffix to it.

Return type

pred_data_X (pd.DataFrame)

causallib.survival.survival_utils.safe_join(df: Optional[pandas.core.frame.DataFrame] = None, list_of_series: Optional[List[pandas.core.series.Series]] = None, return_series_names=False)[source]

Safely joins (concatenates on axis 1) a collection of Series (or one DataFrame and multiple Series), while renaming Series that have a duplicate name (a name that already exists in DataFrame or another Series). * Note that DataFrame columns are never changed (only Series names are).

Parameters

df (pd.DataFrame) – optional DataFrame. If provided, will join Series to DataFrame
list_of_series (List[pd.Series]) – list of Series for safe-join
return_series_names (bool) – if True, returns a list of (potentially renamed) Series names

Returns

single concatenated DataFrame
list of (potentially renamed) Series names