causallib.survival.survival_utils module
- causallib.survival.survival_utils.add_random_suffix(name, suffix_length=4)[source]
Adds a random suffix to string, by computing uuid64.hex.
- Parameters
name – input string
suffix_length – length of desired added suffix.
- Returns
string with suffix
- causallib.survival.survival_utils.canonize_dtypes_and_names(a=None, t=None, y=None, w=None, X=None)[source]
Housekeeping method that assign names for unnamed series and canonizes their data types.
- Parameters
a (pd.Series|None) – Treatment assignment of size (num_subjects,).
t (pd.Series|None) – Followup duration, size (num_subjects,).
y (pd.Series|None) – Observed outcome (1) or right censoring event (0), size (num_subjects,).
w (pd.Series|None) – Optional subject weights
X (pd.DataFrame|None) – Baseline covariate matrix of size (num_subjects, num_features).
- Returns
a, y, t, w, X
- causallib.survival.survival_utils.compute_survival_from_single_hazard_curve(hazard: List, logspace: bool = False) List [source]
Computes survival curve from an array of point hazards. Note that trailing NaN are supported :param hazard: list/array of point hazards :type hazard: list :param logspace: whether to compute in logspace, for numerical stability :type logspace: bool
- Returns
survival at each time-step
- Return type
- causallib.survival.survival_utils.get_person_time_df(t: pandas.core.series.Series, y: pandas.core.series.Series, a: Optional[pandas.core.series.Series] = None, w: Optional[pandas.core.series.Series] = None, X: Optional[pandas.core.frame.DataFrame] = None, return_individual_series: bool = False) pandas.core.frame.DataFrame [source]
Converts standard input format into an expanded person-time format. Input series need to be indexed by subject IDs and have non-null names (including index name).
- Parameters
t (pd.Series) – Followup duration, size (num_subjects,).
y (pd.Series) – Observed outcome (1) or right censoring event (0), size (num_subjects,).
a (pd.Series) – Treatment assignment of size (num_subjects,).
w (pd.Series) – Optional subject weights
X (pd.DataFrame) – Optional baseline covariate matrix of size (num_subjects, num_features).
return_individual_series (bool) – If True, returns a tuple of Series/DataFrames instead of a single DataFrame
- Returns
Expanded person-time format with columns from X and expanded ‘a’, ‘y’, ‘t’ columns
- Return type
pd.DataFrame
Examples
This example standard input:
age height a y t
id 1 22 170 0 1 2 2 40 180 1 0 1 3 30 165 1 0 2
Will be expanded to:
age height a y t
id 1 22 170 0 0 0 1 22 170 0 0 1 1 22 170 0 1 2 2 40 180 1 0 0 2 40 180 1 0 1 3 30 165 1 0 0 3 30 165 1 0 1 3 30 165 1 0 2
- causallib.survival.survival_utils.get_regression_predict_data(X: pandas.core.frame.DataFrame, times: pandas.core.series.Series)[source]
Generates prediction data for a regression fitter: repeats patient covariates per time point in ‘times’. .. rubric:: Example
age height
id 1 22 170 2 40 180
0 1 2
age height t
id 1 22 170 0 1 22 170 1 1 22 170 2 2 40 180 0 2 40 180 1 2 40 180 2
- Parameters
X (pd.DataFrame) – Covariates DataFrame
times (pd.Series) – A Series of time points to predict
- Returns
- DataFrame with repeated covariates per time point.
Index is subject ID with repeats, columns are X + a time column, which is a repeat of ‘times’ per subject.
- t_name (str): Name of time column in pred_data_X. Default is ‘t’, but since we concatenate a column to a
covariates frame, we might need to add a random suffix to it.
- Return type
pred_data_X (pd.DataFrame)
- causallib.survival.survival_utils.safe_join(df: Optional[pandas.core.frame.DataFrame] = None, list_of_series: Optional[List[pandas.core.series.Series]] = None, return_series_names=False)[source]
Safely joins (concatenates on axis 1) a collection of Series (or one DataFrame and multiple Series), while renaming Series that have a duplicate name (a name that already exists in DataFrame or another Series). * Note that DataFrame columns are never changed (only Series names are).
- Parameters
df (pd.DataFrame) – optional DataFrame. If provided, will join Series to DataFrame
list_of_series (List[pd.Series]) – list of Series for safe-join
return_series_names (bool) – if True, returns a list of (potentially renamed) Series names
- Returns
single concatenated DataFrame
list of (potentially renamed) Series names