causallib.estimation.ipw.IPW#

class IPW(learner, clip_min=None, clip_max=None, use_stabilized=False, verbose=False)[source]#

Causal model implementing inverse probability (propensity score) weighting. w_i = 1 / Pr[A=a_i|Xi]

Parameters:
  • learner – Initialized sklearn model.

  • clip_min (None|float) – Optional value between 0 to 0.5 to lower bound the propensity estimation by clipping it. Will clip probabilities under clip_min to this value.

  • clip_max (None|float) – Optional value between 0.5 to 1 to upper bound the propensity estimation by clipping it. Will clip probabilities above clip_max to this value.

  • use_stabilized (bool) – Whether to re-weigh the learned weights with the prevalence of the treatment. See Also: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351790/#S6title

  • verbose (bool) – Whether to print summary statistics regarding the number of samples clipped due to clip_min and clip_max.

__init__(learner, clip_min=None, clip_max=None, use_stabilized=False, verbose=False)[source]#
Parameters:
  • learner – Initialized sklearn model.

  • clip_min (None|float) – Optional value between 0 to 0.5 to lower bound the propensity estimation by clipping it. Will clip probabilities under clip_min to this value.

  • clip_max (None|float) – Optional value between 0.5 to 1 to upper bound the propensity estimation by clipping it. Will clip probabilities above clip_max to this value.

  • use_stabilized (bool) – Whether to re-weigh the learned weights with the prevalence of the treatment. See Also: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351790/#S6title

  • verbose (bool) – Whether to print summary statistics regarding the number of samples clipped due to clip_min and clip_max.

fit(X, a, y=None)[source]#

Trains a model to predict treatment assignment given the covariates: Pr[A|X].

Parameters:
  • X (pandas.DataFrame) – Covariate matrix of size (num_subjects, num_features).

  • a (pandas.Series) – Treatment assignment of size (num_subjects,).

  • y – IGNORED.

Returns:

A causal weight model with an inner learner fitted.

Return type:

WeightEstimator

compute_weights(X, a, treatment_values=None, clip_min=None, clip_max=None, use_stabilized=None)[source]#

Computes individual weight given the individual’s treatment assignment. w_i = 1 / Pr[A=a_i|X_i] for each individual i.

Parameters:
  • X (pandas.DataFrame) – Covariate matrix of size (num_subjects, num_features).

  • a (pandas.Series) – Treatment assignment of size (num_subjects,).

  • treatment_values (Any | None) – A desired value/s to extract weights to (i.e. weights to what treatment value should be calculated). If not specified, then the weights are chosen by the individual’s actual treatment assignment.

  • clip_min (None|float) – Optional value between 0 to 0.5 to lower bound the propensity estimation by clipping it. Will clip probabilities under clip_min to this value.

  • clip_max (None|float) – Optional value between 0.5 to 1 to upper bound the propensity estimation by clipping it. Will clip probabilities above clip_max to this value.

  • use_stabilized (None|bool) – Whether to re-weigh the learned weights with the prevalence of the treatment. This overrides the use_stabilized parameter provided at initialization. If True provided, but the model was initialized with use_stabilized=False, then prevalence is calculated from data at hand, rather than the prevalence from the training data. See Also: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351790/#S6title

Returns:

If treatment_values is not supplied (None) or is a scalar, then a vector of

n_samples with a weight for each sample is returned. If treatment_values is a list/array, then a DataFrame is returned.

Return type:

pandas.Series | pandas.DataFrame

compute_weight_matrix(X, a, clip_min=None, clip_max=None, use_stabilized=None)[source]#

Computes individual weight across all possible treatment values. w_ij = 1 / Pr[A=a_j | X_i] for all individual i and treatment j.

Parameters:
  • X (pandas.DataFrame) – Covariate matrix of size (num_subjects, num_features).

  • a (pandas.Series) – Treatment assignment of size (num_subjects,).

  • clip_min (None|float) – Optional value between 0 to 0.5 to lower bound the propensity estimation by clipping it. Will clip probabilities under clip_min to this value.

  • clip_max (None|float) – Optional value between 0.5 to 1 to upper bound the propensity estimation by clipping it. Will clip probabilities above clip_max to this value.

  • use_stabilized (None|bool) – Whether to re-weigh the learned weights with the prevalence of the treatment. This overrides the use_stabilized parameter provided at initialization. If True provided, but the model was initialized with use_stabilized=False, then prevalence is calculated from data at hand, rather than the prevalence from the training data. See Also: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351790/#S6title

Returns:

A matrix of size (num_subjects, num_treatments) with weight for every individual and every

treatment.

Return type:

pandas.DataFrame

compute_propensity(X, a, treatment_values=None, clip_min=None, clip_max=None)[source]#
Parameters:
  • X (pandas.DataFrame) – Covariate matrix of size (num_subjects, num_features).

  • a (pandas.Series) – Treatment assignment of size (num_subjects,).

  • treatment_values (Any | None) – A desired value/s to extract propensity to (i.e. probabilities to what treatment value should be calculated). If not specified, then the maximal treatment value is chosen. This is since the usual case is of treatment (A=1) control (A=0) setting.

  • clip_min (None|float) – Optional value between 0 to 0.5 to lower bound the propensity estimation by clipping it. Will clip probabilities under clip_min to this value.

  • clip_max (None|float) – Optional value between 0.5 to 1 to upper bound the propensity estimation by clipping it. Will clip probabilities above clip_max to this value.

Returns:

A matrix/vector num_subjects rows and number of columns is the number of values

provided to treatment_value. The content is probabilities for every individual to have the specified treatment_value. If treatment_value is a list/vector, than a pandas.DataFrame is returned. If treatment_value is sort of scalar, than a pandas.Series is returned.

(just like slicing a DataFrame’s columns)

Return type:

pandas.DataFrame | pandas.Series

compute_propensity_matrix(X, a=None, clip_min=None, clip_max=None)[source]#
Parameters:
  • X (pandas.DataFrame) – Covariate matrix of size (num_subjects, num_features).

  • a (pandas.Series) – Treatment assignment of size (num_subjects,).

  • clip_min (None|float) – Optional value between 0 to 0.5 to lower bound the propensity estimation by clipping it. Will clip probabilities under clip_min to this value.

  • clip_max (None|float) – Optional value between 0.5 to 1 to upper bound the propensity estimation by clipping it. Will clip probabilities above clip_max to this value.

Returns:

A matrix of size (num_subjects, num_treatments) with probability for every individual and e

very treatment.

Return type:

pandas.DataFrame

estimate_population_outcome(X, a, y, w=None, treatment_values=None)[source]#

Calculates weighted population outcome for each subgroup stratified by treatment assignment.

Parameters:
  • X (pandas.DataFrame) – Covariate matrix of size (num_subjects, num_features).

  • a (pandas.Series) – Treatment assignment of size (num_subjects,).

  • y (pandas.Series) – Observed outcome of size (num_subjects,).

  • w (pandas.Series | None) – Individual (sample) weights calculated. Used to achieved unbiased average outcome. If not provided, will be calculated on the data.

  • treatment_values (Any) – Desired treatment value/s to stratify upon. Must be a subset of values found in a. If not supplied, calculates for all available treatment values.

Returns:

Series which index are treatment values, and the values are numbers - the

aggregated outcome for the strata of people whose assigned treatment is the key.

Return type:

pandas.Series[Any, float]

set_fit_request(*, a='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

a (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for a parameter in fit.

Returns:

self – The updated object.

Return type:

object