causallib.estimation.ipw module

  1. Copyright 2019 IBM Corp.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Created on Apr 25, 2018

class causallib.estimation.ipw.IPW(learner, clip_min=None, clip_max=None, use_stabilized=False, verbose=False)[source]

Bases: causallib.estimation.base_weight.PropensityEstimator, causallib.estimation.base_estimator.PopulationOutcomeEstimator

Causal model implementing inverse probability (propensity score) weighting. w_i = 1 / Pr[A=a_i|Xi]

Parameters
  • learner – Initialized sklearn model.

  • clip_min (None|float) – Optional value between 0 to 0.5 to lower bound the propensity estimation by clipping it. Will clip probabilities under clip_min to this value.

  • clip_max (None|float) – Optional value between 0.5 to 1 to upper bound the propensity estimation by clipping it. Will clip probabilities above clip_max to this value.

  • use_stabilized (bool) – Whether to re-weigh the learned weights with the prevalence of the treatment. See Also: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351790/#S6title

  • verbose (bool) – Whether to print summary statistics regarding the number of samples clipped due to clip_min and clip_max.

compute_propensity(X, a, treatment_values=None, clip_min=None, clip_max=None)[source]
Parameters
  • X (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).

  • a (pd.Series) – Treatment assignment of size (num_subjects,).

  • treatment_values (Any | None) – A desired value/s to extract propensity to (i.e. probabilities to what treatment value should be calculated). If not specified, then the maximal treatment value is chosen. This is since the usual case is of treatment (A=1) control (A=0) setting.

  • clip_min (None|float) – Optional value between 0 to 0.5 to lower bound the propensity estimation by clipping it. Will clip probabilities under clip_min to this value.

  • clip_max (None|float) – Optional value between 0.5 to 1 to upper bound the propensity estimation by clipping it. Will clip probabilities above clip_max to this value.

Returns

A matrix/vector num_subjects rows and number of columns is the number of values

provided to treatment_value. The content is probabilities for every individual to have the specified treatment_value. If treatment_value is a list/vector, than a pd.DataFrame is returned. If treatment_value is sort of scalar, than a pd.Series is returned.

(just like slicing a DataFrame’s columns)

Return type

pd.DataFrame | pd.Series

compute_propensity_matrix(X, a=None, clip_min=None, clip_max=None)[source]
Parameters
  • X (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).

  • a (pd.Series) – Treatment assignment of size (num_subjects,).

  • clip_min (None|float) – Optional value between 0 to 0.5 to lower bound the propensity estimation by clipping it. Will clip probabilities under clip_min to this value.

  • clip_max (None|float) – Optional value between 0.5 to 1 to upper bound the propensity estimation by clipping it. Will clip probabilities above clip_max to this value.

Returns

A matrix of size (num_subjects, num_treatments) with probability for every individual and e

very treatment.

Return type

pd.DataFrame

compute_weight_matrix(X, a, clip_min=None, clip_max=None, use_stabilized=None)[source]

Computes individual weight across all possible treatment values. w_ij = 1 / Pr[A=a_j | X_i] for all individual i and treatment j.

Parameters
  • X (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).

  • a (pd.Series) – Treatment assignment of size (num_subjects,).

  • clip_min (None|float) – Optional value between 0 to 0.5 to lower bound the propensity estimation by clipping it. Will clip probabilities under clip_min to this value.

  • clip_max (None|float) – Optional value between 0.5 to 1 to upper bound the propensity estimation by clipping it. Will clip probabilities above clip_max to this value.

  • use_stabilized (None|bool) – Whether to re-weigh the learned weights with the prevalence of the treatment. This overrides the use_stabilized parameter provided at initialization. If True provided, but the model was initialized with use_stabilized=False, then prevalence is calculated from data at hand, rather than the prevalence from the training data. See Also: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351790/#S6title

Returns

A matrix of size (num_subjects, num_treatments) with weight for every individual and every

treatment.

Return type

pd.DataFrame

compute_weights(X, a, treatment_values=None, clip_min=None, clip_max=None, use_stabilized=None)[source]

Computes individual weight given the individual’s treatment assignment. w_i = 1 / Pr[A=a_i|X_i] for each individual i.

Parameters
  • X (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).

  • a (pd.Series) – Treatment assignment of size (num_subjects,).

  • treatment_values (Any | None) – A desired value/s to extract weights to (i.e. weights to what treatment value should be calculated). If not specified, then the weights are chosen by the individual’s actual treatment assignment.

  • clip_min (None|float) – Optional value between 0 to 0.5 to lower bound the propensity estimation by clipping it. Will clip probabilities under clip_min to this value.

  • clip_max (None|float) – Optional value between 0.5 to 1 to upper bound the propensity estimation by clipping it. Will clip probabilities above clip_max to this value.

  • use_stabilized (None|bool) – Whether to re-weigh the learned weights with the prevalence of the treatment. This overrides the use_stabilized parameter provided at initialization. If True provided, but the model was initialized with use_stabilized=False, then prevalence is calculated from data at hand, rather than the prevalence from the training data. See Also: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351790/#S6title

Returns

If treatment_values is not supplied (None) or is a scalar, then a vector of

n_samples with a weight for each sample is returned. If treatment_values is a list/array, then a DataFrame is returned.

Return type

pd.Series | pd.DataFrame

estimate_population_outcome(X, a, y, w=None, treatment_values=None)[source]

Calculates weighted population outcome for each subgroup stratified by treatment assignment.

Parameters
  • X (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).

  • a (pd.Series) – Treatment assignment of size (num_subjects,).

  • y (pd.Series) – Observed outcome of size (num_subjects,).

  • w (pd.Series | None) – Individual (sample) weights calculated. Used to achieved unbiased average outcome. If not provided, will be calculated on the data.

  • treatment_values (Any) – Desired treatment value/s to stratify upon. Must be a subset of values found in a. If not supplied, calculates for all available treatment values.

Returns

Series which index are treatment values, and the values are numbers - the

aggregated outcome for the strata of people whose assigned treatment is the key.

Return type

pd.Series[Any, float]

fit(X, a, y=None)[source]

Trains a model to predict treatment assignment given the covariates: Pr[A|X].

Parameters
  • X (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).

  • a (pd.Series) – Treatment assignment of size (num_subjects,).

  • y – IGNORED.

Returns

A causal weight model with an inner learner fitted.

Return type

WeightEstimator