causallib.estimation.overlap_weights module

  1. Copyright 2021 IBM Corp.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Created on Jun 09, 2021

class causallib.estimation.overlap_weights.OverlapWeights(learner, use_stabilized=False)[source]

Bases: causallib.estimation.ipw.IPW

Implementation of overlap (propensity score) weighting:

https://www.tandfonline.com/doi/full/10.1080/01621459.2016.1260466

A method to balance observed covariates between treatment groups in observational studies. Down-weigh observations with extreme propensity and weigh up Put less importance to observations with extreme propensity scores, and put more emphasis on observations with a central tendency towards (i.e. overlapping propensity scores).

Each unit’s weight is proportional to the probability of that unit being assigned to the opposite group: w_i = 1 - Pr[A=a_i|Xi]

This method assumes only two treatment groups exist.

Parameters
compute_weight_matrix(X, a, clip_min=None, clip_max=None, use_stabilized=None)[source]

Computes individual weight across all possible treatment values. w_ij = 1 - Pr[A=a_j | X_i] for all individual i and treatment j.

Parameters
  • X (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).

  • a (pd.Series) – Treatment assignment of size (num_subjects,).

  • clip_min (None|float) – Lower bound for propensity scores. Better be left None.

  • clip_max (None|float) – Upper bound for propensity scores. Better be left None.

  • use_stabilized (None|bool) – Whether to re-weigh the learned weights with the prevalence of the treatment. This overrides the use_stabilized parameter provided at initialization. If True provided, but the model was initialized with use_stabilized=False, then prevalence is calculated from data at hand, rather than the prevalence from the training data. See Also: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351790/#S6title

Returns

A matrix of size (num_subjects, num_treatments) with weight for every individual and every

treatment.

Return type

pd.DataFrame

stabilize_weights(a, weight_matrix, use_stabilized=False)[source]

Adjust sample weights according to class prevalence: Pr[A=a_i] * w_i

Parameters
  • weight_matrix (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).

  • use_stabilized (None|bool) – Whether to re-weigh the learned weights with the prevalence of the treatment. This overrides the use_stabilized parameter provided at initialization. If True provided, but the model was initialized with use_stabilized=False, then prevalence is calculated from data at hand, rather than the prevalence from the training data. See Also: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351790/#S6title

Returns

A matrix of size (num_subjects, num_treatments) with stabilized (if True)

weight for every individual and every treatment.

Return type

pd.DataFrame