causallib.estimation.overlap_weights module
Copyright 2021 IBM Corp.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Created on Jun 09, 2021
- class causallib.estimation.overlap_weights.OverlapWeights(learner, use_stabilized=False)[source]
Bases:
causallib.estimation.ipw.IPW
Implementation of overlap (propensity score) weighting:
https://www.tandfonline.com/doi/full/10.1080/01621459.2016.1260466
A method to balance observed covariates between treatment groups in observational studies. Down-weigh observations with extreme propensity and weigh up Put less importance to observations with extreme propensity scores, and put more emphasis on observations with a central tendency towards (i.e. overlapping propensity scores).
Each unit’s weight is proportional to the probability of that unit being assigned to the opposite group: w_i = 1 - Pr[A=a_i|Xi]
This method assumes only two treatment groups exist.
- Parameters
learner – Initialized sklearn model.
use_stabilized (bool) – Whether to re-weigh the learned weights with the prevalence of the treatment. See Also: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351790/#S6title
- compute_weight_matrix(X, a, clip_min=None, clip_max=None, use_stabilized=None)[source]
Computes individual weight across all possible treatment values. w_ij = 1 - Pr[A=a_j | X_i] for all individual i and treatment j.
- Parameters
X (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).
a (pd.Series) – Treatment assignment of size (num_subjects,).
clip_min (None|float) – Lower bound for propensity scores. Better be left None.
clip_max (None|float) – Upper bound for propensity scores. Better be left None.
use_stabilized (None|bool) – Whether to re-weigh the learned weights with the prevalence of the treatment. This overrides the use_stabilized parameter provided at initialization. If True provided, but the model was initialized with use_stabilized=False, then prevalence is calculated from data at hand, rather than the prevalence from the training data. See Also: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351790/#S6title
- Returns
- A matrix of size (num_subjects, num_treatments) with weight for every individual and every
treatment.
- Return type
pd.DataFrame
- stabilize_weights(a, weight_matrix, use_stabilized=False)[source]
Adjust sample weights according to class prevalence: Pr[A=a_i] * w_i
- Parameters
weight_matrix (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).
use_stabilized (None|bool) – Whether to re-weigh the learned weights with the prevalence of the treatment. This overrides the use_stabilized parameter provided at initialization. If True provided, but the model was initialized with use_stabilized=False, then prevalence is calculated from data at hand, rather than the prevalence from the training data. See Also: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351790/#S6title
- Returns
- A matrix of size (num_subjects, num_treatments) with stabilized (if True)
weight for every individual and every treatment.
- Return type
pd.DataFrame