causallib.estimation.rlearner module
Copyright 2021 IBM Corp.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Created on April 4, 2021
- class causallib.estimation.rlearner.RLearner(effect_model, outcome_model, treatment_model, outcome_covariates=None, treatment_covariates=None, effect_covariates=None, n_splits=5, refit=True, caliper=1e-06, non_parametric=False)[source]
Bases:
causallib.estimation.base_estimator.IndividualOutcomeEstimator
Given the measured outcome Y, the assignment A, and the coefficients X calculate an R-learner estimator of the effect of the treatment Let e(X) be the estimated propensity score and m(X) is the estimated outcome (E[Y|X]) by an estimator, then the R-learner minimize the following:
||Y - m(X) - (A-e(X)) au(X)||^2_2 + lambda ( au)
where au(X) is a conditional average treatment effect and lambda is a regularize coefficient.
If the effect_model is Linear, than minimizing squared loss with the target variable (Y-m(X)) and the features (A-e(X))X, otherwise it corresponds to a weighted regression problem, where the weights are (A-e(X))**2. This can be used with any scikit-learn regressor that accepts sample weights
References: Nie, X., & Wager, S.(2017). Quasi - oracle estimation of heterogeneous treatment effects https://arxiv.org/abs/1712.04912
Chernozhukov, V., et al. (2018). Double/debiased machine learning for treatment and structural parameters. https://academic.oup.com/ectj/article/21/1/C1/5056401
- Parameters
effect_model – An sklearn model that estimate that estimate the conditional average treatment effect au(X)
outcome_model – An sklearn model that estimate the regressor Y|X (without the treatment). Note: it is recommended to use a regressor, even for binary outcome.
treatment_model – An sklearn model that estimate the treatment model or the probability to be treated, i.e A|X or P(A=1|X)
outcome_covariates (array) – Covariates to use for the outcome model. If None - all covariates passed will be used. Either list of column names or boolean mask.
treatment_covariates (array) – Covariates to use for treatment model. If None - all covariates passed will be used. Either list of column names or boolean mask.
effect_covariates (array) – Covariates to use for the effect model. If None - all covariates passed will be used. Either list of column names or boolean mask.
n_splits (int) – number of sample-splitting in the cross-fitting procedure
refit (bool) – if True - Nuisance models are fitted over the whole training set, otherwise Nuisance models are fitted per folds
non_parametric (bool) – if True - the effect_model is estimated as weighted regression task, otherwise the effect_model is considered linear.
- estimate_individual_effect(X)[source]
Predict the individual treatment effect :param X: Covariate matrix of size (num_subjects, num_features). :type X: pd.DataFrame
- Returns
- The series is a vector in size (num_subjects) that
contains the estimated treatment effect, each row is an individual
- Return type
pd.Series
- estimate_individual_outcome(X, a, treatment_values=None, predict_proba=False)[source]
Estimating corrected individual counterfactual outcomes.
- Parameters
X (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).
a (pd.Series) – Treatment assignment of size (num_subjects,).
treatment_values (Any) – Desired treatment value/s to use when estimating the counterfactual outcome. If not supplied, calculates for all available treatment values.
predict_proba – IGNORED. Not used, present for API consistency by convention.
- Returns
- DataFrame which columns are treatment values and rows
are individuals: each column is a vector size (num_samples,) that contains the estimated outcome for each individual under the treatment value in the corresponding key.
- Return type
pd.DataFrame
- fit(X, a, y, caliper=None)[source]
- Parameters
X (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).
a (pd.Series) – Treatment assignment of size (num_subjects,).
y (pd.Series) – Observed outcome of size (num_subjects,).
caliper (None | float) – minimal value of treatment-probability residual. used to avoid division by zero when fitting the effect-model. If None - no clipping is done. The caliper is irrelevant if the effect_model is Linear.