causallib.estimation.rlearner module

Copyright 2021 IBM Corp.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Created on April 4, 2021

class causallib.estimation.rlearner.RLearner(effect_model, outcome_model, treatment_model, outcome_covariates=None, treatment_covariates=None, effect_covariates=None, n_splits=5, refit=True, caliper=1e-06, non_parametric=False)[source]

Bases: causallib.estimation.base_estimator.IndividualOutcomeEstimator

Given the measured outcome Y, the assignment A, and the coefficients X calculate an R-learner estimator of the effect of the treatment Let e(X) be the estimated propensity score and m(X) is the estimated outcome (E[Y|X]) by an estimator, then the R-learner minimize the following:

||Y - m(X) - (A-e(X)) au(X)||^2_2 + lambda ( au)

where au(X) is a conditional average treatment effect and lambda is a regularize coefficient.

If the effect_model is Linear, than minimizing squared loss with the target variable (Y-m(X)) and the features (A-e(X))X, otherwise it corresponds to a weighted regression problem, where the weights are (A-e(X))**2. This can be used with any scikit-learn regressor that accepts sample weights

References: Nie, X., & Wager, S.(2017). Quasi - oracle estimation of heterogeneous treatment effects https://arxiv.org/abs/1712.04912

Chernozhukov, V., et al. (2018). Double/debiased machine learning for treatment and structural parameters.‏ https://academic.oup.com/ectj/article/21/1/C1/5056401

Parameters

effect_model – An sklearn model that estimate that estimate the conditional average treatment effect au(X)
outcome_model – An sklearn model that estimate the regressor Y|X (without the treatment). Note: it is recommended to use a regressor, even for binary outcome.
treatment_model – An sklearn model that estimate the treatment model or the probability to be treated, i.e A|X or P(A=1|X)
outcome_covariates (array) – Covariates to use for the outcome model. If None - all covariates passed will be used. Either list of column names or boolean mask.
treatment_covariates (array) – Covariates to use for treatment model. If None - all covariates passed will be used. Either list of column names or boolean mask.
effect_covariates (array) – Covariates to use for the effect model. If None - all covariates passed will be used. Either list of column names or boolean mask.
n_splits (int) – number of sample-splitting in the cross-fitting procedure
refit (bool) – if True - Nuisance models are fitted over the whole training set, otherwise Nuisance models are fitted per folds
non_parametric (bool) – if True - the effect_model is estimated as weighted regression task, otherwise the effect_model is considered linear.

estimate_individual_effect(X)[source]

Predict the individual treatment effect :param X: Covariate matrix of size (num_subjects, num_features). :type X: pd.DataFrame

Returns

The series is a vector in size (num_subjects) that: contains the estimated treatment effect, each row is an individual

Return type

pd.Series

estimate_individual_outcome(X, a, treatment_values=None, predict_proba=False)[source]

Estimating corrected individual counterfactual outcomes.

Parameters

X (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).
a (pd.Series) – Treatment assignment of size (num_subjects,).
treatment_values (Any) – Desired treatment value/s to use when estimating the counterfactual outcome. If not supplied, calculates for all available treatment values.
predict_proba – IGNORED. Not used, present for API consistency by convention.

Returns

DataFrame which columns are treatment values and rows: are individuals: each column is a vector size (num_samples,) that contains the estimated outcome for each individual under the treatment value in the corresponding key.

Return type

pd.DataFrame

fit(X, a, y, caliper=None)[source]

Parameters

X (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).
a (pd.Series) – Treatment assignment of size (num_subjects,).
y (pd.Series) – Observed outcome of size (num_subjects,).
caliper (None | float) – minimal value of treatment-probability residual. used to avoid division by zero when fitting the effect-model. If None - no clipping is done. The caliper is irrelevant if the effect_model is Linear.

class causallib.estimation.rlearner.VotingEstimator(estimators)[source]

Bases: object

predict(X)[source]: Aggregate results of different estimators