causallib.estimation.rlearner.RLearner#
- class RLearner(effect_model, outcome_model, treatment_model, outcome_covariates=None, treatment_covariates=None, effect_covariates=None, n_splits=5, refit=True, caliper=1e-06, non_parametric=False)[source]#
Given the measured outcome Y, the assignment A, and the coefficients X calculate an R-learner estimator of the effect of the treatment Let e(X) be the estimated propensity score and m(X) is the estimated outcome (E[Y|X]) by an estimator, then the R-learner minimize the following:
||Y - m(X) - (A-e(X)) au(X)||^2_2 + lambda ( au)
where au(X) is a conditional average treatment effect and lambda is a regularize coefficient.
If the effect_model is Linear, than minimizing squared loss with the target variable (Y-m(X)) and the features (A-e(X))X, otherwise it corresponds to a weighted regression problem, where the weights are (A-e(X))**2. This can be used with any scikit-learn regressor that accepts sample weights
References: Nie, X., & Wager, S.(2017). Quasi - oracle estimation of heterogeneous treatment effects https://arxiv.org/abs/1712.04912
Chernozhukov, V., et al. (2018). Double/debiased machine learning for treatment and structural parameters. https://academic.oup.com/ectj/article/21/1/C1/5056401
- Parameters:
effect_model – An sklearn model that estimate that estimate the conditional average treatment effect au(X)
outcome_model – An sklearn model that estimate the regressor Y|X (without the treatment). Note: it is recommended to use a regressor, even for binary outcome.
treatment_model – An sklearn model that estimate the treatment model or the probability to be treated, i.e A|X or P(A=1|X)
outcome_covariates (
numpy.ndarray) – Covariates to use for the outcome model. If None - all covariates passed will be used. Either list of column names or boolean mask.treatment_covariates (
numpy.ndarray) – Covariates to use for treatment model. If None - all covariates passed will be used. Either list of column names or boolean mask.effect_covariates (
numpy.ndarray) – Covariates to use for the effect model. If None - all covariates passed will be used. Either list of column names or boolean mask.n_splits (
int) – number of sample-splitting in the cross-fitting procedurerefit (
bool) – if True - Nuisance models are fitted over the whole training set, otherwise Nuisance models are fitted per foldsnon_parametric (
bool) – if True - the effect_model is estimated as weighted regression task, otherwise the effect_model is considered linear.
- __init__(effect_model, outcome_model, treatment_model, outcome_covariates=None, treatment_covariates=None, effect_covariates=None, n_splits=5, refit=True, caliper=1e-06, non_parametric=False)[source]#
- Parameters:
effect_model – An sklearn model that estimate that estimate the conditional average treatment effect au(X)
outcome_model – An sklearn model that estimate the regressor Y|X (without the treatment). Note: it is recommended to use a regressor, even for binary outcome.
treatment_model – An sklearn model that estimate the treatment model or the probability to be treated, i.e A|X or P(A=1|X)
outcome_covariates (
numpy.ndarray) – Covariates to use for the outcome model. If None - all covariates passed will be used. Either list of column names or boolean mask.treatment_covariates (
numpy.ndarray) – Covariates to use for treatment model. If None - all covariates passed will be used. Either list of column names or boolean mask.effect_covariates (
numpy.ndarray) – Covariates to use for the effect model. If None - all covariates passed will be used. Either list of column names or boolean mask.n_splits (
int) – number of sample-splitting in the cross-fitting procedurerefit (
bool) – if True - Nuisance models are fitted over the whole training set, otherwise Nuisance models are fitted per foldsnon_parametric (
bool) – if True - the effect_model is estimated as weighted regression task, otherwise the effect_model is considered linear.
- estimate_individual_effect(X)[source]#
Predict the individual treatment effect :param X: Covariate matrix of size (num_subjects, num_features). :type X:
pandas.DataFrame- Returns:
- The series is a vector in size (num_subjects) that
contains the estimated treatment effect, each row is an individual
- Return type:
- estimate_individual_outcome(X, a, treatment_values=None, predict_proba=False)[source]#
Estimating corrected individual counterfactual outcomes.
- Parameters:
X (
pandas.DataFrame) – Covariate matrix of size (num_subjects, num_features).a (
pandas.Series) – Treatment assignment of size (num_subjects,).treatment_values (
Any) – Desired treatment value/s to use when estimating the counterfactual outcome. If not supplied, calculates for all available treatment values.predict_proba – IGNORED. Not used, present for API consistency by convention.
- Returns:
- DataFrame which columns are treatment values and rows
are individuals: each column is a vector size (num_samples,) that contains the estimated outcome for each individual under the treatment value in the corresponding key.
- Return type:
- fit(X, a, y, caliper=None)[source]#
- Parameters:
X (
pandas.DataFrame) – Covariate matrix of size (num_subjects, num_features).a (
pandas.Series) – Treatment assignment of size (num_subjects,).y (
pandas.Series) – Observed outcome of size (num_subjects,).caliper (
None | float) – minimal value of treatment-probability residual. used to avoid division by zero when fitting the effect-model. If None - no clipping is done. The caliper is irrelevant if the effect_model is Linear.
- set_fit_request(*, a='$UNCHANGED$', caliper='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- Returns:
self – The updated object.
- Return type: