causallib.estimation.rlearner.RLearner#

class RLearner(effect_model, outcome_model, treatment_model, outcome_covariates=None, treatment_covariates=None, effect_covariates=None, n_splits=5, refit=True, caliper=1e-06, non_parametric=False)[source]#

Given the measured outcome Y, the assignment A, and the coefficients X calculate an R-learner estimator of the effect of the treatment Let e(X) be the estimated propensity score and m(X) is the estimated outcome (E[Y|X]) by an estimator, then the R-learner minimize the following:

||Y - m(X) - (A-e(X)) au(X)||^2_2 + lambda ( au)

where au(X) is a conditional average treatment effect and lambda is a regularize coefficient.

If the effect_model is Linear, than minimizing squared loss with the target variable (Y-m(X)) and the features (A-e(X))X, otherwise it corresponds to a weighted regression problem, where the weights are (A-e(X))**2. This can be used with any scikit-learn regressor that accepts sample weights

References: Nie, X., & Wager, S.(2017). Quasi - oracle estimation of heterogeneous treatment effects https://arxiv.org/abs/1712.04912

Chernozhukov, V., et al. (2018). Double/debiased machine learning for treatment and structural parameters.‏ https://academic.oup.com/ectj/article/21/1/C1/5056401

Parameters:
  • effect_model – An sklearn model that estimate that estimate the conditional average treatment effect au(X)

  • outcome_model – An sklearn model that estimate the regressor Y|X (without the treatment). Note: it is recommended to use a regressor, even for binary outcome.

  • treatment_model – An sklearn model that estimate the treatment model or the probability to be treated, i.e A|X or P(A=1|X)

  • outcome_covariates (numpy.ndarray) – Covariates to use for the outcome model. If None - all covariates passed will be used. Either list of column names or boolean mask.

  • treatment_covariates (numpy.ndarray) – Covariates to use for treatment model. If None - all covariates passed will be used. Either list of column names or boolean mask.

  • effect_covariates (numpy.ndarray) – Covariates to use for the effect model. If None - all covariates passed will be used. Either list of column names or boolean mask.

  • n_splits (int) – number of sample-splitting in the cross-fitting procedure

  • refit (bool) – if True - Nuisance models are fitted over the whole training set, otherwise Nuisance models are fitted per folds

  • non_parametric (bool) – if True - the effect_model is estimated as weighted regression task, otherwise the effect_model is considered linear.

__init__(effect_model, outcome_model, treatment_model, outcome_covariates=None, treatment_covariates=None, effect_covariates=None, n_splits=5, refit=True, caliper=1e-06, non_parametric=False)[source]#
Parameters:
  • effect_model – An sklearn model that estimate that estimate the conditional average treatment effect au(X)

  • outcome_model – An sklearn model that estimate the regressor Y|X (without the treatment). Note: it is recommended to use a regressor, even for binary outcome.

  • treatment_model – An sklearn model that estimate the treatment model or the probability to be treated, i.e A|X or P(A=1|X)

  • outcome_covariates (numpy.ndarray) – Covariates to use for the outcome model. If None - all covariates passed will be used. Either list of column names or boolean mask.

  • treatment_covariates (numpy.ndarray) – Covariates to use for treatment model. If None - all covariates passed will be used. Either list of column names or boolean mask.

  • effect_covariates (numpy.ndarray) – Covariates to use for the effect model. If None - all covariates passed will be used. Either list of column names or boolean mask.

  • n_splits (int) – number of sample-splitting in the cross-fitting procedure

  • refit (bool) – if True - Nuisance models are fitted over the whole training set, otherwise Nuisance models are fitted per folds

  • non_parametric (bool) – if True - the effect_model is estimated as weighted regression task, otherwise the effect_model is considered linear.

estimate_individual_effect(X)[source]#

Predict the individual treatment effect :param X: Covariate matrix of size (num_subjects, num_features). :type X: pandas.DataFrame

Returns:

The series is a vector in size (num_subjects) that

contains the estimated treatment effect, each row is an individual

Return type:

pandas.Series

estimate_individual_outcome(X, a, treatment_values=None, predict_proba=False)[source]#

Estimating corrected individual counterfactual outcomes.

Parameters:
  • X (pandas.DataFrame) – Covariate matrix of size (num_subjects, num_features).

  • a (pandas.Series) – Treatment assignment of size (num_subjects,).

  • treatment_values (Any) – Desired treatment value/s to use when estimating the counterfactual outcome. If not supplied, calculates for all available treatment values.

  • predict_proba – IGNORED. Not used, present for API consistency by convention.

Returns:

DataFrame which columns are treatment values and rows

are individuals: each column is a vector size (num_samples,) that contains the estimated outcome for each individual under the treatment value in the corresponding key.

Return type:

pandas.DataFrame

fit(X, a, y, caliper=None)[source]#
Parameters:
  • X (pandas.DataFrame) – Covariate matrix of size (num_subjects, num_features).

  • a (pandas.Series) – Treatment assignment of size (num_subjects,).

  • y (pandas.Series) – Observed outcome of size (num_subjects,).

  • caliper (None | float) – minimal value of treatment-probability residual. used to avoid division by zero when fitting the effect-model. If None - no clipping is done. The caliper is irrelevant if the effect_model is Linear.

set_fit_request(*, a='$UNCHANGED$', caliper='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • a (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for a parameter in fit.

  • caliper (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for caliper parameter in fit.

Returns:

self – The updated object.

Return type:

object