causallib.estimation.tmle.TMLE#
- class TMLE(outcome_model, weight_model, outcome_covariates=None, weight_covariates=None, reduced=False, importance_sampling=False, glm_fit_kwargs=None)[source]#
Targeted Maximum Likelihood Estimation. A model that takes an outcome model that was optimized to predict E[Y|X,A], and “retargets” (“updates”) it to estimate E[Y^A|X] using a “clever covariate” constructed from the inverse propensity weights.
- Steps:
Fit an outcome model Y=Q(X,A).
Fit a weight model A=g(X,A).
Construct a clever covariate using g(X,A).
Fit a logistic regression model Q* to predict Y using g(X,A) as features and Q(X,A) as offset.
Predict counterfactual outcome for treatment value a Q*(X,a) by plugging in Q(X,a) as offset, g(X,a) as covariate.
Implements 4 flavours of TMLE controlled by the reduced and importance_sampling parameters. importance_sampling=True moves the clever covariate from being a feature to being a sample weight in the targeted regression. ‘reduced=True’ use a clever covariate vector of 1s and -1s, therefore only good for binary treatment. Otherwise, the clever covariate are the entire IPW matrix and can be used for multiple treatments.
References
TMLE: Van Der Laan MJ, Rubin D. Targeted maximum likelihood learning. 2006. https://doi.org/10.2202/1557-4679.1043
TMLE with a vector version of clever covariate: Schuler MS, Rose S. Targeted maximum likelihood estimation for causal inference in observational studies. 2017. https://doi.org/10.1093/aje/kww165
TMLE with a matrix version of clever covariate: Gruber S, van der Laan M. tmle: An R package for targeted maximum likelihood estimation. 2012. https://doi.org/10.18637/jss.v051.i13
TMLE with weighted regression and matrix of clever covariate: Gruber S, van der Laan M, Kennedy C. tmle: Targeted Maximum Likelihood Estimation. Cran documentation. https://cran.r-project.org/web/packages/tmle/index.html
TMLE for continuous outcomes Gruber S, van der Laan MJ. A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. 2010. https://doi.org/10.2202/1557-4679.1260
- Parameters:
outcome_model (
IndividualOutcomeEstimator) – An initial prediction of the outcomeweight_model (
PropensityEstimator) – An IPW model predicting the treatment.outcome_covariates (
numpy.ndarray) – Covariates to use for outcome model. If None - all covariates passed will be used. Either list of column names or boolean mask.weight_covariates (
numpy.ndarray) – Covariates to use for weight model. If None - all covariates passed will be used. Either list of column names or boolean mask.reduced (
bool) – If True uses a vector version of the clever covariate (rather than a matrix of all treatment values). If True enforces a binary treatment assignment.importance_sampling (
bool) – If True moves the clever covariate from being a feature to being a weight in the regression.glm_fit_kwargs (
dict) – Additional kwargs for statsmodels’ GLM.fit(). Can be used for example for refining the optimizers. see: https://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_linear_model.GLM.fit.html
- __init__(outcome_model, weight_model, outcome_covariates=None, weight_covariates=None, reduced=False, importance_sampling=False, glm_fit_kwargs=None)[source]#
Targeted Maximum Likelihood Estimation. A model that takes an outcome model that was optimized to predict E[Y|X,A], and “retargets” (“updates”) it to estimate E[Y^A|X] using a “clever covariate” constructed from the inverse propensity weights.
- Steps:
Fit an outcome model Y=Q(X,A).
Fit a weight model A=g(X,A).
Construct a clever covariate using g(X,A).
Fit a logistic regression model Q* to predict Y using g(X,A) as features and Q(X,A) as offset.
Predict counterfactual outcome for treatment value a Q*(X,a) by plugging in Q(X,a) as offset, g(X,a) as covariate.
Implements 4 flavours of TMLE controlled by the reduced and importance_sampling parameters. importance_sampling=True moves the clever covariate from being a feature to being a sample weight in the targeted regression. ‘reduced=True’ use a clever covariate vector of 1s and -1s, therefore only good for binary treatment. Otherwise, the clever covariate are the entire IPW matrix and can be used for multiple treatments.
References
TMLE: Van Der Laan MJ, Rubin D. Targeted maximum likelihood learning. 2006. https://doi.org/10.2202/1557-4679.1043
TMLE with a vector version of clever covariate: Schuler MS, Rose S. Targeted maximum likelihood estimation for causal inference in observational studies. 2017. https://doi.org/10.1093/aje/kww165
TMLE with a matrix version of clever covariate: Gruber S, van der Laan M. tmle: An R package for targeted maximum likelihood estimation. 2012. https://doi.org/10.18637/jss.v051.i13
TMLE with weighted regression and matrix of clever covariate: Gruber S, van der Laan M, Kennedy C. tmle: Targeted Maximum Likelihood Estimation. Cran documentation. https://cran.r-project.org/web/packages/tmle/index.html
TMLE for continuous outcomes Gruber S, van der Laan MJ. A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. 2010. https://doi.org/10.2202/1557-4679.1260
- Parameters:
outcome_model (
IndividualOutcomeEstimator) – An initial prediction of the outcomeweight_model (
PropensityEstimator) – An IPW model predicting the treatment.outcome_covariates (
numpy.ndarray) – Covariates to use for outcome model. If None - all covariates passed will be used. Either list of column names or boolean mask.weight_covariates (
numpy.ndarray) – Covariates to use for weight model. If None - all covariates passed will be used. Either list of column names or boolean mask.reduced (
bool) – If True uses a vector version of the clever covariate (rather than a matrix of all treatment values). If True enforces a binary treatment assignment.importance_sampling (
bool) – If True moves the clever covariate from being a feature to being a weight in the regression.glm_fit_kwargs (
dict) – Additional kwargs for statsmodels’ GLM.fit(). Can be used for example for refining the optimizers. see: https://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_linear_model.GLM.fit.html
- fit(X, a, y, refit_weight_model=True, **kwargs)[source]#
Trains a causal model from observed data.
- Parameters:
X (
pandas.DataFrame) – Covariate matrix of size (num_subjects, num_features).a (
pandas.Series) – Treatment assignment of size (num_subjects,).y (
pandas.Series) – Observed outcome of size (num_subjects,).sample_weight – To be passed to the underlining scikit-learn’s fit method.
- Returns:
A causal weight model with an inner learner fitted.
- Return type:
IndividualOutcomeEstimator
- estimate_individual_outcome(X, a, treatment_values=None, predict_proba=None)[source]#
Estimates individual outcome under different treatment values (interventions)
- Parameters:
X (
pandas.DataFrame) – Covariate matrix of size (num_subjects, num_features).a (
pandas.Series) – Treatment assignment of size (num_subjects,).treatment_values (
Any) – Desired treatment value/s to use when estimating the counterfactual outcome/ If not supplied, calculates for all available treatment values.predict_proba (
bool | None) – In case the outcome task is classification and in case learner supports the operation, if True - prediction will utilize learner’s predict_proba or decision_function which returns a continuous matrix of size (n_samples, n_classes). If False - predict will be used and return value will be based on a vector of class classifications. If None - parameter is ignored and behaviour is as specified when initializing the IndividualOutcomeEstimator.
- Returns:
- DataFrame which columns are treatment values and rows are individuals: each column is a vector
size (num_samples,) that contains the estimated outcome for each individual under the treatment value in the corresponding key.
- Return type:
- set_fit_request(*, a='$UNCHANGED$', refit_weight_model='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- Returns:
self – The updated object.
- Return type: