Direct Outcome Prediction Model#

Also known as standardization

%matplotlib inline
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import GradientBoostingRegressor

from causallib.datasets import load_nhefs
from causallib.estimation import Standardization, StratifiedStandardization
from causallib.evaluation import evaluate

Data:#

The effect of smoking cessation on weight loss.
Data example is taken from Hernan and Robins Causal Inference Book

data = load_nhefs()
data.X.join(data.a).join(data.y).head()

	age	race	sex	smokeintensity	smokeyrs	wt71	active_1	education_2	exercise_1	exercise_2	age^2	wt71^2	smokeintensity^2	smokeyrs^2	wt82_71
0	42	1	0	30	29	79.04	0	0	0	1	1764	6247.3216	900	841	-10.093960
1	36	0	0	20	24	58.63	0	1	0	0	1296	3437.4769	400	576	2.604970
2	56	1	1	20	26	56.81	0	1	0	1	3136	3227.3761	400	676	9.414486
3	68	1	0	3	53	59.42	1	0	0	1	4624	3530.7364	9	2809	4.990117
4	40	0	0	20	19	87.09	1	1	1	0	1600	7584.6681	400	361	4.989251

“Standard” Standardization (S-Learner)#

A single model (the S in S-Learner) is trained with the treatment assignment as an additional feature.
During inference, the model assigns a single treatment value for all samples, thus predicting the potential outcome of all samples.

std = Standardization(LinearRegression())
std.fit(data.X, data.a, data.y)

Standardization(encode_treatment=False, predict_proba=False,
                learner=LinearRegression())

Outcome Prediction#

The model can be used to predict individual outcomes:
The potential outcome under each intervention (columns correspond to treatment values)

ind_outcomes = std.estimate_individual_outcome(data.X, data.a)
ind_outcomes.head()

qsmk	0	1
0	4.271756	7.734378
1	6.337239	9.799861
2	1.989807	5.452429
3	-4.261624	-0.799002
4	2.290883	5.753505

The model can be used to predict population outcomes,
By aggregating the individual outcome prediction (e.g., mean or median).
Providing agg_func which is defaulted to 'mean'

median_pop_outcomes = std.estimate_population_outcome(data.X, data.a, agg_func="median")
median_pop_outcomes.rename("median", inplace=True)

mean_pop_outcomes = std.estimate_population_outcome(data.X, data.a, agg_func="mean")
mean_pop_outcomes.rename("mean", inplace=True)

pop_outcomes = mean_pop_outcomes.to_frame().join(median_pop_outcomes)
pop_outcomes

	mean	median
qsmk
0	1.747216	2.391416
1	5.209838	5.854038

Effect Estimation#

Similarly, Effect estimation can be done on either individual or population level, depending on the outcomes provided.

Population level effect using population outcomes:

std.estimate_effect(mean_pop_outcomes[1], mean_pop_outcomes[0])

diff    3.462622
dtype: float64

Population level effect using individual outcome, but asking for aggregation (default behaviour):

std.estimate_effect(ind_outcomes[1], ind_outcomes[0], agg="population")

diff    3.462622
dtype: float64

Individual level effect using inidividual outcomes:
Since we’re using a binary treatment with linear regression on a standard model,
the difference is same for all individuals, and is equal to the coefficient of the treatment varaible.
This is usually refered to as homogenuous effect.

print(std.learner.coef_[0])
std.estimate_effect(ind_outcomes[1], ind_outcomes[0], agg="individual").head()

3.462621829225887

effect_type	diff
0	3.462622
1	3.462622
2	3.462622
3	3.462622
4	3.462622

Multiple types of effect are also supported:

std.estimate_effect(ind_outcomes[1], ind_outcomes[0], 
                    agg="individual", effect_types=["diff", "ratio"]).head()

effect_type	diff	ratio
0	3.462622	1.810585
1	3.462622	1.546393
2	3.462622	2.740179
3	3.462622	0.187488
4	3.462622	2.511479

Treament one-hot encoded#

For multi-treatment cases, where treatments are coded as 0, 1, 2, … but have no ordinal interpretation,
It is possible to make the model encode the treatment assignment vector as one hot matrix.

std = Standardization(LinearRegression(), encode_treatment=True)
std.fit(data.X, data.a, data.y)
pop_outcomes = std.estimate_population_outcome(data.X, data.a, agg_func="mean")
std.estimate_effect(mean_pop_outcomes[1], mean_pop_outcomes[0])

diff    3.462622
dtype: float64

Stratified Standarziation (T-Learner)#

While standardization above can be viewed as a “complete pooled” estimator,
as it includes both treatment groups together,
Stratified Standardization can viewed as “complete unpooled” one,
as it completly stratifies the dataset by treatment values and learns a different model for each treatment group.

std = StratifiedStandardization(LinearRegression())
std.fit(data.X, data.a, data.y)

StratifiedStandardization(learner=LinearRegression())

Checking the core learner we can see that it actually has two models (the T in T-Learner), indexed by the treatment value:

std.learner

{0: LinearRegression(), 1: LinearRegression()}

We can apply same analysis as above.

pop_outcomes = std.estimate_population_outcome(data.X, data.a, agg_func="mean")
std.estimate_effect(mean_pop_outcomes[1], mean_pop_outcomes[0])

diff    3.462622
dtype: float64

While the average population-level effect is the same, the individual effect is now heterogenuous.
However, this flexibility in modelling does not come for free, since each model now uses less data and therefore has less statistical power.

ind_outcomes = std.estimate_individual_outcome(data.X, data.a)
std.estimate_effect(ind_outcomes[1], ind_outcomes[0], agg="individual").head()

effect_type	diff
0	3.389345
1	4.774150
2	7.653848
3	0.754123
4	6.769839

We can see that internally, when asking for some potential outcome,
the model simply applies the model trained on the group of that treatment:

potential_outcome = std.estimate_individual_outcome(data.X, data.a)[1]
direct_prediction =  std.learner[1].predict(data.X)
(potential_outcome == direct_prediction).all()

True

Providing complex scheme of learners#

When supplying a single learner to the standardization above, the model simply duplicates it for each treatment value.
However, it is possible to specify a different model for each treatment value explicitly.
For example, in cases where the treated are more complex than the untreated
(because, say, background of those choosed to be treated),
it is possible to specify them with a more expressive model:

learner = {0: LinearRegression(),
           1: GradientBoostingRegressor()}
std = StratifiedStandardization(learner)
std.fit(data.X, data.a, data.y)
std.learner

{0: LinearRegression(), 1: GradientBoostingRegressor()}

ind_outcomes = std.estimate_individual_outcome(data.X, data.a)
ind_outcomes.head()

qsmk	0	1
0	4.174723	1.611459
1	5.715355	10.336035
2	1.069715	8.876902
3	-3.004029	-1.714927
4	1.303085	13.128340

std.estimate_effect(ind_outcomes[1], ind_outcomes[0])

diff    3.473088
dtype: float64

Evaluation#

Simple evaluation#

results = evaluate(std, data.X, data.a, data.y)
results.plot_all()

{'train': {'residuals': <AxesSubplot:title={'center':'Residual Plot'}, xlabel='Predicted values', ylabel='Prediction residuals'>,
  'continuous_accuracy': <AxesSubplot:title={'center':'Continuous Accuracy Plot'}, xlabel='Predicted values', ylabel='True values'>,
  'common_support': <AxesSubplot:title={'center':'Predicted Common Support'}, xlabel='Predicted $Y^0$', ylabel='Predicted $Y^1$'>}}

../_images/a1c63e40931da0740d5edc0ae784d044e9d244af6e04a888c5be2c7e51e670cd.png

Results show the results for each treatment group separetly and also combined:

results.evaluated_metrics

	expvar	mae	mse	mdae	r2
model_strata
actual	0.305271	4.748815	43.110264	3.725702	0.305271
0	0.134184	4.984365	48.001726	3.892460	0.134184
1	0.620207	4.069050	28.994209	3.366938	0.620207

Thorough evaluation#

evaluation_results = evaluate(Standardization(LinearRegression()), data.X, data.a, data.y, cv="auto")
evaluation_results.plot_all()

{'train': {'residuals': <AxesSubplot:title={'center':'Residual Plot'}, xlabel='Predicted values', ylabel='Prediction residuals'>,
  'continuous_accuracy': <AxesSubplot:title={'center':'Continuous Accuracy Plot'}, xlabel='Predicted values', ylabel='True values'>,
  'common_support': <AxesSubplot:title={'center':'Predicted Common Support'}, xlabel='Predicted $Y^0$', ylabel='Predicted $Y^1$'>},
 'valid': {'residuals': <AxesSubplot:title={'center':'Residual Plot'}, xlabel='Predicted values', ylabel='Prediction residuals'>,
  'continuous_accuracy': <AxesSubplot:title={'center':'Continuous Accuracy Plot'}, xlabel='Predicted values', ylabel='True values'>,
  'common_support': <AxesSubplot:title={'center':'Predicted Common Support'}, xlabel='Predicted $Y^0$', ylabel='Predicted $Y^1$'>}}

../_images/89a08e0cdfa885ad79c9c6de6eb4424148ff11d7de050e3e695070922b9c725f.png

../_images/bfaeb38bfc05eca422997c762b6b3e2869e5dd5e6b21dfc56649c5093c36e21f.png

results.evaluated_metrics

	expvar	mae	mse	mdae	r2
model_strata
actual	0.305271	4.748815	43.110264	3.725702	0.305271
0	0.134184	4.984365	48.001726	3.892460	0.134184
1	0.620207	4.069050	28.994209	3.366938	0.620207

results.models

StratifiedStandardization(learner=LinearRegression())