causallib.estimation.base_estimator module
Copyright 2019 IBM Corp.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Created on Apr 16, 2018
A module defining the various hierarchy of causal models interface. Causal models have two main tasks - predicting counterfactual outcomes and predicting effect based on these estimated outcomes. On top of it there are two resolutions we can work on: the individual level (i.e. outcome and effect for each individual in the dataset) and population level (i.e. some aggregation on the sample level). This module defines it all with: * EffectEstimator - can estimate both individual and population level effect * PopulationOutcomeEstimator - estimates aggregated outcomes on different sub-groups in the dataset. * IndividualOutcomeEstimator - estimates individual level outcomes.
- class causallib.estimation.base_estimator.EffectEstimator[source]
Bases:
sklearn.base.BaseEstimator
Class-based interface for estimating either individual-level or sample-level effect.
- CALCULATE_EFFECT = {'diff': <function EffectEstimator.<lambda>>, 'or': <function EffectEstimator.<lambda>>, 'ratio': <function EffectEstimator.<lambda>>}
- estimate_effect(outcome_1, outcome_2, effect_types='diff')[source]
Estimates an effect given two potential outcomes.
- Parameters
- Returns
- A Series if population effect (input is scalar) with index are the effect types
and values are the corresponding computed effect. A DataFrame if individual effect (input is a vector) where columns are effects types and rows are effect in each individual. Always: Value type is same is outcome_1 and outcome_2 type.
- Return type
pd.Series | pd.DataFrame
Examples
>>> from causallib.estimation.base_estimator import EffectEstimator >>> effect_estimator = EffectEstimator() >>> effect_estimator.estimate_effect(0.3, 0.6) >>> {"diff": -0.3, # 0.3 - 0.6 "ratio": 0.5, # 0.3 / 0.6 "or": 0.2857} # Odds-Ratio(0.3, 0.6)
- class causallib.estimation.base_estimator.IndividualOutcomeEstimator(learner, predict_proba=False, *args, **kwargs)[source]
Bases:
causallib.estimation.base_estimator.PopulationOutcomeEstimator
,causallib.estimation.base_estimator.EffectEstimator
Interface for estimating individual-level outcome for different treatment values.
- Parameters
learner – Initialized sklearn model.
predict_proba (bool) – In case the outcome task is classification and in case learner supports the operation, if True - prediction will utilize learner’s predict_proba or decision_function which returns a continuous matrix of size (n_samples, n_classes). If False - predict will be used and return value will be based on a vector of class classifications.
- estimate_effect(outcome1, outcome2, agg='population', effect_types='diff')[source]
Estimates an effect given two potential outcomes.
- Parameters
outcome1 (pd.Series) – A potential outcome.
outcome2 (pd.Series) – A potential outcome.
agg (str) – Either “population” or “individual” - whether to calculate individual effect or population effect.
effect_types (list[str] | str) – Any iterable of strings from the set of EffectEstimator.CALCULATE_EFFECT keys
- Returns
- A Series if population effect (input is scalar) with index are the effect types
and values are the corresponding computed effect. A DataFrame if individual effect (input is a vector) where columns are effects types and rows are effect in each individual. Always: Value type is the same as outcome_1 and outcome_2 type.
- Return type
pd.Series | pd.DataFrame
- abstract estimate_individual_outcome(X, a, treatment_values=None, predict_proba=None)[source]
Estimates individual outcome under different treatment values (interventions)
- Parameters
X (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).
a (pd.Series) – Treatment assignment of size (num_subjects,).
treatment_values (Any) – Desired treatment value/s to use when estimating the counterfactual outcome/ If not supplied, calculates for all available treatment values.
predict_proba (bool | None) – In case the outcome task is classification and in case learner supports the operation, if True - prediction will utilize learner’s predict_proba or decision_function which returns a continuous matrix of size (n_samples, n_classes). If False - predict will be used and return value will be based on a vector of class classifications. If None - parameter is ignored and behaviour is as specified when initializing the IndividualOutcomeEstimator.
- Returns
- DataFrame which columns are treatment values and rows are individuals: each column is a vector
size (num_samples,) that contains the estimated outcome for each individual under the treatment value in the corresponding key.
- Return type
pd.DataFrame
- estimate_population_outcome(X, a, y=None, treatment_values=None, agg_func='mean')[source]
Implements aggregation of individual outcome into population (sample) outcome.
- Parameters
X (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).
a (pd.Series) – Treatment assignment of size (num_subjects,).
y (pd.Series | None) – Observed outcome of size (num_subjects,).
treatment_values (Any) – Desired treatment value/s to stratify upon before aggregating individual into population outcome. If not supplied, calculates for all available treatment values.
agg_func (str) – Type of aggregation function (e.g. “mean” or “median”).
- Returns
- Series which index are treatment values, and the values are numbers - the aggregated outcome for
the strata of people whose assigned treatment is the key.
- Return type
pd.Series
- abstract fit(X, a, y, sample_weight=None)[source]
Trains a causal model from observed data.
- Parameters
X (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).
a (pd.Series) – Treatment assignment of size (num_subjects,).
y (pd.Series) – Observed outcome of size (num_subjects,).
sample_weight – To be passed to the underlining scikit-learn’s fit method.
- Returns
A causal weight model with an inner learner fitted.
- Return type
- class causallib.estimation.base_estimator.PopulationOutcomeEstimator[source]
Bases:
causallib.estimation.base_estimator.EffectEstimator
Interface for estimating aggregated outcome over different subgroups in the dataset.