causallib.estimation.base_estimator module

Copyright 2019 IBM Corp.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Created on Apr 16, 2018

A module defining the various hierarchy of causal models interface. Causal models have two main tasks - predicting counterfactual outcomes and predicting effect based on these estimated outcomes. On top of it there are two resolutions we can work on: the individual level (i.e. outcome and effect for each individual in the dataset) and population level (i.e. some aggregation on the sample level). This module defines it all with: * EffectEstimator - can estimate both individual and population level effect * PopulationOutcomeEstimator - estimates aggregated outcomes on different sub-groups in the dataset. * IndividualOutcomeEstimator - estimates individual level outcomes.

class causallib.estimation.base_estimator.EffectEstimator[source]

Bases: sklearn.base.BaseEstimator

Class-based interface for estimating either individual-level or sample-level effect.

CALCULATE_EFFECT = {'diff': <function EffectEstimator.<lambda>>, 'or': <function EffectEstimator.<lambda>>, 'ratio': <function EffectEstimator.<lambda>>}

estimate_effect(outcome_1, outcome_2, effect_types='diff')[source]

Estimates an effect given two potential outcomes.

Parameters

outcome_1 (pd.Series | float) – A potential outcome.
outcome_2 (pd.Series | float) – A potential outcome.
effect_types (list[str] | str) – Any iterable of strings from the set of EffectEstimator.CALCULATE_EFFECT keys

Returns

A Series if population effect (input is scalar) with index are the effect types: and values are the corresponding computed effect. A DataFrame if individual effect (input is a vector) where columns are effects types and rows are effect in each individual. Always: Value type is same is outcome_1 and outcome_2 type.

Return type

pd.Series | pd.DataFrame

Examples

>>> from causallib.estimation.base_estimator import EffectEstimator
>>> effect_estimator = EffectEstimator()
>>> effect_estimator.estimate_effect(0.3, 0.6)
>>> {"diff": -0.3,    # 0.3 - 0.6
     "ratio": 0.5,    # 0.3 / 0.6
     "or": 0.2857}    # Odds-Ratio(0.3, 0.6)

class causallib.estimation.base_estimator.IndividualOutcomeEstimator(learner, predict_proba=False, *args, **kwargs)[source]

Bases: causallib.estimation.base_estimator.PopulationOutcomeEstimator, causallib.estimation.base_estimator.EffectEstimator

Interface for estimating individual-level outcome for different treatment values.

Parameters

learner – Initialized sklearn model.
predict_proba (bool) – In case the outcome task is classification and in case learner supports the operation, if True - prediction will utilize learner’s predict_proba or decision_function which returns a continuous matrix of size (n_samples, n_classes). If False - predict will be used and return value will be based on a vector of class classifications.

estimate_effect(outcome1, outcome2, agg='population', effect_types='diff')[source]

Estimates an effect given two potential outcomes.

Parameters

outcome1 (pd.Series) – A potential outcome.
outcome2 (pd.Series) – A potential outcome.
agg (str) – Either “population” or “individual” - whether to calculate individual effect or population effect.
effect_types (list[str] | str) – Any iterable of strings from the set of EffectEstimator.CALCULATE_EFFECT keys

Returns

A Series if population effect (input is scalar) with index are the effect types: and values are the corresponding computed effect. A DataFrame if individual effect (input is a vector) where columns are effects types and rows are effect in each individual. Always: Value type is the same as outcome_1 and outcome_2 type.

Return type

pd.Series | pd.DataFrame

abstract estimate_individual_outcome(X, a, treatment_values=None, predict_proba=None)[source]

Estimates individual outcome under different treatment values (interventions)

Parameters

X (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).
a (pd.Series) – Treatment assignment of size (num_subjects,).
treatment_values (Any) – Desired treatment value/s to use when estimating the counterfactual outcome/ If not supplied, calculates for all available treatment values.
predict_proba (bool | None) – In case the outcome task is classification and in case learner supports the operation, if True - prediction will utilize learner’s predict_proba or decision_function which returns a continuous matrix of size (n_samples, n_classes). If False - predict will be used and return value will be based on a vector of class classifications. If None - parameter is ignored and behaviour is as specified when initializing the IndividualOutcomeEstimator.

Returns

DataFrame which columns are treatment values and rows are individuals: each column is a vector: size (num_samples,) that contains the estimated outcome for each individual under the treatment value in the corresponding key.

Return type

pd.DataFrame

estimate_population_outcome(X, a, y=None, treatment_values=None, agg_func='mean')[source]

Implements aggregation of individual outcome into population (sample) outcome.

Parameters

X (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).
a (pd.Series) – Treatment assignment of size (num_subjects,).
y (pd.Series | None) – Observed outcome of size (num_subjects,).
treatment_values (Any) – Desired treatment value/s to stratify upon before aggregating individual into population outcome. If not supplied, calculates for all available treatment values.
agg_func (str) – Type of aggregation function (e.g. “mean” or “median”).

Returns

Series which index are treatment values, and the values are numbers - the aggregated outcome for: the strata of people whose assigned treatment is the key.

Return type

pd.Series

evaluate_fit(X, y, a=None)[source]

abstract fit(X, a, y, sample_weight=None)[source]

Trains a causal model from observed data.

Parameters

X (pd.DataFrame) – Covariate matrix of size (num_subjects, num_features).
a (pd.Series) – Treatment assignment of size (num_subjects,).
y (pd.Series) – Observed outcome of size (num_subjects,).
sample_weight – To be passed to the underlining scikit-learn’s fit method.

Returns

A causal weight model with an inner learner fitted.

Return type

IndividualOutcomeEstimator

class causallib.estimation.base_estimator.PopulationOutcomeEstimator[source]

Bases: causallib.estimation.base_estimator.EffectEstimator

Interface for estimating aggregated outcome over different subgroups in the dataset.

abstract estimate_population_outcome(X, a, y, treatment_values=None)[source]