Module preprocessing

This module provides several useful filters and transformers to augment the ones provided by scikit-learn.

Specifically, the various filters remove features for the following criteria:

  • Features that are almost constant (not by variance but by actual value).

  • Features that are highly correlated with other features.

  • Features that have a low variance (can deal with NaN values).

  • Features that are mostly NaN.

  • Features that are highly associated with the outcome (not just correlation)

Various transformers are provided:

  • A standard scaler that deals with Nan values.

  • A min/max scaler.

A transformer that accepts numpy arrays and turns them into pandas will be added soon.

These filters and transformers can be used as part of a scikit-learn pipeline.

Example:

This example combines a scikit-learn filter with a causallib scaler. The pipeline scales the data, then removes covariates with low variance, and then applies IPW with logistic regression.

from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import VarianceThreshold
from sklearn.pipeline import make_pipeline
from causallib.estimation import IPW
from causallib.datasets import load_nhefs
from causallib.preprocessing.transformers import MinMaxScaler

pipeline = make_pipeline(MinMaxScaler(), VarianceThreshold(0.1), LogisticRegression())
data = load_nhefs()
ipw = IPW(pipeline)
ipw.fit(data.X, data.a)
ipw.estimate_population_outcome(data.X, data.a, data.y)

Submodules

Module contents