causallib.preprocessing.confounder_selection module

class causallib.preprocessing.confounder_selection.DoubleLASSO(treatment_lasso=None, outcome_lasso=None, mask_fn=None, threshold=1e-06, importance_getter='auto', covariates=None)[source]

Bases: causallib.preprocessing.confounder_selection._BaseConfounderSelection

A method for selecting confounders using sparse regression on both the treatment and the outcomes, and select for

Implementing “Inference on Treatment Effects after Selection among High-Dimensional Controls” https://academic.oup.com/restud/article/81/2/608/1523757

Parameters
  • treatment_lasso – Lasso learner to fit confounders and treatment. For example using scikit-learn, continuous treatment may use: Lasso(), discrete treatment may use: LogisticRegression(penalty=’l1’). If None will try to automatically assign a lasso model with cross validation.

  • outcome_lasso – Lasso learner to fit confounders and outcome. For example using scikit-learn, continuous outcome may use: Lasso(), discrete outcome may use: LogisticRegression(penalty=’l1’). If None will try to automatically assign lasso model cross-validation.

  • mask_fn – Function that takes input as two fitted lasso learners and returns a mask of the length of number of columns where True corresponds to columns that need to be selected. When set to None, the default implementation returns a mask based on non-zero coefficients in either learner. User can supply their own function, which must return a boolean array (of the length of columns of X) to indicate which columns are to be included.

  • threshold – For default mask_fn, absolute value below which a lasso coefficient is treated as zero.

  • importance_getter (str | callable) – how to obtain feature importance. either a callable that inputs an estimator, a string of ‘coef_’ or ‘feature_importance_’, or ‘auto’ will detect ‘coef_’ or ‘feature_importance_’ automatically.

  • covariates (list | np.ndarray) – Specifying a subset of columns to perform selection on. Columns in X but not in covariates will be included after transform no matter the selection. Can be either a list of column names, or an array of boolean indicators length of X, or anything compatible with pandas loc function for columns. if None then all columns are participating in the selection process. This is similar to using sklearn’s ColumnTransformer or make_column_selector.

fit(X, *args, **kwargs)[source]
class causallib.preprocessing.confounder_selection.RecursiveConfounderElimination(estimator, n_features_to_select: int = 1, step: int = 1, importance_getter='auto', covariates=None)[source]

Bases: causallib.preprocessing.confounder_selection._BaseConfounderSelection

Recursively eliminate confounders to prune confounders.

Parameters
  • estimator – Estimator to fit for every step of recursive elimination.

  • n_features_to_select (int) – The number of confounders to keep.

  • step (int) – The number of confounders to eliminate in one iteration.

  • importance_getter (str | callable) – how to obtain feature importance. either a callable that inputs an estimator, a string of ‘coef_’ or ‘feature_importance_’, or ‘auto’ will detect ‘coef_’ or ‘feature_importance_’ automatically.

  • covariates (list | np.ndarray) – Specifying a subset of columns to perform selection on. Columns in X but not in covariates will be included after transform no matter the selection. Can be either a list of column names, or an array of boolean indicators length of X, or anything compatible with pandas loc function for columns. if None then all columns are participating in the selection process. This is similar to using sklearn’s ColumnTransformer or make_column_selector.

fit(X, *args, **kwargs)[source]