causallib.preprocessing.filters module

  1. Copyright 2019 IBM Corp.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class causallib.preprocessing.filters.BaseFeatureSelector[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

abstract fit(X, y=None)[source]
Parameters
  • X (pd.DataFrame) – array-like, shape [n_samples, n_features] The data used for filtering.

  • y – Passthrough for Pipeline compatibility.

Returns

BaseFeatureSelector

property selected_features
transform(X)[source]
Parameters

X (pd.DataFrame) –

Return type

pd.DataFrame

class causallib.preprocessing.filters.ConstantFilter(threshold=0.95)[source]

Bases: causallib.preprocessing.filters.BaseFeatureSelector

Removes features that are almost constant

Parameters

threshold (float) –

fit(X, y=None)[source]
Parameters
  • X (pd.DataFrame) – array-like, shape [n_samples, n_features] The data used for filtering.

  • y – Passthrough for Pipeline compatibility.

Returns

BaseFeatureSelector

class causallib.preprocessing.filters.CorrelationFilter(threshold=0.9)[source]

Bases: causallib.preprocessing.filters.BaseFeatureSelector

Removes features that are strongly correlated to other features

Parameters

threshold (float) –

fit(X, y=None)[source]
Parameters
  • X (pd.DataFrame) – array-like, shape [n_samples, n_features] The data used for filtering.

  • y – Passthrough for Pipeline compatibility.

Returns

BaseFeatureSelector

class causallib.preprocessing.filters.HrlVarFilter(threshold=0.0)[source]

Bases: causallib.preprocessing.filters.BaseFeatureSelector

Removes features with a small variance, while allowing for missing values

Parameters

threshold (float) –

fit(X, y=None)[source]
Parameters
  • X (pd.DataFrame) – array-like, shape [n_samples, n_features] The data used for filtering.

  • y – Passthrough for Pipeline compatibility.

Returns

BaseFeatureSelector

class causallib.preprocessing.filters.SparseFilter(threshold=0.2)[source]

Bases: causallib.preprocessing.filters.BaseFeatureSelector

Removes features with many missing values

Parameters

threshold (float) –

fit(X, y=None)[source]
Parameters
  • X (pd.DataFrame) – array-like, shape [n_samples, n_features] The data used for filtering.

  • y – Passthrough for Pipeline compatibility.

Returns

BaseFeatureSelector

class causallib.preprocessing.filters.StatisticalFilter(threshold=0.2, isLinear=True)[source]

Bases: causallib.preprocessing.filters.BaseFeatureSelector

Removes features according to univariate association

Parameters
fit(X, y=None)[source]
Parameters
  • X (pd.DataFrame) – array-like, shape [n_samples, n_features] The data used for filtering.

  • y – Passthrough for Pipeline compatibility.

Returns

BaseFeatureSelector

class causallib.preprocessing.filters.UnivariateAssociationFilter(is_linear=True, threshold=0.2)[source]

Bases: causallib.preprocessing.filters.BaseFeatureSelector

Removes features according to univariate association

Parameters
compute_pvals(X, y)[source]
fit(X, y=None)[source]
Parameters
  • X (pd.DataFrame) – array-like, shape [n_samples, n_features] The data used for filtering.

  • y – Passthrough for Pipeline compatibility.

Returns

BaseFeatureSelector

causallib.preprocessing.filters.track_selected_features(pipeline_stages, num_features)[source]
Parameters
  • pipeline_stages (list [tuple[str, TransformerMixin]]) – list of steps. each step is a tuple of Name and Transformer Object.

  • num_features (int) –

Return type

np.ndarray