causallib.preprocessing.transformers.MatchingTransformer#
- class MatchingTransformer(propensity_transform=None, caliper=None, with_replacement=True, n_neighbors=1, matching_mode='both', metric='mahalanobis', knn_backend='sklearn')[source]#
Transform data by removing poorly matched samples.
- Parameters:
propensity_transform (
causallib.transformers.PropensityTransformer) – an object for data preprocessing which adds the propensity score as a feature (default: None)caliper (
float) – maximal distance for a match to be accepted. If not defined, all matches will be accepted. If defined, some samples may not be matched and their outcomes will not be estimated. (default: None)with_replacement (
bool) – whether samples can be used multiple times for matching. If set to False, the matching process will optimize the linear sum of distances between pairs of treatment and control samples and only min(N_treatment, N_control) samples will be estimated. Matching with no replacement does not make use of the fit data and is therefore not implemented for out-of-sample data (default: True)n_neighbors (
int) – number of nearest neighbors to include in match. Must be 1 if with_replacement is False. If larger than 1, the estimate is calculated using the regress_agg_function or classify_agg_function across the n_neighbors. Note that when the caliper variable is set, some samples will have fewer than n_neighbors matches. (default: 1).matching_mode (
str) – Direction of matching: treatment_to_control, control_to_treatment or both to indicate which set should be matched to which. All sets are cross-matched in match and when with_replacement is False all matching modes coincide. With replacement there is a difference.metric (
str) – Distance metric string for calculating distance between samples. Note: if an external built knn_backend object with a different metric is supplied, metric needs to be changed to reflect that, because Matching will set its inverse covariance matrix if “mahalanobis” is set. (default: “mahalanobis”, also supported: “euclidean”)knn_backend (
strorcallable) – Backend to use for nearest neighbor search. Options are “sklearn” or a callable which returns an object implementing fit, kneighbors and set_params like the sklearn NearestNeighbors object. (default: “sklearn”).
- __init__(propensity_transform=None, caliper=None, with_replacement=True, n_neighbors=1, matching_mode='both', metric='mahalanobis', knn_backend='sklearn')[source]#
Transform data by removing poorly matched samples.
- Parameters:
propensity_transform (
causallib.transformers.PropensityTransformer) – an object for data preprocessing which adds the propensity score as a feature (default: None)caliper (
float) – maximal distance for a match to be accepted. If not defined, all matches will be accepted. If defined, some samples may not be matched and their outcomes will not be estimated. (default: None)with_replacement (
bool) – whether samples can be used multiple times for matching. If set to False, the matching process will optimize the linear sum of distances between pairs of treatment and control samples and only min(N_treatment, N_control) samples will be estimated. Matching with no replacement does not make use of the fit data and is therefore not implemented for out-of-sample data (default: True)n_neighbors (
int) – number of nearest neighbors to include in match. Must be 1 if with_replacement is False. If larger than 1, the estimate is calculated using the regress_agg_function or classify_agg_function across the n_neighbors. Note that when the caliper variable is set, some samples will have fewer than n_neighbors matches. (default: 1).matching_mode (
str) – Direction of matching: treatment_to_control, control_to_treatment or both to indicate which set should be matched to which. All sets are cross-matched in match and when with_replacement is False all matching modes coincide. With replacement there is a difference.metric (
str) – Distance metric string for calculating distance between samples. Note: if an external built knn_backend object with a different metric is supplied, metric needs to be changed to reflect that, because Matching will set its inverse covariance matrix if “mahalanobis” is set. (default: “mahalanobis”, also supported: “euclidean”)knn_backend (
strorcallable) – Backend to use for nearest neighbor search. Options are “sklearn” or a callable which returns an object implementing fit, kneighbors and set_params like the sklearn NearestNeighbors object. (default: “sklearn”).
- fit(X, a, y)[source]#
Fit data to transform
This function loads the data for matching and must be called before transform. For convenience, consider using fit_transform.
- Parameters:
X (
pandas.DataFrame) – DataFrame of shape (n,m) containing m covariates for n samples.a (
pandas.Series) – Series of shape (n,) containing discrete treatment values for the n samples.y (
pandas.Series) – Series of shape (n,) containing outcomes for the n samples.
- Returns:
Fitted object
- Return type:
self (MatchingTransformer)
- transform(X, a, y)[source]#
Transform data by restricting it to samples which are matched
Following a matching process, not all of the samples will find matches. Transforming the data by only allowing samples in treatment that have close matches in control, or in control that have close matches in treatment can make other causal methods more effective. This function will call match on the underlying Matching object.
The attribute matching_mode changes the behavior of this function. If set to control_to_treatment each control will attempt to find a match among the treated, hence the transformed data will have a maximum size of N_c + min(N_c,N_t). If set to treatment_to_control, each treatment will attempt to find a match among the control and the transformed data will have a maximum size of N_t + min(N_c,N_t). If set to both, both matching operations will be executed and if a sample succeeds in either direction it will be included, hence the maximum size of the transformed data will be len(X).
If with_replacement is False, matching_mode does not change the behavior. There will be up to min(N_c,N_t) samples in the returned DataFrame, regardless.
- Parameters:
X (
pandas.DataFrame) – DataFrame of shape (n,m) containing m covariates for n samples.a (
pandas.Series) – Series of shape (n,) containing discrete treatment values for the n samples.y (
pandas.Series) – Series of shape (n,) containing outcomes for the n samples.
- Raises:
NotImplementedError – Raised if a value of attribute matching_mode
other than the supported values is set. –
- Returns:
Covariates of samples that were matched am (pandas.Series): Treatment values of samples that were matched ym (pandas.Series): Outcome values of samples that were matched
- Return type:
Xm (pandas.DataFrame)
- find_indices_of_matched_samples(X, a)[source]#
Find indices of samples which matched successfully.
Given a DataFrame of samples X and treatment assignments a, return a list of indices of samples which matched successfully.
- Parameters:
X (
pandas.DataFrame) – Covariates of samplesa (
pandas.Series) – Treatment assignments
- Returns:
indices of matched samples to be passed to X.loc
- Return type:
- fit_transform(X, a, y)[source]#
Match data and return matched subset.
This is a convenience method, calling fit and transform at once. For details, see documentation of each function.
- Parameters:
X (
pandas.DataFrame) – DataFrame of shape (n,m) containing m covariates for n samples.a (
pandas.Series) – Series of shape (n,) containing discrete treatment values for the n samples.y (
pandas.Series) – Series of shape (n,) containing outcomes for the n samples.
- Returns:
Covariates of samples that were matched am (pandas.Series): Treatment values of samples that were matched ym (pandas.Series): Outcome values of samples that were matched
- Return type:
Xm (pandas.DataFrame)
- set_params(**kwargs)[source]#
Set parameters of matching engine. Supported parameters are:
- Keyword Arguments:
propensity_transform (
causallib.transformers.PropensityTransformer) – an object for data preprocessing which adds the propensity score as a feature (default: None)caliper (
float) – maximal distance for a match to be accepted (default: None)with_replacement (
bool) – whether samples can be used multiple times for matching (default: True)n_neighbors (
int) – number of nearest neighbors to include in match. Must be 1 if with_replacement is False (default: 1).matching_mode (
str) – Direction of matching: treatment_to_control, control_to_treatment or both to indicate which set should be matched to which. All sets are cross-matched in match and without replacement there is no difference in outcome, but with replacement there is a difference and it impacts the results of transform.metric (
str) –Distance metric string for calculating distance between samples (default: “mahalanobis”,
also supported: “euclidean”)
knn_backend (
strorcallable) – Backend to use for nearest neighbor search. Options are “sklearn” or a callable which returns an object implementing fit, kneighbors and set_params like the sklearn NearestNeighbors object. (default: “sklearn”).
- Returns:
(MatchingTransformer) object with new parameters set
- Return type:
self