causallib.datasets.data_loader module

causallib.datasets.data_loader.load_acic16(instance=1, raw=False)[source]

Loads single dataset from the 2016 Atlantic Causal Inference Conference data challenge.

The dataset is based on real covariates but synthetically simulates the treatment assignment and potential outcomes. It therefore also contains sufficient ground truth to evaluate the effect estimation of causal models. The competition introduced 7700 simulated files (100 instances for each of the 77 data-generating-processes). We provide a smaller sample of one instance from 10 DGPs. For the full dataset, see the link below to the competition site.

If used for academic purposes, please consider citing the competition organizers:: Vincent Dorie, Jennifer Hill, Uri Shalit, Marc Scott, and Dan Cervone. “Automated versus do-it-yourself methods for causal inference: Lessons learned from a data analysis competition.” Statistical Science 34, no. 1 (2019): 43-68.

Parameters

instance (int) – number between 1-10 (inclusive), dataset to load.
raw (bool) – Whether to apply contrast (“dummify”) on non-numeric columns If True, returns a (pd.DataFrame, pd.DataFrame) tuple (one for covariates and the second with treatment assignment, noisy potential outcomes and true potential outcomes).

Returns

dictionary-like object

attributes are: X (covariates), a (treatment assignment), y (outcome),

po (ground truth potential outcomes: po[0] potential outcome for controls and: po[1] potential outcome for treated),

descriptors (feature description).

Return type

Bunch