In medicine, for example, treatment effects are typically estimated via rigorous prospective studies, such as randomised controlled trials (RCTs), and their results are used to regulate the approval of treatments. Perfect Match: A Simple Method for Learning Representations For Counterfactual Inference With Neural Networks d909b/perfect_match ICLR 2019 However, current methods for training neural networks for counterfactual inference on observational data are either overly complex, limited to settings with only two available treatments, or both. (2016) and consists of 5000 randomly sampled news articles from the NY Times corpus333https://archive.ics.uci.edu/ml/datasets/bag+of+words. Shalit etal. task. Since the original TARNET was limited to the binary treatment setting, we extended the TARNET architecture to the multiple treatment setting (Figure 1). We repeated experiments on IHDP and News 1000 and 50 times, respectively. Ben-David, Shai, Blitzer, John, Crammer, Koby, Pereira, Fernando, et al. Copyright 2023 ACM, Inc. Learning representations for counterfactual inference. D.Cournapeau, M.Brucher, M.Perrot, and E.Duchesnay. In thispaper we propose a method to learn representations suitedfor counterfactual inference, and show its efcacy in bothsimulated and real world tasks. 36 0 obj << We perform extensive experiments on semi-synthetic, real-world data in settings with two and more treatments. 1) and ATE (Appendix B) for the binary IHDP and News-2 datasets, and the ^mPEHE (Eq. AhmedM Alaa, Michael Weisz, and Mihaela vander Schaar. (2017); Alaa and Schaar (2018). (2018), Balancing Neural Network (BNN) Johansson etal. Domain-adversarial training of neural networks. Mansour, Yishay, Mohri, Mehryar, and Rostamizadeh, Afshin. Counterfactual Inference With Neural Networks, Double Robust Representation Learning for Counterfactual Prediction, Enhancing Counterfactual Classification via Self-Training, Interventional and Counterfactual Inference with Diffusion Models, Continual Causal Inference with Incremental Observational Data, Explaining Deep Learning Models using Causal Inference. If you find a rendering bug, file an issue on GitHub. We used four different variants of this dataset with k=2, 4, 8, and 16 viewing devices, and =10, 10, 10, and 7, respectively. Create a folder to hold the experimental results. Learning representations for counterfactual inference from observational data is of high practical relevance for many domains, such as healthcare, public policy and economics. To compute the PEHE, we measure the mean squared error between the true difference in effect y1(n)y0(n), drawn from the noiseless underlying outcome distributions 1 and 0, and the predicted difference in effect ^y1(n)^y0(n) indexed by n over N samples: When the underlying noiseless distributions j are not known, the true difference in effect y1(n)y0(n) can be estimated using the noisy ground truth outcomes yi (Appendix A). 373 0 obj The root problem is that we do not have direct access to the true error in estimating counterfactual outcomes, only the error in estimating the observed factual outcomes. RVGz"y`'o"G0%G` jV0g$s"w)+9AP'$w}0WN 9A7qs8\*QP&l6P$@D@@@\@ u@=l{9Cp~Q8&~0k(vnP?;@ !lTv[ sj Kevin Xia - GitHub Pages PM is easy to implement, 1 Paper questions, such as "What would be the outcome if we gave this patient treatment t1?". To run BART, you need to have the R-packages, To run Causal Forests, you need to have the R-package, To reproduce the paper's figures, you need to have the R-package. PM is based on the idea of augmenting samples within a minibatch with their propensity-matched nearest neighbours. Pi,&t#,RF;NCil6 !M)Ehc! For each sample, we drew ideal potential outcomes from that Gaussian outcome distribution ~yjN(j,j)+ with N(0,0.15). You can add new benchmarks by implementing the benchmark interface, see e.g. How well does PM cope with an increasing treatment assignment bias in the observed data? We trained a Support Vector Machine (SVM) with probability estimation Pedregosa etal. In addition, using PM with the TARNET architecture outperformed the MLP (+ MLP) in almost all cases, with the exception of the low-dimensional IHDP. $ @?g7F1Q./bA!/g[Ee TEOvuJDF QDzF5O2TP?5+7WW]zBVR!vBZ/j#F y2"o|4ll{b33p>i6MwE/q {B#uXzZM;bXb(:#aJCeocD?gb]B<7%{jb0r ;oZ1KZ(OZ2[)k0"1S]^L4Yh-gp g|XK`$QCj 30G{$mt This work was partially funded by the Swiss National Science Foundation (SNSF) project No. We did so by using k head networks, one for each treatment over a set of shared base layers, each with L layers. Login. This work contains the following contributions: We introduce Perfect Match (PM), a simple methodology based on minibatch matching for learning neural representations for counterfactual inference in settings with any number of treatments. Formally, this approach is, when converged, equivalent to a nearest neighbour estimator for which we are guaranteed to have access to a perfect match, i.e. endobj The distribution of samples may therefore differ significantly between the treated group and the overall population. We therefore conclude that matching on the propensity score or a low-dimensional representation of X and using the TARNET architecture are sensible default configurations, particularly when X is high-dimensional. We use cookies to ensure that we give you the best experience on our website. Our empirical results demonstrate that the proposed Counterfactual inference is a powerful tool, capable of solving challenging problems in high-profile sectors. Tree-based methods train many weak learners to build expressive ensemble models. On the binary News-2, PM outperformed all other methods in terms of PEHE and ATE. Zemel, Rich, Wu, Yu, Swersky, Kevin, Pitassi, Toni, and Dwork, Cynthia. The results shown here are in whole or part based upon data generated by the TCGA Research Network: http://cancergenome.nih.gov/. "Would this patient have lower blood sugar had she received a different To address the treatment assignment bias inherent in observational data, we propose to perform SGD in a space that approximates that of a randomised experiment using the concept of balancing scores. If you reference or use our methodology, code or results in your work, please consider citing: This project was designed for use with Python 2.7. PMLR, 2016. Towards Interactivity and Interpretability: A Rationale-based Legal Judgment Prediction Framework, EMNLP, 2022. Doubly robust estimation of causal effects. Bag of words data set. A tag already exists with the provided branch name. Perfect Match is a simple method for learning representations for counterfactual inference with neural networks. Counterfactual inference enables one to answer "What if?" questions, such as "What would be the outcome if we gave this patient treatment t1?". BART: Bayesian additive regression trees. Observational studies are rising in importance due to the widespread In International Conference on Learning Representations. The variational fair auto encoder. More complex regression models, such as Treatment-Agnostic Representation Networks (TARNET) Shalit etal. On IHDP, the PM variants reached the best performance in terms of PEHE, and the second best ATE after CFRNET. Learning representations for counterfactual inference from observational data is of high practical relevance for many domains, such as healthcare, public policy and economics. Learning representations for counterfactual inference - ICML, 2016. This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. Quick introduction to CounterFactual Regression (CFR) We propose a new algorithmic framework for counterfactual inference which brings together ideas from domain adaptation and representation learning. This alert has been successfully added and will be sent to: You will be notified whenever a record that you have chosen has been cited. Representation Learning: What Is It and How Do You Teach It? Chipman, Hugh A, George, Edward I, and McCulloch, Robert E. Bart: Bayesian additive regression trees. Secondly, the assignment of cases to treatments is typically biased such that cases for which a given treatment is more effective are more likely to have received that treatment. stream Methods that combine a model of the outcomes and a model of the treatment propensity in a manner that is robust to misspecification of either are referred to as doubly robust Funk etal. Run the following scripts to obtain mse.txt, pehe.txt and nn_pehe.txt for use with the. Want to hear about new tools we're making? Pearl, Judea. Sign up to our mailing list for occasional updates. << /Filter /FlateDecode /Length1 1669 /Length2 8175 /Length3 0 /Length 9251 >> https://dl.acm.org/doi/abs/10.5555/3045390.3045708. Similarly, in economics, a potential application would, for example, be to determine how effective certain job programs would be based on results of past job training programs LaLonde (1986). (2017), and PD Alaa etal. %PDF-1.5 (2009) between treatment groups, and Counterfactual Regression Networks (CFRNET) Shalit etal. This repository contains the source code used to evaluate PM and most of the existing state-of-the-art methods at the time of publication of our manuscript. arXiv as responsive web pages so you In The 22nd International Conference on Artificial Intelligence and Statistics. Finally, although TARNETs trained with PM have similar asymptotic properties as kNN, we found that TARNETs trained with PM significantly outperformed kNN in all cases. DanielE Ho, Kosuke Imai, Gary King, and ElizabethA Stuart. zz !~A|66}$EPp("i n $* << /Names 366 0 R /OpenAction 483 0 R /Outlines 470 0 R /PageLabels << /Nums [ 0 << /P (0) >> 1 << /P (1) >> 4 << /P (2) >> 5 << /P (3) >> 6 << /P (4) >> 7 << /P (5) >> 11 << /P (6) >> 14 << /P (7) >> 16 << /P (8) >> 20 << /P (9) >> 25 << /P (10) >> 30 << /P (11) >> 32 << /P (12) >> 34 << /P (13) >> 35 << /P (14) >> 39 << /P (15) >> 40 << /P (16) >> 44 << /P (17) >> 49 << /P (18) >> 50 << /P (19) >> 54 << /P (20) >> 57 << /P (21) >> 61 << /P (22) >> 64 << /P (23) >> 65 << /P (24) >> 69 << /P (25) >> 70 << /P (26) >> 77 << /P (27) >> ] >> /PageMode /UseOutlines /Pages 469 0 R /Type /Catalog >> Learning Representations for Counterfactual Inference choice without knowing what would be the feedback for other possible choices. Finally, we show that learning rep-resentations that encourage similarity (also called balance)between the treatment and control populations leads to bet-ter counterfactual inference; this is in contrast to manymethods which attempt to create balance by re-weightingsamples (e.g., Bang & Robins, 2005; Dudk et al., 2011;Austin, 2011; Swaminathan Run the command line configurations from the previous step in a compute environment of your choice. /Length 3974 XBART: Accelerated Bayesian additive regression trees. We consider a setting in which we are given N i.i.d. Perfect Match: A Simple Method for Learning Representations For Counterfactual Inference With Neural Networks, Correlation MSE and NN-PEHE with PEHE (Figure 3), https://cran.r-project.org/web/packages/latex2exp/vignettes/using-latex2exp.html, The available command line parameters for runnable scripts are described in, You can add new baseline methods to the evaluation by subclassing, You can register new methods for use from the command line by adding a new entry to the. Christos Louizos, Uri Shalit, JorisM Mooij, David Sontag, Richard Zemel, and Shalit etal. compatible with any architecture, does not add computational complexity or hyperparameters, and extends to any number of treatments. We evaluated PM, ablations, baselines, and all relevant state-of-the-art methods: kNN Ho etal. We perform experiments that demonstrate that PM is robust to a high level of treatment assignment bias and outperforms a number of more complex state-of-the-art methods in inferring counterfactual outcomes across several benchmark datasets. Bayesian inference of individualized treatment effects using We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPUs used for this research. Speaker: Clayton Greenberg, Ph.D. Stay informed on the latest trending ML papers with code, research developments, libraries, methods, and datasets. Edit social preview. Batch learning from logged bandit feedback through counterfactual risk minimization. Wager, Stefan and Athey, Susan. Share on PMLR, 1130--1138. M.Blondel, P.Prettenhofer, R.Weiss, V.Dubourg, J.Vanderplas, A.Passos, ,E^-"4nhi/dX]/hs9@A$}M\#6soa0YsR/X#+k!"uqAJ3un>e-I~8@f*M9:3qc'RzH ,` Learning-representations-for-counterfactual-inference - Github The chosen architecture plays a key role in the performance of neural networks when attempting to learn representations for counterfactual inference Shalit etal. Upon convergence at the training data, neural networks trained using virtually randomised minibatches in the limit N remove any treatment assignment bias present in the data. endobj The IHDP dataset is biased because the treatment groups had a biased subset of the treated population removed Shalit etal. How do the learning dynamics of minibatch matching compare to dataset-level matching? Upon convergence, under assumption (1) and for. decisions. Weiss, Jeremy C, Kuusisto, Finn, Boyd, Kendrick, Lui, Jie, and Page, David C. Machine learning for treatment assignment: Improving individualized risk attribution. Add a In particular, the source code is designed to be easily extensible with (1) new methods and (2) new benchmark datasets. (2017) claimed that the nave approach of appending the treatment index tj may perform poorly if X is high-dimensional, because the influence of tj on the hidden layers may be lost during training. We performed experiments on several real-world and semi-synthetic datasets that showed that PM outperforms a number of more complex state-of-the-art methods in inferring counterfactual outcomes. endstream We consider the task of answering counterfactual questions such as, "Would this patient have lower blood sugar had she received a different medication?". Examples of representation-balancing methods are Balancing Neural Networks Johansson etal. We propose a new algorithmic framework for counterfactual inference which brings together ideas from domain adaptation and representation learning. Then, I will share the educational objectives for students of data science inspired by my research, and how, with interactive and innovative teaching, I have trained and will continue to train students to be successful in their scientific pursuits. The script will print all the command line configurations (1750 in total) you need to run to obtain the experimental results to reproduce the News results. Representation Learning: What Is It and How Do You Teach It? (2016). Chipman, Hugh and McCulloch, Robert. MicheleJonsson Funk, Daniel Westreich, Chris Wiesen, Til Strmer, M.Alan Brookhart, and Marie Davidian. (2017). In medicine, for example, we would be interested in using data of people that have been treated in the past to predict what medications would lead to better outcomes for new patients Shalit etal. bartMachine: Machine learning with Bayesian additive regression Learning Representations for Counterfactual Inference | OpenReview To elucidate to what degree this is the case when using the matching-based methods we compared, we evaluated the respective training dynamics of PM, PSMPM and PSMMI (Figure 3). To run the IHDP benchmark, you need to download the raw IHDP data folds as used by Johanson et al. Bio: Clayton Greenberg is a Ph.D. Learning Decomposed Representation for Counterfactual Inference