An Equivalence Approach to Balance and Placebo Tests

View Researcher's Other Codes

Disclaimer: The provided code links for this paper are external links. Science Nest has no responsibility for the accuracy, legality or content of these links. Also, by downloading this code(s), you agree to comply with the terms of use as set out by the author(s) of the code(s).

Authors Erin E. Hartman, F. Daniel Hidalgo
Journal/Conference Name AMERICAN JOURNAL OF POLITICAL SCIENCE
Paper Category
Paper Abstract Recent emphasis on credible causal designs has led to the expectation that scholars justify their research designs by testing the plausibility of their causal identification assumptions, often through balance and placebo tests. Yet current practice is to use statistical tests with an inappropriate null hypothesis of no difference, which can result in equating nonsignificant differences with significant homogeneity. Instead, we argue that researchers should begin with the initial hypothesis that the data are inconsistent with a valid research design, and provide sufficient statistical evidence in favor of a valid design. When tests are correctly specified so that difference is the null and equivalence is the alternative, the problems afflicting traditional tests are alleviated. We argue that equivalence tests are better able to incorporate substantive considerations about what constitutes good balance on covariates and placebo outcomes than traditional tests. We demonstrate these advantages with applications to natural experiments. Replication Materials: The data, code, and any additional materials required to replicate all analyses in this article are available on the American Journal of Political Science Dataverse within the Harvard Dataverse Network, at: https://doi.org/10.7910/DVN/RYNSDG. Recent debates over the difficulties of causal inference, and the rise of causal empiricism, in the social sciences have spurred a growing literature on how to judge the quality of causal research designs (Austin 2008; Hansen 2008; Dunning 2010; Samii 2016) and a growing expectation that scholars defend the merits of their research designs with tests of empirically refutable implications of the assumptions justifying their inferences (Sekhon 2009, 503). For example, as evidence in favor of their designs, observational researchers are expected to provide evidence of covariate balance, and experimental researchers run randomization checks for balance on pretreatment covariates. The procedures used to check the assumptions justifying a design are just as important as those used to estimate causal effects (Rubin 2008). In this article, we argue that “tests of design,” such as balance and placebo tests, discussed in the next section, should be structured so that the responsibility lies Erin Hartman is Assistant Professor, Departments of Political Science and Statistics, UCLA, 4289 Bunche Hall, 405 Hilgard Avenue/315 Portola Plaza, Los Angeles, CA 90095-1472 (ekhartman@ucla.edu). F. Daniel Hidalgo is Associate Professor, Department of Political Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, E53-470, Cambridge, MA 02142 (dhidalgo@mit.edu). Thanks to Santiago Olivella, Jasjeet Sekhon, Philip Stark, and Hans Noel for their comments and encouragement and to Kosuke Imai’s Research Group, and SLAMM! 2016 Conference, and the International Methods Colloquium participants for valuable feedback. We also thank Shiyao Liu for research assistance. We are grateful for support for the R package development provided by the EGAP 2018 Standards Grant. Author order reflects contribution. Identification assumptions are assumptions about the data-generating process that allow for identification of causal effects, and which are usually inherently untestable, but often have testable, observable implications. with researchers to positively demonstrate that the data are consistent with their identification assumptions or theory.1 This means that researchers should begin with the initial hypothesis that the data are inconsistent with a valid research design, and only reject this hypothesis if they provide sufficient statistical evidence in favor of data consistent with a valid design. The conceptual distinction between beginning with a null hypothesis of no difference, as is standard in current practice, versus beginning with a null hypothesis of a difference, as we advocate, may seem small, but the practical implications are substantial. To implement our tests of design, we rely on the large body of literature in biostatistics on equivalence testing (Wellek 2010; Westlake 1976). We show how to apply these procedures to tests of design, discussed in the Mechanics of an Equivalence Test section. We pay particular attention to the selection of an equivalence American Journal of Political Science, Vol. 00, No. 0, xxxx 2018, Pp. 1–14 C ©2018, Midwest Political Science Association DOI: 10.1111/ajps.12387
Date of publication 2018
Code Programming Language R
Comment

Copyright Researcher 2021