Real-World Examples in Methods Papers

In most methods papers the properties of the methods being presented are described in theoretical terms and then (usually) applied to some real-world data. The reasons for including real-world data examples are not always clear to me. With simulated data we know what the correct answer is since we set up the whole thing. With real-world data we don't know what the correct answers are, and so this doesn't really give us much information about the performance of the method in question except insomuch as the estimate recovered is somehow more plausible/intuitive than previous estimates of the same quantity.

I think the more important reason behind applying the method to real-world data is to set an example for researchers that aren't methodologists and might use the method in their substantive research. Thus, the example provides information about the contexts in which the method can be of use. What is the estimand? What is needed for identification?

In the Acharya, Blackwell, and Sen paper "Detecting Direct Effects and Assessing Alternative Mechanisms" there is a gross disconnect between the example presented in their talk (one of two in their paper) and the assumptions necessary to identify their quantity of interest: the controlled direct effect (CDE). I think this is a problem as it sets a bad example to applied researchers. I won't be shocked to see this method being used in circumstances it shouldn't be.

The CDE gives you an estimate of the conterfactual in which a causal mediator of interest is fixed. How much would the treatment effect change as a result? Would it be completely blocked? This lets you think about the size of the indirect effect, that is, how important this mechanism is in causing the treatment effect. Like the mediation approach described in the Imai, Keele, Tingley and Yamamoto paper "Unpacking the Black Box of Causality: Learing About Causal Mechanisms from Experimental and Observational Studies"1 you need ignorable treatment assignment, and also sequential ignorability (though their version of this is weaker) for identification. Sequential ignorability (they call it sequential unconfoundeness) requires that treatment assignment is ignorable conditional on pre-treatment confounders and that the mediator is ignorable conditional on the treatment, pre-treatment confounders, and intermediate confounders, the latter of which are things caused by the treatment that affect both the mediator and the outcome. This is selection on observables for both the mediator and the treatment.

These are both really strong assumptions in many settings. Ignorable treatment assignment is unlikely to be plausible outside of an experimental context unless we have a plausibly as-if random proccess that is not under the researchers control. Most observational data are not generated by an as-if random proccess. One of the examples presented used data from Fearon and Latin's 2003 paper to estimate the average CDE of ethnic fractionalization that doesn't go through a measure of political instability.2

The idea that "assignment" of ethnic fractionalization is ignorable conditional on the "pre-treatment" confounders measured is preposterous. Similarly, it is silly to suggest that the "assignment" of the measure of political-instability they use is ignorable conditional on the pre-treatment confounders, intermediate confounders included in the model, and the measure of ethnic fractionalization.

The authors are obviously aware of this and do not attempt to defend the assumption. The justification for using such a silly example is unlcear. The "effect" of ethnic fractionalization changes relative to the coefficient in the Fearon and Latin paper, but who cares? Does anyone actually believe that this estimate can be considered causal? It sets a bad example for applied researchers. Again, I won't be surprised when I see this method applied in similar situations.

Sensitivity analysis would go some ways to making this better (as they point out in the paper) but I still think it would be better to have an example where you thought the assumptions were plausible. In their slavery example it appears that they have done a great deal of work to do precisely that.


  1. There is an observational example in the Imai, Keele, Tingley and Yamamoto paper as well, but it is drawn from a study which argues for identification of a mediator, so this makes a bit more sense.  

  2. Fearon and Latin measure political instability with "a dummy variable indicating whether the country had a three-or-greater change on the Polity IV regime index in any of the three years prior to the country-year in question."