## Statistical Education in Political Science

Thanks to Jeremy Johnson, Luke Keele, Christopher Zorn, and Amber Worthington for comments.

Much of this was discussed with Fridolin Linder, so he deserves much of the credit and/or blame.

Since political science is primarily a quantitative empirical discipline, statistical education is of paramount importance. The way it is done could be (and needs to be) greatly improved. Unlike primarily experimental disciplines we cannot follow a simple data analysis recipe developed by methodologists. Our data are primarily observational, and, if we are going to use data models to infer things about the processes that gave rise to these data, we have to think long and hard about what our data models look like. This requires an understanding of probability theory, mathematical statistics, and research design! Teaching specific models (e.g. discrete choice, duration, etc.) is (relatively) unimportant. If we teach the core theoretical tools, these specific models are variations on a core set of probability models that are better understood. Specific models will be more easily and better understood.

"Cookbook statistics" are dominant in political science. That is, statistical "recipes" are typically memorized, rather than understood. An understanding of how and why those recipes were put together in the first place is fundamentally important. This lack of statistical theory manifests as a focus on software, for example, linear models with R or, time series with STATA. That is not to say that implementation is unimportant, but the basic mathematical and statistical theory should be taught first. Advances in computing power have made it easy to fit models that one does not understand very well or, in some cases, at all. This is the only path available when nearly all of the math underlying these models is glossed over in graduate education.

I think the reason for this is that many political science graduate students come to Ph.D. programs with little or no mathematical background. This is an enormous problem (which I've written about before), but it doesn't mean we should just avoid the math when teaching graduate methods. One of my advisers (Luke Keele) teaches the first political methodology class at Penn State, which covers probability and mathematical statistics. The undergraduate version of that course (in the statistics department) is two full semesters and requires multivariable calculus. He made the material accessible to a group of students with mixed mathematical backgrounds by skipping some material (e.g., moment generating functions, sufficient statistics, characteristic functions) typically assigned in undergraduate probability and mathematical statistics, and by assigning problems that involved relatively simple calculus (no integration by parts, mostly simple substitutions). This preserves the core of the curriculum: set theory, probability, conditional probability, random variables, expectation and higher moments, joint and conditional distributions, conditional expectation, limit theorems, estimation, and a bunch of hypothesis testing. I think his class did a good job covering probability theory and mathematical statistics without overwhelming students that lacked a strong mathematical background.

One of the consequences of not covering the underlying math is the use of statistics as window dressing. In quantitative empirical research it is presumably the case that the statistics are the primary evidence for the claim(s) the authors are making. However, in my experience that the intuitive plausibility of the claims are much more influential in determining whether the claims are viewed as supported or not. In a particular instance we can often identify a whole host of problems with the analysis: causal identification is weak or non-existent, measurement is lacking, model fit is un-evaluated or poor, etc. An important cause of these data analysis problems, and our frequent lack of honesty about them, is statistical education. Progress towards better statistical education is progress towards better, more reliable research. Note that under the term "statistical education" I include research design: the manner in which the data were collected largely determines the inferences you can make from the data.

This doesn't mean everyone should be a methodologist. A Ph.D. program lasts a (hopefully) finite amount of time, and each math or statistics course taken means that an opportunity to take another substantive course was missed. The actual number of statistics/math courses taken probably doesn't need to change (we have 4 required methods courses at Penn State). The content of those courses is what needs to change. In many cases new Ph.D. students might be better off taking probability and mathematical statistics in the statistics department (probably at the undergraduate level). In some sense it doesn't make sense to have political scientists, however well mathematically trained, teaching probability and mathematical statistics. There isn't really any discipline specific content at that level, and the statistical/mathematical training of all but a few political scientists pales in comparison to statisticians.

If you aren't a graduate student and you weren't fortunate enough to be in a program that recognized these issues you can (and should!) go back and learn this material. More collaboration with methodologists is also an option. Researchers In many other fields frequently collaborate with statisticians, yet this is seemingly rare in political science. There are a variety of interesting modeling issues with data collected by political scientists, hence many things of interest to statisticians. I think that one of the reasons collaboration with methodologists and/or statisticians is not as common as it should be may be that tenure committees value (or are perceived as valuing) solo work much more than collaborative work. This is unfortunate because a good division of labor markedly increases the quality of the work.

Better statistical education is doable. It is being done in a number of programs already, and more are moving in this direction. It will probably be an evolutionary (that is, generational) change, but that doesn't mean we shouldn't help it along! If you aren't in a program that provides good statistical education within the department, you can go to the statistics department. If you are a professor already you can self-teach or take the relevant classes (the benefits of working at a university!).