Peer-Review is a Weak Signal of Paper Quality

Peer review seems a bit random. More random than I would like. Other people seem to have the same opinon. Some "classic" papers were rejected more than you might think is reasonable: including papers related to work later given the Nobel Prize.

I think there are obvious statistical reason for this if you think of peer-review as an estimator of the latent quality of a paper.

The number of reviewers is at most three in political science. Additionally, the number of features/items/variables extracted from the paper which are informative about the quality of said paper is relatively few compared to the amount of information that is contained in the paper or that was used to generate the paper. It is frustrating when a reviewer misses information that was in the paper that should have been used in their recommendation to the editor and wasn't or when they misinterpret what is there. Additionally, papers often rely on code and data which reviewers don't examine. Reviewers may be more or less dilligent about extracting information that is in the text (i.e., some do more work than others). All this means that the variance of the peer-review estimator is high because there aren't enough discriminating features and because there aren't many reviewers.

Then there is the issue of bias. Reviewers may be biased for or against your methodology, your theory, etc. They might think they know who you are and be biased for or against you on those grounds. This probably enters most frequently when combining the features into a recommendation (i.e., weighting the importance of novelty, rigor, etc.), or when extracting them in the first place (e.g., an unreasonably positive or negative view of a particular dimension of quality).

An additional possibly predominant source of bias is when the reviewer pool is tainted by misconceptions which cause them to view favorably work which, in fact, is not of high quality on that dimension. For example, in some areas of psychology speculative underpowered studies are common, and pass peer-review because the peers are engaged in the same practice.

All of this is made more difficult by the fact that the latent qaulity we are trying to estimate is multidimensional. We are supposed to evaluate papers on their theoretical and methodological novelty, methodological rigor, etc. PLoS ONE at least makes this a bit easier since you are only supposed to review based on methodological rigor.

Three reviewers and the limited amount of information they extract from the papers simply isn't enough to reliably estimate the quality of a paper. So it isn't surprising that peer review seems random. It would be odd for it to be otherwise. I don't know of a good way to fix this in the current system. It is yet another reason we should be doing post publication review. Passing peer review is only a weak, variable signal of quality.