Ronald Fisher, who first introduced the
Quote
The criticism of the
Problems of the P-Value
Base Rate Fallacy
see: base rate fallacy
P-values are commonly used in hypothesis testing to decide whether to reject or retain the null hypothesis. However, this process is based on the misinterpretation that “the p-value is the chance that the null hypothesis is true”. The statement is false since p-value is based on the assumption that the null hypothesis is true and then asks how unusual the data is. And the statement flips the direction. A low p-value tells you, “If the null hypothesis is true, these results are unlikely”. It does not tell you: “If these results are true, the null hypothesis is unlikely.” 3
In other words, this misinterpretation conflates the conditional probabilities
Multiple Comparisons Problem
See: multiple comparisons problem and p-hacking
When multiple hypotheses are tested together, the probability of obtaining “significant” result by chance increases exponentially. While it’s rare for researchers to intentionally manipulate data to produce statistically significant results, they may still unconsciously select hypotheses based on whether they achieve statistical significance. 4
Lack of Information on Effect Size
See: effect size
With a sufficiently large sample size, even minuscule effects can yield statistically significant results, provided the test has adequate statistical power. This phenomenon has led some researchers to advocate for a shift in focus from p-value to effect size. 5
A commonly cited example is a study on aspirin. The study has a sample size of more than 22000 subjects. It achieves a high statistical significance of
Overemphasis on Dichotomous Decision-Making
related: binary thinking
Reliance on p-values often encourages a binary mindset of either “rejecting” or “retaining” the null hypothesis, based on arbitrary thresholds like
Instead, some researchers think that the p-value should be interpreted as a continuous variable rather than in a dichotomous way. They also propose reconceptualizing confidence intervals as “compatibility intervals.” 6
Further, some people believe that statistical significance is widespread because it sates to our human desire for certainty. However, they think that we should instead embracing uncertainty and avoid oversimplifying the would’s complexity. 1
Publication Bias
The requirement of statistical significance as a criterion can cause publication bias. In particular, statistical non significant results are less likely to be published. 7
Teaching of P-value
Despite serious concerns, hypothesis testing and p-value are still often taught in intro statistical courses, often without mentioning any above mentioned issues. One professor mentioned that it may be due to the circularity that “we teach it because it’s what we do in industry, and we do it because it’s what we were taught.” 8