Suppose that one statistic test has a 5% false positive rate; then it will have a 95% probability of not getting a false positive. However, if we perform the test
This issue is known as the multiple comparison problem. It has long troubled the scientific community due to the unavoidable need for making multiple comparisons, a focus on achieving statistical significance, and a lack of understanding of probability and statistics.
The practice of conducting multiple statistical comparisons without a pre-specified hypothesis, with the aim of finding statistically significant results, is known as p-hacking. This approach is now widely considered unethical.
Perhaps the most egregious example of the multiple comparisons problem is that a scan of dead salmon can be shown to have brain activity. Despite using a small p-value (
There are techniques to mitigate the multiple comparison problem. For example, the Bonferroni correction says that if you perform multiple comparisons, the criteria for significance should be
Related
Footnotes
-
C. Bennett, A. Baird, M. Miller, G. Wolford. Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction. Journal of Serendipitous and Unexpected Results, 1:1–5, 2010. ↩
-
The p value and the base rate fallacy — Statistics Done Wrong ↩