There’s a strong case that chasing p-values has led science astray. Photo: erhui1979/Getty Creative Images |
For too long, many scientists’ careers have been built around the pursuit of a single statistic: p<.05.
In many scientific disciplines, that’s the threshold
beyond which study results can be declared “statistically significant,”
which is often interpreted to mean that it’s unlikely the results were a
fluke, a result of random chance.
Though this isn’t what it actually means in practice. “Statistical significance” is too often misunderstood — and misused. That’s why a trio of scientists writing in Nature this week are calling “for the entire concept of statistical significance to be abandoned.”
Their biggest argument: “Statistically significant” or “not statistically significant” is too often easily misinterpreted to mean either “the study worked” or “the study did not work.” A “true” effect can sometimes yield a p-value of greater than .05. And we know from recent years that science is rife with false-positive studies that achieved values of less than .05 (read my explainer on the replication crisis in social science for more)...
Though this isn’t what it actually means in practice. “Statistical significance” is too often misunderstood — and misused. That’s why a trio of scientists writing in Nature this week are calling “for the entire concept of statistical significance to be abandoned.”
Their biggest argument: “Statistically significant” or “not statistically significant” is too often easily misinterpreted to mean either “the study worked” or “the study did not work.” A “true” effect can sometimes yield a p-value of greater than .05. And we know from recent years that science is rife with false-positive studies that achieved values of less than .05 (read my explainer on the replication crisis in social science for more)...
Even the simplest definitions of p-values tend to get complicated, so bear with me as I break it down.
When researchers calculate a p-value, they’re putting to the test what’s known as the null hypothesis. First thing to know: This is not a test of the question the experimenter most desperately wants to answer.
Let’s say the experimenter really wants to know if eating one bar of chocolate a day leads to weight loss. To test that, they assign 50 participants to eat one bar of chocolate a day. Another 50 are commanded to abstain from the delicious stuff. Both groups are weighed before the experiment and then after, and their average weight change is compared.
The null hypothesis is the devil’s advocate argument. It states there is no difference in the weight loss of the chocolate eaters versus the chocolate abstainers.
When researchers calculate a p-value, they’re putting to the test what’s known as the null hypothesis. First thing to know: This is not a test of the question the experimenter most desperately wants to answer.
Let’s say the experimenter really wants to know if eating one bar of chocolate a day leads to weight loss. To test that, they assign 50 participants to eat one bar of chocolate a day. Another 50 are commanded to abstain from the delicious stuff. Both groups are weighed before the experiment and then after, and their average weight change is compared.
The null hypothesis is the devil’s advocate argument. It states there is no difference in the weight loss of the chocolate eaters versus the chocolate abstainers.
Source: Vox.com