Calling time on ‘statistical significance’ in science research
Scientists should stop using the term ‘statistically significant’ in their research, urges this editorial in a special issue of The American Statistician published today.
The issue, Statistical Inference in the 21st Century: A World Beyond P
Containing 43 papers by statisticians from around the world, the special issue is expected to lead to a major rethinking of statistical inference by initiating a process that ultimately moves statistical science – and science itself – into a new age.
In the issue’s editorial, Dr. Ronald Wasserstein, Executive Director of the ASA, Dr. Allen Schirm, retired from Mathematica Policy Research, and Professor Nicole Lazar of the University of Georgia said: “Based on our review of the articles in this special issue and the broader literature, we conclude that it is time to stop using the term ‘statistically significant’ entirely.
“No p-value can reveal the plausibility, presence, truth, or importance of an association or effect. Therefore, a label of statistical significance does not mean or imply that an association or effect is highly probable, real, true, or important. Nor does a label of statistical non-significance lead to the association or effect being improbable, absent, false, or unimportant.
“For the integrity of scientific publishing and research dissemination, therefore, whether a p-value passes any arbitrary threshold should not be considered at all when deciding which results to present or highlight.”
Articles in the special issue suggest alternatives and complements to p-values, and highlight the need for widespread reform of editorial, educational and institutional practices [quotes below].
While there is no single solution to replacing the outsized role that statistical significance has come to play in science, solid principles for the use of statistics do exist, say the editorial’s authors.
“The statistical community has not yet converged on a simple paradigm for the use of statistical inference in scientific research – and in fact it may never do so,” they acknowledge. “A one-size-fits-all approach to statistical inference is an inappropriate expectation. Instead, we recommend scientists conducting statistical analysis of their results should adopt what we call the ATOM model: Accept uncertainty, be Thoughtful, be Open, be Modest.”
This ASA special issue builds on the highly influential ASA Statement on P-Values and Statistical Significance which has had over 293,000 downloads and 1,700 citations, an average of over 10 per week since its release in 2016.
Need for change
“Considerable social change is needed in academic institutions, in journals, and among funding and regulatory agencies. We suggest partnering with science reform movements and reformers within disciplines, journals, funding agencies and regulators to promote and reward ‘reproducible’ science and diminish the impact of statistical significance on publication, funding and promotion.” – Goodman
“Evaluation of manuscripts for publication should be ‘results-blind’. That is, manuscripts should be assessed for suitability for publication based on the substantive importance of the research without regard to their reported results.” – Locascio
“Everything should be published in some form if whatever we measured made sense before we obtained the data because it was connected in a potentially useful way to some research question. Journal editors should be proud of their exhaustive methods sections and base their decisions about the suitability of a study for publication on the quality of its materials and methods rather than on results and conclusions; the quality of the presentation of the latter should only be judged after it is determined that the study is valuable based on its materials and methods.” – Amrhein et al.
“Reproduction of research should be encouraged by giving byline status to researchers who reproduce studies. We would like to see digital versions of papers dynamically updated to display ‘Reproduced by…’ below the original research authors’ names or ‘Not yet reproduced’ until it is reproduced.” – Hubbard and Carriquiry
“An important role for statistics in research is the summary and accumulation of information. If replications do not find the same results, this is not necessarily a crisis, but is part of a natural process by which science evolves. The goal of scientific methodology should be to direct this evolution toward ever more accurate descriptions of the world and how it works, not toward ever more publication of inferences, conclusions, or decisions.”- Amrhein et al.
Alternatives and complements to p-values
“A number of factors should no longer be subordinate to ‘p
“Words like ‘significance’ in conjunction with p-values and ‘confidence’ with interval estimates mislead users into overconfident claims. We propose researchers think of p-values as measuring the compatibility between hypotheses and data, and interpret interval estimates as ‘compatibility intervals’ rather than ‘confidence intervals’.” – Amrhein et al.
“Continuous p-values should only be used in conjunction with the ‘false positive risk (FPR)’, which answers the question: If you observe a ‘significant’ p-value after doing a single unbiased experiment, what is the probability that your result is a false positive? ” – Colquhoun
About Taylor & Francis Group
Taylor & Francis Group partners with researchers, scholarly societies, universities and libraries worldwide to bring knowledge to life. As one of the world’s leading publishers of scholarly journals, books, ebooks and reference works our content spans all areas of Humanities, Social Sciences, Behavioural Sciences, Science, and Technology and Medicine.
From our network of offices in Oxford, New York, Philadelphia, Boca Raton, Boston, Melbourne, Singapore, Beijing, Tokyo, Stockholm, New Delhi and Cape Town, Taylor & Francis staff provide local expertise and support to our editors, societies and authors and tailored, efficient customer service to our library colleagues.
Related Journal Article