Ch 16 and More to think

Author

Qi Yi

Published

May 7, 2026

Exploratory Statistics or Confirmatory Statistics?

  • Exploratory statistics is hypothesis-generating: finding novel relationships in data.

  • Confirmatory statistics is hypothesis testing: assessing the validity of pre-existing hypotheses with novel data.

  • It is problematic to mix confirmatory statistics and exploratory statistics. It is better to explicitly state which analysis is carried out in a research.

The Replication Crisis ‼️

(Recommended reading, Sönning & Werner, 2021, The Replication Crisis, Scientific Revolutions, and Linguistcs)

  • The replication crisis is defined as the inability to reproduce the (similar) results of studies.

  • What causes the replication crisis?

    • questionable research practice. For example, inflated likelihood of finding statistically significant results by running all possible statistical comparisons. The “significant results” might be significant just BY CHANCE.

    • data manipulation, i.e. include/exclude outliers

    • structural factors and institutionalized incentives, for instance a strong link between success in scholarly careers and publication records

    • overreliance on significance testing and incorrect interpretation of p-values

    • poor research design and quality that lead to underpowered studies with a fragile basis for statistical inference

    • cognitive biases that interfere with a neutral and reasoned interpretation of empirical data

    • unintentional, data-contingent analysis decisions that alter the error-rates of statistical procedures

    • weak theory, which fails to inform, guide, and constrain data-based work

    (Related works can be found in Sönning & Werner, 2021; Wiederman & Nicolai, 2018)

BUT it’s always interesting to see the other side of the question.

(Recommended reading, Maxwell et al., 2015, Is Psychology Suffering from a Replication Crisis? - What Does “Failure to Replicate” Really Mean? <– Worth reading if you are going to do a replication study)

  • Some people believe that the replication crisis is misperceived by researchers. Maxwell et al (2015) hold the opinion that interpretation on the results from a replication study should be cautious, it is more complicated than people imagined. They claimed that replication is complicated from the following perspectives:

    • decide how much statistical power is needed

    • biased effect size is obtained from the original study but is used to decide the sample size in the replication study, which leads to the failure of replication

    • if the results from the replication study are nonsignificant, to what extent these nonsignificant results support the truth of null hypothesis? In other words, though the replication study failed to confirm the original study, it also did not confirm or contradict the original study even if it had adequate statistical power.

    • On that note, statistically, rejecting the result of the original study does not mean accepting the null hypothesis. Put in a different way, nonsignificant result should not interpreted directly as providing evidence that the null hypothesis is true.

  • This paper also discussed how to use the result from a replication study to decide whether to reject the results of the original study and claim the null hypothesis is true, and what if the effect in the null hypothesis is ZERO? (BAM!) They gave solutions in from both a frequentist perspective and a Bayesian perspective. (I’ll stop here because I can’t really understand these methods ATM 🤪)

How to increase the validity of your research?

A few strategies have been put forward by Sönning and Werner (2021).

  • Preregistration of data collection and analysis plans

  • reporting guidelines

  • open science practices

  • meta-analytic thinking (Cumming 2012), a cumulative stance towards knowledge construction and the information value of individual studies

  • using alternative inferential frameworks, such as estimation (Cumming, 2012), and Bayesian inference (Kruschke, 2010), counter concerns about null hypothesis significance testing (NHST)

From Binary Thinking to Quantitative Thinking

Cumming, G. (2010). Statistics education in the social and behavioural sciences: From dichotomous thinking to estimation thinking and meta-analytic thinking. A short nice article to read.

Based on the last two points mentioned above, we take a look at the problematic NHST. Some researchers argued that: (citation from Cumming, 2010)

  • “[NHST] is a corrupt form of the scientific method” (Carver, 1978, p.378)

  • “I find it difficult to imagine a less insightful means of transiting from data to conclusions” (Loftus, 1991, p.103)

  • “reliance on merely refuting the null hypothesis… is basically unsound, poor scientific strategy, and one of the worst things that ever happened in the history of psychology” (Meehl, 1978, p. 817)

A less emotional argument from Meehl:

“The Fisherian tradition [NHST], … [which] has inhibited our search for stronger tests, so we have thrown in the sponge and abandoned hope of concocting substantive theories that will generate stronger consequences than merely ‘the Xs differ from the Ys’. We should, instead, develop quantitative theories that allow us to “generate numerical point predictions (the ideal case found in the exact sciences)”(1978, p. 824)

Key takeaways from Cumming (2010):

  • New language to promote new thinking. Instead of asking “Is there an effect?”, researchers should be encouraged to ask “How large is the effect?”. (Question from QY: Do you think linguistic theory can be quantified? How many words can be learned by proficient L2 speakers after watching a video with subtitle?)

  • Seeing wider confidence interval should encourage researchers to aim for greater precision in future experiments. (QY: It seems that in linguistics we don’t really discuss how wide the CI is in a significant result?)

Kuhn’s scientific evolutionary scheme

Sönning & Werner’s work also discussed a lot on Kuhn’s scientific revolutions. Thomas Kuhn is a philosopher of science who published “The Structure of Scientific Revolutions” in 1962, which earned him more citation than Einstein on Google Scholar.

Put in a very simple way, Kuhn proposed that the history of scientific progress undergoes a cyclical evolution.

“Normal science” means research firmly based on past scientific achievements that the scientific community agreed and thinks it will supply the foundation for further research. This kind of science shared two important characteristics:

  • ” the achievement was sufficiently unprecedented to attract an enduring group of adherents away from competing modes of scienfitic activity”

  • ” it was sufficiently open-ended to leave all sorts of problems for the redefined group of practitioners to resolve” (Kuhn, 1962, p.10)

As researchers routinely work on small problems in normal science, they find anomalies. They pointed to the shortcomings of the current theory but can be mitigated by theoretical refinements. The gap between the established theory and the observation from reality (anomalies) becomes larger and larger, the crisis occurs.

Researchers start to question the “shared set of norms” as the anomalies accumulate. The old paradigm is replaced by a new paradigm, and a new period of normal science follows. (Please refer to Kuhn’s 174 original work for a detailed/obscure explanation)

Sönning & Werner argued that the research method in linguistics is now undergoing a stage where researchers are “confidently replying on a set of techniques that constitute the current methodological norm”, passing on generation to generation.

The current methodological norm is heavily relying on NHST. These authors think that the heavily data-based linguistic research did not stress enough on linguistic theory, and how the research questions are derived. Partly because the techniques that are used in linguistics are mostly borrowed from other fields which ignores the underlying linguistic phenomenon.

The authors finally call for greater effort to establish a set of “unifying and language-specific principles for empirical works”.

(Again, strongly recommend to refer to the original paper, my interpretation is too short…)

Reference

Cumming, G. (2010). Statistics education in the social and behavioural sciences: From dichotomous thinking to estimation thinking and meta-analytic thinking. Proceedings of the 8th International Conference on Teaching Statistics. https://www.stat.auckland.ac.nz/~iase/publications/icots8/ICOTS8_C111_CUMMING.pdf

Cumming, G. (2012). Understanding the new statistics : effect sizes, confidence intervals, and meta-analysis (1st edition). Routledge. https://doi.org/10.4324/9780203807002

Kruschke, J.K. (2010). What to believe: Bayesian methods for data analysis. Trends in Cognitive Sciences 14, 293–300. https://doi.org/10.1016/j.tics.2010.05.001.

Maxwell, S. E., Lau, M. Y., & Howard, G. S. (2015). Is psychology suffering from a replication crisis? What does “failure to replicate” really mean?. The American psychologist, 70(6), 487–498. https://doi.org/10.1037/a0039400

Sönning, L. & Werner, V. (2021). The replication crisis, scientific revolutions, and linguistics. Linguistics, 59(5), 1179-1206. https://doi.org/10.1515/ling-2019-0045

Wiederman, M. W., & Nicolai, K. M. (2018). Introduction. In W. J. Koen & C. M. Bowers (Eds.), The psychology and sociology of wrongful convictions (pp. 355–375). Academic Press. https://doi.org/10.1016/B978-0-12-802655-7.00011-3