In today’s session, we will cover
Standard errors & confidence intervals
Null hypotheses
Using t to measure the incompatibility with the null hypotheses
Using the t-distribution to compute p-values
Review
Last week, we introduced what inferential statistics is, how to use Cohen’s d and Pearson’s r to measure effect size.
\[ Cohen's \,d = \frac{\bar{x_1}-\bar{x_2}}{s} \]
\(\bar{x_1}\) and \(\bar{x_2}\) refer to the mean of groups, s refers to the standard deviation of both groups together. Cohen (1988) suggested when d=|0.2|,|0.5| and |0.8|, the effect size are “small”, “medium”, “large”.
\[ Pearson's \ r \, = \frac{s^2_{x,y}}{s_x s_y} \]
\(s^2_{x,y} = \frac{\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y})}{n-1}\) The numerator describes the co-variance of x and y, meaning how much two groups of data vary together. The denominator contains the standard deviation of x and y variables multiplied by each other.
Standard Error
The goal of statistic analysis is to estimate a characteristic of the population, which may be called a parameter or estimand. To achieve this goal, we take a sample of data from the population.
For example, we want to know the average height of adult women in New Zealand. The population parameter is average height of adult women in NZ (μ). We want to use the average height from a random group of adult women to estimate the actual average height of women in NZ. Therefore, we randomly choose 10,000 women to measure their height. We obtain the average height (\(\bar{x}\)), which can be called an estimate. (There are around 2.7 million adult women in New Zealand.)
Will the average height of these 10,000 women be equal to the average height of the whole population in New Zealand? NO!
There is an estimation error. All estimates can be considered to have an error. Indeed, we almost never know the estimation error because we don’t know the population mean. Therefore, we need to settle for something less specific, that is the standard error. Roughly speaking, “the standard error is not the actual estimation error in our study, but instead tells us for studies using the methods that we are using and with a similar population as we are studying, what would the magnitude of the estimation error typically be” source
\[ SE= \frac{s}{\sqrt{N}} \]
s stands for standard deviation of the sample data. N is the sample size. When N is bigger, the SE tends to be smaller. Since normally we can’t control the variation of the data, we try to increase the sample size to control the SE. If SE is large, that means the estimate is not precise, you are more uncertain in your estimates.
Confidence Intervals
A confidence interval is a form of internal estimate, meaning that instead of presenting our findings as a single number, which we know is not exactly correct, we present our findings as an interval. For example, we may report an interval of [5.3, 7.1] for the unemployment percentage in a particular population. This interval is constructed so that it has a high chance (probability) of containing the true value of interest, e.g. the true unemployment rate in the population. (cite from here)
Standard errors can be used to calculate 95% confidence intervals (CI) .
\[ CI = [\bar{x}-1.96*SE, \ \bar{x}+1.96*SE] \] \(\bar{x}\) is the sample estimate, in this case, the mean. The value 1.96 (z-score) varies depends on the desired coverage probability. In this case, a 95% CI corresponds to a z-score of 1.96. A z-score indicates how many SDs a datapoint is away from the mean of the sample.
Some helpful videos to understand what z-score is:
https://numiqo.com/tutorial/z-score
https://numiqo.com/tutorial/z-distribution
More… according to the frequentism statistical philosophy, “objective probability cannot plausibly be assigned to singular events; a coin toss is either head or tail”. Indeed, we will never know whether the population parameter falls in the CI. But when we repeat the experiment for infinite times, 95% of the chance that the CI will contain the population parameter (Winter, 2020, p.165) (QY confused here, if an experiment is conducted infinite times, then there will be infinite CIs, which CI would likely to contain the population parameter? Also, what does “singular events” mean?)
Null Hypotheses
In inferential statistical hypothesis testing, the hypothesis that we formally test is called the null hypothesis. It is an assumption about the population. The alternative hypothesis is a proposed explanation for what happens if the null hypothesis is wrong (source)
Conventionally, a null hypothesis is denoted as \(H_0\) and an alternative hypothesis can be denoted as \(H_1\) or \(H_A\).
\(H_0\) can be “there is NO difference between groups” (\(H_0\) : \(\mu_1\)=\(\mu_2\)) \(H_1\) can be “there is difference between groups” (\(H_1\) : \(\mu_1\neq\mu_2\)) Example: \(H_0\) can be “There is no difference in the word frequency effect between L1 and L2 English speakers” \(H_0\) can be “There is difference in the word frequency effect between L1 and L2 English speakers”
Null hypothesis significance test (NHST) is to measure the incompatibility of the sample data with the null hypothesis.
Null hypothesis is “never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exit only in order the give the facts a chance of disproving the null hypothesis” (Fisher, 1935/1971, p16).
t-Test, t-Distribution and P-value
Test statistic is a key element to the hypothesis test, which measures the evidence against the null hypothesis. Simply saying, a test statistic takes the entire sample dataset and deduct it into a single number.
t-test can be used to compare whether there is a significant difference between two groups.There are three types of t-test: 1) one sample t-test, 2) independent samples t-test, 3) paired samples t-test
one sample t-test: compare the mean of a sample with a known reference mean (e.g. we know the average height of female is 165cm, we measure female’s height at VIC and compare to 165cm)
independent samples t-test: compare the means of two independent samples (e.g. the word frequency effect on L1 and L2 speakers)
paired samples t-test: compare the means of two dependent groups (e.g. the effectiveness of TBLT on a classroom before/after the intervention)
Winter gave the below formula for calculating t \[t=\frac{\bar{x_1}-\bar{x_2}}{SE}=\frac{\bar{x_1}-\bar{x_2}}{\frac{s}{\sqrt{N}}}\]
Namely, the bigger the sample size is (N), the smaller the smaller the standard error will be, the larger the t will be. Then the question is, what does a large t or a small t mean? To understand this, we need to know the statistic t actually follows a t-distribution, which is very similar to a normal distribution.
We can use t-distribution to calculate the p-value. Use the t-value table to find out the critical t-value, compare the critical t-value with the calculated t-value. Critical t-value is based on the significance level and degree of freedom. For more details, please check here (for more detail on t-distribution here)
Or just use R…magic codes
Reference
Fisher, Ronald Aylmer. The Design of Experiments. [Ninth edition]. New York: Hafner, 1971. Print.