The T-test allows us to investigate differences in means, either between groups or compared to a pre-specified amount.
Compare to the Z-test (which uses a normal distribution), the t-test tests the test statistic against the Student’s t-distribution. The Z-test also requires a known population standard deviation, so the t-test can be more appropriate when the population standard deviation is unknown and/or the sample size is small.
One-Sample T-test
The one-sample T-test tests the mean against a specific amount. A.k.a, we have a null hypothesis of
The test statistic is very similar to the test statistic of a Z-test for mean, though we use sample standard deviation rather than the population standard deviation:
where
is the observed sample mean is the expected population mean is the sample standard deviation
The degrees of freedom of the t-distribution for one-sample test is depends on the sample size (
Paired T-test
A paired t-test is used when two samples are dependent (e.g., measurements from the same individuals taken at two different times). It then compares the means of the difference of these paired measurements.
The paired T-test relaxed the independence assumption, but it still requires that the
- population differences follows a bell curve distribution
- Each pair is independent
The test statistics is essentially the same as the one-sample version, though it is calculated on the differences between paired observations:
where
is the standard deviation of
Independent Two-Sample T-test
An independent two-sample t-test differs from a paired t-test in that it assumes independence between the two groups being compared. It also assumes that both populations have equal spread, unless a modification (like Welch’s t-test) is used.
The test statistics is
where
is the pooled standard deviation where and are the sample standard deviations and are the sample sizes
The degrees of freedom of two-sample T-test is
Checking Assumptions
Here are the check for various assumptions in different flavor of T-tests:
- independence: There is no direct statistical test to verify independence. We need to reason about the experiment design and data collection.
- equal spread (variance)
- We can visualize with (comparative) boxplot or histogram
- We can use a statistical test like the Levene’s test to compute variance
- If the assumption is not met, we can use the Welch’s t-test
- normality
- We can visualize with boxplot (check outliers) and histogram, though QQ-plots is the most effective visualization
- We can use a statistical test such as the Shapiro-Wilk Test for normality
- The normality assumption becomes less critical with larger sample sizes due to the central limit theorem