By the central limit theorem, the sampling distribution of averages or proportions from a large number of independent trials approximately follows the normal distribution. The expectation of a sample proportion or average is the corresponding population value.

The variance of the sample mean is the standard error:

The speed at which random variables converge to a normal distribution depends on their original distribution:

  • More skewed distributions require a larger number of samples.
  • A common rule of thumb is that at least 30 samples are needed for approximation by a normal distribution, assuming the distribution is relatively symmetric and free of significant outliers.

Example: If we replicate the experiment of tossing a fair coin 10000 times, how to approximate the sum of the result (H = 1, T = 0) as a normal distribution?

The mean population is , and the population standard deviation . The sample sum is is , and the standard deviation of sample sum is . Thus, the distribution we have is

To approximate discrete random variables via normal distribution, we need to apply continuity correction.

Sample Size Vs Replicates

It is important to distinguish between the effect of sample size and replicates

Consider a skewed distribution. Note that increasing the replicates down column 1 will approach the original distribution, NOT the Normal curve. However, increasing the sample size (going across the rows) approaches the Normal curve.

sample size vs replicates.png