Section 6.1: Error and Confidence Intervals

Fundamentals of Social Statistics by Adam J. McKee

In social scientific research utilizing samples, population parameters are often estimated using sample statistics.  For example, the mean of a sample is often used to approximate the mean of the population.  When we do this, we go into the process knowing that our sample mean will be different from the population mean (this idea applies to any statistic that we can compute based on sample data; we’re just focusing on a single statistic—the mean— right now to keep things from getting confusing).

The process is useful because, most of the time, the sample mean will be very close to the population mean.  There are times, however, that we’ll be way off the mark.  The rules of mathematics and probability allow the researcher to estimate how big the discrepancy (difference) between the sample mean and the population mean is likely to be.    One such estimation is known as the standard error of the mean.

Standard Error of the Mean

To understand how standard error works, it is important to understand what a sampling distribution is.  Suppose that we have a standardized test with a population mean of 100.00 and a population standard deviation of 15.00.  As with most research situations, we have no idea what these population values are in reality.  We will need to estimate them using sample data.  As we previously discussed, there will always be some discrepancy between any given population parameter and its corresponding sample statistic.  Say for example that we took a sample from our standardized test with a mean of 100.00.  It is very unlikely that we would get a sample with a mean of exactly 100.00.  Most samples, however, would result in a mean very near the actual mean, with extreme deviations being very rare.

If we obtain all possible samples of a particular sample size (n) from a given population, and then compute a statistic (mean, standard deviation, proportion, etc.) for each sample, the probability distribution of that statistic is called a sampling distribution.  This idea is extremely useful when we examine it in light of the central limit theorem.  The central limit theorem states that the sampling distribution of any statistic will be approximately normal if the sample size is large enough.  (As a rough rule of thumb, many researchers say that a sample size of 30 is large enough).  Remember that the means only vary from each other because of random chance due to random sampling error.  These rules do not apply to systematically biased samples.

In our hypothetical example, the mean of the sampling distribution would be 100.00, and other sample mean values would cluster around the population mean in a normal distribution.  The standard deviation of a sampling distribution of means is known as the standard error of the mean.  Of course, we do not actually know the standard deviation of the sampling distribution, so we use the standard deviation of the sample as an estimate of it.

The standard error of the mean (SEM) is an estimate of the amount of error in estimating the mean of a population based on sample data.  Recall that there will always be chance error (i.e., sampling error) that crops up in samples drawn from populations by random selection.  The standard error of the means tells us how large that error is likely to be.

When we report the value of an estimate for a population, it is often called a point estimate.  It is so called because it represents a single point estimated to be the mean of the population based on the sample data.  Some critics argue that reporting point estimates can be misleading.  After all, we know that it is unlikely that the population mean is exactly equal to the one we estimated using the sample data.  It would be better, they argue, to provide a range within which we can be fairly sure the population mean really falls.

The standard error of the mean allows us to construct a confidence interval.  A confidence interval is simply a range within which we are fairly sure (to a degree specified by the researcher) the population mean will fall.  When we report such an interval, we are said to be reporting an interval estimate rather than a single point estimate.  When a researcher reports a 95% confidence interval (CI) for a mean, we can have 95% confidence that the true mean lies within that interval.  The true mean is the mean of the population if we could somehow determine it (which we can’t, or we wouldn’t be doing sample research in the first place).  Obviously, if we had a complete set of data for the population, we would not be using samples.  Therefore, we have to estimate how good an estimate the sample mean is of the population mean.

Computing the Standard Error of the Mean

The standard error of the mean (SEM) can be computed using the following formula:

SEM = s / √n

Where:

  • SEM stands for Standard Error of the Mean.
  • “s” represents the standard deviation of the scores.
  • “√n” denotes the square root of the sample size, “n.”

The Standard Error of the Mean (SEM) is a statistical measure used to estimate the variability or dispersion of sample means. It quantifies how much the sample mean is expected to differ from the true population mean. The formula for SEM involves two components:

  1. Standard Deviation (s): This represents the spread or variation in the data points within your sample. A larger standard deviation indicates greater variability among the scores.
  2. Sample Size (n): The square root (√) of the sample size (n) is taken to standardize the SEM. It ensures that SEM is relative to the size of the sample. A larger sample size results in a smaller SEM, indicating more precise estimates of the population mean.

To calculate SEM, you divide the standard deviation (s) by the square root of the sample size (√n). This computation provides a standard error value that helps researchers establish confidence intervals for estimated means, assess the precision of their sample data, and make inferences about the population mean.

Once you’ve determined the SEM, it’s easier to construct confidence intervals for the estimated population mean. Referring back to the earlier discussion, areas under the normal distribution curve can be understood in terms of standard deviation units. The principles of the 68% Rule, the 95% Rule, and the 99% Rule apply to sampling distributions in the same way they do for regular frequency distributions.

Computing Confidence Intervals (CIs)

Here are the formulas for computing the lower limit and upper limit of Confidence Intervals (CI) at different confidence levels, along with explanations:

For a 68% Confidence Interval: Lower Limit (LL68) = M – (1.00 * SEM) Upper Limit (UL68) = M + (1.00 * SEM)

For a 95% Confidence Interval: Lower Limit (LL95) = M – (1.96 * SEM) Upper Limit (UL95) = M + (1.96 * SEM)

For a 99% Confidence Interval: Lower Limit (LL99) = M – (2.58 * SEM) Upper Limit (UL99) = M + (2.58 * SEM)

Where:

  • LL68, LL95, and LL99 represent the lower limits of the confidence intervals at 68%, 95%, and 99% confidence levels, respectively.
  • UL68, UL95, and UL99 represent the upper limits of the confidence intervals at 68%, 95%, and 99% confidence levels, respectively.
  • M stands for the sample mean.
  • SEM denotes the Standard Error of the Mean, calculated using the formula SEM = s / √n (where “s” is the sample standard deviation, and “√n” is the square root of the sample size).

Confidence Intervals (CI) provide a range within which the true population mean is likely to fall with a certain level of confidence. The formulas above are used to compute the lower and upper limits of these intervals at different confidence levels (68%, 95%, and 99%).

  • The lower limit (LL) represents the lower boundary of the interval.
  • The upper limit (UL) represents the upper boundary of the interval.
  • “M” denotes the sample mean, which serves as the best estimate of the population mean.
  • “SEM” is the Standard Error of the Mean, reflecting the precision of the sample mean estimate and calculated based on the sample’s standard deviation (“s”) and sample size (“n”).

Researchers use these formulas to establish confidence intervals for their sample means, allowing them to make probabilistic inferences about the population mean. Confidence intervals provide a degree of certainty regarding the location of the population mean while acknowledging the inherent uncertainty associated with sample research.

Key Terms

Standard Error of the Mean, Point Estimate, Confidence Interval, Interval Estimate, True Mean


[ Back | Contents | Next ]

Last Modified:  09/25/2023

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.