Probability in Hypothesis Tests

Fundamentals of Social Statistics by Adam J. McKee

In the good old days before computers, we evaluated the test statistic by comparing it to a critical value that we looked up in a table in the back of a statistics book.  Now computer programs tell us the probability (p) associated with the test statistic we computed.  If the test statistic is below our alpha level (e.g., p < .05 or p < .01) we reject the null hypothesis.  This means that we are really evaluating the probability associated with the test statistic, not the test statistic itself.  In fact, knowing that t = 3.54 is pretty meaningless to the researcher; the probability associated with a value of 3.54 is critical in making a statistical decision.

In hypothesis testing, p is the probability of rejecting the null hypothesis when it is true for the population, which is sometimes known as a “false positive.”

In general, the purpose of a test statistic is to determine whether the result of the research study is different than you would expect from chance alone.  That is, it helps us answer the question “did what we observe in the sample reflect a real relationship, or was it due to sampling error?”  (Note that there are many other sources of potential error, but these are research design issues and are beyond the scope of this text).

When we specify how two or more variables are related in the “real world,” we have stated a research hypothesis.

Example: College educated students will earn a higher income than those without a college education.

A research hypothesis is a statement of how two or more variables are related in the population.

We can specify this in terms of the means of the groups: We would expect the mean of a sample of college-educated people to be larger than the mean income of a group that was not college educated.

We can specify this hypothesis in terms of population means (µ) as follows:

μ1 > μ2

The null hypothesis specifies the exact opposite of the research hypothesis—it specifies that no relationship exists in the population.  For example, a psychologist evaluating a new anger management therapy would state “the new therapy has no effect on anger levels” as his null hypothesis.  If he can reject the null hypothesis (determine that it is very unlikely to be true) then he can accept the alternate hypothesis—the treatment worked.

A null hypothesis is a statement that there is no relationship between the variables of interest.

The logic of hypothesis testing is that when we draw a sample from a population, we know that the sample will never exactly reflect the value of the population—chance (probability) dictates a difference, however slight.  For example, if we flip a coin ten times, we expect that it will come up heads five times and come up tails five times.  If we flip the coin ten times, we would not be surprised if it came up heads six times and tails four times.  Such a result is not very improbable.

If we flipped the same coin 1000 times and it came up heads 1000 times, we would suspect that something was amiss, such as a trick coin that was heads on both sides.  The reason we suspect something besides chance is causing the observed result is that a result of 1000 heads is very, very improbable.  The same logic applies to hypothesis testing.  If the observed result is very unlikely due to chance, we dismiss chance as a factor and suspect something else is at work.  In an experiment, that thing at work is most likely a treatment effect.

When we observe a difference in means or a correlation larger than zero in a sample, it can reflect two basic things:

      1. The difference we observe in the sample really exists in the population; or
      2. The difference we observe is due only to chance, and there really is no difference in the population.

Conventional research standards dictate that the observed difference be much larger than chance alone would cause—this increases our confidence in saying that an observed difference really exists in the population.  The focus in hypothesis testing then is determining whether a sample statistic is the result of a real relationship between the variables of interest, or it just looks that way due to a fluke sample.  If we can say that the probability of an observation being due to chance is very unlikely such as a less than 5% or a less than 1% chance, we reject chance as a cause of the observed difference.  That is, we reject the null hypothesis.  In a properly conducted experiment, if chance is not at work, then the only other alternative is that the observed difference is a real difference and would be found in the population if a census could be conducted to observe it.

Researchers will almost never claim to have proven a research hypothesis.  This is because of the probabilistic nature of hypothesis testing.  There is always a chance that they are wrong in rejecting the null hypothesis.  Most of the time, the researcher will say something to the effect that the research hypothesis was supported.

[ Back | Contents | Next ]

Last Modified:  06/03/2021

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Doc's Things and Stuff uses Accessibility Checker to monitor our website's accessibility.