Section 6.4: Significance and Correlations
As with the mean difference tests (e.g., t and F), a researcher can examine the probability associated with a particular correlation and determine if that correlation is statistically significant. The null hypothesis, however, is different with correlation coefficients. The null hypothesis for a correlation is essentially that there is no real correlation in the population. If true, this means that any observed correlation was due to chance—sampling error. Rejecting or failing to reject the null is associated with a probability, just as with other hypothesis tests. That is, most researchers reject the null hypothesis when p < .05.
Recall that with any inferential test, we are trying to use sample data to make an inference about the true state of things in the population. If we had data for the entire population, we could find the population correlation coefficient. The symbol for the population correlation coefficient is ρ, the Greek letter “rho.” Researchers seldom have access to population data, so we must make use of sample data instead. In this usual case, we use the sample correlation coefficient (r) as an estimate of the unknown population correlation coefficient.
Ultimately, our hypothesis is about ρ (the population correlation coefficient) and not the sample correlation coefficient (r). Just as with other statistical significance tests, we are more confident in our conclusion when the sample size is larger. Therefore, the larger our sample, the easier it is to conclude that there is a significant difference between ρ and zero. If the null hypothesis is rejected, the researcher can conclude something like “There is sufficient evidence to conclude that there is a significant linear relationship between X and Y because the correlation coefficient is significantly different from zero.” This suggests that the regression line from the sample data provides a more informative model of Y given X. If the researcher fails to reject the null hypothesis, then it cannot be said that there is a high probability that the observed relationship really exists in the population. It follows that it is inappropriate to make statements about the regression line.
For any sample correlation, there are two competing hypotheses about the population correlation:
Null Hypothesis: ρ = 0
Alternate Hypothesis: ρ ≠ 0
Several different methods can be used to determine the probability level associated with a particular correlation at a given sample size. The “old school” way to do it is via a table. For whatever reason, the Excel function doesn’t provide a probability with Pearson’s r. The easiest way to test the hypothesis in Excel is to use the regression procedure and evaluate the t-test probability, which will be identical to the p-value of t in Pearson’s r. This works because both methods are a subset of the General Linear Model (GLM) and they really do the same thing.
Sample correlation coefficient, population correlation coefficient, General Linear Model (GLM)
Last Modified: 02/12/2019