When variables are correlated, we can use knowledge of one to predict the value of the other. That is, we can study the relationship between variables that are known, and use our knowledge of that relationship to predict future values when one of the variables is not known. For example, a university admissions office might use your SAT score to predict your success in college based on what they observed about the relationship of SAT scores to college success in past students. The statistical technique that allows us to derive an equation to make such predictions is known as regression.
In linear regression, a line is mathematically fitted to the dots on a scattergram. The idea of the line is to split the field of dots down the middle, much like the seam on a football.
By using a mathematical equation derived from the regression line, we can derive a regression equation that allows us to predict the value of an unknown score (y) using information from a known variable (x). A straight line can be represented mathematically by the following formula:[INSERT EQUATION HERE]
Where “Y Hat” stands in for the value we want to predict, a is the intercept (the score where the regression line meets the vertical axis) and b is the slope of the line (the direction and angle of the line).
Thus to predict a score for Y given X, we first need to obtain values for a and b. In the case of multiple regression, this involves some very complex math and is best accomplished using a computer.
Let us say, for example, a statistics professor is curious as to whether scores on the first test are a valid predictor of scores on a final exam. He enters the scores into a spreadsheet and generates the scatterplot illustrated below. We can determine several things by examining this scatterplot. First, notice the line running through the dots. This is the regression line (also known as a trend line). The dots fall “pretty close” to the line, but they do not fall along it exactly. From this, we can tell that the correlation is a strong one, but it is not perfect.
Also, note that R-square is reported. This tells us that 83.88% of the variance in Y is explained by knowing X.
[ Back | Contents | Next ]
Last Modified: 07/02/2018