Advanced Statistical Analysis:
Adam J. McKee, Ph.D.
This content is released as a draft version for comment by the scholarly community. Please do not distribute as is.
Section 2.2: Multiple Regression
Multiple regression is simply a statistical method for studying the relationship between a single dependent variable (DV) and two or more independent variables (IV). It is hands down the winner of the most popular statistic contest across the social science disciplines. It is a logical extension of simple linear regression and has a regression equation that generates a regression line. The logic of adding those extra variables is pretty simple: If you have more information, you are likely to make better predictions. Of course, this assumes that the information is relevant (related).
Let’s consider the regression equation when two predictor variables are used:
Y = a +b1X1 + b2X2
Here, the subscripts mean that we are using the information from the first predictor, and then adding the information from the second predictor. Note that the way the equation works is additive; you are combining information from two variables to form a single line. The line is pushed up and down according to the effect of each variable. Because it is a single line, there is only one intercept (a), regardless of the number of predictor variables.
Note that the various b symbols in the equation stand for slope coefficients, but are more commonly referred to as just “slopes” or “coefficients.”
Uses of Multiple Regression
While there are many technical differences is the scope of multiple regression, those differences are often summarized in two major purposes:
- Causal Analysis
In a prediction study, the researcher’s goal is to develop a formula for making predictions about the DV based on observed values of the IV. This means that the researcher has to have complete data to derive the equation, but can then use it going forward to predict an unknown Y when X is known. For example, a college admissions officer may want to predict college success (as measured by GPA at graduation) using standardized test information. The admissions officer would first have to gather complete data from prior years and develop the equation. After the equation is developed, then the officer could predict the GPA at graduation of an incoming freshman.
In a causal analysis, the researcher is trying to make causal statements in the same way an experimentalist may use an ANOVA to say that a treatment had a specific, measurable effect. This really boils down to two questions:
- Does the effect really exist?
- If so, what is the magnitude of the effect?
These two questions can be complicated by the nature of the research questions, and the limitations of the experimental design. Always keep in mind that causal statements are not made because a particular research method is used. The veracity of causal statements depends on the research design, not the statistical procedure used to analyze the data.
File Created: 08/24/2018 Last Modified: 08/27/2019
This work is licensed under an Open Educational Resource-Quality Master Source (OER-QMS) License.