# Path Analysis

**Path analysis** is just an extension of multiple linear regression designed to make causal statements about the relationships between a set of variables. A major advantage of path analysis is that it uses **path diagrams** to visually depict the relationships that the researcher proposes. This makes understanding the nature of the researcher’s hypothesis must easier in many instances. A path diagram is constructed by first putting down the variable names, and then drawing arrows from the hypothetical causes to the hypothetical effects. The simplest form of a path diagram would be drawing an arrow from a single independent variable to a single dependent variable: ** X** –>

**. Of course, few social scientific theories predict such simple relationships between such small sets of variables. These models can become very complex, and are ultimately limited by the mathematics behind the method.**

*Y*The regression methodology comes into play when we add **path coefficients** to the path diagram. These take the arrows proposed by the model into account and test them using what is essentially multiple regression analysis. *A path coefficient is a standardized regression coefficient (beta weight)*. This type of analysis results in a coefficient that is akin to a simple correlation coefficient and can be interpreted in much the same way. Recall that correlation coefficients can be both positive and negative. The same thing applies to path coefficients, and the interpretation is essentially the same as well. When the researcher hypothesizes that two variables are correlated but is not willing to predict the direction of that relationship, double-headed arrows can be used: ** X <–> Y**.

When examining a path diagram, you will sometimes encounter paths that do not have coefficients next to an arrow. This is because researchers often omit path coefficients that do not reach a certain magnitude that they have determined. Researchers often omit path coefficients when the relationship represented by a particular arrow is not statistically significant.

All of this talk of causes and effects in path analysis can be confusing. As we stated in an earlier section, correlations do a poor job of supporting cause and effect statements, especially when compared to true experiments. Those caveats sill apply to path analysis: correlational studies are still correlational studies, no matter how fancy the statistics used to analyze the data are. That is not to say that causal-comparative (correlational) studies have no value, but it does suggest that results be interpreted with extreme caution when evaluating cause and effect statements. Keep in mind that the researcher’s decisions as to which way the arrows point is based on theory (hopefully!), and that the converse relationship may, in fact, be the correct specification. Also, there may be some variable not included in the analysis that causes the variables to only appear to be causally related.

A major advantage of path analysis versus other techniques is the ability to clearly model *direct* and *indirect effects*. A **direct effect** is observed when two variables are specified to be causally related. **Indirect effects** are effects that are mediated by an intermediary variable. Let’s say, for example, that a researcher is interested in job satisfaction among police officers. She notices that older officers tend to have higher job satisfaction than younger officers. She could specify this relationship as follows: **Age –>**** Job Satisfaction**. She further theorizes that the explanation for this relationship is that veteran officers tend to have higher salaries than “rookies” do and that this income disparity could explain the differences in job satisfaction. She could model this relationship as follows: **Age –>**** Income –> **** Job Satisfaction**.

The path coefficients are derived from the creation of **structural equations**. Whereas multiple regression analysis typically results in a single equation that describes the relationship between the set of predictor variables and a variable **Y**, structural equations (there are usually more than one) look at specific aspects of a complex set of hypothesized relationships. In other words, structural equations provide coefficients for each arrow in the path diagram.

When creating or interpreting path diagrams, it is useful to distinguish between *exogenous variables* and *endogenous variables*. An **exogenous variable** is “caused” by other variables not considered in the researcher’s model. We can quickly identify such variables in a path diagram because no arrows are pointing to them; all of the arrows point away from exogenous variables. With an **endogenous variable**, it is hypothesized that the variance is at least partially explained by one or more other variables in the researcher’s model. We can identify such variables because arrows from other variables will point to them. Note that the path to an endogenous variable must be unidirectional in path analysis. That is, you cannot specify causal loops using this analytical method.

When constructing structural equations, it is important to note that predictions made with such equations are seldom perfect. Given the complex nature of human behavior, it is no wonder that a small system of variables cannot perfectly predict an outcome variable. There are other, unconsidered factors in play. These “unknowns” are referred to as “error” in the language of regression analysis. It is important to understand that the term error does not connote a mistake on the part of the researcher; the term is not pejorative when used in this context. It may be helpful to informally define *error* as “stuff not in the model.” Many path diagrams will explicitly model the **error term**.

Last Modified: 10/10/2018