When testing a causal model, the researcher’s objective is ascertaining whether the hypothesized model is consistent with the pattern of relationships among the variables under consideration. Mathematically, this can be determined by estimating the “**goodness of fit**.” In this process, estimated parameters and constraints are used to calculate a *predicted* matrix of association. This is analogous to regression where we subtract a predicted value from an observed value to obtain a residual. The principle is simple: The smaller the residuals, the better the fit; the larger the residuals, the poorer the fit. We can perform a statistical significance test of the fit using the chi-square distribution.

In SEM, Chi-square is really a “badness of fit” measure. That is, the null hypothesis is that the model does not differ significantly from the data. In short, *we do not want to reject the null hypothesis*. Chi-square is almost always reported in SEM studies but is rarely considered alone as a measure of fit. Chi-square is strongly influenced by sample size. Adequate sample sizes for SEM analyses will be significant most of the time. Chi-square is very sensitive to departures from multivariate normality. This is why researchers using SEMs have derived several **fit indices** that are designed to correct the shortcomings of the chi-square statistic.

The **Goodness of Fit Index (GFI)** is roughly analogous to R^{2} in multiple regression analysis. The **Adjusted Goodness of Fit (AGFI)** is calculated and interpreted similarly to the GFI, but provides a penalty for model complexity. This is to preserve the importance of **parsimony**. Both indices have ranges from zero to one; closer to one meaning better fit. A common rule of thumb is that values of .92 to .95 suggests good fit. These are sensitive to sample size and should be interpreted with caution.

The **Parsimony Goodness of Fit Index (PGFI)** seeks to balance parsimony and goodness of fit in one index. This was the first of many parsimony-based fit indices to be developed. As a rule of thumb, a PGFI > .5 suggests good fit.

Often, the researcher can compare two different models and determine which fits better, such as when there are two related but different theories to explain a phenomenon. There are actually several **comparative fit indices** in this family. This family of indices compares the hypothetical model to a baseline model rather than the absence of a model as does the GFI and AGFI. The **Normed Fit Index (NFI)** –Underestimates fit for small samples. The **Comparative Fit Index (CFI)** is a modification of the NFI designed to correct underestimation errors. Both the NFI and CFI range from zero to one. A rule of thumb for is that values of NIF/CFI > .95 suggest a good fit.

In AMOS, fit statistics are presented in three rows:

-First Row: The Hypothesized Model

-Second Row: Saturated Model

-Third Row: Independent Model

The **independent model** is a model where all correlations among variables are zero (they are independent in the sense of being orthogonal). This is the most restricted model. In a **saturated model**, the number of estimated parameters is equal to the data points. That is, the model is just identified. This is the least restricted model.

Plausible alternative models (based on alternative theoretical considerations) should be tested whenever possible. Do not interpret findings that a model fits the data as meaning that it is the only model that can do so. Alternative models represent competing hypotheses that must be ruled out. Researchers sometimes respecify their models, for two primary reasons to respecify a model: to increase parsimony and to improve model fit. Note that model revision is a *post hoc* process, and is thus *exploratory*. The results of such modified models are weak support for theory. The theoretical implications of such analyses should be tested using confirmatory techniques on *new* data.

Last Modified: 02/14/2019