Correlation and Causation

Fundamentals of Social Statistics by Adam J. McKee

Correlation is a statistical measure that expresses the extent to which two variables are linearly related. It’s a way of summarizing how closely the variables follow a straight line when plotted against each other. However, correlation itself doesn’t imply that changes in one variable cause changes in another.  Let’s review some of the basic features of correlations:

  • Correlation Coefficient: This is a numerical value that indicates the strength and direction of the linear relationship between two variables.
  • Positive vs Negative Correlation: Positive correlation means that as one variable increases, the other tends to increase too. Negative correlation is the opposite.
  • No Correlation: Sometimes, no discernible pattern exists between two variables, indicating no correlation.

Misconceptions About Correlation

A common misunderstanding is that correlation indicates a cause-and-effect relationship between two variables. However, correlation alone cannot prove that one variable causes another to change.

  • Correlation Does Not Imply Causation: Just because two variables move together does not mean one is causing the other to move.
  • Coincidental Correlation: Sometimes, correlations exist purely by chance or due to a third factor affecting both variables.
  • Spurious Correlations: These are correlations that may be mathematically valid but are absurd or non-sensical in the real world.

Establishing Causality

To claim that one variable actually causes another, rigorous testing and evidence are required beyond merely demonstrating a correlation.

Criteria for Causality

Several criteria must be met to establish a causal relationship between variables, and these go well beyond finding a correlation.

Temporal Precedence

In the realm of establishing causation, temporal precedence is a critical factor. This principle dictates that the cause must occur before the effect. It is the chronological ordering of events that provides a preliminary indication of a causal relationship. For instance, a student’s amount of study time must precede their performance on an exam to argue that studying impacts performance. If the temporal order is unclear or if the effect precedes the cause, then any assertion of causation is undermined. Researchers often use longitudinal studies, tracking the same variables over time, to establish temporal precedence and strengthen their argument for a causal link.

Non-Spuriousness

The criterion of non-spuriousness demands that a relationship between two variables should not be due to a third variable that is not being measured. This is what statisticians refer to as a confounding variable. For example, if there is a correlation between ice cream sales and shark attacks, it is insufficient to claim one causes the other without considering temperature as a confounding variable, since both ice cream sales and shark attacks may increase during warmer periods. Establishing non-spuriousness involves using statistical controls to account for potential confounding variables, ensuring that the observed relationship is genuine and not an artifact of an unexamined influence.

Consistency

Consistency in causal relationships implies that the observed effect should consistently follow from the cause across various studies and contexts. If the relationship between two variables is truly causal, then the same relationship should be observable in different environments, cultures, and populations, and over time. This repeatability is what gives weight to causal claims. For instance, the link between cigarette smoking and lung cancer has been consistently observed across numerous studies and populations worldwide. Such consistency supports the argument for a causal connection, as it reduces the likelihood that the relationship is due to random chance or specific to a particular study’s circumstances.

Methods to Test Causality

Researchers use various methods to test whether a causal relationship exists, not just a correlation.

Experimental Designs

Controlled experiments are often considered the gold standard for establishing causation. In these experimental designs, researchers manipulate one variable (the independent variable) to determine its effect on another variable (the dependent variable), while controlling for other potential influences. This manipulation provides a clear cause-and-effect relationship. For instance, in a drug efficacy trial, the drug is the independent variable, and the health outcome is the dependent variable. Participants are randomly assigned to either a treatment or a control group, ensuring that any differences observed are due to the drug itself and not other factors. The strength of experimental designs lies in their ability to isolate variables and demonstrate direct causation. However, they may not always be ethical or practical, especially in fields like social sciences, where manipulating variables directly can be challenging or impossible.

Longitudinal Studies

Longitudinal studies are observational in nature and involve collecting data from the same subjects repeatedly over a period. These studies are particularly adept at establishing temporal precedence, as they can clearly show the order of events. By observing how variables change over time, researchers can infer potential causal relationships. For example, a longitudinal study could track a group of individuals’ smoking habits and their lung function over several decades to examine the impact of smoking on lung health. While longitudinal studies are powerful in showing correlations and temporal sequences, they do not always provide definitive causation because other variables may change over time as well and influence the outcome.

Statistical Controls

Statistical controls are a suite of advanced techniques used to parse out the effects of different variables. Techniques such as regression analysis can adjust for the influence of confounding variables, helping to isolate the relationship between the primary independent and dependent variables. For instance, if researchers want to understand the relationship between education level and income, they can use statistical controls to account for confounding factors like work experience, age, or geographic location. While statistical controls are instrumental in clarifying relationships and enhancing the validity of causal claims, they rely on the accurate measurement and identification of all potential confounders, which is not always possible. Furthermore, these techniques are correlational, and their ability to infer causation is often limited compared to experimental designs.

Conclusion

While correlation is a valuable statistical tool, it is only one piece of the puzzle in understanding relationships between variables. Causality requires more evidence and careful consideration to ensure that we are not misled by coincidental or misleading correlations.


[ Back | Contents | Next ]

Last Modified:  11/07/2023

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.