Section 1.2: Making Sense of Numbers | Research


Fundamentals of Social Research

Adam J. McKee, Ph.D.


DRAFT - Do Not Distribute

This content is released as a draft version for comment by the scholarly community.  Please do not distribute.


Recall from our discussion of the scientific method that a major element is a search for relationships between causes and effects.  Often, the effect the researcher is interested in is some sort of social problem that would ideally be eliminated. In other words, social scientific researchers would like to eliminate unhealthy, harmful, or dangerous behaviors.  Elimination can be viewed as the ultimate form of control—one of the basic goals of science that we identified earlier. Before a behavior can be eliminated, it must be explained. That is, we cannot eliminate a behavior if we do not know what causes the behavior.

For example, if we identify Driving Under the Influence (DUI) as an immediate cause of many highway fatalities, we can attempt to control highway fatalities by designing interventions that eliminate the cause of those fatalities: Driving Under the Influence.  Note that under our overly simple model, if we could eliminate all drunk driving, it would eliminate all highway fatalities. Our very basic model specifies that drunk driving causes highway fatalities.

We failed to factor in other causes of highway fatalities, such as texting and driving (which the data suggests is just as lethal).  Thus, interventions based on a theoretical model of reality are imperfect because models are by nature imperfect representations of reality.  Also, note that many models are based on what can be practically manipulated. Some highway fatalities are caused by high winds and rain, but we can do little about those as a matter of public policy.

The picture painted of social science researchers so far has been one of prudence and infinite practicality.  We have examined social scientific research from the perspective of designing an intervention to make the world a better place.  Researchers are human, and humans are inherently curious creatures (some of us more so than others). Many research projects are undertaken because the researcher simply cannot stand not understanding why a social phenomenon takes place.  A political scientist, for example, may be interested in how voters decide on a particular candidate for public office. With such a research question, merely explaining the phenomenon is the goal of the researcher.

Control, in this case, would be unethical.  No matter whether the researcher is interested in the explanation, prediction, or control of a particular phenomenon, there is nearly always a desire to know what causes the phenomenon.  This, in essence, is the why question that drives all science.  The simplest way to determine if one theoretical variable causes another (e.g., poverty causes crime) is to look for a systematic relationship between the two variables.  If wealthy individuals are found to commit just as many crimes as poor ones, then the hypothesized relationship between poverty and crime does not seem to exist.  Note that the idea of causation is hotly debated in science; it truly is a complex concept worthy of further consideration. We will do so later, but for now, we will oversimplify things so as not to muddy the waters.   

The simplest way to look for relationships between variables is to make observations of the two variables as they exist in the real world and look to see if the variables change together in a predictable pattern.  For example, we may theorize that the higher a person’s IQ, the higher their college grade point average will be. To test this hypothesis, we can examine IQ scores and GPAs for a sample of college students and see if GPAs go up as IQ scores go up.  Note that if such a relationship exists, the statement works in the opposite direction just as well: As IQ scores go down, GPAs will go down.

Such a systematic relationship is often referred to as a correlation.  In everyday language, the term correlation is used to mean that two or more things are related or interconnected in some way.  Social scientists (as you probably guessed) are uncomfortable with that lack of precision. When a social scientist talks about a correlation, they are talking about patterns in measurements of the variables they are interested in.  The correlational method involves observing two variables to see whether there is a relationship between them.  This is important to the social scientific endeavor because correlation is necessary (but not sufficient) to infer causation.  The opposite is also true: We can infer that no causal relationship exists if the proposed effect has no systematic change when the proposed cause changes.

Researchers will often use statistical methods to determine if such a relationship exists between two variables.  Depending on the type of information being related, researchers will choose an appropriate statistical method that results in a correlation coefficient.  A correlation coefficient is a number ranging from -1.0 to +1.0 that specifies the degree of relatedness between two variables.

A correlation coefficient is a number ranging from -1.0 to +1.0 that specifies the degree of relatedness between two variables.

In interpreting correlation coefficients, we examine how close the coefficient is to 1.00.  A correlation that is very close to zero means that there is very little or no relationship between the two variables.  This would be the likely outcome of computing the correlation between college GPAs and the price of tea in China. Looking at it another way, knowing the value of one variable does nothing to help us figure out the value of the other.  The closer the correlation coefficient is to 1.0, the more strongly related the variables are. A correlation of 1.0 is known as a perfect correlation.  With a perfect correlation, we can predict the exact value of one variable by knowing the other.  In the physical sciences, perfect correlations are common. In the social sciences, perfect correlations are extremely rare.  This is because most social variables are affected by many other variables, not a single one.

When a correlation coefficient is negative, it means that there is a relationship between the two variables, but an increase in one results in a decrease in the other.  For example, we would expect a negative correlation between hot chocolate sales and outside temperature. As the outside temperature goes up, the sales of hot chocolate will go down.  This is a negative correlation, also known as an inverse correlation.  A positive correlation means that the two variables go up or down together.  For example, we would expect a positive correlation between ice cream sales and outside temperature.  As the weather gets hotter, ice cream sales will go up. Positive correlations are also referred to as direct correlations.

Path Diagrams

We can graphically depict relationships between variables.  This is very useful in helping us understand complex relationships between variables.  There are also advanced statistical techniques that utilize path diagrams, so a basic understanding of how they work will help you to understand the literature.  

Ice Cream Sales ←→ Outside Temperatures

The above diagram uses a double-headed arrow to indicate a relationship.  Traditionally, the double-headed arrow says that two variables are related but we are not specifying which one is thought to cause the other.

If we hypothesize that one is the cause of the other, then we use a single-headed arrow to point from the cause to the effect.

Ice Cream Sales → Outside Temperatures

It is common to see a positive (+) or a negative (-) sign over the arrows in a path diagram.  This tells us whether the correlation is positive or negative.

Another Way of Looking at Relationships

Correlations have the advantage of being easy to interpret.  A researcher named Pearson developed the most common correlation coefficient, so we call it Pearson’s r.  All of what was said about correlation coefficients above holds true for Pearson’s r.  A problem with Pearson’s r is that it does not work well with binary data.  That is, when either you have something or you do not have it, there are only two possible values for the variable.  Pregnancy is a good example; you are either pregnant or you are not pregnant. The concept of being “a little pregnant” makes no sense.  Gender is another common example. Most people identify with a particular gender, usually called male or female.  An important binary variable in the sciences is what is referred to as the grouping variable.

The groups being referred to are the experimental group and the control group.  In a scientific experiment, the experimental group gets a treatment.  Scientists would say that a treatment is a manipulation of some variable by the researcher.  The control group gets a different treatment.  The control group is often associated with the “placebo” group in a clinical trial.  When there are groups and manipulation of variables like this, we refer to the research as experimental research.  (There are other necessary requirements for a study to be a “true” experiment; we will delve into those in a later section).

For example, if a researcher is testing a new drug, the experimental group will receive the actual drug, and the control group will receive a “sugar pill.”  Since getting the pill or not getting the pill is binary (only two possible choices), Pearson’s r will not work.  Another way to get at the idea of the new drug being related to the medical outcome (let us say lower blood pressure) is to look at the average blood pressure for both groups and compare them.  If the new drug really does lower blood pressure, then we expect the experimental group (the group that got the drug) to have a lower average blood pressure than the average for the group that got the placebo (sugar pill).  On the other hand, if the drug did not work, then we would expect to find no differences between the averages of the two groups.

Note that researchers tend to use the more precise statistical term mean when talking about an average.  Therefore, when a researcher examines the average score of the two different groups, they would say they are examining mean differences.  This can be confusing to students.  There seems to be no apparent relationship between correlations and mean differences; this is true when we are talking about statistical procedures, but is false when we consider the underlying logic of both processes.  In both correlational research and in experimental research the researcher is interested in the relationship between one variable (that is often thought of as the cause) and a second variable (often thought of as the effect).  

Research studies that utilize mean differences are often referred to as experimental designs.  The language researchers use to discuss experimental designs usually revolves around “significant differences.”  Do not let this language throw you off; it is correct (in the general, non-statistical sense) to say that the experimental treatment is correlated with the outcome.  There are many, many experimental designs. If you keep in mind the basic purpose of them that we have outlined above, it will continue to make sense (although keeping the details straight will take some effort!)

Modification History

File Created:  07/24/2018

Last Modified:  07/24/2018

[ Back | Content | Next]


This work is licensed under an Open Educational Resource-Quality Master Source (OER-QMS) License.

Open Education Resource--Quality Master Source License


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.