Section 3.4: Sampling and Generalizability | Research


Fundamentals of Social Research

Adam J. McKee, Ph.D.


DRAFT - Do Not Distribute

This content is released as a draft version for comment by the scholarly community.  Please do not distribute.


Social science research usually begins with a question about a specific group of people.  In the language of research, the entire group of people that a researcher is interested in is called a population.

A population is the group of all the people of interest in a particular research study.

Many times, populations are large.  A political scientist, for example, may be interested in every registered female voter in the United States.  Sometimes, however, populations can be very small, such as when a study concerns a very small group, such as female Supreme Court justices.

A population does not always have to consist of individual people.  It can be groups of people, such as a population of police departments.  A population can also consist of animals, or non-living things, such as the production of a factory.  A population can be anything an investigator wants to study.

Research questions are concerned with the whole population, but it is seldom possible to collect data from an entire population.  For this reason, researchers usually select a small group from among the population and limit their study to those individuals in the smaller group.  This set of individuals selected from the population is called a sample.  A sample is intended to be representative of the population from which it was drawn.  That is, the sample should be just like the population in every way that is important to the study.

A sample is a group of individuals selected from a population that serve to represent the population in a research study.

If your physician thinks you may be anemic, she may draw a sample of your blood to test your iron level.  You would be upset of the doctor wanted to take out all of your blood to be tested!  A sample is just fine so long as the blood taken out of your body is just like the blood remaining in your body.  If this is the case, then we can say that the sample of blood is representative of all your blood—the population of blood.  With blood, representativeness can be taken for granted.  When it comes to selecting a group of people to represent a larger group, we have to take great care to make sure that the sample of people is a representative sample.

Probability Sampling Methods

Probability sampling techniques are the most desirable because they rely on chance to determine the selection of participants in a study.  If a sample is chosen by random rules (i.e., it is not systematic), it is highly likely that the sample will be representative of the population.

Simple Random Samples

The most common type of probability sample is the simple random sample.  In this procedure, each member of the population has an equal and independent chance of being selected from the population as part of the sample.  Equal and independent are the critical terms.

  • The chances are equal because there is no bias in the process that would cause one person to be chosen over another.
  • The chances are independent because the choice of one person does not alter the chance of any other person being selected.

The beauty of this method is that (the vast majority of the time) it results in a sample with characteristics very close to those of the population.  That is, it is free of bias.

Bias is present in a sample when some members of the population have a greater chance of being selected than other members.

To draw a simple random sample from a population, you need to take four basic steps:  First, you must define the population from which you want to select the sample. Second, you need to list all the members of your population.  Third, each listed member of the population must have a number assigned to it. Lastly, you use a random criterion to select the sample you want.  Traditionally, subjects were selected based on a table of random numbers. (You can find such tables in the back of any statistics text along with instructions for use).

These days, you can use a statistical package for the computer (like SPSS) to randomly select a sample for you.  If you do not have statistical software, you can find random number generators on the internet. You can flip coins, toss dice, draw names from a hat—whatever—so long as the method produces the same chance of being selected for every member in the population.

Simple random sampling identifies an unbiased sample.

Systematic Sampling

Systematic sampling is easier to do than simple random sampling.  The tradeoff is that it is less unbiased because it reduces the chance that certain individuals in the population will be selected.  In systematic sampling, every kth name on the list is chosen. (kth is a shorthand used by researchers where k stands for any number you want to put in its place).  To find k, all you have to do is divide the number of people in the population by the sample size you want to obtain.  To add an element of randomness to the process, most researchers will choose a starting point at random.

Stratified Sampling

The previously discussed methods of sampling are great if specific characteristics of the population are of no concern to the researcher.  If, however, the researcher is considering a specific characteristic of the population (such as race, age, gender) that is not equally distributed in the population to begin with, a different sampling technique is in order.  Stratified sampling allows the researcher to choose a sample that is forced to fit the profile of the population. Stratified sampling is used to ensure that the strata (layers) in the population are closely represented in the sample.  

Let’s say that a political scientist is conducting a study of voting behavior in a particular region where 60% of the voting population is Republican, and 40% is Democrat.  It wouldn’t make sense to draw a sample that is half Democrat and half Republican—we know from the start that such a sample would not be representative of the population. If we want a sample that accurately reflects the population, we need a sample where 60% of the voting population is Republican, and 40% is Democrat.  Stratified sampling allows us to achieve this.

To achieve a stratified sample, you must list each stratum separately.  In the above example, we would need to list democrats and Republicans separately.  Let’s say we want a sample of 100 subjects. To get such a sample, we would select 60 participants from the Republican list and 40 from the Democrat list.  Thus, our sample is stratified just like our population.

Cluster Sampling

The final type of probability sampling that we will discuss is cluster sampling.  In a cluster sample, groups are chosen rather than particular individuals. Let’s say we are doing a national study on police officer’s perceptions of domestic violence.  To use the other probability sampling techniques we’ve already discussed, we’d need to obtain a list of every police officer in the United States. As far as I know, no such list exists now or is likely to exist in the future (if you know of such a list, please send it to me!)  It is entirely possible, however, to obtain a list of ever police department in the United States (hard, but you can do it). With cluster sampling, we select a random sample of police departments. The major weakness with this method is that the members of a group may have something in common that contributes to a bias.

Nonprobability Sampling Methods  

In this general category of sampling techniques, the common thread is that the probability of selecting a particular individual from the population is not known.  This violates the basic assumption of probability samples that each individual has an equal and independent chance of being selected.

Convenience Sampling

Convenience sampling is just what the name implies.  You used the sample because the individuals were convenient—easy to rope in.  This type of sample is sadly common among academic researchers. College students are often used as research subjects because professors conducting research can lure them in with promises of extra credit.  If your population of interest is college students, then that may be okay. In general, this is a terrible sampling method and should be used with extreme caution.

Samples of convenience are biased.

Quota Sampling

Some research projects will require that you have a stratified sample, but you cannot obtain a probability sample for some reason.  Let’s say that you are conducting a study on drug use, education, and recidivism. It would be difficult indeed to get a list of the population of cocaine using convicts that hold Master’s degrees.  A situation like this suggests quota sampling. First, you have to identify how many people you need in the sample with those characteristics. Then, you search everywhere for people that meet the criteria.  Once you reach the number that you wanted, you simply stop. You have your sample.

Sampling Error

Whenever a sample is drawn, only that part of the population that is included (by definition!) in the sample is measured.  The idea is to use the sample to represent the entire population. Because of this, there will always be some error in the data, resulting from those members of the population who were not measured.

Random sampling procedures always result in some degree of sampling error.

The more people we include (the larger our sample is), the more accurately the sample will reflect the population, and the less sampling error we will have.  If a census is performed (a 100 percent sample is a census), there will be no sampling error.

Selecting a large sample size does not correct for errors due to bias.

When newspapers print things like “the margin of error is plus or minus three percent,” it seems to suggest that the results are accurate to within the stated percentage.  This view of it is completely wrong and grossly misleading. That is not to say anything bad about the media; they merely want to warn people about sampling error. However, most readers are not trained in statistical methods, nd may fail to assume that all surveys—all data coming from samples—are estimates. Estimates may be wrong.

Let us take a public opinion poll with a 4% margin of error as an example.  (We use percentages in this example because they are very easy in intuitive in interpretation.  Keep in mind that these basic ideas apply to any statistic computed from a sample).  If we continue to take random samples from the population 100 times, then the results would fall within that confidence interval 95% the time.  That means that if you asked a question from this poll 100 times, 95 of those times the percentage of people giving a particular answer would be within 4 points of the percentage who gave that same answer in this poll.  Why only 95% of the time? In reality, the margin of error is what statisticians call a confidence interval.  The math behind it is much like the math behind the standard deviation.

Modification History

File Created:  07/25/2018

Last Modified:  07/25/2018

[ Back | Content | Next]


This work is licensed under an Open Educational Resource-Quality Master Source (OER-QMS) License.

Open Education Resource--Quality Master Source License


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.