Nonrandom Sampling

Nonrandom sampling, also called “nonprobabilistic” or “nonprobability sampling,” is any sampling method in which the process that determines whether a member of the population is selected for inclusion in the sample is guided by a nonchance or nonrandom process. Such nonrandom processes can include the investigator choosing who to include in the sample, advertising a study to find participants, or other methods of seeking participants in such a way that whoever is approached, recruited, or selected cannot be described by some kind of random mechanism.

Nonrandom sampling methods are often perceived as inferior to random sampling methods, and it has been said that their frequent use in communication research renders the field a “prescience” rather than a mature science (Potter et al. 1993). Such a condemnation is too harsh, as nonrandom sampling can be an entirely effective and highly practical way of recruiting participants for a research study, and not a requirement for the application of the principles of scientific investigation (Sparks 1995). But the researcher needs to be well aware of the restrictions on the kinds of inferences that can be made when sampling nonrandomly. Namely, population inference is nearly impossible when a study is based on a nonrandom sample, as the sample is very likely to be unrepresentative of the population the investigator is interested in studying. Furthermore, when sampling nonrandomly, some statistical concepts such as sampling error and confidence intervals have little to no meaning or application to the interpretation of research results. Nevertheless, nonrandom sampling methods have their place in the field.

Convenience Sampling

A common method – perhaps even the dominant method – of sampling in some camps of communication scholarship is “convenience sampling,” also known as “haphazard sampling” or “opportunistic sampling.” Convenience sampling is, just as its name implies, the inclusion of members of a population in a sample because those members are conveniently available to the researcher and thus easy to find or recruit. The most common instantiation of this sampling method is the use of college students enrolled in an investigator’s class, or perhaps majors in his or her department taking courses taught by a colleague.

Other forms include approaching people who just happen to be passing by the researcher when he or she is in data-collection mode, or the use of “subject pools” – pools of students who are required to participate in studies as a course requirement or for extra credit. Or a content analyst, interested in describing the kinds of appeals ordinary citizens use when attempting to persuade, might restrict his or her search for such editorials by looking only in his or her local newspaper, the newspapers of nearby communities, or those available in the local public library (i.e., newspapers that are easy to find and collect).

One danger of the use of convenience sampling is the possibility that the investigator might, knowingly or unknowingly, end up restricting the sampling in such a way that precludes or reduces the likelihood of observing certain measurements on the variables being measured. For instance, a researcher who was studying sex differences in shyness might tend to approach people to fill out a survey on shyness who appear to be outgoing, friendly, and willing to cooperate, thus reducing the likelihood of including shy people in the sample. Alternatively, a content analysis of newspaper content might end up including only conservative newspapers in a convenience sample if the analyst lives in a region of the country that tends to be highly conservative in its political leanings.

Quota Sampling

If a researcher decides he or she wants to make sure that certain groups are included in a sample in sufficient numbers for research purposes, then quota sampling might be used. Quota sampling is essentially nonrandom stratified sampling. With random stratified sampling, the population is broken into subpopulations that are homogeneous on the stratification variable prior to randomly sampling each stratum. But with quota sampling, each stratum would not be randomly sampled, and there may not be any attempt to develop or define the population prior to sampling. Instead, the investigator would recruit participants with the only goal of making sure that the desired number of participants is recruited from each stratum, regardless of how obtained. Once the desired sample size (the “quota”) in a given stratum is achieved, sampling from that stratum would terminate. From that point forward, all sampling would focus only on those strata for which the quota has not yet been met.

Consider a quota sample of members of an academic association, with the goal of making sure the sample contains a certain number of members at various academic ranks. Rather than randomly sampling lecturers, assistant professors, associate professors, and full professors, a quota sample could be obtained by contacting 100 members from each of these groups (the strata) without making any kind of attempt to do so randomly. For instance, the membership could be contacted by email, and the first 100 members of each group to reply would be included in the sample. Alternatively, someone could be stationed at the registration desk at a conference with instructions to approach conference attendees with the goal of getting 100 conference participants from each of these four strata to respond to a survey. If 100 assistant professors are successfully recruited before the other groups, then no more assistant professors would be included in the sample, and sampling would focus on filling the quota for the remaining groups.

Quota sampling could also be used to sample entities other than people. For instance, a researcher might want to collect a sample of advertisements published in men’s magazines with the goal of comparing ad content to those published in women’s magazines. The goal might be to get 200 ads from each category of magazines. The investigator might go to the local bookstore and purchase a bunch of magazines of both types, scan through them, and include the first 200 found in each magazine type.

Quota sampling is a good way to ensure that members from various sub-groups of a population end up in the sample, but the costs in terms of nonrepresentativeness resulting from biases in the recruiting process can be severe. Another danger of quota sampling exists when the sample is built over a long period of time. For instance, if one were to quota sample members of various political parties over a year-long period to assess attitudes toward various government policies, then attitudes and group membership may be spuriously correlated if the quota for sampling one of the groups is achieved very early in the sampling process and attitudes are shifting over time. For some interesting comparisons and discussions of the results of quota sampling compared to other methods, see Curtice & Sparrow (1997), Lynn & Jowell (1996), and Worcester (1996).

Volunteer Sampling

Researchers sometimes make a public call in search of participants for a research study. An example would be advertising in a newspaper, or sending an email to a listserv requesting people to participate. Those who respond to the ad or email would then be included in the sample. There are companies in the business of providing volunteer samples to researchers, such as access to email addresses of people who have agreed to participate in online surveys in exchange for some kind of incentive or reward. This method of sampling, known as “volunteer sampling,” can be used to extend the reach of a researcher who might otherwise be inclined to use a convenience sample of some kind. For instance, rather than recruiting students from a single college class, the investigator could send an email to a database of students enrolled at the investigator’s university, or advertise the study in the student newspaper.

Like other nonrandom sampling methods, volunteer sampling typically produces ambiguity about the representativeness of the sample to some broader population of interest, as there is the real possibility that those who volunteer to participate in a given study differ from those who do not volunteer in ways that might influence their behavior in the study, responses to questions on surveys, and so forth. Indeed, there is evidence that people who volunteer to participate in research studies differ from nonvolunteers in personality, interests, and other things that can reduce the generalizability of the findings away from the group of people who volunteered (Rosenthal & Rosnow 1975).

Although volunteer sampling is frequently discussed (as it is here) as a distinct form of sampling, in fact, virtually all methods of sampling of human beings can be construed to some extent as a form of volunteer sampling because we cannot force people to participate in research studies. A person ending up providing data to a researcher is typically the result of him or her volunteering to participate, regardless of the mechanism that resulted in him or her being solicited in the first place.

Network Sampling

A researcher might have an interest in sampling a population whose members are small in number, difficult to locate, or perhaps unreachable by any of the methods described thus far. For example, there may not be a readily available list of intravenous drug users, or people suffering from depression, from which a random sample can be drawn, and the investigator may not have convenient access to sufficient numbers from such populations. But if it is possible to find even one member of such a population, network sampling can be used to construct a decent, albeit nonrandom sample of sufficient size for empirical study.

With network sampling, once one or two members from the population are located and solicited for participation in a study, they can be used to help recruit others from this population through their social network. Network sampling is based on the principle that individuals with a certain background, certain characteristics, habits, or life circumstances are likely to know others who share such features. For instance, suppose an investigator wants to sample pregnant women for a study of prenatal advice-giving by friends. Pregnant women are likely to know or come into frequent contact with other pregnant women as a result of attending prenatal classes, visits to the obstetrician, and so forth. If a couple of pregnant women can be recruited, the investigator can request that they spread the word about the study throughout their social network, communicating the study and the need for additional participants. Some kind of reward system could give past participants the motivation to recruit future participants.

Network sampling is also called “snowball sampling” because the sample gets built and grows in size over time as new participants are recruited throughout the social network, in much the same way that a snowball rolling down a snow-covered hill would grow in size as it descends. For more information on this and related methods, see Salganik & Heckathorn (2004).

Scientific Value Of Nonrandom Samples

One of the more common criticisms a communication scholar is likely to face by other scientists is poor sampling. Such criticisms typically take the form of skepticism about the generalizability of the findings reported as a result of a failure to engage in a rigorous sampling of the population of interest. For instance, a researcher who uses a convenience sample of students in his or her class as research participants may be criticized on the grounds that there is little basis for making claims about the processes and effects of human communication by studying adolescents and young adults – a population with various cognitive and personality characteristics that make them distinctive from others not in college or lower in education. Even if the investigator was specifically interested in college students as a population, the criticism could still be lodged that students at the investigator’s university may be different from students at other universities. An argument could be made that students in an investigator’s class are probably not even representative of students at that investigator’s university. And if they were, who cares about such a trivial population anyway?

Ultimately, concerns researchers have about nonrandom samples boil down in one way or another to worries about the possibility of a hidden moderator of a research finding.

That is, a relationship a researcher reports between variables measured (or manipulated) may, unbeknownst to the researcher, vary systematically as a function of a third variable. If the people in a sample are overwhelmingly low or high on that third variable – a variable that may not even be measured – the investigator may be reporting an effect that is contingent and would not generalize to a broader population of people who differ from the sample on that third variable. Sears (1986) and Abelman (1996) make the point that the possibility of an important, hidden moderator must loom large in our thinking about the generalizability of research findings from nonrandom samples (student samples in particular). To the extent that a researcher’s findings are contingent on a hidden moderator variable that the sample is restricted on, criticisms of research based on nonrandom samples are justified.

But criticizing the use of nonrandom samples is sometimes a knee-jerk reaction to the mismatch between the way that data collection should be done as presented in introductory statistics and research methods books. Communication researchers frequently do not do anything like what is described in most statistics textbooks – taking random samples of well defined populations. If the researcher’s goal is population inference, as if he or she were a pollster trying to gauge the sentiment of the public on some topic or issue, nonrandom samples leave a lot to be desired. Similarly, if a content analysis wants to make a claim about the content of a population such as newspaper editorials or news stories about crime published by major newspapers in a given window of time, some attempt to randomly sample content is paramount to sound generalization from sample to population.

But communication researchers are typically motivated by understanding processes rather than populations. We seek to understand why relationships exist between variables – the mechanisms that produce associations – and to test theoretical propositions that make predictions about whether relationships between variables exist when an attempt to quantify those associations is made. We conduct studies to see if predicted relationships are found or whether a pattern of associations observed between variables is consistent with a proposed mechanism or theoretical orientation. Process inference, rather than population inference, is usually the ultimate goal of the communication researcher (see Frick 1998, Hayes 2005, or Mook 1983, for a detailed discussion of the distinction between process and population inference).

For instance, a researcher might create two interventions in the form of high-school lesson plans designed to increase condom use among high-school boys by systematically manipulating the framing of the message to focus on gains or losses that can result from an unwanted pregnancy. Theory might lead the researcher to expect that one frame might be more effective at changing the attitudes or behavior of the high-school boys who participated in the study. If, indeed, one version of the intervention turns out to be more effective, this gives the researcher some insight into the processes that produced the behavior- or attitude-change among these boys. The researcher may have no particular way of inferring what fraction of high-school boys are likely to be influenced, or whether one advertisement is likely to be more effective on high-school boys as a whole. The goal is simply to see if one intervention works better when it is delivered to a group of highschool boys and, if so, what it is about the processes at work that yields greater effectiveness.

To be clear, research based on nonrandom samples should be heavily scrutinized, just as any research should be. Nonrandom sampling does make it more likely that the sample will be restricted on potential moderators of relationships found, and nonrandom sampling frequently makes it highly ambiguous just what population is being sampled. But we should applaud anyone who makes a good-faith attempt to collect data rather than admonish them for nonrandom sampling, while still holding their feet to the fire to make sure that unwarranted leaps beyond the data are not being made. When process inference rather than population inference is the goal, the shortcomings of nonrandom sampling can be rectified in part through means other than random sampling, such as replication. If a research finding is an artifact of sampling method, other researchers (or the original researcher himself or herself) will fail to replicate the finding in future studies. Rather than insisting that communication researchers rely on random samples, why not instead require them to replicate their findings using a different sample from a different population than the original study before publishing those findings? In many ways, the ability to replicate a finding time and again, each time on a different sample from a different population, is a more powerful form of generalization than the population inferences random sampling affords.

References:

Abelman, R. (1996). Can we generalize from generation X? Not! Journal of Broadcasting and Electronic Media, 40, 441–446.
Curtice, J., & Sparrow, N. (1997). How accurate are traditional quota opinion polls? Journal of the Market Research Society, 39, 433–449.
Frick, R. W. (1998). Interpreting statistical testing: Process and propensity, not population and random sampling. Behavior Research Methods, Instruments, and Computers, 30, 527–535.
Hayes, A. F. (2005). Statistical methods for communication science. Mahwah, NJ: Lawrence Erlbaum.
Lynn, P., & Jowell, R. (1996). How might opinion polls be improved? The case for probability sampling. Journal of the Royal Statistical Society (Series A: Statistics in society), 159, 21–28.
Mook, D. G. (1983). In defense of external invalidity. American Psychologist, 38, 379–387.
Potter, W. J., Cooper, R., & Dupagne, M. (1993). The three paradigms of mass media research in mainstream communication journals. Communication Theory, 3, 317–335.
Rosenthal, R., & Rosnow, R. L. (1975). The volunteer subject. New York: John Wiley.
Salganik, M. J., & Heckathorn, D. D. (2004). Sampling and estimation in hidden populations using respondent-driven sampling. Sociological Methodology, 34, 193–239.
Sears, D. O. (1986). College sophomores in the laboratory: Influences of a narrow data base on social psychology’s view of human nature. Journal of Personality and Social Psychology, 51, 515– 530.
Sparks, G. (1995). Comments concerning the claim that mass media research is “prescientific”: A response to Potter, Cooper, and Dupagne. Communication Theory, 5, 280–281.
Worcester, R. (1996). Political polling: 95% expertise, 5% luck. Journal of the Royal Statistical Society (Series A: Statistics in society), 159, 5–20.