Scales and Indices

Empirical communication and media research attempts to translate reality into data by means of measurement. To insure reliability and validity, this process is guided by rules. In the case of quantitative research, the rules tend to be standardized in order to keep results comparable across different instances of measurement. These standardized measurements can make use of the simple gauges employed in physics or engineering only occasionally, e.g., when the length of a newspaper article is determined with a ruler or the time people watch a TV program by a clock. Many concepts related to communication and media phenomena are complex in nature, for they refer to people’s knowledge, intentions, emotions, or behavior. Thus they require more complex instruments for measurement, including different indicators (“latent variables”; Black 1999, 211; see also Bryman 2004, 66–69) to cover the multifaceted aspects of these concepts and to provide an indirect measurement for concepts that cannot be measured directly (Sullivan 2001, 159).

The procedure of “scaling” is used to quantify phenomena that cannot be counted directly (Jacoby 2004, 999). Empirical research draws upon several types of scales that describe general principles of how to collect data by multiple-indicator measures. Four important types of scales will be explained below. They share the characteristic that proper analysis and interpretation requires that the set of indicators be condensed to a single value representing the specific relationship between a case and a concept. This can be achieved by summarizing the multiple-indicator measurement to a composite score for each case.

The term “scale” is also used for a set of indicators related to a certain concept representing its composite measure, e.g., the uses and gratifications scale tapping individuals’ motives for media consumption (see Rubin et al. 1994, 173–177). At best, such a particular scale is used repetitively in a similar form that also allows for assessing the validity of the scale by combining these different applications. The term “index,” while used interchangeably with “scale” by many authors, can be defined as a particular type of scale, consisting of different indicators that are accumulated to form a single summary score in order to represent a theoretical construct (Carmines & Woods 2004, 485). A scale, however, “takes advantage of differences in intensity among the attributes of the same variable to identify distinct patterns of response” (Babbie 2002, 148).

Furthermore, the different levels of measurement are sometimes addressed as “scales” as well: objects are classified by nominal (or categorical), ordinal, interval, or ratio scales depending on the number and type of categories included (Bryman & Cramer 1995, 65– 66; see also Ruane 2005, 54–57). To confuse the issue further, the term “scale” is also used in generally addressing the options given in a closed-ended question of a survey, or a dimension of a content analysis. “Five-point scale” in this context means that the instrument designated five options to answer the question (and not a scale of five indicators). As this represents a more technical use of the term that is largely self-explanatory, it is excluded from further elaboration here.

Types Of Scales As Multiple-Indicator Measures

Before we turn to applications in media and communication research, it is essential to see how and for what purposes multiple-indicator measures are designed. Of the possible reasons for employing scaling procedures, data reduction is the most obvious. In addition, validity and reliability are improved, the effects of measurement error are minimized by multiple measures, it allows for drawing finer distinctions between cases, the dimensionality within a concept can be assessed, and data presentation may often recur on a graphical depiction of these dimensions (see Jacoby 2004 in detail; also Bryman & Cramer 1995; Sullivan 2001). On the other hand, critics criticize scaling techniques for producing false quantification because “simply putting a lot of similar questions together and treating the responses to each question equally . . . does not automatically lead a social scientist to an underlying variable” (Gorard 2003, 108).

As a consequence, sophisticated ways of constructing a meaningful scale emerged in the history of empirical research, that are not specific to the field of media studies but rather common knowledge in social science. Their logic and characteristics need to be pointed out. From the variety of scales, four distinct types can be distinguished, which represent the majority of applications in the field (for an overview see, e.g., Sullivan, 2001, 170–179; Babbie 2002, 164–169).

The most popular approach to multiple-item measures is the so-called Likert scale, named after researcher Rensis Likert who developed it in 1932. The scale consists of a series of statements related to the construct under study, and respondents are required to express their agreement by selecting between response alternatives often ranging between 1 and 5 (“strongly agree” to “strongly disagree,” or vice versa). Still there is a certain ambiguity as to what the middle choice means to the respondents (Black 1999, 228). The summated ratings should discriminate between individuals, so item selection needs to consider a broad array of possible construct dimensions. In the pre-test phase, items are dropped that either correlate highly with each other (assuming that they measure the same dimension and are interchangeable) or correlate not at all (assuming that they probably do not measure the same construct).

A different approach to measuring using multiple items is represented by the Thurstone scale. Again, a set of statements is developed and rated by a group of “experts” on an 11point scale ranging from favorable to unfavorable. Items for the main study are selected based on the mean values of these ratings in order to represent the whole range of agreement and disagreement. Respondents in the main study then select only those items they agree with; their composite score results from the mean of the pre-determined scale values. Thurstone scales are more difficult to construct for a researcher but easier to handle for respondents.

A Guttman scale tries to “achieve unidimensionality by developing the items in such a way that, in a perfect Guttman scale, there is only one pattern of responses that will yield any given score on the scale” (Sullivan 2001, 176). When formulating the items, researchers try to construct a progressive characteristic relating to the intensity of the construct being measured. As a consequence, more people will agree with “easy” items while only few will agree with the items considered as “difficult.” In a perfect Guttman scale, all items can be ordered according to their intensity, and a person who agrees with a certain item is expected to agree with all “easier” items and none of the “difficult” ones. The score of each individual is marked by the turning point of the last item he or she agrees with.

A popular scaling format for judging objects according to a set of characteristics is the semantic differential. The stimulus is rated by respondents based on a series of polar opposite adjectives (e.g., “powerful” vs “weak”). Between the two poles, a linear scale with seven points is drawn up on which respondents indicate their own assessment of the object. Results are often given in form of a “semantic profile” drawn by the mean values assigned to the object under study for each of the adjectives. If summarized scores for each respondent are required, relevant dimensions to describe the object are determined by a factor analysis identifying response patterns across the sample.

Index As A Composite Score Of Data

The term “index” is used in everyday life for any kind of composite measure. Frequently, indices are used by governments to provide official statistics. Representing a prominent example, the consumer price index is supposed to cover a significant cross-section of products. These goods serve as indicators for price trends, and in a long-term view the index values are compared to assess inflation or deflation in the field of individual purchase (e.g., Carmines & Woods 2004, 486).

Accordingly, the calculation of an index value sometimes means nothing more than equating a certain figure with “100” at a certain point in time, and expressing changes in subsequent periods relative to 100 for purposes of comparison. For instance, where the daily TV viewing time in one community is 220 minutes, we could equate this figure with an index value of 100; the next measurement after some time may yield a mean value of 231 minutes which would be expressed by an index value of 105. The advantage of comparing data starting with a standardized value such as 100 is a simplified interpretation (in our case an increase of 5 percent relative to the earlier measurement).

In empirical research, the meaning of the term “index” goes far beyond statistical transformations. Moreover, the term is sometimes used interchangeably with “scale” to refer to an instrument that consists of different indicators for measuring a more comprehensive concept. The index value, then, represents a single figure that is expected to express the score of a single case under study with respect to this concept. To distinguish an index from scales in general, Sullivan (2001, 160) pointed out that scales are usually a multiple-item measuring device in which a built-in intensity structure or even an inherent order among the items exists.

Constructing A Scale And Calculating A Summary Score

The five main steps in the construction of a composite measure like a scale or an index are (1) item selection, (2) the examination of empirical relationships among these items, (3) their condensation in a summary score, (4) the treatment of missing data, and (5) the validation of the entire measurement (see Babbie 2002, 149–163). It is important to note that for a proper calculation of index as well as scale values, measurement needs to fulfill some basic requirements that must already have been taken into account during the process of scale construction (Dunn-Rankin et al. 2004).

The range of possible answers should be similar for all items related to the concept. If, for instance, all answers are recorded on a five-point scale, it is easier to summarize these items than in a measurement where indicators include dichotomous or nominal variables as well. In the latter case it would be necessary to standardize the number of categories for each variable to the same range because otherwise variables with a higher range acquire a higher weighting within the index value. But weighting a variable higher than another one should always derive from theoretical considerations and not from question wording (on item weighting, see Sullivan 2001, 166–167).

The two dominant ways to calculate an index value refer to addition or multiplication procedures. The difference is obvious: within an additive index all partial scores can be replaced by scores for another item, because it is irrelevant to the final score which of the single items have contributed to the overall score. By multiplying item scores, the distribution of the final score becomes more volatile: first, a larger number of high item scores exponentially increase the overall score; and second, by including zero in the range of item values, higher scores on these items become a prerequisite for an overall score different from zero. Multiplication thus allows for defining necessary conditions among a set of items within an index.

No matter which procedure is applied, a multiple-indicator measure produces, strictly speaking, ordinal variables that are usually treated as interval/ratio variables in statistics (Bryman & Cramer 1995, 66; Sullivan 2001, 162). But still the allocation of high and low values needs to be meaningful for the concept underlying the summary score. The case is given when the scale consists of, for example, a set of different media that can be available within a household, and the additive index (in this case, the number of media available) is supposed to express media supply. In other cases it may be easy to calculate a score by summing up any ratings but difficult to interpret the meaning of the composite score later on.

Composing a summary score is a convenient way to transform sophisticated measurements of a complex concept into a single value. This becomes particularly relevant if this concept should be included in additional calculations involving a whole set of variables (e.g., multivariate procedures), in which case it is often essential to draw upon singular interval/ratio variables. However, the concept is thus reduced to one dimension, which might be appropriate only for selected research designs.

Scales For Relevant Concepts In Communication Research

The most important decision in the process of scaling refers to determining the operational definition of a concept or, more precisely, the indicators that stand for the concept (Bryman & Cramer 1995, 62–63). In contrast to other academic disciplines such as psychology, there is no strong tradition of scale development and testing in media and communication research. If the same concepts are used over and over again for research, it is obviously advantageous to have a set of standardized measures at one’s disposal that can be assumed to produce valid data (Sullivan 2001, 163). As a consequence, lists and handbooks of scales are widespread and easily available in other disciplines.

Up to now, this has not been the case in the field of media and communication studies. The only sourcebook of general communication research measures dates from 1994 and was edited by Rubin et al. They collected 62 profiles of measures they claimed were commonly used for research in the field, including interpersonal, instructional, mass, and organizational communication. All descriptions include the exact wording of the scale, data on its validity and reliability, and references to further application. As most of the original research in this volume was conducted in the 1980s, most of the scales are no longer adequate for current research instruments. A more recent sourcebook collected instruments particularly for the measurement of nonverbal measures (Manusov 2004).

Consequently, media and communication researchers often are on their own in finding or developing operational definitions of the concepts relevant to their topic (Bryman 2004, 66). Item sources for multiple-item measures are previous research, judgments of experts, opinions of the individuals who are the focus of the research, and of course the researcher’s knowledge and one’s own imagination (Sullivan 2001, 165–166).

References:

Babbie, E. (2002). The basics of social research, 2nd edn. Belmont, CA: Wadsworth Thompson Learning.
Black, T. R. (1999). Doing quantitative research in the social sciences. London: Sage.
Bryman, A. (2004). Social research methods, 2nd edn. Oxford: Oxford University Press.
Bryman, A., & Cramer, D. (1995). Quantitative data analysis for social scientists. London and New York: Routledge.
Carmines, E. G., & Woods, J. (2004). Index. In M. S. Lewis-Beck, A. Bryman, & T. F. Liao (eds.), The Sage encyclopedia of social science research methods, vol. 2. Thousand Oaks, CA: Sage, pp. 485– 486.
Dunn-Rankin, P., Knezek, G. A., Wallace, S., & Zhang, S. (2004). Scaling methods, 2nd edn. Mahwah, NJ: Lawrence Erlbaum.
Gorard, S. (2003). Quantitative methods in social science. New York and London: Continuum.
Jacoby, W. G. (2004). Scaling. In M. S. Lewis-Beck, A. Bryman, & T. F. Liao (eds.), The Sage encyclopedia of social science research methods, vol. 3. Thousand Oaks, CA: Sage, pp. 999–1002.
Manusov, V. L. (ed.) (2004). The sourcebook of nonverbal measures. Mahwah, NJ: Lawrence Erlbaum.
Ruane, J. M. (2005). Essentials of research methods: A guide to social science research. Malden, MA: Blackwell.
Rubin, R. B., Palmgreen, P., & Sypher, H. E. (eds.) (1994). Communication research measures: A sourcebook. New York and London: Guilford.
Sullivan, T. J. (2001). Methods of social research. Fort Worth, TX: Harcourt College.