Rating Methods

A rating, as the term is most often used in media industries, is an estimate of the size and demographic composition of a radio, television, or Internet audience. Such metrics are of enormous importance to advertiser-supported media because they set the value of the time used to run commercial messages. The larger and more desirable the audience, the more the media can charge advertisers. Ratings are typically measures of exposure to media based on surveys of various target populations conducted by “third-party” firms independent of the sales transaction. The practice of ratings research emerged in the United States in the 1930s, and has since been refined and adopted worldwide. However, new technologies that allow people to consume a wide range of media anywhere, at any time, coupled with the desire of advertisers to reach ever-more narrowly drawn markets, have strained current systems of audience measurement. Ratings companies have responded by developing new methods to keep pace with the demands of their client industries.

Audience ratings are most often based on some form of probability sampling, and as such are subject to the same kinds of non-response and sampling error that occur in any survey research. In dozens of countries, ratings companies provide commercial and government clients with estimates of national audiences. In a subset of those countries, including the US and China, firms provide more localized measurement of cities and regions. Two factors are pressing sample sizes to their limit. The increasing abundance of new media delivery systems, including broadband delivery systems and video-on-demand, has fragmented audiences, reducing the size of any one outlet’s audience. Concurrently, advertisers are more apt to be concerned with tightly defined target audiences. Under such circumstances, even large national samples are quickly whittled down to a very small number of respondents in the audience of interest, producing unacceptably high levels of sampling error. To address this problem, ratings firms are devising strategies to increase sample sizes, or basing their estimates on technologies that afford census-like numbers of respondents.

Measures Of Exposure

The methods used to measure media exposure are, in large part, what set ratings apart from other forms of survey research. As US radio grew into an advertiser-supported medium in the late 1920s, it became essential to quantify its audience. The first such effort, launched in 1930, used telephone recall techniques to ascertain listening in the previous 24 hours. A few years later, “telephone coincidental” techniques, which asked respondents what they were listening to at the time of the call, were introduced in an effort to reduce response errors attributable to faulty memories. Telephones are still used in ratings research today, though usually in support of other methods that provide more copious data on exposure.

Diaries are inexpensive paper booklets that require a respondent to make a written log of their radio listening or television viewing, usually for one week. Diary formats vary by medium, but all offer some sort of grid that divides each day into quarter hours or broader “day-parts.” In television measurement, a diary is assigned to each set in the sample household. In radio, each individual in the sample carries a diary. At the end of the survey week, the diaries are mailed to a processing center where they are coded, checked for logical errors and omissions, and ultimately turned into ratings reports.

If they are properly filled out, diaries provide a wealth of information, including audience demographics, at relatively low cost. Nonetheless, although they are, at this writing, in widespread use, they suffer from a number of problems that make them increasingly problematic for audience measurement. Diaries necessarily require some literacy. They suffer from relatively low response rates that, even with financial incentives, sometimes dip below 30 percent. Data collection is slow, and prone to a variety of processing and response errors. Most importantly, the new media environment, with remote controls, hundreds of channels, and various recording and delivery devices, simply overwhelms the ability of even conscientious diary-keepers to produce an accurate, contemporaneous record of their media use.

Meters, devices attached to receivers and producing a continuous paper record of tuning behavior, were introduced in radio measurement in the early 1940s by Arthur C. Nielsen. These were “household” meters that could record when sets were on, and the station to which they were tuned, but were incapable of identifying who within the household was listening. In the 1950s, household meters were adapted to television measurement. Such meters eventually made an electronic record of set use that could be retrieved over telephone lines to produce “overnight” ratings. They were the principal method of national television measurement in the US until the late 1980s, and continue to be used in some local markets.

Household meters have a number of advantages. They are fast, relatively unobtrusive, require no literacy, and produce vast amounts of accurate set-tuning data over long periods of time. Compared with diaries, however, they are expensive. Their cost is justified only in larger markets (e.g., nations and major cities). Moreover, household meters produce no demographic information, so they must typically be used in conjunction with diaries.

A new generation of meters called people-meters, introduced in the late 1980s, provided a means to quickly gather demographic information. They work much like household meters, but feature a set-top box and/or hand-held devices that allow respondents to press a button signaling their presence in front of the set. People-meters are currently the preferred way to measure television audiences around the world. They do require some effort on the part of respondents, however, and so are more obtrusive and prone to respondent fatigue and error than is ideal. Newer generations of more passive, portable devices are being introduced.

All the aforementioned techniques gather data from samples. Even large national panels, which might exceed 10,000 households, can be insufficient to estimate the audience for a very small network, or to assess the behaviors of a narrowly defined market segment. Media industries are now studying the possibility of harnessing the data created by digital set-top boxes. These are analogous to household meters operating in millions of homes, and might provide a way to study highly fragmented digital media consumption. There are, however, at least three problems with this approach. First, it presents obvious concerns about privacy. Second, not all homes subscribe to digital cable or satellites, and even those that do won’t necessarily have all their sets attached to the service. Hence, it is impossible to determine the total audience for all channels. Third, like all household meters, the technology provides no “people data,” though mathematical models can approximate demographic composition.

Internet audience measurement has also presented some relatively new opportunities to measure user behaviors. One approach, sometimes labeled “user-centric,” mirrors conventional ratings research. Here a probability sample of users is recruited to provide information. However, since Internet access is gained via a computer, it is a relatively simple and inexpensive proposition to install a piece of software on the user’s machine that records and reports the URLs the user visits. This, in effect, turns each computer into a metering device, and allows research firms to create much larger panels than would be economically viable with conventional metering. Alternatively, a “server-centric” approach takes advantage of the fact that all Internet traffic is managed by computers called servers. They can record the total number of times information is requested and fed to users. This goes beyond sampling and represents a census of use. Unfortunately, server “hits” can be difficult to decipher. While there are techniques to differentiate returning versus new visitors, or identify their place of origin, this approach cannot provide reliable demographic information about Internet audiences. It can, however, be combined with usercentric data to provide highly detailed estimates of exposure, including the use of media streamed over the Internet.

Measures Of Engagement

Another consequence of newer media that allow people to see what they want when they want it is some erosion in the value of simple exposure as the metric upon which media time is bought and sold. DVRs and other on-demand technologies make it relatively simple for audiences to avoid commercials. Increasingly, advertisers are demanding some measure of the extent to which audiences are involved or engaged with the media they consume. The general theory is that engaged audience members will be less inclined to look away, more receptive to advertising, and better able to recall brand messages.

“Qualitative” ratings are nothing new. In some countries with strong traditions of public service broadcasting, finding out how much people like or learn from programming is an ongoing practice. Historically, in the more commercially oriented US, qualitative ratings have foundered. Current definitions of engagement include various affects, attentiveness, recall, intentions, and behaviors. For these factors to constitute an ongoing system of ratings, the industry must reach some consensus on the definition of engagement, valid measures of the construct, and whether the value of such supplemental ratings ultimately justifies the cost.

Ratings Quality

Ratings are subject to various sources of error, including sampling, response, nonresponse, and processing error. The first three are familiar to survey researchers. The last speaks to the fact that ratings are a complex product manufactured from various inputs, including different measures of behavior as well as program and advertising information. All such forms of error have relatively objective meanings and can generally be ameliorated with the application of sufficient resources.

However, there are more subjective criteria that affect the quality of ratings data. Take, for example, something as fundamental as the definition of exposure to television. Should the audience for a program include those who watched the show in real time as well as those who recorded it? If the latter, does their inclusion depend upon how quickly they replayed the program? Is a delay of a few minutes, or hours, or days acceptable? There are no objectively right answers, but the resolution can have profound consequences for different ratings consumers. As a consequence, ratings are inevitably the product of an ongoing process of negotiation among different industry and government players, often with competing interests. That very tension is probably the best guarantee that ratings maintain a reasonable degree of quality.

References:

Ang, I. (1991). Desperately seeking the audience. London: Routledge.
Beville, H. (1988). Audience ratings: Radio, television, and cable, rev. edn. Hillsdale, NJ: Lawrence Erlbaum.
Ettema, J. S., & Whitney, D. C. (1994). Audiencemaking: How the media create the audience. Thousand Oaks, CA: Sage.
Napoli, P. M. (2003). Audience economics: Media institutions and the audience marketplace. New York: Columbia University Press.
Webster, J. G., & Phalen, P. F. (1997). The mass audience: Rediscovering the dominant model. Mahwah, NJ: Lawrence Erlbaum.
Webster, J. G., Phalen, P. F., & Lichty, L. W. (2006). Ratings analysis: The theory and practice of audience research, 3rd edn. Mahwah, NJ: Lawrence Erlbaum.