Quantitative Content Analysis

Quantitative content analysis is an empirical method used in the social sciences primarily for analyzing recorded human communication in a quantitative, systematic, and intersubjective way. This material can include newspaper articles, films, advertisements, interview transcripts, or observational protocols, for instance. Thus, a quantitative content analysis can be applied to verbal material, and also to visual material like the evening news or television entertainment. Surveys, observations, and quantitative content analysis are the main three methods of data collection in empirical communication research, with quantitative content analysis the most prominent in the field. In other disciplines like psychology or sociology quantitative content analysis is not used as widely.

Quantitative And Qualitative Content Analysis

Ole R. Holsti (1969) defines quantitative content analysis as “any technique for making inferences by objectively and systematically identifying specified characteristics of messages.” Bernhard Berelson (1952) speaks of “a research technique for the objective, systematic and quantitative description of the manifest content of communication.” There has been much debate on this classical definition of quantitative content analysis: what does the word “manifest” mean, and is it possible to analyze latent structures of human communication beyond the surface of the manifest text, i.e., the “black marks on white”? In a more practical sense, the word “manifest” should be interpreted in terms of “making it manifest.” For example, if one were to look for irony in political commentaries, the construct of “irony” is not manifest in the sense of being directly identifiable from “black marks on white.” Whether or not there is irony in a commentary has to be interpreted. Thus, before a commentary is coded, it has to be precisely determined what words, phrases, key words, or arguments should serve as indicators for the category “irony.” In other words, this “latent” aspect of communication is made manifest by its definition.

With newspaper articles, the logic of quantitative content analysis can be described as a systematic form of “newspaper reading.” Here, the reader, i.e., the person charged with coding, assigns numeric codes (taken from a category system) to certain elements of the articles (e.g., issues or politicans mentioned in articles) by following a fixed plan (written down in a codebook). Of course, one does not analyze only one but many articles, reports, or commentaries from different newspapers. In this respect, quantitative content analysis differs from qualitative content analysis. While qualitative content analysis is limited to a number of roughly 50 to 100 sample elements (e.g., newspaper reports, interview transcripts, or observational protocols), quantitative content analysis can deal with a huge number of sample elements. Qualitative content analysis works rather inductively by summarizing and classifying elements or parts of the text material and by assigning labels or categories to them. Quantitative content analysis, however, works deductively and measures quantitatively by assigning numeric codes to parts of the material to be coded – which is called coding in quantitative content analysis.

Fields Of Application

Originally, quantitative content analysis was linked to propaganda research, for instance, propaganda material in World War II. Nowadays, quantitative content analysis is applied to many different forms of human communication like print and television coverage, public relations, entertainment, advertisements, photographs, pictures, or films. For instance, subjects of analysis can include national images in media coverage about foreign affairs, the role models presented in television advertisements, the kind and quality of arguments in company press releases, or the features of actors in modern Asian film. In theoretical terms, quantitative content analysis is being applied to many different fields of research in communication science, for example, in analyzing the kind of issues covered by the media in the context of agenda-setting. Similarly, quantitative content analysis is a key method in the cultivation approach, where it is called message system analysis. In both these approaches, content analysis is combined with a survey in field design. Further applications of quantitative content analysis are in: election studies, where it is used to examine election campaigns themselves or media coverage about the candidates, parties, and campaigns; and in media coverage of or public discourses on social problems and issues like racism, social movements, or collective violence.

Purposes Of Quantitative Content Analysis

Generally speaking, there are three major purposes and thus three basic types of quantitative content analysis which follow the popular “Lasswell formula,” which asks who says what, to whom, why, to what extent, and with what effect. The first purpose is to make inferences about the antecedents of communication, for example, examining the coverage of a liberal and a conservative newspaper (who). If there is a political bias in the commentaries or in the news reports (what), the bias would be explained (why), e.g., differences between the newspapers in their editorial lines or political orientation.

The second purpose is merely to describe communication (what, to what extent). Here, different techniques can be applied (which are described below). The third purpose is to draw inferences about the effects of communication (to whom, with what effect). Agendasetting and cultivation studies are good examples for this basic type of quantitative content analysis. Yet, strictly speaking, such an inference is not possible when nothing but the message is examined. Additionally, a survey should be carried out, for instance, before a statement is made about mass media coverage influencing people’s thoughts and perceptions of the world, as in the cultivation approach. The reason for this can be explained by constructivism. From the constructivist point of view, quantitative content analysis is a reactive method, like surveys for example, since the message to be coded is not fixed in an objective sense. Several people reading the same message may interpret it differently due to their individual schemata, beliefs, and attitudes. Thus, when making inferences about message effects, one should carry out not only a quantitative content analysis but also a reception study or an effects study.

Types Of Quantitative Content Analysis

There are different types of quantitative content analysis, i.e., different techniques of describing human communication. The first focuses on frequencies, where one merely counts, e.g., the appearance of issues, persons, or arguments in newspaper coverage. Many studies using the agenda-setting approach are such simple frequency content analyses, and most rely only on media archives. Thus, they do not examine the content or text, e.g. of a newspaper article, but focus only on the headline. This is a simplified version of quantitative content analysis.

The second technique focuses on tendencies. The above-mentioned example of political bias in the commentaries of a liberal and a conservative newspaper is a good example. In another example, the analysis would not only measure the number of articles on nuclear energy, but also their viewpoint on the issue, e.g., by noting the advantages and disadvantages mentioned. If the advantages reported exceed the disadvantages, the article would be observed to be “in favor” of nuclear energy, while if the disadvantages reported exceed the advantages, it would be considered to show “disapproval” or a “negative tendency.”

The third technique does not only focus on tendencies, but also on intensities. Here, one would not just code on an ordinal scale (using, e.g., “positive,” “ambivalent,” and “negative”), but use interval measurement (e.g., “strongly positive,” “positive,” “ambivalent,” “negative,” and “strongly negative”).

Finally, there are also several techniques that meet the popular objection that quantitative content analysis “disassembles” communication. Critics of quantitative content analysis (e.g., Siegfried Kracauer) claim that quantitative content analysis cannot examine relations between elements of communication, i.e., the semantic and syntactical structures of communication. They argue, for instance, that when arguments and persons in newspaper reports are counted separately, the fact that arguments are raised by persons is neglected; or that when statements are separately coded, the fact that a statement can refer to another statement, e.g., by supporting it, is not taken into account. The latter objection may apply with reference to most agenda-setting studies, but it is not a justified objection against quantitative content analysis in general.

A good example of a technique that meets the semantic and syntactical structures of communication is the Semantische Strukturund Inhaltsanalyse (Semantic Structure and Content Analysis) developed by Werner Früh. Without going into detail, the technique considers various elements of communication as well as the relations between them; for instance, it analyzes persons and roles mentioned in newspaper articles, but it also examines time aspects like anteriority and so-called “modifications” like persons’ features or local specifications. In addition, it looks for so-called “k-relations,” such as causal, intentional, or conditional relations as well as for “R-relations” mentioned in, e.g., news reports. Thus, newspaper articles are deconstructed into micro-propositions, but the semantic structure and content analysis reconstructs all relations between these micropropositions.

Most quantitative content analyses examine text or verbal material, i.e., transcribed or recorded human communication. Studies analyzing visual material like films, television advertisements, or televised debates between presidential candidates are comparably rare. There are three major reasons for this. First, copies of newspaper articles are easier to access than copies of the evening news, for instance; in retrospective studies (which most quantitative content analyses are), especially, visual material is often no longer available. Second, visual material is more complex than verbal or text material. For example, television news not only provides information via the audio channel but also via the visual channel. Since verbal and visual information can deliver different messages, one has to code both streams of information. This is more expensive than just coding print news. Finally, coding visual material like evening news on television requires more detailed coding instructions and more complex category definitions than a codebook for analyzing newspaper coverage.

A more or less recent development in quantitative content analysis is automatic, that is, computer-assisted, content analysis, where a computer program counts keywords and searches for related words in the same paragraph, for example. Before the coding process begins, all the relevant keywords or phrases in a so-called coding dictionary – an equivalent of the codebook of a conventional content analysis – have to be listed. While there has been some progress in this technique, it will be a while before the human coder becomes redundant.

Current challenges for quantitative content analysis stem from the world wide web, where the content of a private weblog, arguments in online chat, or the pictures in an online gallery can be subject to analysis. Compared to newspaper coverage, for example, a content analysis of online communication can be quite a problem. Here, the population from which a sample is taken for analysis is not fixed but changes from day to day, or even more quickly. It is therefore important to store all relevant communication for a specific study. But even if this were possible, one seldom has a view of the complete population since the world wide web, or the Internet, as a whole is not easy to grasp. Thus most studies that analyze online communication work with samples that are more or less clearly defined.

Standards

Like any other method in the social sciences, quantitative content analysis has to meet certain standards of quantitative empirical research. The first criterion of intersubjectivity calls for transparent research. This means that all the details of a quantitative content analysis have to be described and explained so that exactly what has been done is clear. The second criterion of systematics requires that the coding rules and sampling criteria are invariantly applied to all material. The third criterion of reliability calls for the codebook to be dependable. Different coders do not always agree on coding. For example, one coder may identify an argument in a newspaper article as argument 13 from the argument list in the codebook, while another coder will choose argument 15, with the result that the numeric codes assigned to the argument in the newspaper article do not match. Yet, in other cases the two coders may agree. Using all codings of both coders, one can divide the doubled number of matching pairs (e.g., 17 identical codings) by the number of all codings of the first coder and the number of all codings of the second coder (e.g., 20 codings each), to obtain a ratio called the Holsti-formula, which is a simple reliability coefficient. In this example we would find R = 17 * 2 / (20 + 20) = 0.85. The values of all reliability coefficients range from 0 (no matching at all) to 1 (perfect matching). Another popular reliability coefficient – better known from index calculation – is Cronbach’s alpha α.

The fourth standard for a quantitative content analysis is validity. An instrument of empirical research, i.e., the codebook with reference to CA, can claim to be valid when it measures what it intends to measure. For instance, if the codebook contains a category “stereotype,” then the coders should not measure political bias or irony when applying the coding rules for this category, but code stereotypes. According to Klaus Krippendorff, there are different forms of validity. The type of validity in the example can be called face validity. Predictive validity and concurrent validity both refer to an external criterion measure for validating data obtained by a quantitative CA. Such a measure may be another quantitative content analysis or may be statistical data from governmental sources. With concurrent validity the validity test is administered at the same time as the criterion is collected. With predictive validity, scores are predicted on some criterion measure.

Content Analysis As Research Practice

The research process using quantitative content analysis comprises six steps. It usually begins with theoretical considerations, literature review, and deducing empirical hypotheses. In the second step, the sample material that is to be coded, i.e., examined with the codebook, is defined. In the third step, the coding units (e.g., articles or arguments) are described. In the fourth step, the codebook with the category system is developed and pre-tested. The actual measurement, i.e., the process of coding, represents the fifth step of a CA. The final step is data analysis and data interpretation.

Most quantitative content analysis requires multilevel sampling, for instance, analysis of campaign coverage would involve the choice of a limited number of national newspapers which represent diverse political standpoints (e.g., from liberal to conservative). On the next level the time span to be analyzed (e.g., every day in the critical phase of the election campaign) is set. On the next level, the articles to be coded are determined (e.g., all articles on the front page). Usually the sample is called the unit of analysis. The coding unit, however, is the most important unit in quantitative CA. It defines the level of measurement. For example, if the features of an article (e.g., the main issue of the article) are examined, the single article is the coding unit, but if the attributes of an argument (e.g., the issues mentioned in an argument) are examined, then the single argument is the coding unit. In the first case, 100 articles (with 5 arguments per article) will lead to 100 codes, in the second case the same number of articles will produce 500 codes. The level of coding depends on the sample size as well as on the research question. If the the study focuses on argumentation structures, the article will not be chosen as the coding unit. If political coverage in 10 newspapers in the last 50 years is the focus of the study, the argument or statement will not be chosen as a coding unit, otherwise a vast number of cases will have to be coded.

References:

Berelson, B. (1952). Content analysis in communication research. Glencoe, IL: Free Press.
Früh, W. (1998). Inhaltsanalyse: Theorie und Praxis, 4th edn. [Content analysis: Theory and practice]. Constance: UVK.
Holsti, O. R. (1969). Content analysis for the social sciences and humanities. Reading, MA: AddisonWesley.
Krippendorff, K. (2004). Content analysis: An introduction to its methodology. Beverly Hills, CA: Sage.
Neuendorf, K. A. (2002). The content analysis guidebook. Thousand Oaks, CA: Sage.
Popping, R. (2000). Computer-assisted text analysis. Thousand Oaks, CA: Sage.
Riffe, D., Lacy, S., & Fico, F. G. (2005). Analyzing media messages: Using quantitative content analysis in research. Mahwah, NJ: LEA.
Weber, R. P. (1990). Basic content analysis, 2nd edn. Newbury Park, London, and New Delhi: Sage.