The term “coding” has different meanings in empirical research. Generally speaking, coding becomes relevant whenever data at hand are unstructured, and coding then provides a structure for a systematic analysis of these data. In quantitative research using standardized instruments, coding is the process of tagging data about a given unit of analysis, in order to assign these units to a category. Usually, these categories correspond to numbers that allow the information to be processed by statistical software. Coding gains particular importance in quantitative content analysis, where it represents the main task of researchers. Likewise, the term is used in qualitative research to describe how data gathered with nonstandardized methods is broken down into components relevant to the research question under study (Bryman 2004, 537). Within statistical data analysis, the term “recoding” is technically used for the procedure of regrouping the categories of a variable; this meaning of the term is not elaborated on in greater detail here.
Coding In Quantitative Research
Data Tagging
For the purposes of computerized data analysis, researchers need empirical information to be tagged with numeric codes. This is true whenever processing is based on structured data collection commonly labeled as “quantitative.” The coding of items can be carried out either by the data source (usually the interviewee) or by the researcher (Sullivan 2001; Babbie 2002).
In the first case, coding by the data source, the researcher prepares a precoded instrument, e.g., a questionnaire with closed-ended questions. This means that the interviewee is exposed to a fixed selection of categories from which he or she can choose as appropriate with regard to the stimulus (for the most part, a question or an object to be rated). Accordingly, this is a two-step process that first requires the researcher to generate relevant and meaningful categories. These categories should be distinct, exhaustive, and adequate with respect to the related construct. In the second step, the interviewee gives the required assessment based on the given alternatives.
The advantages of this procedure are obvious: with the completion of the data entry, descriptive results of the research are quickly available. But this procedure also raises at least two problems in terms of validity. On the one hand, the results of the data analysis depend heavily on the thoroughness of category construction exercised by the researcher. Second, the difficult task of transforming empirical reality into numeric codes is handed over to the interviewee. Thus, data quality reflects the way the interviewee understands the different answering options and the mental effort he or she puts into the responses. Variability in this respect is a substantial source of error, which is why careful category construction focused on minimal ambiguity is important.
The second case mentioned above, coding by the researcher, refers primarily to the use of so-called open-ended questions in a questionnaire. Here the steps are performed in reverse order. First, the interviewees are given a question without precoded categories and are encouraged to express their thoughts in their own individual style. In a selfadministered questionnaire, respondents are required to write down all relevant aspects. During an interview, their answer is either recorded literally or in note form, or the interviewer translates the answer immediately into precoded categories usually not available to the interviewee. When data collection includes nonstandardized entries, the process of coding is postponed until the data editing phase, when categories are defined in a coding frame and codes are assigned to the answers.
Pros and cons are distributed exactly the other way around in this case. After collecting the data, further coding efforts are necessary, which postpones the results. Furthermore, people’s answers to open-ended questions may vary substantially across a sample – a phenomenon that becomes increasingly important when the reference point for answers is not clear. As no categories are administered, the researcher does not obtain comparable results from all units of analysis. This might impose severe restrictions on data analysis. Still, open-ended questions are an important tool since they produce “unbiased” data in the sense that interviewees can reproduce their thoughts without needing to perform any kind of mental transfer beforehand. Furthermore, if the subject matter lacks evidence to establish precoded categories in the run-up to the survey, open-ended questions provide an opportunity to collect a broad range of relevant aspects. To this effect, it resembles coding in qualitative research (see below).
While coding by the researcher has so far been applied to the example of survey research, it is relevant to other methods of empirical data collection as well. For instance, in the case of structured or systematic observation research, the schedule for the recording of observations such as interpersonal communication or group communication may include precoded categories as well as space for individual notes to be coded afterwards (for a classic example of coding communication behavior in small groups, see Bales 1950).
Content Analysis
One major methodological approach in communication research is the standardized content analysis of messages in general and of media coverage in particular. For this purpose, particular features of these messages need to be identified and assigned according to a set of relevant categories previously defined by the researcher. Thus, coding is the main procedure when applying this method, which explains why the term is prominent in almost every stage of a content analysis research process (Krippendorff 2004, 125 –149). It is used to address the primary preoccupation of the research assistants we call coders, and, moreover, it serves as a general label for the fieldwork in content analysis as a whole.
Procedures in standardized content analysis are quite similar to data tagging in survey research, as mentioned above. Researchers will determine the concepts to be measured and operationalize these concepts by defining categories accordingly. These categories are summarized in a coding manual, together with further instructions and examples. Coders are provided with a coding scheme that is prepared to hold the codes attributed to each category by the coder. Data entry is based on these coding schemes or, if computer-aided coding applies, the coder enters each code into a preformatted data file (Neuendorf 2002).
With regard to the mental processes involved in coding, the term refers to the capability of each coder to identify the relevant characteristics of each unit of analysis, and to assign these coding units correctly to the categories and their respective codes. Thus, the difficult task of translating the content of messages to numerical data is left to the coder. Therefore, his or her performance is considered the main source of error in standardized content analysis. Different techniques are applied to ensure correspondence of coding acts, beginning with intense coder training and ending with checks of coding agreement between and within coders (Riffe et al. 1998, 104 –134).
Besides this standardized type of content analysis, there is a longstanding tradition of so-called qualitative content analysis. This method basically refers to techniques of document analysis. Messages are extracted from these documents by using a more holistic approach since texts (or pictures, films, or voice recordings) are considered complex structures of meaning. Condensation of issues and arguments often follows a more implicit process. Quotes from the materials are presented in the text report, and are important to illustrate the reconstructions of the researcher.
Coding In Qualitative Research
Although qualitative research depends on standardization to a substantially lesser degree, coding still remains an issue. It applies, for example, to grounded theory as a general strategy of qualitative data analysis where coding is considered to be a key process. “It entails reviewing transcripts and/or field notes and giving labels (names) to component parts that seem to be of potential theoretical significance and/or that appear to be particularly salient within the social worlds of those being studied” (Bryman 2004, 402).
The meaning of “coding” in qualitative research (which some authors also refer to as “indexing”) differs from the meaning prevalent in quantitative research, where coding is a technique for transferring information to numeric data, classification, and data handling. In this case, important decisions about what a single code actually means are made in advance, during the procedure of operationalization. In contrast to this rather instrumental sense, coding in qualitative research is a first step toward creating a theory. It represents an ongoing state of potential revision and fluidity, as the resulting codes are not treated as fixed “data.” In contrast, qualitative research merely treats codes as potential indicators of concepts, with these indicators being permanently under revision. This is because, while research proceeds, they are compared with previous codes and modified with regard to the best fit of concepts.
Following Strauss and Corbin (1998), three types of coding practice can be distinguished, which are often not exclusive but used sequentially in the elaboration of categories. First, open coding implies the least regulated step, when data are examined, compared, and broken down into codes (or “concepts”), which are later grouped and turned into categories. Second, axial coding usually follows open coding and aims at connecting the categories obtained by relinking the codes to contexts such as causes, consequences, or patterns of interaction. Finally, selective coding identifies the core category, which is the one central aspect that serves as the main concept for interpretation. All other categories are organized around the core category and thus integrated into a larger frame that is called the “storyline.”
These steps are not undisputed in research practice. Researchers who fear terminating the exploratory step too early in the process suggest refraining from axial coding and performing only an initial and a focused coding. There are several different approaches to conceptualizing the coding process in qualitative research, but most of them share the general distinction between one phase that stays closer to the original data, and a second phase where codes are transferred to a more abstract level in order to construct meaning about the phenomena under study.
Research practice has developed indications and guidelines for developing codes. Considerations that may lead to relevant codes include the topic of an item of data, its underlying type of event, the persons involved, statements and intentions of these persons, causal attributions, and descriptions of opinions, emotions, and behaviors. In the process of coding it is suggested to code promptly after data collection, to keep in mind that each item of data can be coded in more than one way, and, above all, to constantly move forward and backward between the different steps of coding. Only from this permanent rereading and reviewing may the relevant connections between concepts and categories emerge. Finally, coding should not be equated to analysis in general: although coding represents an important step within the analysis (particularly because it reduces the amount of data available), interpretation is still the step in which codes are assigned to meaning, the significance of the coded material is assessed, and the findings are reflected in the light of theoretical considerations and earlier research.
References:
- Babbie, E. (2002). The basics of social research, 2nd edn. Belmont, CA: Wadsworth Thompson Learning.
- Bales, R. F. (1950). Interaction process analysis: A method for the study of small group interaction. Cambridge, MA: Addison Wesley.
- Bryman, A. (2004). Social research methods, 2nd edn. Oxford: Oxford University Press.
- Krippendorff, K. (2004). Content analysis: An introduction to its methodology, 2nd edn. Thousand Oaks, CA: Sage.
- Neuendorf, K. (2002). The content analysis guidebook. Thousand Oaks, CA: Sage.
- Riffe, D., Lacy, S., & Fico, F. G. (1998). Analyzing media messages: Using quantitative content analysis in research. Mahwah, NJ: Lawrence Erlbaum.
- Strauss, A., & Corbin, J. M. (1998). Basics of qualitative research: Techniques and procedures for developing grounded theory, 2nd edn. Thousand Oaks, CA: Sage.
- Sullivan, T. J. (2001). Methods of social research. Fort Worth, TX: Harcourt.