Discriminant Analysis

The main aim of discriminant function analysis is to predict group membership of an object or a person by using as few characteristics (or set of predictors) as possible. Additionally, discriminant analysis is used to classify elements according to their characteristic properties. So if you know the answers a subject might give to a crucial set of questions and/or the scores that subject might achieve on an important set of characteristics, discriminant analysis enables you to predict whether the subject is or will likely be a voter or nonvoter, buyer or nonbuyer, college dropout or graduate, researcher or lecturer, master or servant, winner or loser, television viewer or nonviewer, opinion-leader or follower, etc.

For example, a product manager is interested in the critical characteristics of the target group that distinguish the buyers of the product from the nonbuyers. The question is: what makes a person a buyer or a nonbuyer? What are the relevant factors that are responsible for paying money for the product or not? And which of the factors is mainly responsible for the difference; which are more important than others? Do buyers differ from nonbuyers predominantly in their socio-demographic background, their psychological profile, their attitudes, or their personal interests? If the answers a subject might give to the crucial questions are known, the product manager is able to predict the subject’s group membership. In addition, the product manager knows the general probability of classifying a “new” element into the “right” group.

Application In Communication Research

A well-known example of the application of discriminant analysis in communication research is the attempt to differentiate television viewers from nonviewers by means of numerous variables (e.g., Tankard & Harris, 1980). A list of more than 200 variables was reduced first to 43 and then to 11 by using discriminant analysis. The 11 variables succeeded in classifying television viewers and nonviewers correctly for 74.6 percent of the cases in a sample from a national US survey. Nonviewing was associated with less satisfaction from family life, greater happiness in general, fewer young children in the household, greater participation in groups and organizations, more time spent in active military service, lower family income, stronger view of self as religious, less frequent attendance at religious services, more frequent socializing with friends outside the neighborhood, and so on.

Another example of the advantages of discriminant analysis taken from the field of political communication revealed that supporters of various political parties (in South Africa) could more easily be differentiated in terms of their support for internal harmony and equality than in their support for national strength and order (Heaven et al. 1994). Research on opinion leadership showed that, in addition to personal involvement and product familiarity, public individuation was the most important variable in distinguishing opinion leaders from non-leaders. Risk preference, open-mindedness, and mass media exposure, though correlated with opinion leadership, were not found to be important predictors of opinion leadership (Chan & Misra 1990).

Origin

The famous statistician, Sir Ronald A. Fisher, originated the concept of discriminant analysis. It is sometimes called Fisher’s linear discriminant, although Fisher’s original article (1936) described a slightly different discriminant, which does not make some of the assumptions of discriminant analysis such as normally distributed classes or equal class co-variances (see below). Technically speaking, discriminant analysis asks which (linear) combination of properties can best predict the discrimination of groups, and which of these independent variables has the largest predictive power, i.e., how much variance can be explained by each of the independent variables. The resulting linear combination of features is often used for dimensionality reduction before later classification.

Hence, discriminant analysis can also be related closely to principal component analysis (PCA) and factor analysis in the attempt to find linear combinations of variables that best explain the data. While discriminant analysis explicitly tries to model the difference between the classes of data, PCA does not take into account any difference in class, and factor analysis builds the dimensions based on differences rather than similarities. Nevertheless, discriminant analysis can be employed as a useful complement to principal components analysis and also to cluster analysis in order to judge the results of these analyses.

Statistical Basis

The statistical technique underlying discriminant analysis is closely related to regression analysis and analysis of variance, which also attempt to express one dependent variable as a linear combination of other measures. However, in the two latter procedures, the dependent variable is continuous, while for discriminant analysis it is categorical (i.e., group membership). In contrast to procedures like multivariate analysis of variance (MANOVA), in discriminant analysis group membership is explained and predicted by two or more metric independent variables. The set of these predictors could be all kinds of measures – answers to questions in a survey, test scores in psychological tests, observation data, etc. The choice of predictors is usually made on the basis of theory about which variables provide information about group membership.

Because in MANOVA we ask whether group membership is associated with significant mean differences on a combination of dependent variables, MANOVA and discriminant analysis use similar mathematical algorithms. But discriminant analysis is putting cases into groups using a classification process – which is a significant extension of MANOVA. Moreover, adequacy of this classification can be evaluated. Adequacy reveals how many cases or subjects are classified correctly, that is, actual and predicted membership are identical (as shown in the above mentioned example where 74.6 percent of the cases were classified correctly). In addition, discriminant analysis allows for interpreting the pattern of differences among independent variables to differentiate among groups. Discriminant analysis may thus have a descriptive and a predictive objective. On the one hand, if the researcher is simply interested in a decision rule for classifying cases into groups, the number and meaning of the dimensions is of minor interest. On the other hand, interpreting the results of discriminant analysis in terms of the combination of predictors is usually of high value for research, as shown in the characterization of the “nature” of television nonviewers in the example outlined above.

Discriminant analysis can equally well be univariate as multivariate. Univariate discriminant analysis consists of only two groups that are separated through the linear combination of factors, represented by one linear discrimination function. If there are more than two groups, the number of discriminant functions is the number of groups minus 1 (degrees of freedom). In general, classification procedures such as discriminant analysis make fewer statistical assumptions (according to sample size and distribution) than inferential procedures. However, the sample size of the smallest group should at least exceed the number of predictor variables. Nevertheless, classification is optimal if requirements on data are met, such as absence of outliers and homogeneity of variance/ co-variance matrices. If group sizes are unequal or distributional assumptions are unsustainable, logistic regression can answer some of the same questions that discriminant analysis answers.

References:

Chan, K. K., & Misra, S. (1990). Characteristics of the opinion leader: A new dimension. Journal of Advertising, 19(3), 53 – 60.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179 –188.
Heaven, P., Stones, C., Nel, E., Huysamen, G., & Louw, J. (1994). Human values and voting intention in South Africa. British Journal of Social Psychology, 33(2), 223 –231.
Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics. Boston: Pearson.
Tankard, J. W., Jr., & Harris, M. C. (1980). A discriminant analysis of television viewers and nonviewers. Journal of Broadcasting, 24, 399 – 409.