Factor analysis is a data analysis procedure that aims at extracting a small number of factors from a large number of items, that is, from variables. The extracted few factors should be able to describe and explain the core characteristics of a phenomenon, without the loss of too much information. Hence, factors show the essence of a large amount of measured items.
Origin And Applications
Charles Spearman pioneered the use of factor analysis and is credited with the invention of this statistical technique. He found out that children’s scores on a wide variety of seemingly unrelated items were somehow correlated. He postulated that a general mental ability underlies and shapes cognitive performance. Factor analysis was and is therefore often associated with intelligence research and the development of objective tests for its measurement. Raymond Cattell expanded on Spearman’s idea in using a multifactor theory to explain intelligence. He addressed alternate factors in intellectual development, like motivation and psychology. Still very well-known is Cattell’s 16 personality factors (16 PF) theory of personality. However, the technique has also been used to find factors in a broad range of domains such as psychology (personality), marketing, sociology, etc.
An important and famous application of factor analysis in communication research is the derivation of general motives for media use that stem from numerous single reasons for using media. Factor analysis was able to reveal that television is mainly used because of general motives such as information seeking, entertainment, (para)social interaction, and orientation. Known as the television viewing motives scale this measure, developed by Bradley Greenberg and adapted by Alan Rubin, is the most widely used measure of viewing motivation. Other popular communication and/or psychological scales derived from factor analysis are the need for cognition scale, big-five personality inventories, the parasocial interaction scale, the personal involvement inventory, the source credibility scale, etc. (see Rubin et al. 2004 for detailed references). Because of their ability to reduce the complexity of extensive phenomena, factor analysis procedures are frequently used in communication research. Out of 550 published articles in Human Communication Research, Communication Research and Communication Monographs between 1990 and 2000, 119 (21.6 percent) used factor analysis procedures (Park et al. 2002).
Forms Of Factor Analysis
The technique can be differentiated into exploratory factor analysis (EFA), principal component analysis (PCA), and confirmatory factor analysis (CFA). While the general goal of EFA (outlined above) is to find the latent structure of observed variables by uncovering factors that influence the measured variables, PCA tries to reduce the measured variables to a smaller set of components that finds as much information as possible in the measured variables with as few components as possible. While EFA focuses on the shared variance among the variables by separating common variance from unique variance, PCA does not distinguish between common and shared variance and focuses on the variation among the variables. Therefore PCA is less suitable to reveal latent constructs (i.e., factors), but more appropriate for reducing measured variables into a smaller set of items (i.e., components). In doing this, PCA keeps as much variance as possible out of the total variance (Park et al. 2002). For example, if a researcher wants to reduce measures such as indicators for education, income, occupation, and qualification into a composite component such as socio-economic status, PCA would be more appropriate than EFA. In short, as in EFA the measured variables are a function of factors, in PCA the components are a function of the measured variables. The latter will retain as much information as possible from the original variables (Fabrigar et al. 1999).
This being said, EFA is usually performed in the early stages of research, providing a technique for consolidating variables and for generating hypotheses about the underlying constructs. While in this exploratory stage we may want to determine a structure among many variables, factor analysis can also be used in a confirmatory fashion (CFA). In that case, we are either testing a hypothesis, or wanting to find the statistical proof that one specific theoretical construct is responsible for the correlation of several theoretically relevant variables. CFA should therefore be used when constructs are measured with multiple items, when the items have a linear relationship to the scale average or total, and when an a priori idea of which items measure which constructs is present (Levine 2005). The latter procedure therefore aims mainly at construct validation and requires a more thorough and sophisticated approach than does EFA, which is a relatively simple task due to sophisticated and user-friendly data analysis software.
Statistics Behind Factor Analysis
Technically, factor analysis uses the correlation coefficients between all measured items as a description of the similarity of items. For example, the researcher starts out with a very large number of items reflecting a first guess about the items that may eventually prove useful – different reasons for watching television, for instance. The researcher is now interested in the dimensions that lie “behind” these individual reasons. The statistical procedure tries to maximize correlations within one factor and minimize correlations between the factors. Hence, the factor is the underlying theoretical construct that is able to explain the correlation between the items. The “correlation” between one item and the factor is called “factor loading” and represents the importance of this variable for the factor. The sum of squared factor loadings of one variable is called “communality.” It indicates to what degree the variance of this variable can be explained by all factors. An item with a low communality is therefore deficiently represented in the model.
Mathematically, the technique produces several linear combinations of measured variables, each linear combination a factor. As an example, a high correlation between the items “fun,” “well-being,” “tension,” and “liking” could be subsumed under the factor “entertainment.” While a factor reduces complexity and is able to illustrate phenomena on a higher abstraction level, the interpretation and naming of the factors is derived from the particular combination of the measured variables that correlate highly with it. Thus, the final choice among alternative interpretations and – mathematically identical – solutions depends on the researcher’s assessment of its interpretability and scientific utility.
The validity of the factors is tested in research where predictions are made regarding differences in the behavior of persons who score high or low on a factor. The researcher tries to prove that scores on the factors (latent variables) co-vary with the scores on other variables. Nevertheless, in PCA and EFA there is no external criterion against which to test the solution – as there is with group membership in discriminant analysis or logistic regression, for instance (Tabachnik & Fidell 2007).
Factors – and factor scores of every subject – can be used for further analyses. The advantage is that we do not have to deal with long lists of variables anymore. The problem, however, is that part of the variance explained by all items is lost by using factors as variables. Nevertheless, when scores on factors are estimated for each subject, they are in the majority of cases more reliable than scores on individual observed variables.
References:
- Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4, 272 –299.
- Levine, T. R. (2005). Confirmatory factor analysis and scale validation in communication research. Communication Research Reports, 22(4), 335 –338.
- Park, H. S., Dailey, R., & Lemus, D. (2002). The use of exploratory factor analysis and principal components analysis in communication research. Human Communication Research, 28(4), 562 –577.
- Rubin, R. B., Palmgreen, P., & Sypher, H. E. (eds.) (2004). Communication research measures: A sourcebook. New York: Guilford/Lea.
- Tabachnik, B. G., & Fidell, L. S. (2007). Using multivariate statistics. Boston, MA: Pearson.