Regression Analysis

The essence of scientific research is explaining and predicting relationships among variables. Two or more variables co-vary and are related if their values systematically correspond to each other. In other words, as one value increases or decreases, the other value consistently or systematically increases or decreases. For example, researchers might observe the amount of Internet use increases from younger to older adolescence, leading them to expect a relationship between Internet use and age of adolescents.

As scientists seek to explain phenomena, they employ various empirical measures to express relationships among two or more variables. Correlation is a measure of such relationships. The Pearson product–moment correlation coefficient assesses the magnitude and direction of a relationship between two linear variables, and describes how proportional the values of the variables are to each other (StatSoft 2006). A multiple correlation coefficient does this for three or more variables, such as age, education level, and amount of Internet use. There are similar tests, such as gamma and phi, for relationships among nonlinear, categorical, or rank-order variables.

From a correlation coefficient we might conclude there is a positive and significant relationship between amount of Internet use and age of adolescents. A correlation coefficient ranges from 0.0 (no relationship) to 1.0 (a perfect relationship between the variables’ values). The coefficients can be positive (the variables increase or decrease in unison) or negative (as one variable increases, the other decreases, or vice versa).

Regression And Prediction

Regression is typically used for research designs having one or more continuous independent or predictor variables. Based on correlation, regression moves beyond examining whether a relationship exists between variables to assessing the nature of the relationship (Kerlinger & Pedhazur 1973). Regression analyzes the variability of the criterion or dependent variable based on the information from one or more predictor or independent variables (Pedhazur & Schmelkin 1991), seeking to explain which independent variables best predict the dependent variable. For example, we might try to predict income level from people’s age, experience, and amount of education. Or we might try to predict level of fear from the amount of time people spend watching television and how realistic they feel television content is.

Prediction is the essence of science. Regression analysis seeks to uncover how much one or more independent variables predict the dependent variable. It seeks to explain the dependent variable’s sources of variance, and to answer, “What values in the dependent variable can we expect given certain values of the independent variable(s)?” (Vogt 1993, 192). Good regression models can predict one’s income or one’s level of fear from the predictor variables.

Simple And Multiple Regression

The regression equation involves one or more independent variables. Regression analysis estimates the coefficients of that equation, involving the independent variables, which best predict the value of the dependent variable. The regression equation indicates the nature and proximity of the variables, specifically how well we can predict values of the dependent variable by knowing the values of the independent variable(s) (Vogt 1993). The equation is represented by the regression line, which depicts the relationship between the variables. The sum of squares refers to the deviation or variance of a score from the average score of a distribution; it is fundamental to regression analysis (StatSoft 2006). The regression line or least-squares line is a line on the graph or scatterplot that depicts the lowest sum of squared distances of all data points. We fit our data to the best-fitting straight line based on this least-squares criterion (Blalock 1979).

Simple regression analysis contains one continuous predictor variable. The equation for simple linear regression refers to the regression of Y scores on X scores, or how the dependent variable scores depend on the independent variable scores. The simple regression equation seeking for a design with one predictor variable, X, and one dependent variable, Y, is. Y = a + bX where X is the independent variable score, Y is the predicted dependent variable score, a is the intercept constant (i.e., where the regression line intercepts the Y axis), and b is the regression coefficient (i.e., the change in Y with the change in one unit of X). The simple linear regression equation seeks to uncover how much an independent variable explains or predicts the dependent variable.

Multiple regression analysis contains the simple regression designs for two or more continuous independent variables. The regression equation for a multiple regression design with three predictor variables, X₁, X₂, and X₃, and one dependent variable, Y, is.

Y = a + bX₁ + bX₂ + bX₃

where X₁, X₂, and X₃ are the scores on three independent variables, Y is the predicted dependent variable score, a is the intercept constant, and b is the unstandardized regression coefficient (used with raw scores). The multiple linear regression equation seeks to uncover how two or more independent variables explain or predict the dependent variable. If the regression coefficient b were to be standardized in these equations, it would be represented by β (beta), whereby all variables are standardized to a mean of 0.0 and a standard deviation of 1.0.

Based on the size of each regression coefficient, researchers can compare the contribution of each independent variable for predicting the dependent variable. Multiple R indicates the strength of the relationship. The proportion of explained variance for the predictor or set of predictors is depicted by R² and F is the test of significance of the relationship. If the predictor variables are intercorrelated, such multicolinearity makes it difficult to assess individual predictor contributions to the regression equation.

Multiple regression, then, estimates the separate and collective contributions of two or more independent variables to explaining the dependent variable (Kerlinger & Pedhazur 1973). Multiple regression analysis assesses the relationship between a dependent variable and a set of independent variables, seeking to learn how the continuous independent variables, such as age, level of education, academic performance, and amount of television viewing, explain or predict the dependent variable, such as the amount of Internet use. Or communication researchers might want to learn how, collectively, knowledge, skill, and motivation enhance communication competence, and whether knowledge, skill, or motivation is more instrumental to enhancing communication competence. Once the researchers measure the three predictor variables – knowledge, skill, and motivation – they can assess how the variables, collectively, explain a communicator’s level of competence, and which one, if any, better explains a communicator’s competence. Or, in a typical transaction, a salesperson might want to learn which attribute – price, gas mileage, or reliability – predicts a consumer’s decision to buy an automobile. Once the salesperson gathers the information across many transactions, he or she can learn which attribute is, or which attributes are, better predictors of car purchases.

Additional Considerations

Statistical programs allow researchers to enter the predictors into the regression equation using forward, backward, stepwise, or hierarchical techniques. Depending on the objective, a researcher might choose to enter all predictors simultaneously. Forward entry sequentially adds predictors having the highest correlations with the criterion variable. Backward entry enters all predictors and then removes one at a time based on the weakest significance. Using stepwise regression, the computer selects predictors that add incrementally and significantly to the equation, based on the set tolerance criterion. If the researcher’s goal was to test a communication model, he or she would enter the predictor variables in blocks, hierarchically, according to the sequential steps in the model.

We also can expand the relationships examined by regression analysis to include two or more criterion variables. For example, we might examine how knowledge, skill, and motivation predict communication competence and satisfaction. Or we might analyze how amount and type of television viewing predict distrust and fear. Monge (1980) explains the application of multivariate multiple regression to communication research. In addition, such techniques as binary and logistic regression can be used to expand the manner of how we can examine relationships via regression analysis to include discrete, categorical, and other nonlinear variables (Norusis 1999).

Brief Examples

A few brief examples help illustrate the application of regression analysis in communication research. Sypher and Zorn (1986), for example, used stepwise multiple regression in their organizational study, and found, of four communication-related abilities, cognitive differentiation accounted for the most variance when predicting job level and upward mobility. Those with more developed cognitive abilities tended to be promoted to higher levels in organizations than did those with lesser cognitive abilities.

Ohr and Schrott (2001) used regression analysis to examine determinants of political information seeking in a local German election: social expectations to be politically informed; a personal duty to stay politically informed; a desire to express political orientations by voting; and the entertainment aspect of politics. They found that campaign information seeking can be explained reasonably well by these determinants, especially social expectations to be politically informed.

In the media context, Rubin et al. (1985) used hierarchical multiple regression and found news affinity, perceived news realism, and news-viewing motives predicted parasocial interaction with favorite television news personalities. Those who sought information when viewing the news, and felt news content was realistic and important, developed a greater sense of parasocial interaction with newscasters than their counterparts.

Loges (1994) also used hierarchical multiple regression, and found support for the hypothesis that media dependency relations with newspapers, magazines, radio, and television are more intense the more threatening one perceives the social and natural environment to be. Controlling for demographics, Loges found that threat significantly added to the explained variance in dependency.

Using hierarchical regression, Slater (2003) found that gender, sensation seeking, aggression, and frequency of Internet use contributed to explaining the use of violent media content and violent website content. Alienation from school and family partially mediated the effects of sensation seeking and aggression on using violent Internet content.

Path analysis uses several regression analyses to test the path model, seeking to explain complex directional relationships between independent and dependent variables. Rubin and McHugh (1987), for example, examined an explanatory model of perceived importance of parasocial relationships, moving from television exposure through interpersonal attraction and parasocial interaction to perceived relationship importance. They found that social attraction and parasocial interaction significantly predicted perceived relationship importance.

References:

Blalock, H. M., Jr. (1979). Social statistics, 2nd edn. New York: McGraw-Hill.
Kerlinger, F. N., & Pedhazur, E. J. (1973). Multiple regression in behavioral research. New York: Holt, Rinehart and Winston.
Loges, W. E. (1994). Canaries in the coal mine: Perceptions of threat and media dependency system relations. Communication Research, 21, 5–23.
Monge, P. R. (1980). Multivariate multiple regression. In P. R. Monge & J. N. Capella (eds.), Multivariate techniques in human communication research. New York: Academic Press, pp. 13–56.
Norusis, M. J. (1999). SPSS regression models 10.0. Chicago: SPSS.
Ohr, D., & Schrott, P. R. (2001). Campaigns and information seeking: Evidence from a German state election. European Journal of Communication, 16, 419–449.
Pedhazur, E. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ: Lawrence Erlbaum.
Rubin, A. M., Perse, E. M., & Powell, R. A. (1985). Loneliness, parasocial interaction, and local television news viewing. Human Communication Research, 12, 155–180.
Rubin, R. B., & McHugh, M. P. (1987). Development of parasocial interaction relationships. Journal of Broadcasting and Electronic Media, 31, 279–292.
Slater, M. D. (2003). Alienation, aggression, and sensation seeking as predictors of adolescent use of violent film, computer, and website content. Journal of Communication, 53(1), 105–121.
StatSoft, Inc. (2006). Electronic statistics textbook. Tulsa, OK: StatSoft. Also at www.statsoft.com/ textbook/stathome.html, accessed August 30, 2007.
Sypher, B. D., & Zorn, T. E., Jr. (1986). Communication-related abilities and upward mobility: A longitudinal investigation. Human Communication Research, 12, 420–431.
Vogt, W. P. (1993). Dictionary of statistics and methodology. Newbury Park, CA: Sage. Reification