Meta-Analysis

Meta-analysis is a set of methods and statistical analyses for summarizing the findings of an existing empirical literature. As the name implies, it is a study of studies. It provides a way to do a quantitative literature review that involves cumulating effects across studies. The purpose of a meta-analysis is to ascertain if the findings from a collection of studies investigating some specific issue lead to some consistent result and, if so, to estimate the magnitude of that finding. If not, it serves to reconcile findings that appear to offer mixed support for a hypothesis. Meta-analysis is also useful in identifying the reasons why findings are inconsistent from study to study and to identify theoretically important moderators. Meta-analysis will likely play an increasingly important role in making sense out of social science research.

The Uses Of Meta-Analysis

The value of meta-analysis is particularly apparent when contrasted with the typical narrative review of sustained research on a topic. Due to the nature of social scientific research, the results of different studies investigating the same question will inevitably vary from study to study. Some of this variability is attributable to sampling error. That is, because studies seldom, if ever, include the entire population, estimates drawn from samples will likely depart from population values by some amount, with greater departures possible with smaller samples. Results can also vary across studies because of methodological artifacts. For example, differences in measurement and design validity, induction strength, and measurement reliability affect results. Finally, results can vary from study to study for substantive reasons. That is, for theoretically meaningful reasons, a finding might be stronger in some populations or contexts than in others. Thus, individual and contextual differences can produce differences in results because no two studies are done with exactly the same sample participants, in the same location and context, at the same time, etc. Put differently, known or unknown moderators can exist and cause variation in results from study to study. Because of variability attributable to sampling error, methodological artifacts, and unidentified moderators, it can be difficult to make sense of the findings in a literature through a narrative review. Most often, the collection of existing studies on a communication topic provides a collection of mixed results that makes a coherent account of the findings difficult. So, the existing research investigating some topic often contains studies that appear to lead to contradictory conclusions.

The picture is further clouded by the reliance on null hypothesis significance testing. Analyses show that typical effect sizes are meager in many social science literatures (Richard et al. 2003) and that the statistical power to detect typical effects is low (Cohen 1962). Because of sub-optimal statistical power, true population effects may not always be detected. Given the variance in sample sizes used from study to study, even when consistent effects are observed across a set of studies, some studies would achieve significant results while others would yield nonsignificant findings simply due to the sensitivity of the dichotomous decision rule (significant vs. nonsignificant) to sample size. This makes literatures with consistent results appear inconsistent, and creates the false impression of mixed support. In practice, however, findings actually do vary because of sampling error, methodological artifacts, and substantive moderators, and this variability is further complicated by sub-optimal statistical power. This makes doing head counts of significant and nonsignificant results in the literature a problematic way of assessing the consistency of support a hypothesis has received across studies (Meehl 1978).

Meta-analysis provides a way to overcome a number of these problems. Because results are cumulated across studies, low statistical power is less of an issue. Meta-analysis focuses attention on effect sizes, and relies less on significance testing. The degree to which sampling error explains study-to-study variability is estimated, and corrections for many methodological artifacts are possible. Substantive moderators can also be tentatively identified.

Practice Of Meta-Analysis

Procedure

So, how is a meta-analysis done? Meta-analysis involves several steps. First, the relevant and usable studies investigating a topic are collected. Then, the findings of each study need to be converted to some common metric so that the results can be cumulated. Relevant study features are also coded. Next, an average effect across studies is calculated, and study-to-study variability is examined. Analyses are also done to see if and how coded study features affect results.

The first step is to gather relevant and usable existing studies, which serve as the data for the meta-analysis. Authors of meta-analyses typically will develop criteria for study inclusion. It is important that all the studies included test the same issue or hypothesis. It does not make sense to average across apples and oranges, so to speak. This does not necessarily mean, however, that all studies must have used the same experimental inductions or measures. So long as there is consistency at the conceptual level, operational differences need not be considered cause for exclusion. Operational differences can be coded to determine later if those differences caused heterogeneity of study results. Studies must also report sufficient information so that an effect size can be calculated. Once the criteria for inclusion are determined, a search method is specified. Typical searches will include those studies already known to the authors, the use of relevant search engines and databases, and examination of reference pages of found studies. The goal is to collect all existing research meeting the inclusion criteria.

Once previous studies have been collected and determined to meet the inclusion criteria, the findings from each study need to be converted to a common metric, usually some unit of “effect size.” The most common metrics used in meta-analysis are d and r, where d is the standardized mean difference, and r is the correlation coefficient. So, for each test of a hypothesis in the literature, an effect size (i.e., d or r) is obtained. If the previous studies report effect sizes, this is straightforward. But, unfortunately, many studies do not do so. Fortunately, however, a variety of conversion formulas exist (Rosenthal 1991; Rosenthal et al. 2000; Levine & Hullett 2002; Hullett & Levine 2003). For example, if either sufficient descriptive statistics (e.g., means, standard deviations, and cell sizes) or significance tests with degrees of freedom are reported, effect sizes can often be calculated. The effect sizes collected can be corrected for various methodological artifacts, such as measurement error, restriction in range, artificial dichotomization of continuous measures, and deviations from perfect construct validity (Hunter & Schmidt 1990). It should be noted, though, that some scholars argue against those corrections (Rosenthal 1991).

Once a set of effects has been collected reflecting the findings in the literature, the findings are cumulated and tested for homogeneity of effects. Findings are cumulated simply by averaging, although the average is usually weighted by study sample. This produces an across-study average effect, and this average effect can be considered an estimation of the population effect. The across-study average can be tested to see if it is likely different than zero, using confidence intervals calculated around the average. Because across-study average effects are based on much larger and more diverse samples than any one study, they provide a better and more stable picture of the findings than is obtained from individual studies.

In addition to examining the across-study average effect, meta-analysis considers the dispersion of effects; that is, how much the studies vary from one another. Given how much findings vary from study to study and the sample sizes of those studies, the extent to which the different findings might be attributable to sampling error is estimated. If the dispersion of effects is found to be similar to what would be predicted by sampling error alone, this means that all the findings from all the studies appear to be drawn from the same population, and the set of studies is considered homogeneous. When the results are homogeneous, then the average study effect can be meaningfully interpreted as summarizing the findings of the existing research. Heterogeneous findings mean that the results vary substantially more from study to study than can be attributed to sampling error alone. Heterogeneity can be caused by methodological artifacts in some studies but not others that differentially impact results, theoretically important moderators, outliers, reporting errors, and/or the inclusion of some studies that are not assessing the same issue as other studies. The homogeneity/heterogeneity of effects is often tested with chisquare or Q statistics. If heterogeneity is found, the meta-analyst will attempt to resolve it with a moderator search.

Search For Moderators

When studies are collected, they may be coded for study features. For example, some studies might use students while others might use working adults; some might use self-report measures while others might use open-ended coding. Any identifiable subject, context, or method feature could be coded. Although these features can be coded using a continuous metric (e.g., year of publication), one frequently sees moderators assessed categorically. In the latter case, analyses of the effects of moderators are accomplished simply by sorting effects into groups depending on some coded study feature, then calculating average study effects separately for each feature. If the feature makes a difference, then the average effect size will be different under one condition than under another. This is partial evidence of a moderator. The other key indicator of a useful moderator is that the average variance of the effects within each sub-group is substantially less than the variance of all the effect sizes when treated as a whole. Given these two conditions, the sub-groups are tested again for homogeneity. Ideally, all heterogeneity can be resolved by finding the cause of study-to-study variance, meaning that no further moderators are operating. Once a moderator has been identified with meta-analysis, it is good practice to design an experiment to test the moderator with original research.

Challenges

A number of challenges face meta-analysis. One major challenge is that the results of meta-analysis are no better than the quality of the studies used. For example, if some common bias was evident in all studies of a given topic, then that bias would be reflected in the meta-analysis results and it would be undetectable. A second challenge is a publication bias favoring supportive (often, statistically significant) results. If supportive studies are published and nonsignificant findings are not, then the results of metaanalyses will show an upward bias. A third problem arises from having only small numbers of studies within a research domain. Just as any single study derives more credibility from having larger samples, meta-analyses provide more meaningful results when they incorporate larger numbers of studies representing larger numbers of participants. A meta-analysis of, say, five or six studies each with only 100 participants probably cannot be considered as providing a definitive statement about the nature of effects in a population. Finally, there is the question of what do to with heterogeneous effects. If heterogeneity cannot be resolved with moderator analysis, then it is questionable if average results can be meaningfully interpreted.

References:

Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65, 145 –153.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum.
Hullett, C. R., & Levine, T. R. (2003). The overestimation of effect sizes from F values in metaanalysis: The cause and a solution. Communication Monographs, 70, 52 – 67.
Hunter, J. E., & Schmidt, F. L. (1990). Methods of meta-analysis: Correcting error and bias in research findings. Newbury Park, CA: Sage.
Levine, T. R., & Hullett, C. R. (2002). Eta-squared, partial eta-squared, and misreporting of effect size in communication research. Human Communication Research, 28, 612 – 625.
Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806 – 834.
Richard, F. D., Bond, C. F., & Stokes-Zoota, J. J. (2003). One hundred years of social psychology quantitatively described. Review of General Psychology, 7, 331–363.
Rosenthal, R. (1991). Meta-analytic procedures for social research. Newbury Park, CA: Sage.
Rosenthal, R., Rosnow, R. L., & Rubin, D. B. (2000). Contrasts and effect sizes in behavioral research: A correlational approach. Cambridge: Cambridge University Press.