Time-series analysis is a statistical procedure for describing the characteristics of one time series (e.g., a trend) or predicting the future development of one time series (forecasting), but it can also be used to analyze the impact of an event on a single time series (intervention analysis) and to analyze the correlations between two or more different time series (cross-correlations, transfer-function analysis). The first two applications are usually called the univariate perspective of time-series analysis; the second two are known as the multivariate perspective of time-series analysis.
Types Of Time-Series Analysis
The univariate perspective can be compared to descriptive statistical analyses of one variable and its values for all elements of a sample distribution. Yet with time-series analysis the values of a certain variable are organized consecutively, i.e., in a time order. For example, one can ask 365 people (n = 365) about the time they spend watching television and get one value for each person, i.e., 365 different values altogether. Or one can ask just one person about the time he or she spends watching television on 365 consecutive days (t = 365). Again, one gets 365 values, but these are in a sequence and refer to one person (n = 1). The multivariate perspective of time-series analysis can be compared to calculating correlations – with two variables – or to regression analysis – with more than two variables – as known from descriptive statistics. Yet in timeseries analysis all variables are time series. So, a correlation refers to a first sequence of values of variable Xt being temporally related to a second sequence of values of variable Zt.
Besides this classification according to statistical procedures, time-series analysis has to be categorized according to designs and methods. With respect to design, time-series analysis can be described as a special form of longitudinal design. As in the television-viewing example, asking 365 persons at one point of time would be a cross-sectional study. But when asking one person on 365 consecutive days, one measures longitudinally. With respect to methods, a time series can be obtained by applying a survey, a content analysis, or an observation. This can be illustrated by some examples, which also help to explain the term “time series” and related concepts. Data in a time series are ordered sequentially. For instance, one can conduct a content analysis and count the number of newspaper articles about foreign affairs on each day. If there is one article on the first day (t1:z1 = 1), three on the second (t2:z2 = 3), four on the third (t3:z3 = 4), and two on the fourth (t4:z4 = 2), one obtains the time series “daily number of articles on foreign affairs,” which has the value sequence {1, 3, 4, 2, . . . }. This time series Zt = {z1, z2, z3, . . . zn} measures or represents daily values of the variable Z. If the time span between the observations is longer – say, a month – then one obtains a monthly time series. The span between points of time is called the time lag. The time span, e.g., between t1 and t1 is lag 0, while the time span, e.g., between t4 and t5 is lag +1 and that between t18 and t17 is lag –1. The length of a time series is represented by the number of data points, i.e., time points. If one asks a person Z on 100 consecutive days about his or her media use, for instance, the daily time series “media use of person Z” has length T = 100, so the time series Zt has 100 data points.
ARIMA Models In Univariate Time-Series Analysis
In social sciences, both describing and forecasting a singular time series usually refer to so-called ARIMA models, introduced by George E. Box and Gwilym M. Jenkins. Thus, this type of time-series analysis is also called the Box–Jenkins method. The term ARIMA stands for three different stochastic processes: AR is the abbreviation for autoregressive processes, I for integrated processes, and MA for moving-average processes. Other processes, such as the Wiener process or Markov chains, are not included in the Box–Jenkins method since most time series in social sciences can be described by autoregressive, integrated, and moving-average processes or models quite sufficiently.
The idea of describing a time series with an ARIMA model can be illustrated by the example of a single person’s emotional state. Person Z may be in a good mood on some days and in a bad mood on others. Thus, his or her emotional state develops in the form of a daily time series Zt. It is quite likely that the moods on consecutive days are not totally independent of each other. For instance, person Z’s mood on a current day (t0) may depend on his or her mood yesterday (t−1). In this case, one would say that the time series Zt is autocorrelated with a first-order autoregression (AR = 1). If not only Zt−1 has an impact on the present-day mood, but also the moods from the day before yesterday (Zt−2), then there is also a second-order autoregressive element (AR = 2) in the time series Zt. A model for this time series would be denoted by ARIMA (2,0,0). Autoregressions can be compared to conventional regression analysis. The only difference is that in time-series analysis the predictor is not another variable Z but the time series itself in a so-called lagged form. For instance, one predictor for Zt may be Zt−1, another predictor may be Zt−2, and so on. In terms of the above-mentioned example, the current mood of person Z (Zt) can be explained by his or her mood on previous days (Zt−n).
If there were one integrated element in a time series, the shortcut would be ARIMA (0,1,0). This model can mostly be considered the logical equivalent to an autoregressive model. Next to autoregressive and integrated elements, an empirical time series can have moving-average characteristics, which can be described by MA models. With the example of a person’s mood development, moving averages represent random influences on a time series. For instance, a current mood may be due to some event that cannot be predicted. The sequence of such random events is called white noise, which is a special form of a random walk and is denoted by at. But not just current random events, but also random events from yesterday – written as at−1 – can have an impact on a time series Zt. This is called a first-order moving average (MA = 1). With a delay of more than one lag (e.g., lag −2), one obtains a second-order moving average (MA = 2). The models for these two random impacts are abbreviated using ARIMA (0,0,1) and ARIMA (0,0,2). The crucial difference between autoregressive elements and moving-average elements in an empirical time series is the quality of the past impact. Autoregression stands for the impact of past values Zt−1 or Zt−2 . . . Zt−n while moving average represents the impact of past random events at−1 or at−2 . . . at−n.
Times series in social sciences can be described by these three models of stochastical processes. This is comparable to regression analysis. A regression model or regression line is a model of the correlation of two variables X and Y. Similarly, a stochastic process, e.g., an autoregressive process, can be conceived as a model for the inherent momentum of an empirical time series Zt. Usually a time series Zt has elements or characteristics of not just one, but two or three, of these stochastic processes. For instance, a time series Zt that can be described by an ARIMA (1,0,1) model has characteristics of a first-order autoregressive as well as of a first-order moving average process.
The general form of an ARIMA model is written as ARIMA (p, d, q), where p, d, and q represent the order, i.e., number of autoregressive, integrated, and moving-average elements of the empirical time series. After having described an empirical time series by an ARIMA model, one obtains the residuals at. As in regression analysis, the residuals should be kept to a minimum. The equation at = Zt − ARIMA (p, d, q) expresses the fact that the empirical observations – the time series – and the expected values – the ARIMA model for the time series – should match “perfectly” in the sense of residuals being a random walk, i.e., a white noise process. In other words, the residuals at should be a sequence of totally uncorrelated random events.
Describing a time series by an ARIMA model is quite similar to the second form of univariate time-series analysis, namely forecasting. Here, the empirical values of a time series are used for predicting future, but yet unknown values of the time series. One application may be the prediction of unemployment. Both ARIMA modeling and forecasting afford a time series length of at least 30 data points – the more, the better. And in both cases one should also look out for seasonal components. For instance, unemployment time series often have spring or winter peaks, which can be described by seasonal ARIMA models.
Multivariate Methods
Among the multivariate methods in time-series analysis, are intervention analysis, crosscorrelation, and transfer-function analysis. In intervention analysis, one examines the impact of an event on a singular time series Zt. For instance, September 11 certainly had an impact on the amount of articles about terrorism, which can be modeled as a time series. Following the attacks, coverage on terrorism increased tremendously and was constantly on a higher level than before. In intervention analysis the event itself is modeled like a time series (It), but in the form of a dummy code. For instance, 0 (“no event”) is assigned to all data points before the event and 1 (“event”) is assigned to all other data points. Then a correlation is calculated for the event series It (e.g., September 11) and the time series Zt (e.g., number of articles on terrorism).
In cross-correlation, one does not examine the impact of an event, but the influence, e.g., of an assumed independent time series Xt on an assumed dependent time series Zt. In the first step, one usually estimates the ARIMA model for each time series separately. In the second step, one calculates so-called cross-correlations for the residuals obtained by the first step. Cross-correlations can be compared to conventional correlations. Yet one does not only calculate “synchronous” correlations between the two time series residuals (lag 0), but also “diachronous” correlations (lag ±1, ±2 . . . ). This requires three steps. First, one correlates the residuals of Zt with the lagged residuals of Xt, i.e., with Xt−1, Xt−2, and so on. Second, one correlates the residuals of Xt with the lagged residuals of Zt, i.e., with Zt−1, Zt−2, and so on. Finally, one calculates synchronous correlations. With this procedure one detects the causal logic between Xt and Zt. A significant correlation between Xt−2 and Zt, for instance, indicates that Zt is certainly not the predictor of Xt, but Xt is likely to be the predictor of Zt. Furthermore, one can see that this impact has a delay of two lags (e.g., of two days).
In transfer-function analysis, not just one time series (Xt), but several time series (e.g., Xt, Yt) serve as predictors for an assumed dependent time series (Zt). Here, an overall model represents the causal logic and time relations between all variables (e.g., Xt, Yt, and Zt). Thus, a transfer-function analysis resembles a multiple regression analysis in many ways.
References:
- Box, G. E. P., Jenkins, G. M., & Reinsel, G. C. (1994). Time-series analysis: Forecasting and control, 3rd edn. Englewood Cliffs, NJ: Prentice Hall.
- Franses, P. H. (1998). Time series models for business and economic forecasting. Cambridge: Cambridge University Press.
- Gonzenbach, W. J. (1996). The media, the president, and the public opinion: A longitudinal analysis of the drug issue, 1984–1991. Mahwah, NJ: Lawrence Erlbaum.
- McCleary, R., & Hay, R. A. (1986). Applied time-series analysis for the social sciences. Beverly Hills, CA: Sage.
- Ostrom, C. W., Jr. (1983). Time-series analysis: Regression techniques, 10th edn. Beverly Hills, CA: Sage.
- Wei, W. W. S. (1990). Time-series analysis. Redwood City, CA: Addison Wesley.