Time-Series Analysis

Time-series analysis is a statistical procedure for describing the characteristics of one time series (e.g., a trend) or predicting the future development of one time series (forecasting), but it can also be used to analyze the impact of an event on a single time series (intervention analysis) and to analyze the correlations between two or more different time series (cross-correlations, transfer-function analysis). The first two applications are usually called the univariate perspective of time-series analysis; the second two are known as the multivariate perspective of time-series analysis.

Types Of Time-Series Analysis

The univariate perspective can be compared to descriptive statistical analyses of one variable and its values for all elements of a sample distribution. Yet with time-series analysis the values of a certain variable are organized consecutively, i.e., in a time order. For example, one can ask 365 people (n = 365) about the time they spend watching television and get one value for each person, i.e., 365 different values altogether. Or one can ask just one person about the time he or she spends watching television on 365 consecutive days (t = 365). Again, one gets 365 values, but these are in a sequence and refer to one person (n = 1). The multivariate perspective of time-series analysis can be compared to calculating correlations – with two variables – or to regression analysis – with more than two variables – as known from descriptive statistics. Yet in timeseries analysis all variables are time series. So, a correlation refers to a first sequence of values of variable X_t being temporally related to a second sequence of values of variable Z_t.

Besides this classification according to statistical procedures, time-series analysis has to be categorized according to designs and methods. With respect to design, time-series analysis can be described as a special form of longitudinal design. As in the television-viewing example, asking 365 persons at one point of time would be a cross-sectional study. But when asking one person on 365 consecutive days, one measures longitudinally. With respect to methods, a time series can be obtained by applying a survey, a content analysis, or an observation. This can be illustrated by some examples, which also help to explain the term “time series” and related concepts. Data in a time series are ordered sequentially. For instance, one can conduct a content analysis and count the number of newspaper articles about foreign affairs on each day. If there is one article on the first day (t₁:z₁= 1), three on the second (t₂:z₂ = 3), four on the third (t₃:z₃= 4), and two on the fourth (t₄:z₄ = 2), one obtains the time series “daily number of articles on foreign affairs,” which has the value sequence {1, 3, 4, 2, . . . }. This time series Z_t = {z₁, z₂, z₃, . . . z_n} measures or represents daily values of the variable Z. If the time span between the observations is longer – say, a month – then one obtains a monthly time series. The span between points of time is called the time lag. The time span, e.g., between t₁and t₁is lag 0, while the time span, e.g., between t₄and t₅is lag +1 and that between t₁₈and t₁₇is lag –1. The length of a time series is represented by the number of data points, i.e., time points. If one asks a person Z on 100 consecutive days about his or her media use, for instance, the daily time series “media use of person Z” has length T = 100, so the time series Z_t has 100 data points.

ARIMA Models In Univariate Time-Series Analysis

In social sciences, both describing and forecasting a singular time series usually refer to so-called ARIMA models, introduced by George E. Box and Gwilym M. Jenkins. Thus, this type of time-series analysis is also called the Box–Jenkins method. The term ARIMA stands for three different stochastic processes: AR is the abbreviation for autoregressive processes, I for integrated processes, and MA for moving-average processes. Other processes, such as the Wiener process or Markov chains, are not included in the Box–Jenkins method since most time series in social sciences can be described by autoregressive, integrated, and moving-average processes or models quite sufficiently.

The idea of describing a time series with an ARIMA model can be illustrated by the example of a single person’s emotional state. Person Z may be in a good mood on some days and in a bad mood on others. Thus, his or her emotional state develops in the form of a daily time series Z_t. It is quite likely that the moods on consecutive days are not totally independent of each other. For instance, person Z’s mood on a current day (t₀) may depend on his or her mood yesterday (t−₁). In this case, one would say that the time series Z_t is autocorrelated with a first-order autoregression (AR = 1). If not only Z_t−₁ has an impact on the present-day mood, but also the moods from the day before yesterday (Z_t−₂), then there is also a second-order autoregressive element (AR = 2) in the time series Z_t. A model for this time series would be denoted by ARIMA (2,0,0). Autoregressions can be compared to conventional regression analysis. The only difference is that in time-series analysis the predictor is not another variable Z but the time series itself in a so-called lagged form. For instance, one predictor for Z_t may be Z_t−₁, another predictor may be Z_t−₂, and so on. In terms of the above-mentioned example, the current mood of person Z (Z_t) can be explained by his or her mood on previous days (Z_t−_n).

If there were one integrated element in a time series, the shortcut would be ARIMA (0,1,0). This model can mostly be considered the logical equivalent to an autoregressive model. Next to autoregressive and integrated elements, an empirical time series can have moving-average characteristics, which can be described by MA models. With the example of a person’s mood development, moving averages represent random influences on a time series. For instance, a current mood may be due to some event that cannot be predicted. The sequence of such random events is called white noise, which is a special form of a random walk and is denoted by a_t. But not just current random events, but also random events from yesterday – written as a_t−₁ – can have an impact on a time series Z_t. This is called a first-order moving average (MA = 1). With a delay of more than one lag (e.g., lag −2), one obtains a second-order moving average (MA = 2). The models for these two random impacts are abbreviated using ARIMA (0,0,1) and ARIMA (0,0,2). The crucial difference between autoregressive elements and moving-average elements in an empirical time series is the quality of the past impact. Autoregression stands for the impact of past values Z_t−₁ or Z_t−₂ . . . Z_t−_n while moving average represents the impact of past random events a_t₋₁ or a_t₋₂ . . . a_t₋_n.

Times series in social sciences can be described by these three models of stochastical processes. This is comparable to regression analysis. A regression model or regression line is a model of the correlation of two variables X and Y. Similarly, a stochastic process, e.g., an autoregressive process, can be conceived as a model for the inherent momentum of an empirical time series Z_t. Usually a time series Z_t has elements or characteristics of not just one, but two or three, of these stochastic processes. For instance, a time series Z_t that can be described by an ARIMA (1,0,1) model has characteristics of a first-order autoregressive as well as of a first-order moving average process.

The general form of an ARIMA model is written as ARIMA (p, d, q), where p, d, and q represent the order, i.e., number of autoregressive, integrated, and moving-average elements of the empirical time series. After having described an empirical time series by an ARIMA model, one obtains the residuals a_t. As in regression analysis, the residuals should be kept to a minimum. The equation a_t = Z_t − ARIMA (p, d, q) expresses the fact that the empirical observations – the time series – and the expected values – the ARIMA model for the time series – should match “perfectly” in the sense of residuals being a random walk, i.e., a white noise process. In other words, the residuals a_t should be a sequence of totally uncorrelated random events.

Describing a time series by an ARIMA model is quite similar to the second form of univariate time-series analysis, namely forecasting. Here, the empirical values of a time series are used for predicting future, but yet unknown values of the time series. One application may be the prediction of unemployment. Both ARIMA modeling and forecasting afford a time series length of at least 30 data points – the more, the better. And in both cases one should also look out for seasonal components. For instance, unemployment time series often have spring or winter peaks, which can be described by seasonal ARIMA models.

Multivariate Methods

Among the multivariate methods in time-series analysis, are intervention analysis, crosscorrelation, and transfer-function analysis. In intervention analysis, one examines the impact of an event on a singular time series Z_t. For instance, September 11 certainly had an impact on the amount of articles about terrorism, which can be modeled as a time series. Following the attacks, coverage on terrorism increased tremendously and was constantly on a higher level than before. In intervention analysis the event itself is modeled like a time series (I_t), but in the form of a dummy code. For instance, 0 (“no event”) is assigned to all data points before the event and 1 (“event”) is assigned to all other data points. Then a correlation is calculated for the event series I_t (e.g., September 11) and the time series Z_t (e.g., number of articles on terrorism).

In cross-correlation, one does not examine the impact of an event, but the influence, e.g., of an assumed independent time series X_t on an assumed dependent time series Z_t. In the first step, one usually estimates the ARIMA model for each time series separately. In the second step, one calculates so-called cross-correlations for the residuals obtained by the first step. Cross-correlations can be compared to conventional correlations. Yet one does not only calculate “synchronous” correlations between the two time series residuals (lag 0), but also “diachronous” correlations (lag ±1, ±2 . . . ). This requires three steps. First, one correlates the residuals of Z_t with the lagged residuals of X_t, i.e., with X_t−₁, X_t−₂, and so on. Second, one correlates the residuals of X_t with the lagged residuals of Z_t, i.e., with Z_t−₁, Z_t−₂, and so on. Finally, one calculates synchronous correlations. With this procedure one detects the causal logic between X_t and Z_t. A significant correlation between X_t−₂ and Z_t, for instance, indicates that Z_t is certainly not the predictor of X_t, but X_t is likely to be the predictor of Z_t. Furthermore, one can see that this impact has a delay of two lags (e.g., of two days).

In transfer-function analysis, not just one time series (X_t), but several time series (e.g., X_t, Y_t) serve as predictors for an assumed dependent time series (Z_t). Here, an overall model represents the causal logic and time relations between all variables (e.g., X_t, Y_t, and Z_t). Thus, a transfer-function analysis resembles a multiple regression analysis in many ways.

References:

Box, G. E. P., Jenkins, G. M., & Reinsel, G. C. (1994). Time-series analysis: Forecasting and control, 3rd edn. Englewood Cliffs, NJ: Prentice Hall.
Franses, P. H. (1998). Time series models for business and economic forecasting. Cambridge: Cambridge University Press.
Gonzenbach, W. J. (1996). The media, the president, and the public opinion: A longitudinal analysis of the drug issue, 1984–1991. Mahwah, NJ: Lawrence Erlbaum.
McCleary, R., & Hay, R. A. (1986). Applied time-series analysis for the social sciences. Beverly Hills, CA: Sage.
Ostrom, C. W., Jr. (1983). Time-series analysis: Regression techniques, 10th edn. Beverly Hills, CA: Sage.
Wei, W. W. S. (1990). Time-series analysis. Redwood City, CA: Addison Wesley.