Detecting dynamical changes in nonlinear time series using locally linear state‐space models

Interest is growing in methods for predicting and detecting regime shifts—changes in the structure of dynamical processes that cause shifts among alternative stable states. Here, we use locally linear, autoregressive state-space models to statistically identify nonlinear processes that govern the dynamics of time series. We develop both time-varying and threshold models. In time-varying autoregressive models with p time lags, AR(p), and vector autoregressive models for n-dimensional systems of order p = 1, VAR(1), we assume that coefficients vary with time. We can infer an approaching regime shift if the coefficients indicate critical slowing down of the local dynamics of the system. In self-excited threshold models, we assume that the time series is governed by two autoregressive processes; the state variable switches between them when the time series crosses a threshold value. We use the existence of a statistically significant threshold as an indicator of alternative stable states. All models are fit to data using a state-space form that incorporates measurement error, and maximum likelihood estimation allows for statistically testing alternative hypotheses about the processes governing dynamics. Our model-based approach for forecasting regime shifts and identifying alternative stable states overcomes limitations of other common metric-based approaches and is a useful addition to the toolbox of methods for analyzing nonlinear time series.


INTRODUCTION
Ecological dynamics may experience abrupt transitions between alternative stable states, often termed regime shifts (Scheffer et al. 2001, Carpenter 2003. Although regime shifts are difficult to predict, recent work has proposed a series of statistical properties that change in predictable ways before a system shifts to an alternate state, and these properties can be used as generic early warning signals (Scheffer et al. 2009, Clarke andSignorino 2010). Theoretically, most of these signals are manifestations of critical slowing down; the return rate of a system towards one of the stable states slows down as a bifurcation is approached at which the system shifts to the alternative state (Wissel 1984, Strogatz 1994. Generic early warning signals v www.esajournals.org 1 June 2012 v Volume 3(6) v Article 58 include a rise in variance (Carpenter and Brock 2011), a decline in recovery rate after a perturbation (van Nes and Scheffer 2007), an increase in autocorrelation (Held and Kleinen 2004), and a peak in skewness (Guttal and Jayaprakash 2008).
An advantage of using these types of generic metrics as early warning signals is that they are relatively simple to measure. A disadvantage, however, is that they are descriptive; they only describe the increasingly slow dynamics around a stable state that is gradually forced towards a regime shift (Scheffer et al. 2009). This leads to at least four handicaps. First, statistical inference is problematic; it is difficult, for example, to statistically test hypotheses about changes in dynamics, such as whether the dynamics are in fact slowing down. Second, it is difficult to interpret changes in the metrics in cases where there are multiple simultaneous changes (Carpenter and Brock 2011). For example, as a regime shift is approached, there could be changes not only in critical slowing down, but also in the location (mean value) of a stable state, and these two patterns need to be separated to obtain a true measure of the return rate to the stable state. Third, simple metrics provide no way to account for measurement error, and variation due to measurement error can muffle early warning signals (Carpenter and Brock 2010). Finally, identifying regime shifts using metrics often requires long records that are not commonly available for ecological systems. Thus, it is no surprise that so far the best empirical examples of successful early warning metrics come from long paleoclimate records (Dakos et al. 2008) or from experimental work where the signals are compared to a control treatment (Drake andGriffen 2010, Carpenter et al. 2011).
Some of these handicaps have been addressed in part by alternative approaches. For example, null models have been used as a way to estimate significance levels for some metrics, such as autocorrelation measured in paleoclimate time series (Dakos et al. 2008). Conditional heteroscedasticity has been proposed as an alternative metric to the variance and can be associated with probability values . Generalized modeling approaches that incorporate multiple sources of data are suggested as ways to directly estimate return rates in cases where data are scarce (Lade and Gross 2012). Nonparametric regression has been used to fit Drift-Diffusion-Jump models to time series where the underlying dynamics are affected by multiple processes (Carpenter and Brock 2011). Still, these approaches require a predefined model, do not explicitly account for measurement error, or need long records.
To try to overcome some of these handicaps, we developed locally linear state-space models that can detect changes through time in the stochastic dynamics of time series. Although this approach can be used to detect many types of changes in dynamics, we specifically use it to detect either critical slowing down of dynamics approaching a regime shift or ''flickering'' of a state variable between alternative states. Conceptually, the approach is a stochastic analogue to performing linear approximations of nonlinear, deterministic systems to infer the local dynamics around stationary points (May 1974). Even for nonlinear deterministic systems, linear stability analysis is informative about qualitative dynamics and bifurcations. In a similar way, the locally linear stochastic approach that we take can be used to infer qualitative changes in nonlinear stochastic systems. Linear stochastic models can give good approximations to the variance and lagged autocorrelations of nonlinear time series and thereby are useful when insufficient information is available to build more-realistic nonlinear models of a system.
We present two specific formulations of locally linear autoregressive state-space models: (1) time-varying autoregressive state-space models of lag order p, TVARSS( p), and (2) self-excited threshold autoregressive state-space models, SE-TARSS( p). Time-varying AR( p) models approximate the local dynamics of a time series by estimating autoregressive coefficients that are allowed to vary through time. This gives information about the ''local stability'' of the system. In addition to univariate TVARSS( p) models, we also develop time-varying vector autoregressive models of order p ¼ 1, TVVARSS(1), that contain more than one state variable. TVVARSS(1) models can be fit to multiple time series simultaneously, assuming time series interact such that the values of one time series directly affect the values of another. SETARSS( p) models are autoregressive models in which the coefficients switch between different parameter sets, with the switch occurring when the state variable v www.esajournals.org 2 June 2012 v Volume 3(6) v Article 58 crosses a threshold value; they are the state-space extension of self-excited threshold autoregressive models, SETAR( p) (Tong 1990). SETARSS( p) models could be used, for example, to explore the existence of alternative stable states, in which the dynamics within each state are approximated by a separate AR( p) model and the time series flickers between the domains of attraction to the different alternative states.

Autoregressive models
To explain the analyses of the time-varying and threshold AR models, it is useful to begin with a brief exposition of the properties of standard AR models. A standard AR( p) model (Box et al. 1994) is where x(t) is the possibly transformed state variable (e.g., species density), b 0 sets the mean (location) of the time series, b i (i ! 1) are the autoregression coefficients, and e(t) is the environmental variability associated with changes in the state variable; e(t) is assumed to be a Gaussian random variable with mean zero and variance r 2 e . Although linear, the AR( p) model is flexible enough to describe many types of complex dynamics. Even for small values of p, an AR( p) can well-approximate the variance and autocorrelations of even highly nonlinear time series (e.g., Ives and Jansen 1998).
Whereas the coefficient b 0 determines the mean of the time series, the coefficients b i (i ! 1) determine the dynamics around the mean. A useful quantity in describing these dynamics is the inverse of the characteristic root (Box et al. 1994), which we denote k. This quantity is similar to the magnitude of the dominant eigenvalue of the Jacobian matrix computed at a stationary point of a deterministic discrete-time model (Strogatz 1994). Values of k near zero imply that the state variable is drawn rapidly towards the mean b 0 , and as values approach one, the central tendency of the time series diminishes (Ives et al. 2003). When k exceeds one, the process is nonstationary; the variance of the process increases continuously through time. This is analogous to the magnitude of the dominant eigenvalue of the Jacobian matrix exceeding one, in which case the stationary point becomes unstable.
The standard AR( p) assumes that the underlying process generating a time series does not itself change. Specifically, the location of the mean b 0 , and the inverse of the characteristic root k (which is determined by the coefficients b i , i ! 1) are time invariant. Nonetheless, the properties of time-invariant AR(p) models have counterparts in time-varying and threshold autoregression models, and can be used to give insights into the dynamics of time-varying and nonlinear processes.

TVARSS(p) models
The general form of the suite of TVARSS( p) models we consider is The top equation is a standard AR( p) model but with coefficients expressed as functions of time, and the bottom equation allows the autoregression coefficients b i (t) (i ! 0) to vary as random walks, with the rates of the random walks dictated by the variances r 2 i of / i (t). Although in principle the values of the autoregressive coefficients are unbounded (since they follow random walks), the values are constrained by fitting to the data. In addition to the process Eq. 2, we assume that there is a measurement equation in which x*(t) is the observed value of the state variable, and a(t) is a Gaussian random variable with mean zero and variance r 2 a depicting measurement error.
The parameters in this model that must be estimated are the initial (t ¼ 0) values of the coefficients b i , the variances in the state variables r 2 e and r 2 i , and the measurement error variance r 2 a . This time-varying model is conditionally Gaussian; conditional on the Gaussian distribution of state variables at time t À 1, the distributions of x(t) and b i (t) are Gaussian at time t. Therefore, the likelihood of parameter v www.esajournals.org 3 June 2012 v Volume 3(6) v Article 58 estimates can be computed exactly using a Kalman filter (Harvey 1989). This also permits likelihood ratio tests for statistical inference about the parameter estimates and likelihoodbased model selection such as Akaike's Information Criterion, AIC (Akaike 1973). In practice, for realistically short time series (,200 points) and higher-order models ( p . 1), the likelihoods are not unimodal ; therefore, caution is required when numerically computing the maximum likelihood parameter estimates.
In likelihood ratio tests, because the possible values of the variances r 2 are constrained to be positive, the asymptotic values of the loglikelihood ratios computed for the full model and the model reduced by q parameters are distributed by a mixture of v 2 distributions (Self andLiang 1987, Stram andLee 1994). For the case in which the variances r 2 i of / i (t) are assumed to be independent, the distribution of the likelihood ratio is asymptotically given by X q k¼0 q k 2 Àq v 2 k (case 9 in Self and Liang 1987). For example, for testing a single parameter (q ¼ 1) as in the case of a TVARSS(1), the test distribution is a 50:50 mix of v 2 0 and v 2 1 distributions; thus, the P-values given by the constrained likelihood ratio test are one-half the values that would be calculated from a standard likelihood ratio test using v 2 1 . The Kalman filter propagates estimates of both the mean and the variance of the state variables (x(t) and b i (t)) (Harvey 1989). This makes it possible to calculate the time-dependent confidence intervals of k. For the case of TVARSS(1) models, k(t) ¼ b 1 (t), and therefore the distribution of the estimates of k(t) is Gaussian. For TVARSS( p . 1) models, k(t) depends in a nonlinear way on b i (t) (i ! 1), and therefore the distribution of its estimate is not Gaussian. Nonetheless, we can approximate the distribution of the estimate numerically to give confidence intervals; here, we report the 67% confidence intervals that coincide with standard errors (which give the 67% confidence intervals for Gaussian distributions).
Numerous modifications and constraints can be added to the model. For example, as formulated above, the variance parameters r 2 i are unbounded above, so that large estimates that allow rapid changes to the coefficients b i are possible. Instead, an upper limit on r 2 i could be imposed to force the coefficients to vary smoothly through time. Also, the model assumes that there is measurement error (r 2 a . 0) and estimates this simultaneously with the other parameters. If the magnitude of measurement error is known from independent information, then this estimate could be used, thereby alleviating the need to estimate r 2 a . This will be particularly advantageous if the measurement error changes through time or according to the values in the time series.
Two related decisions need to be made: selecting the time step of the time series if it is very long, and whether to detrend the data. The state-space structure of the model allows data points to be unequally spaced by iterating the process equations (e.g., Eq. 2) without updating the Kalman filter with the measurement equations (e.g., Eq. 3) (Harvey 1989). Therefore, for short time series with unequally spaced samples, it is best to formulate a model with unequally spaced sample points rather than select a longer iteration interval to create equally spaced points. For very long time series, it might be advantageous to subsample the data, especially if the measurement error is known to be high, because in this case variation between sequential points will largely be caused by measurement error. As for detrending, this will often be unnecessary, because the mean location of the time series, b 0 (t), is allowed to vary through time. Thus, detrending is performed as part of the estimation of the TVARSS model, with the degree of detrending determined by the maximum likelihood estimate of r 2 0 . Nonetheless, if there are strong trends in the data, then simple linear or quadratic detrending can be used (e.g., fitting a line x(t) ¼ c 0 þ c 1 t and analyzing the residuals). We recommend against using a smoothing algorithm, because it is unclear how this might affect the estimates of the parameters of the TVARSS model. Decisions like these about how to formulate the TVARSS, and how to transform and initially process the data, will generally depend on the data set (Clarke and Signorino 2010).

TVVARSS(1) models
The time-varying vector autoregressive statespace model, TVVARSS(1), is closely related to v www.esajournals.org 4 June 2012 v Volume 3(6) v Article 58 TVARSS( p) models, although multiple time series can be analyzed simultaneously. If X(t) denotes the n-dimensional vector of state variables x i (t) (i ¼ 1, . . . , n), then , and E(t) is the n-dimensional Gaussian random variable with mean zero and covariance matrix R e . The coefficients b i,j (t) are assumed to follow random walks with rates given by the variances The state variables X(t) are assumed to be measured with error We only consider TVVARSS(1) processes, rather than TVVARSS( p . 1). Although there are no technical restrictions limiting the approach to p ¼ 1, for most applications p ¼ 1 is likely to be sufficient. Numerous constraints can further be imposed on the model, such as assuming that environmental variability and/or measurement error are independent between state variables (i.e., R e and/or R a are diagonal). As with the TVARSS( p) model, the Kalman filter can be used to compute the likelihood, allowing maximum likelihood estimation and statistical inference.

SETARSS(p) models
The self-exciting threshold autoregressive state-space model, SETARSS( p), assumes that there are two possible AR( p) models governing dynamics, with the possibility that the state variable switches between them when it crosses a threshold. Specifically, where the coefficients b i and b i 0 (i ¼ 0, . . . , p) denote separate sets of coefficients; this is the SETAR model (Tong 1990) that assumes no measurement error. Here, we assume that there is measurement error that does not differ between the two AR( p) systems and is therefore given by Eq. 3. As with the time-varying models, the SETARSS( p) is conditionally Gaussian, and therefore the Kalman filter can be used to compute its exact likelihood. In addition to the two sets of autoregression parameters b i and b i 0 , parameters to be estimated are the threshold c, the variance of the process error r 2 e , and the variance of the measurement error r 2 a .

Data simulations
To illustrate the methods, we used two stochastic simulation models that produce different types of dynamics. The first, the singlespecies harvesting model of May (1977), produces two alternative stable points over a range of values of the parameter that governs the harvesting rate, a. We use this model to produce two data sets. In the first, we increased the value of a so that one of the two stable points collapses at a fold bifurcation; this provides a test for whether the TVARSS( p) models can detect the impending collapse before it occurs. In the second, we set a at a value giving alternative stable points and added sufficiently high environmental variability that the population jumps between the domains of attraction to the two stable points; this provides a test of the SETARSS( p) models to identify flickering between alternative stable states. The second model we used is the Nicholson-Bailey predator-prey model (Nicholson and Bailey 1935) with prey density dependence that undergoes a Neimark-Sacker bifurcation (discrete-time version of a Hopf bifurcation). Reducing the strength of prey density dependence leads to a deterministic stable limit cycle. The simulated data set consisting of only the density of prey provides a test for TVARSS( p) models to identify a regime shift to greater population cyclicity, and the data set consisting of both prey and predator densities provides a similar test for the TVVARSS(1) model.
May harvesting  of May (1977) describes the discrete-time changes in density of a population N(t) as where a gives the harvesting effort, the carrying capacity K sets the strength of density dependence, r is the intrinsic rate of increase, and H and Q determine how the harvesting rate depends on N(t). We assume that there is environmental variability given by the Gaussian random variable e N (t) that has mean zero and variance r 2 N . In addition, we assume that there is lognormal measurement error, so that the observed density N*(t) is where a N (t) is a Gaussian random variable with mean zero and variance r 2 a . Over a range of values of a, the May harvesting model has alternative stable points. If the population is initially in the domain of attraction to the upper stable point, increasing a will eventually lead to the annihilation of this point and a rapid drop in density to the lower stable point through a fold bifurcation (Fig. 1A). To test  May (1977) shows alternative stable states (green lines) over a range of values of the harvesting rate, a, which in the simulation is increased through time from a ¼ 1 (t ¼ 0) to a ¼ 3 (t ¼ 400). The stochastic time series initially stays within the domain of attraction to the upper stable point, but as time (and a) increases, the upper stable point collapses, and the time series drops to the lower, persisting stable state through a fold bifurcation. Other parameters are r ¼ 0.75, K ¼ 10, P ¼ 3, H ¼ 1, r N ¼ 0.4, and r a ¼ 0.2. (B) The harvesting model of May (1977) shows flickering between the domains of attraction to two stable points. In contrast to (A), in (B) no parameter changes through time, and the switches between domains of attraction are cause by the stochasticity of the system. Green lines give the alternative stable states. Parameter values are r ¼ 0.75, K ¼ 10, a ¼ 2, P ¼ 3, H ¼ 1, r N ¼ 0.2, and r a ¼ 0.1. (C) The Nicholson-Bailey predator-prey model shows cyclic dynamics, and as the carrying capacity of the prey, K, decreases through time from 12 (t ¼ 0) to 30 (t ¼ 400), the cycles become more pronounced. The dominant eigenvalue of the Jacobian matrix around the stable point (red line) increases through time, reaching 1 at about t ¼ 300 corresponding to a Neimark-Sacker bifurcation. With increasing K, the prey density decreases (black line) while that of the predator increases (blue line), with the deterministic stationary point given by the green lines. Other parameter values are r ¼ 0.75, k ¼ 0.1, r N ¼ 0.1, r P ¼ 0.1, and r a ¼ 0.05. v www.esajournals.org 6 June 2012 v Volume 3(6) v Article 58 the ability of TVARSS( p) models to detect this regime shift, we truncated the data when it reached the midpoint between the upper and lower stable points immediately before the collapse of the upper stable point.
When there is a fixed value of a giving two stable points, and when the environmental variance r 2 N is large, the population jumps between the domains of the two stable points (Fig. 1B). The underlying existence of the stable points is suggested because densities linger within each of the domains of attraction for successive time steps, generating flickering between the two stable points. We used this simulated data set to test whether the SE-TARSS( p) model could detect the existence of these alternative stable points.
Nicholson-Bailey predator-prey model.-The Nicholson-Bailey model (1935) is a simple model of predator-prey interactions in which predators search for prey randomly, and those prey that are attacked develop into predators in the following generation. Because there is a one-to-one match between prey killed and predators produced, this model is suited for parasitoid-host systems in which parasitized hosts give rise to the next generation of parasitoids. The population dynamics of prey N(t) and predators P(t) are given by where prey have intrinsic rate of increase r and experience density dependence with carrying capacity K. The proportion of prey that escape predation is exp[ÀkP(t)], the zero term in a Poisson distribution with mean kP(t), and so the proportion killed is 1 À exp[ÀkP(t)]. Both prey and predator experience environmental variability incorporated as independent Gaussian random variables e N (t) and e P (t) that have means zero and variances r 2 N and r 2 P . Furthermore, both prey and predator densities are assumed to be measured with error, so that their observed densities N*(t) and P*(t) are given by N Ã ðtÞ ¼ NðtÞexp½a N ðtÞ P Ã ðtÞ ¼ PðtÞexp½a P ðtÞ: As K increases, prey density dependence weakens, and the population dynamics show greater cyclicity. Increasing K eventually leads to a Neimark-Sacker bifurcation in which the stable point of the deterministic skeleton of Eqs. 9 (i.e., the equations stripped of environmental stochasticity) converts into a stable limit cycle. In the fully stochastic model, however, there is no marked transition in dynamics at this bifurcation ( Fig. 1C), as the environmental variability makes cycles persist before the bifurcation is reached; nonetheless, increasing K leads to greater intrinsically driven cycles. For analyses, we truncated the data at the bifurcation point. We tested TVARSS( p) models using only the prey density and the TVVARSS(1) models with both prey and predator densities.

TVARSS(p) models
We fit TVARSS( p) models with p ¼ 1, 2, and 3 to the data simulated from the May harvesting model (Fig. 1A); the data were log-transformed, normalized, and truncated immediately before the collapse of the stable point. The TVARSS(1) model (Fig. 2) fit the data best according to the AIC (DAIC ¼ 0.04 and 0.47 for p ¼ 2 and 3, respectively). We then fit TVARSS( p) models in which only the estimate of the mean of the AR process, b 0 (t), was allowed to vary through time (r 2 i ¼ 0 for i ! 1). The log-likelihood ratio between the two models with p ¼ 1 was 3.59, implying that the model allowing the coefficient b 1 to vary was statistically significantly better (0.5v 2 0 þ 0.5v 2 1 ¼ 7.18, P , 0.004). A similar test for the difference between the models with and without time variation in b i (i ! 1) was significant for both p ¼ 2 (0.25v 2 0 þ 0.5v 2 1 þ 0.25v 2 2 ¼ 6.56, P , 0.02) and 3 (0.125v 2 0 þ 0.375v 2 1 þ 0.375v 2 2 þ 0.125v 2 3 ¼ 8.40, P , 0.02). We also estimated the inverse of the characteristic root, k, of the TVARSS(1) model and its 67% confidence interval (standard errors) at each point in the time series from the estimates of b 1 (t). As the time v www.esajournals.org 7 June 2012 v Volume 3(6) v Article 58 series approached the collapse of the upper stationary point, the inverse of the characteristic root approached and exceeded 1 (Fig. 2). Values of k approaching 1 imply critical slowing down, since the tendency of x(t) to return towards the mean is lowered. Values of k . 1 imply loss of stationarity, with x(t) being pushed away from the mean value given by b 0 (t).
In this analysis, the confidence interval of the estimate of k includes 1 as the regime shift is closely approached at the end of the time series. Therefore, if we were to ask whether the prediction of an impending regime shift is statistically significant, we would conclude no. This is not surprising, however, because the regime shift has not yet occurred, so the statistical power to identify a loss of stationarity at the final point in the time series is low. At the same time, however, there is strong statistical support for time-varying autoregression coefficients, and the estimate of k from these coefficients increases towards nonstationarity. In a real data set, we would use this as evidence that an imminent regime shift is possible, although without detailed knowledge of the nonlinearities of the system, we could not ascribe a degree of certainty to this prediction. Because the confidence bounds on k are fairly wide, any conclusion about whether the time series has actually become nonstationary must be made cautiously; while the best (i.e., maximum likelihood) estimates from the time series imply nonstationarity, considerable uncertainty in the nonstationarity still exists. Unfortunately, the system will not stay in a nonstationary state very long and, if there is a regime shift, it will occur quickly. Therefore, from a management perspective, if there is a way to mitigate a regime shift, a manager will not have the leisure to wait for statistical significance.
These results are given only for a single, example time series. A full exploration of the method requires detailed analyses of simulated data. We are strong advocates of simulation studies, yet simulation studies are most informative when they are tailored for specific data sets that a researcher has in hand; the researcher can  (1) is simply jb 1 (t)j, which is plotted through time with its 67% confidence interval (dashed lines). Parameter estimates are: initial b 0 ¼ 0.737, initial b 1 ¼ 0.241, r e ¼ 0.558, r a ¼ 0.328, r 1 ¼ 0.039, and r 2 ¼ 0.016, and the log likelihood is À288.85.
v www.esajournals.org 8 June 2012 v Volume 3(6) v Article 58 then produce simulations that mimic the real data to explore the power and possible bias of a given statistical method. To illustrate the type of analyses we think should routinely be performed, we simulated the May harvesting model to generate 500 simulated data sets under the same parameter values as Fig. 2. For each simulated data set, we estimated parameters of a TVARSS(1) under the null hypothesis H 0 that coefficient b 1 did not change (r 2 1 ¼ 0) and under the hypothesis H 1 that b 1 (t) varied through time (Fig. 3). In only 212 of the 500 simulated data sets was H 0 rejected at the a ¼ 0.05 level. Thus, the statistical power of the test for this particular model is not high. In no case was the estimate of b 1 under H 0 greater than 1 (Fig. 3A); for this model, jb 1 (t)j equals the inverse of the characteristic root k, so no data set was identified as nonstationary when analyzed under the null hypothesis. In contrast, when fit under H 1 , 36% of the estimates of k were greater than 1 at the last time point in the data set. For those 212/500 data sets for which H 0 was rejected, 68% had estimates of k . 1 at the last time point, suggesting that the time series was at a regime shift. Even for data sets that did not have the final values of k . 1, for the data sets for which H 0 was rejected, 98% had a value of k at the last time point greater than the value at the first time point, indicating that the time series was becoming less stationary and suggesting an approaching regime shift. Our intent in presenting these simulations is to illustrate the value of simulations when assessing the results of these or any methods of analyzing time series; however, because the power and performance of our methods will depend on the length of data sets and the strength of signal within them, we would not expect the results from our simulations to be representative of other applications.
We also performed simulations to test whether the TVARSS( p) models might give false positives (type I errors), rejecting the null hypothesis (no change through time) when it is in fact true. We simulated 500 data sets under the null hypothesis by fixing a ¼ 1 in Eq. 7, and then fit each data set to a TVARSS(1) model and performed a likelihood ratio test against the null hypothesis that b 1 is time-invariant. The null hypothesis was rejected (the time-varying model was significantly better) in 10/500 data sets, giving a incorrectly low rejection rate of 0.02 at the a-level of 0.05. Thus, in this scenario the TVARSS(1) model did not reject as many data sets as it should have, suggesting low power. In many situations, it is statistically preferable to err on the side of false negatives than false positives, so that we do not infer patterns that in fact are not real. However, in the case of regime shifts, erring on the side of false positives is probably more often preferable; if the consequences of a regime shift are sufficiently great, missing a warning should be avoided. Therefore, we suggest that researchers do not take as dogma an a-level of 0.05 but instead select a (higher) a-level consistent with the risks of a regime shift, and then perform simulations using a model tailored to their specific system to assess possible type I and II errors.
In addition to simulations of the May harvesting model, we investigated whether TVARSS( p) models could identify changes in the dynamics leading up to the Neimark-Sacker bifurcation in the predator-prey model (Fig. 1C) using only the time series for the prey. The TVARSS(2) model which allowed all coefficients b i to vary through time was selected over the models with p ¼ 1 (DAIC ¼ 212) and p ¼ 3 (DAIC ¼ 0.97). The TVARSS(2) allowing the coefficients b i (t) (i ! 1) to vary provided a statistically better fit than the TVARSS(2) model that treated these coefficients as fixed (0.25v 2 0 þ 0.5v 2 1 þ 0.25v 2 2 ¼ 12.98, P , 0.001), implying that the dynamics of the time series changed through time. Performing the same comparison for the pair of TVARSS(3) models similarly rejected the hypothesis that the coefficients b i (i ! 1) were constant (0.125v 2 0 þ 0.375v 2 1 þ 0.375v 2 2 þ 0.125v 2 3 ¼ 13.97, P , 0.001); we did not perform this comparison for the TVARSS(1) models, because their fits to the data were poor, presumably because they could not capture the cyclicity in the data. The timevarying estimate of the inverse of the characteristic root, k, exceeded one as the Neimark-Sacker bifurcation was approached, suggesting loss of stationarity of the AR(2) process (Fig. 4). Although the 67% confidence intervals of the estimates of k were fairly narrow, never did the lower bound exceed 1, and therefore never was it significantly greater than 1 even at the a ¼ 0.33 level.
In addition to the TVARSS( p) models, we also applied the TVVARSS(1) model using both prey and predator time series (Fig. 5). For the fitting, we assumed that both R e and R a were diagonal. Fig. 4. Illustration of a TVARSS(2) model fit to simulated data for prey from the Nicholson-Bailey predator-prey model (Eqs. 9 and 10). The carrying capacity of the host, K, increases through time to cause a Neimark-Sacker bifurcation at about t ¼ 300 (Fig. 1C). (A) Differences between the fitted trajectory (blue line) and the simulated data (black dots) are attributed in the model to measurement error. The green line gives the time-varying estimate of b 0 (t) from TVARSS(2). (B) The inverse of the characteristic root for the AR(2) is computed from b 1 (t) and b 2 (t), which is plotted through time with its 67% confidence interval (dashed lines). Parameter estimates are: initial b 0 ¼ 0.096, initial b 1 ¼ 0.961, initial b 2 ¼À0.429, r e ¼ 0.302, r a ¼ 0.253, r 0 ¼ 0.0094, r 1 ¼ 0.027, and r 2 ¼ 0.026, and the log likelihood is À206.92. The model in which the autoregression coefficients b i,j (t) (i, j ! 1) were allowed to vary fit the two time series much better than the model keeping these coefficients fixed ( P 4 k¼0 ð 4 k Þ2 À4 v 2 k ¼ 57.7, P , 0.0001). The statistical strength of this result was much greater than the case in which only the single time series for the prey is analyzed (Fig. 4); this is not surprising, because the predator data set added more information. As found for the analysis of only the prey time series, the inverse of the characteristic root k increased through time, approaching 1 close to the Neimark-Sacker bifurcation, although the lower bound of the 67% confidence interval never exceeded 1. The estimate of k was very similar between the TVARSS(2) and TVVARSS(1) models (compare Figs. 4B and 5B).

Potential advantages of locally linear state-space models
The approaches that we have presented using locally linear state-space models to anticipate regime shifts and to identify alternative stable states have potential advantages over existing metric-based methods. The existence and magnitude of these advantages will vary according to which methods are compared and what types of data are analyzed. We have not tested whether these advantages are realized in specific situa- Fig. 5. Illustration of a TVVARSS(1) model fit to the two time series simulated from the Nicholson-Bailey predator-prey model (Eqs. 9 and 10). The carrying capacity of the host, K, increases through time to cause a Neimark-Sacker bifurcation (Fig. 1C). (A) Differences between the fitted trajectory (blue line) and the simulated data (black dots) are attributed in the model to measurement error. The green lines give the time-varying estimates of b 1,0 (t) and b 2,0 (t). (B) The inverse of the characteristic root for the TVVARSS(1) is computed from b 1,1 (t), b 1,2 (t), b 2,1 (t) and b 2,2 (t) (Box et al. 1994), which is plotted through time with its 67% confidence interval (dashed lines). Parameter estimates are: initial b 1,0 ¼ 0.0132, initial b 2,0 ¼ À0.198, initial b 1,1 ¼ 0.467, initial b 1,2 ¼ 0.472, initial b 2,1 ¼À0.0827, initial b 2,2 ¼ 0.880, r e,1 ¼ 0.388, r e,2 ¼ 0.314, r a ¼ 0.0862, r 1,0 ¼ 0.0080, r 2,0 ¼ 0.0080, r 1,1 ¼ 0.0202, r 1,2 ¼ 0.0160, r 2,1 ¼ 0.0000, and r 2,2 ¼ 0.0153, and the log likelihood is À255.52.
v www.esajournals.org 11 June 2012 v Volume 3(6) v Article 58 tions, but instead list them as speculations. Many of these potential advantages are shared with other model-based approaches (Clarke and Signorino 2010, Livina et al. 2010, Carpenter and Brock 2011. We strongly recommend that multiple methods be applied to any data set, and that simulations be performed to compare the potential advantages of the methods in the context of the data set. A first potential advantage is that the approaches we have presented are model-based, with the analyses producing a structurally simple linear autoregressive model with statistically fitted parameters. Autoregressive models can avoid challenges and subjective decisions that are inherent in metric-based approaches, such as deciding whether and how to filter or detrend data, and choosing the best rolling window for calculating the metric (Dakos et al. 2008), as both filtering and rolling window size may have a strong influence on whether a metric predicts a regime shift (Lenton 2011, Lenton et al. 2012. For time-varying AR( p) and VAR(1) models (Eqs. 2, 4), detrending is performed de facto by allowing b 0 to change through time (although simple detrending can also be used; see Methods). The rate at which b 0 changes, and hence the effective magnitude of detrending, is given by the maximum likelihood estimates of the parameter r 2 0 and therefore is determined as part of the overall fitting procedure. Thus, the approach gives statistical descriptions of the different properties of the time series (changes in the location b 0 , changes in the dynamics around b 0 given by b i , i ! 1). How these properties are used to infer regime shifts is then left up to the researcher. For example, in some models (e.g., Fig. 1A) the deterministic equilibrium of the process changes more rapidly as a regime shift is approached, and a researcher could decide to use this information to infer a regime shift. This carries the risk, however, that changes in the location are caused by some environmental trend that has no effect on the dynamics and therefore should not be used to infer a regime shift. Our approach has been to use only information about the dynamics around the (possibly temporally varying) mean of the process to infer a regime shift. Second, our model-based approaches uses maximum likelihood estimation, thereby admitting a full range of statistical inference, including hypothesis testing, confidence intervals on coefficients, and model selection. Statistical inference makes it possible to directly compare between alternative hypotheses, such as whether or not Fig. 6. Illustration of a SETARSS(1) model fit to a time series simulated from the harvesting model of May (1977) in which all parameters are fixed and stochasticity causes the population x(t) to switch between domains of attraction to two stable points (Fig. 1B). (A) Differences between the fitted trajectory (blue line) and the simulated data (black dots) are attributed in the model to measurement error. The green lines give the estimates of b 0 (t) and b 0 '(t), and the yellow line gives the threshold c which separates the two AR(1) processes. Parameter estimates are: the coefficients that govern dynamics in a an AR( p) vary through time. For metric-based approaches, the only way to test whether trends in a metric are statistically significant is to perform simulations, comparing the change in the metric computed from the data with the distribution of changes in the metric computed from data simulated under the null hypothesis that the dynamics of the system do not change (i.e., non-bifurcating systems) ).
The difficulty with this approach is that it is often not clear how to formulate an appropriate null model to be used to simulate data, and the chances of rejecting the null hypothesis will depend on decisions made about how the null model is constructed. For example, a natural null model might be a time-invariant AR(1), yet this would be inappropriate for data with cyclic dynamics (e.g., Fig. 1C) for which an AR( p) model would be more appropriate. Furthermore, the presence of measurement error would make an ARMA( p,p) a better null model than an AR( p) (Staudenmayer andBuonaccorsi 2005, Ives et al. 2010). In our model-based approach, determining the appropriate null model is more straightforward and statistically coherent, because we can compare the same model with and without time-varying coefficients, or with and without a threshold between alternative stable states. Third, as a state-space model, measurement error can be included as either user-defined input or, as we have illustrated here, a parameter estimated in the model. Measurement error introduces short-term (non-autocorrelated) variation to time-series data, yet critical slowing down by definition involves autocorrelated variation. In effect, in our models measurement error is smoothed out of the time series during the fitting with the Kalman filter (Harvey 1989). Factoring out measurement error should better reveal the long-term patterns of variation, leading to fewer false negatives in the identification of impending regime shifts. While this in general is likely to be true, measurement error could have more complicated effects, especially if the magnitude of measurement error varied with the values in the time series or showed systematic changes through time; these complications would best be handled by, if possible, obtaining independent estimates of the measurement error and incorporating these directly into the model (i.e., not estimating measurement error in the time-series analysis). Metric-based approaches do not account for measurement error, which is likely to lead to false negatives in the identification of early warning signals (Carpenter and Brock 2010).
Fourth, our approach should have good statistical power, because it incorporates the full information about the time series. This being said, the power of any statistical method will depend on the length and the underlying signal of a time series. As we illustrated (Fig. 3), power analyses can and should be conducted to assess the results of any analyses, and to compare the power of multiple approaches in the context of the data set in hand.

DISCUSSION
In this paper, we illustrated how locally linear state-space models can be used to measure changes in dynamics prior to regime shifts and to identify alternative stable states in time series. We showed that a rise in the dominant eigenvalue of a fitted time-varying autoregressive model, TVARSS( p), can be used as an early warning signal in time series approaching a regime shift through a fold or a Neimark-Sacker bifurcation (Figs. 2,4,5). We also used a threshold autoregressive model, SETARSS( p), to identify alternative states in a time series that flickers between alternative stable points (Fig. 6). These model-based approaches have several potential advantages over metric-based approaches that currently dominate analyses of time series for regime shifts and alternative stable states.
Although the approaches we presented are model-based, they are nonetheless simple, general models of time series. If sufficient information exists about a system, a potentially better strategy is to fit system-specific, nonlinear, biologically explicit models in order to investigate the possibility of alternative states (e.g., Ives et al. 2008, Schooler et al. 2011. This typically requires time series that are rich in dynamical patterns taken from systems that are biologically well-understood. In the absence of such requirements, or when performing a comparative analysis of the dynamics of many time series (e.g., Turchin and Taylor 1992, Murdoch et al. 2002, simpler, non-systemv www.esajournals.org 13 June 2012 v Volume 3(6) v Article 58 specific models are appropriate.
Although locally linear state-space models have advantages over existing metric-based approaches, they are not immune to problems that infect all approaches for detecting early warning signals (Scheffer et al. 2009, Hastings and Wysham 2010, Carpenter and Brock 2011. State-space models cannot distinguish between strong nonlinear changes in dynamics that lead to regime shifts from similarly strong nonlinear changes that do not. Furthermore, they will be difficult to apply to very short or very noisy time series. As there is no universal solution to the challenges of analyzing complex time series, multiple approaches should still be applied when trying to detect regime shifts.
Although we have presented locally linear state-space models as tools to identify regime shifts and alternative stable states, this is only a subset of the problems to which they can be applied. The time-varying models should be capable of identifying any type of change in the dynamical characteristics of time series, and the threshold models could be formulated to identify regions of state space that differ dynamically. Thus, they may prove useful additions to the general arsenal of tools currently available for time-series analyses.