Bounding reproductive rates in state-space models for animal population dynamics

. Time-series models applied in the study of animal population dynamics commonly assume linearity on the log-scale, leading to log-normally distributed rates of increase. While this is often computationally convenient, in particular when performing statistical inference in the presence of observation error, it may lead to unrealistic predictions for animals with a limited reproduction. We introduce a model that includes an explicit bound on the reproductive rate of an individual, and apply this to a population time series of ungulates in Kruger National Park, South Africa. Due to observational error, the year-to-year increases in animal counts occasionally exceeded the maximal reproductive rate of the animals. In such cases, the traditional unbounded model showed a tendency of over ﬁ tting data, leading to unrealistic predictions of the underlying population increase. An observed increase above the maximal reproductive rate also provides empirical con ﬁ rmation that observation error exists. The model with an explicit bound was able to utilize this in order to separate observational error from population process noise, which the traditional unbounded model was unable to do. We conclude that enforcing a strict upper bound on the reproductive rate of an animal population model may lead to more realistic statistical inference than commonly applied log-linear models when an explicit bound on reproductive rate is known. We further conclude that introducing a bound on reproduction can greatly assist in separating observational error and population process noise for slow life histories, or more generally, when the rate of sampling is high compared to reproductive rates.


INTRODUCTION
Analyses of population time series are one of the most fundamental tools in the study of population dynamics and in population viability analyses. Time-series analyses are used in many ecological studies with aims varying from estimating population densities (Moore and Barlow 2011), density dependence (Dennis et al. 2006), effects of environmental variables to extinction risks and population viability (Holmes et al. 2007).
A common approach to modeling population time-series data is to start with a discrete time deterministic population model, such as an exponential growth model, a Ricker model, or a Gompertz model (Br€ annstr€ om and Sumpter 2006). To account for variation, the deterministic model is converted to a stochastic model by incorporating noise often called process error (Dennis et al. 1991), representing environmental or demographic variation (Engen et al. 1998). Environmental process error is, with a few exceptions, incorporated by adding a Gaussian error term to the log of the annual growth rate yielding a lognormal distribution for the process error at the original scale (Dennis et al. 1991, Clark andBjørnstad 2004). Demographic process error is typically incorporated either by adding a Gaussian error term to the log annual growth rate, scaled by the square root of population size (Saether et al. 2006), or using a Poisson or binomial distribution for the number of births and survival events. Environmental variation is also modeled by introducing environmental covariates such as rain, climate, or other factors thought to affect population growth rates (Stenseth et al. 2002, Knape andde Valpine 2011). This is often done in the same way as for process error by adding the covariates to the log annual growth rate, although the effects of environmental covariates on other parameters of population models have sometimes been considered (Stenseth et al. 2002(Stenseth et al. , M ansson et al. 2007.
A consequence of assuming a log-normal distribution for the process error and adding covariates at the scale of log-transformed annual growth rates is that some of the annual growth rates suggested by a fitted population model can be unrealistically large, even if the estimated value of the average growth rate is realistic. The log-normal distribution is right-skewed and heavy-tailed, implying that occasional large deviations from the mean growth rate are likely to occur in the annual realized growth rates, with potential implications for population projections, viability analyses, and inferences about parameters of population models. This property of the log-normal distribution becomes more pronounced with increased variance in the process error.
Estimating the magnitude of process error from time-series data is non-trivial since observational errors also are present. Accounting for observational errors is vital for many types of inference from time-series data such as estimating density dependence and population viability (Freckleton et al. 2006, Holmes et al. 2007). Statespace models incorporate both sources of errors and are often used with the aim of separating between them and to try to filter out observation error from the inference about population dynamics (Millar andMeyer 2000, de Valpine andHastings 2002). In practice, it is however often difficult to achieve a good separation (Dennis et al. 2006, Knape 2008. For some organisms, such as many large mammals, there can be strong biological information about to what extent a population can grow during a limited period of time, but this information is rarely used in analyses of data. Some previous studies have explored incorporating prior knowledge in the intrinsic growth rate parameter of population models, that is, the expected mean growth rate on the log-scale when the population is small (Delean et al. 2013, Lebreton andGimenez 2013), but those models may still predict unrealistic annual growth rates. Here, we suggest a simple modification of the Gompertz state-space model that makes use of biological information to place a bound on annual realized growth rates rather than on the intrinsic growth rate. The effect of bounding growth rates on forecasts and parameter estimates, and specifically on the separation between process and observation error, is demonstrated in applications to data on ungulate populations in the Kruger National Park (Redfern and Viljoen 2002).

POPULATION MODELS
A state-space model has two components: a model for the hidden state, in our context an index of the unobserved population abundance; and a model for the observation process. Denoting abundance at time t by N t , a popular model in discrete time is the stochastic Gompertz model. Here (1) t = 1, . . . , T, for a process z t of environmental variables and Gaussian noise with e t zero mean and variance r 2 . On the log-scale, X t ¼ logN t then follows an autoregressive model with covariates (ARMAX) which is computationally convenient. Since we can rarely monitor N t directly, it is commonly assumed that the sampling method results in an index Y t , often the logarithm of animal counts, related to logabundance as Y t ¼ X t þ e t through a Gaussian sampling error e t with mean zero and variance s 2 . Some organisms, such as large mammals, often have a sharp upper limit in their ❖ www.esajournals.org reproductive rates. For example, if fertile females in a closed population produce at most one juvenile per season, k t ¼ N tþ1 =N t is clearly bounded from above by k max ¼ 1 þ p, where p is the population proportion of fertile females. Such a constraint is difficult to incorporate in a model like (1) due to the nature of the exponential function. Failure to do so may lead to unrealistic predictions of N t .
In order to investigate the effect on statistical inference of incorporating a known upper limit, we construct an alternative model where the exponential function is replaced by a function that is bounded from above by k max . A convenient choice here is the logistic function, implying that the logit rather than the log links the rate k t to its linear form. This approach of transformation from a bounded interval to the real line has a long history in the statistical literature (Dyke and Patterson 1952, Naylor 1963, Atchison and Shen 1980, in particular for data that arise as proportions.
The logistic function, logistic , is the sigmoid function decreasing to zero as x approaches À1 and increasing to L as x approaches 1. Here, the parameter b > 0 controls the steepness of the curve and a the location of its midpoint. Its use in this paper should not be confused with its rather different and traditional appearance in population ecology through Verhulst's logistic model of population growth. In Verhulst's model, it forms a bound on a population's size through carrying capacity L ¼ K. Our aim is instead to implement a bound L ¼ k max on the population's reproductive rate. In agreement with (1), we do so by simply replacing the exponential with the logistic, where e t is again Gaussian noise with zero mean and variance r 2 . Since our main aim in this paper is to compare the models defined in (1) and (2), we use a particular parametrization of the logistic function in (2). Rather than the standard choice of a ¼ 0 and Þ . This parametrization ensures that the exponential and logistic functions have the same value and first derivative at x ¼ 0, making parameters b 0 ; b 1 ; b 2 ; r 2 À Á comparable across models. It also ensures that the carrying capacities of the two models are the same. When refering to the logistic function in this paper, we hence refer to the version defined by It is clear from Fig. 1 that within a range of a 15% rise or decline, which should cover all but the most extreme fluctuations for most populations, the rate functions are very similar when k max ¼ 1:5. As k max increases, the logistic function parameterized as in (3) will approach the exponential function so that the Gompertz model is recovered in the limit of model (2) when the remaining parameters are held fixed.
In the following, we will refer to model (1) as the exponential rate model and model (2), with the logistic function parameterized as in (3), as the logistic rate model.

APPLICATION
Our approach is illustrated using data on ungulates in Kruger National Park, previously analyzed using exponential rate models by Ogutu andOwen-Smith (2003) andM ansson et al. (2007). Data consist of animal counts in four regions of the park (labeled South, Central, North, and FarNorth) from 1977 to 1997, and details of the study can be found in Redfern and Viljoen (2002) and Ogutu and Owen-Smith (2003).
We have analyzed seven of the species covered by the census: waterbuck (Kobus ellipsiprymnus), giraffe (Giraffa camelopardalis), Burchell's zebra (Equus burchelli), greater kudu (Tragelaphus strepsiceros), blue wildebeest (Connochaetes taurinus), impala (Aepyceros melampus), and sable antelope (Hippotragus niger). In this paper, we report results for the giraffe and the waterbuck as examples of species with high and low detection probabilities, respectively. Redfern and Viljoen (2002) report a detection probability of~50% for the waterbuck. Results for the other species are found in Appendix S1. Both Ogutu andOwen-Smith (2003) andM ansson et al. (2007) find that dry-season rainfall is the main driving force of environmental process variation; hence, we include this as our environmental variable z t in models (1) and (2).
Regional census totals are shown in Fig. 2 together with dry-season rainfall. The waterbuck in particular show a decreasing trend in response to the lower precipitation from 1988 onward.

Statistical model
An advantage of modeling the reproductive rate using an exponential function is that X t ¼ log N t ð Þ follows a Gaussian autoregressive process which allows the use of standard tools, for example, the Kalman filter, for computing the likelihood. We choose instead to follow a Bayesian approach, fitting the models using Markov chain Monte Carlo (MCMC) methods implemented in JAGS (Plummer 2003) run through the package rjags (Plummer 2016) within the R computing environment (R Development Core Team 2016). The Bayesian approach has the advantage of, given a suitable choice of prior parameter distributions, providing a unified way of presenting probabilistic measures of uncertainty through posterior distributions. This serves our purpose well, since our main aim is to illustrate how the use of an explicit upper bound on rates of increase can influence uncertainty about other parts of the process.
We model populations at the regional level, where different species develop independently of each other conditionally on the environmental variable. For a particular species, we further assume independence across regions conditionally on the values of species-specific parameters b 0 ; b 1 ; b 2 ; r; s ð Þ . In summary, if we denote the logcount of a given species in region i at time t by y it , we assume that where e it $ N 0; s 2 À Á independently over indices and for the Gompertz or exponential rate model for the proposed logistic rate model; with z it denoting dry season rainfall in region i year t, e it $ Nð0; r 2 Þ independently over indices, and finally the logistic function parametrized as in (3) given a known upper bound k max on the reproductive rate of the species. All parameters are given vague prior distributions (see Appendix S1). As a further simplification, we start each population process at its first observed value, N it i0 ¼ y it i0 . Since we are mainly interested in comparing the exponential and logistic rate models, we have kept the remaining model details as simple as possible while still including a continuous covariate represented by dry-season rainfall. M ansson et al. (2007) consider the same model structure as our exponential rate model, but with a wider range of covariates and higher order time lags of density dependence. Also Ogutu and Owen-Smith (2003) try a wider range of covariates, but they only model park totals and their models do not include a measurement error.

Bounds on reproductive rates
A true upper bound on reproduction depends, among other factors, on proportion of reproductive females, litter sizes, gestation periods, and juvenile survival. With the exception of the giraffe, which has a gestation period of 15 months, reproduction for the species under study is ❖ www.esajournals.org mainly seasonal with only rare occasions of twins. We base our maximal reproductive rates on summary population structure statistics in Owen-Smith and Mason (2005). They report average number of juveniles (p j ), yearlings (p y ), and adult males (p m ) per adult female across the full census. This is used to estimate proportion of adult females as p ¼ 1= 1 þ p j þ p y þ p m À Á , and set k max ¼ 1 þ p. For the giraffe, we assume the number of subadults to equal three times the number of two-year-olds, since data are only available on the latter. Further, the maximal rate is corrected for the giraffe's long gestation period as k max ¼ 1þ p Â 12=15. Since for all species under study, estimated upper bounds range between 1.4 and 1.5, we finally decide to use a common value of k max ¼ 1:5.

RESULTS
Posterior densities for all species and parameters are available in Appendix S1: Tables S1-S7 and Figs. S1-S14, and we here focus on our key findings. In general, with the exception of r and s, posterior parameter distributions were very similar across models.

Separating process noise and observation error
A key challenge in state-space modeling lies in separating process noise and observation error. As noted by, for example, Dennis et al. (2006), the likelihood function often has local maxima along r ¼ 0 and s ¼ 0, and profile likelihoods for r; s ð Þ tend to be ridge-shaped due to the difficulties in separating the two contributions to variation. For the giraffe, this poses no problem and both approaches give similar posterior densities for r; s ð Þ, clearly bounded away from zero (see Appendix S1: Fig. S2). For the waterbuck, on the other hand, the log-normal model shows a ridgeshaped density and fails to confirm that observation error exists (Fig. 3). This is also observed for the kudu (Appendix S1: Fig. S4), blue wildebeest (Appendix S1: Fig. S5), and to a lesser extent, Burchell's zebra (Appendix S1: Fig. S3).

Adapting to rapid changes in data
In 1984, 301 waterbuck were counted in the FarNorth region. The next year, almost twice as many, 599, were counted. This is far above the maximal reproduction, and given that also the other regions increase over the period, much of the increase could be attributed to sampling error rather than migration. As noted by Redfern and Viljoen (2002), detection probabilities can differ greatly between years due to variations in vegetation cover. Fig. 4a shows the posterior distribution for the total population index under the two models; here, the logistic rate model predicts a smoother curve, while the exponential rate model seems to overfit data resulting in an unrealistic growth rate in 1984-1985. For the giraffe, the results from the two models are difficult to distinguish (Fig. 4b). In Fig. 5, the corresponding posterior densities for the rates of increase of the waterbuck population from 1984 to 1985 and 1985 to 1986 are plotted; here, the former shows a substantial posterior probability mass on values above 1.5 for the exponential rate model.

Response to rainfall
The log-linear form of the exponential model makes it sensitive to extreme values in covariates. Taking rainfall as an example, this will often be Year Giraffe abundance b Fig. 4. Posterior median Central region population index for the waterbuck (a) and giraffe (b) together with 50 and 90 % credible intervals for exponential rate model (left frame) and logistic rate model (right frame). Actual counts superimposed as black dots. beneficial for population growth up to a point of saturation, after which additional rain has little effect. Fig. 6 shows the posterior distribution of the rate function k as a function of dry-season rainfall. In the data used for fitting the model, annual precipitation does not exceed 200 mm. In 1973 however, 279 mm was recorded in the South region. This illustrates the dangers of extrapolation in the exponential rate model, leading to an 80% posterior predictive probability of exceeding the maximal rate 1.5.

Prediction
Predicting future population sizes is one of the main applications of time-series models in ecology. Failure to take upper bounds on reproduction into account may lead to overly optimistic predictions. We illustrate this issue for the waterbuck population using historical rainfall data for a low-precipitation and a high-precipitation scenario. In Fig. 7(a), we have assumed rainfall following the last year of the census to be identical to what was observed in 1981-1987, which corresponds to high precipitation. Not surprisingly, the exponential rate model gives much wider prediction intervals and puts larger posterior probability masses on high population increases. For the low-precipitation scenario corresponding to historical data from 1988 to 1994, the difference is smaller in Fig. 7(b). Note that predictions may still lead to unrealistically high population densities due to, for example, ignoring a regulation mechanism affecting the population at high levels. In such cases, our bound on the rate of increase can be combined with a bound on population density as in Staples and Taper (2006).

DISCUSSION
Standard stochastic population models used to analyze time-series data on abundance do not put any restriction on realized growth rates, allowing estimated annual growth rates to become unrealistically large. We have modified a Gompertz state-space model so that annual growth rates are forced to be smaller than a maximum growth rate determined from biological knowledge (Popescu et al. 2016). By excluding biologically unrealistic growth rates, the model can aid in the separation of process variance and observation variance, one of the key challenges in the application of state-space models to timeseries data (Dennis et al. 2006, Knape 2008, and reduce variance in projections of future population sizes, as used in, for example, population   6. Posterior median and interquartile range of the reproductive rate of waterbuck, starting at an average level and as a function of dry-season rainfall for exponential and logistic rate models. Superimposed at the bottom is a histogram of observed dry-season rainfall, all years and regions. viability analyses. Intuitively, we gain information about the magnitude of observation errors from the positive changes in the index that are larger than would be possible with growth rate equal to k max , which helps in improving predictions and in estimation of other parameters. The model will be most useful for organisms where k max is not far larger than 1. That includes mainly relatively slow life histories and larger organisms (Savage et al. 2004), but the model could also be relevant for faster life histories if the rate of sampling is high. It is not uncommon for point estimates of the process error standard deviation at the log-scale to be around 0.2 or larger, and confidence or credible intervals extending well beyond that. Under a log-normal distribution, a process error standard deviation of 0.2 at the log-scale yields a realized per capita growth 20% larger than the deterministic growth rate in around 18% of the time steps. At a process standard deviation of 0.3, the realized per capita growth rate would be 30% larger than the deterministic growth rate in on average 21% of the time steps. Thus, for organisms with small k max , there Year Waterbuck abundance index Fig. 7. Posterior predictive waterbuck median abundance together with 50 and 90% credible intervals for the two models. In (a), corresponding to the high-precipitation scenario observed in 1981-1987; and in (b), corresponding to the low-precipitation scenario observed in 1988-1994. can be a high risk of growth rates estimated under an exponential rate model to become unrealistic, particularly if they occur at population sizes below the carrying capacity with a consequential risk of overestimating the resilience of the population. For organisms where the maximum annual growth is far larger than 1 (say k max > 2), such as in many smaller organisms, there will typically be little difference between our model and an exponential rate model. Our model can therefore be seen as a generalization of the Gompertz statespace model that is useful for slowly growing organisms or at high rates of sampling.
For simplicity, we have used here a hard limit for k max . However, maximal growth rates can be difficult to estimate and it may be warranted to include uncertainty in k max (Meyer et al. 1986). Since our model is cast in a Bayesian framework, it is conceptually straightforward to extend the approach by incorporating uncertainty through an informative prior for the parameter k max . In the absence of prior information about k max , it may seem tempting to use our model to try to estimate k max from the data. We do not believe that such an approach would be viable as estimating the parameters of the Gompertz statespace model is already challenging without this extra parameter, and there is little information about k max in the survey data.
An alternative approach to bounding reproductive rates would be to incorporate environmental variation in the carrying capacity of a non-linear model such as a Ricker population model (e.g., Ruokolainen et al. 2009). That is, using a model of the form N t ¼ N tÀ1 exp r 1 À N t = Kexp e t ð Þ À Á À Á À Á where e t is normally distributed. For this model, growth rates will not exceed exp r ð Þ and it has been suggested in the context of modeling environmental variation in the carrying capacity but appears to be rarely used in practice. We are not aware of any attempts to use it in a state-space model context or to separate process error from observation error, but if environmental covariates are added to the log-scale of the carrying capacity and biological knowledge is used to inform the value of r, we expect it could also serve to improve parameter estimates and predictions in a state-space model framework. One advantage of our approach however is that it works also in the absence of density dependence, and it could easily be used in combination with simple density-independent PVA models for small populations (Dennis et al. 1991).
To conclude, we have presented a simple method for more realistic modeling of annual growth rates in population state-space models. The method can improve population projections and parameter estimation for populations with slow life histories. The only downside is that the model is computationally more complicated to fit than an exponential rate Gompertz model. However, with today's computational techniques, such as Markov chain Monte Carlo via JAGS used here, model fitting should not be a major obstacle to implementation of the method.