Predicting residence time using a continuous-time discrete-space model of leatherback turtle satellite telemetry data

,


INTRODUCTION
With the advancements in telemetry technologies, animal movement data have been collected with increasing duration, resolution, and accuracy (Hooten et al. 2017). These telemetry data provide opportunities for resource selection studies (Johnson et al. 2008b), which examine the complex interactions between animal populations and environmental processes. The high dimensionality of modern animal movement data, however, calls for development of efficient computational methods and the ability to incorporate both static (e.g., bathymetry) and dynamic (e.g., sea surface temperature) environmental drivers that can influence animal movements. For aquatic and exploited species, insights from such studies can in turn enable dynamic ocean management, spatiotemporally varying management based on the incorporation of near real-time data (Howell et al. 2008, Maxwell et al. 2015, Hazen et al. 2016. Telemetry data provide valuable animal movement information, but are characterized by many challenging statistical properties, such as non-normal measurement errors (Jonsen et al. 2005), temporal auto-correlation (Fleming et al. 2015), and unbalanced and presence-only sampling design (Aarts et al. 2008). Modern statistical models include state-space model approaches to model measurement errors and dynamics of the movement process (Jonsen et al. 2005), continuous-time models of unbalanced sampling (Johnson et al. 2008a), and agent-based or velocity-based models (Hooten et al. 2010, Hanks et al. 2011). These approaches, however, tend to use customized Markov chain Monte Carlo (MCMC) algorithms for statistical inference, which are computationally demanding, and generally focus on statistical inference of individual-level movement (Hooten et al. 2016). With the increasing availability of telemetry observations from multiple individuals, it is natural to generalize this individual-level inference to the population level, although there are the added challenges of individual-level or sampling variability (Aarts et al. 2008).
Telemetry data are increasingly being used in species distribution models (Hazen et al. 2016(Hazen et al. , 2018. Although terrestrial studies tend to use environmental variables based on static geographic features at a fixed resolution, this may not adequately represent the environmental conditions experienced by marine species within a dynamic seascape (Hidalgo et al. 2016). Thus, it is important to consider movements of highly migratory marine species in relation to the contemporaneous environment they are experiencing, particularly for highly dynamic features (Mannocci et al. 2017).
In order to account for the various sources of uncertainty and complexity inherent within telemetry data, Hooten et al. (2016) and Hanks et al. (2015) proposed a Bayesian hierarchical approach to statistical inference of populationlevel movement. This hierarchical approach incorporates uncertainties from multiple sources, such as inaccuracy of location observations, process variability from stochastic movement processes, and sampling variability among individuals. In addition, this approach can be formulated in the classical generalized linear mixed model framework. Although previously implemented with static environmental variables, we generalized the modeling approach to incorporate dynamic environmental drivers, essential to understanding and predicting the spatiotemporal distribution of highly migratory species. We utilized and modified this statistically robust approach to study dynamic resource selection using a large telemetry dataset of Eastern Pacific leatherback turtles (Dermochelys coriacea). The "ctmcmove" R package (Hanks 2018) was generalized to achieve the input of dynamic environmental drivers (see Data S1). Thus, this dynamic framework can be applied to other populations and species tracked through a dynamic seascape to predict motility and residence time based on changing environmental conditions.

Leatherback turtle data
Adult females were tagged with Argos satellite transmitters at nesting beaches in Mexico (n = 1) and Costa Rica (2003-2008n = 42;Shillinger et al. 2008, Bailey et al. 2012b. Two juvenile leatherbacks caught in the Peruvian driftnet fishery were released with Argos tags (2014). Additional data from Mexiquillo, Cahuit an, and Agua Blanca, Mexico (n = 24;1993-2003; Playa Grande, Costa Rica (n = 8;1992-1995; and Peru (n = 2; 2014-2015) were not included within the final analyses due to substantial gaps in the availability of associated remotely sensed environmental datasets.

State-space model
We applied a Bayesian switching state-space model (SSSM; Jonsen et al. 2005Jonsen et al. , 2007 to raw satellite telemetry positions and obtained mean daily location and behavioral mode estimates (classified as transiting or foraging) for each ❖ www.esajournals.org individual track. We distinguished post-nesting behavior from inter-nesting behavior for the leatherbacks tagged on the nesting beaches by removing the initial part of the track that was indicative of inter-nesting behavior as in Bailey et al. (2008Bailey et al. ( , 2012a. Only post-nesting portions of the tracks of at least 30-d duration were included in the analysis (Bailey et al. 2012a). The SSSM package "bsam" (Jonsen et al. 2005, Jonsen 2016, R Core Team 2017) was chosen over other R packages (e.g., "crawl"; Johnson et al. 2008a, Johnson andLondon 2018) because the SSSM provided the behavioral mode estimation necessary for this track delineation and had been used in previous studies with these tracks (Bailey et al. 2008, 2012a. By removing inter-nesting females, we assumed sex did not influence movement behavior in this population (James et al. 2005, Benson et al. 2011. Two MCMCs were run with 30,000 samples, a burn-in of 20,000, and thinning of 10. Switching state-space model-derived position estimates on land were corrected to the nearest plausible location at sea. Tracks with gaps too large for the SSSM to accurately interpolate across (≥20 d) were split into track sections (n = 12 tracks split). All analyses were conducted in the R statistical environment (R Core Team 2017).

Environmental covariates
Environmental covariates included in the model were sea surface temperature (SST), bathymetry, sea surface height (SSH), frontal probability index (FPI), and Ekman upwelling ( Fig. 1; see Appendix S1: Table S1). Monthly values were obtained from the NOAA/NMFS Southwest Fisheries Science Center, Environmental Research Division's ERDDAP server (https://c oastwatch.pfeg.noaa.gov/erddap; Simons 2016), E.U. Copernicus Marine and Environment Monitoring Service (CMEMS; marine.copernicus.eu), and Plymouth Marine Laboratory (following method in Miller et al. 2015) within a latitudinal range of À42°to 30°N and longitude from À140°to À70°E. We chose to build the model over a 0.5°grid cell to provide a cell size that leatherbacks could move across within one day given transiting speed estimates ), but as large as possible to reduce computational intensity given the large prediction area (see Appendix S2).

Continuous-time discrete-space model
Continuous-time discrete-space (CTDS; Hanks et al. 2015) is a habitat model that accommodates unbalanced telemetry data from individual Argos tracks and gridded estimates of environmental covariates to provide movement estimates through a given area. Specifically, let b y i ðuÞ; u ¼ n 1. . .; Tg denote the estimated daily track locations from the SSSM, where i denotes an individual animal. Let b i denote the CTDS parameter for this individual. We denote the individual CTDS model pðŷ i jb i Þ as following: discretize the continuous track according to the granularity of the environmental covariates, and let Cg denotes the sequence of starting cells traversed by the animal, and t i ¼ t ic ; c ¼ f 1; . . .; Cg and s i ¼ s ic ; c ¼ 1; . . .; C f gthe time of entry and residence time for each cell g i;cÀ1 before transition into g i;c .
Let k $ l denote two neighboring grid cells, and x i;kl;t the corresponding vector of environmental drivers. The transition rate between cells can then be modeled with the environmental data via a log link We considered only motility-based covariates and assumed that impacts of gradient-based covariates on turtle movements would be minor at the population level.
The transition to neighboring cells follows a multinomial process with probability proportional to (1). Let |N k | denote the number of adjacent cells to cell k, the total transition rate from a cell k equals and the corresponding residence time at cell k follows an exponential distribution where exponential (k) denotes the distribution with mean 1/k. Assuming conditional independence between transitions and residence time-given the ❖ www.esajournals.org environmental drivers-within and across transitions, the likelihood of the CTDS path S i according to models 1-4 can be written as where we write k i;g cÀ1 ;g c k i;cÀ1;c for brevity of notation. Hanks et al. (2015) introduced the latent variable z c; The advantage of the parameterization is that model 6 is an independent Poisson likelihood with data z i;c;c 0 ; , for which the iterative generalized least squares algorithm can be used for estimation in the frequentist framework.

Population-level inference
The CTDS models individual-level association between movement and environmental cues. To account for sampling variabilities among individuals and then generalize the individual response pattern to the population level, we applied a hierarchical model that uses random effects for individual-level parameters (Hooten et al. 2016). Let n denote the number of individuals and pŷ i jb i À Á denote the CTDS likelihood function (6) of individual i ¼ 1; . . .; n as the first stage of the hierarchical model. At the second stage, we specify a conventional normal prior where l b ; R b denote the population-level parameter and the covariance matrix of individual-level sampling variabilities around l b . Letting m ¼ dimðxÞ denote the dimensionality of the environmental covariates, we assign a vague hyper-prior to complete the hierarchical Bayesian model specification.
where l 0 is m Â 1 vector of zero, R 0 is 100 9 I, where I is a m Â m identity matrix, and Q ¼ m À1 I.

Markov chain Monte Carlo algorithm
The hierarchical model (6-8) follows the generalized linear mixed model family (see Appendix S2). We used the Hamilton MCMC with No-U-Turn algorithm (Hoffman and Gelman 2014) due to its superior convergence performance. The "brms" package (B€ urkner 2017) served as an R interface to Stan and model selection. Markov chain Monte Carlo convergence diagnostics were conducted for l b ; R b ; and selected b i parameters based on the R "coda" package (Brooks andGelman 1998, Plummer et al. 2006).

Model selection
The full model available for selection was symbolically where motility is the total rate of transitions out of a grid cell per day. Each environmental variable is incorporated as a dynamic driver based on the time of entry into the initial cell for each movement c. We incorporated quadratic sea surface temperature and linear bathymetry, Ekman upwelling, frontal probability index, and sea surface height. The crw term is a constructed auto-covariate term (Illian et al. 2012), which measures the directional change between the adjacent moves, and is important in the model to incorporate temporal auto-correlation. The remaining model covariates were selected first using a "top-down" strategy based on posterior leave-one-out (LOO) predictive statistics (Vehtari et al. 2017). We sequentially removed each variable from the hierarchical model (6-8) and compared the LOO statistics between the original model and each simplified model. The variable resulting in the least significant difference in the LOO statistic is removed. This drop-one process stops when no variable can be removed from the model. The automatic model selection was supplemented by manual interpretations.

Model prediction
We used residence time as a metric to map the predictive resource selection (Hanks et al. 2015). Let x p j ; j ¼ 1; . . .; J denote the value of selected covariates at grid cells g j , which covers the entire ❖ www.esajournals.org leatherback habitat in the eastern Pacific. The mean residence time at each cell can be defined as Posterior inference of r j l b À Á can be forward simulated based on posterior samples of l b , while accounting for the joint distribution of all model parameters. The posterior mean and interquartile-range can be mapped as the estimated residence time and associated uncertainty.

RESULTS
Based on the results of the SSSM, there were 58 tracks from 45 individual turtles used in the modeling process, totaling 10,066 daily positions spanning February 2003 through December 2014 ( Table 1). The best model from the posterior LOO model selection was the full model bathymetry, SST, SST 2 , FPI, and SSH (DLooIC = À4.57, SE(DLooIC) = 3.53, P = 0.20; see Appendix S1: Table S2). Population-level Eastern Pacific leatherback motility estimates (l b ) indicate effects for each model covariate given no zero overlap for each 95% confidence interval (Table 2). Leatherbacks are more likely to move (decrease residence time) with high and low values of SST (l b ¼ 0:54; SST 2 : l b ¼ 0:72) and high FPI (l b ¼ 0:76), while reducing movement in areas of high SSH (l b ¼ À1:330). Environmental predictors were checked for collinearity. Residence time in days was predicted for June 2016 (Fig. 2a) and December 2016 (Fig. 2b).

DISCUSSION
This habitat-based movement modeling approach can serve as a foundation for studies utilizing tagging data to overcome statistical challenges for population-level movements (see Appendix S1: Fig. S1). Organismal movement is often highly complex, and this framework accommodates intricate environmental selection by individuals scaled to populations. This process can be applied to tagging studies to account for the spatiotemporally auto-correlated, unbalanced, and presence-only telemetry observations in a dynamic environment. It is broadly applicable, automatable, parallelizable, and interpretable, easing computing demands of vast datasets. This considers and accounts for oftenoverlooked data source errors, reducing model uncertainty. Overall, this hierarchical modeling approach represents a higher-level estimation of persistent, predictable behavior across any population of study.
Compared to other movement studies, the CTDS model framework does not require the  Notes: crw denotes the direction of the most recent movement at each time point. Tracks (n = 58) used were longer than 30 d, include all environmental predictors, and do not exhibit collinearity between predictors. Bathy is bathymetry, SST is sea surface temperature, SST 2 is quadratic sea surface temperature, FPI is the frontal probability index, and SSH is sea surface height. creation of false absences or pseudo-tracks commonly used in generalized additive mixed model analyses of telemetry data (Willis-Norton et al. 2015, Hazen et al. 2016). This framework provides a means strengthening dynamic management models by incorporating multiple data sources, including irregularly spaced data, which strengthens the ability to model available data and maximize sample sizes. The approach reported here is flexible, particularly in that it can be adapted around the same framework indeterminate of the model used. Wilson et al. (2018) utilized a CTDS approach to model species distribution based on static seascape variables. Their approach accounted for unobserved movements into preferred habitats, highlighting these model benefits for marine telemetry data. We incorporated dynamic environmental covariates in CTDS to better reflect changing environmental conditions encountered by migrating marine species through time (e.g., seasonal seascape changes). This enabled short-term forecasting of preferential spatial use at a monthly timescale. The monthly prediction could be more relevant for dynamic ocean management than the long-term utilization density, which was more appropriate for species interacting with static environmental variables. Overall, CTDS provides population-level inference through MCMC compared to many movement studies based on individual-level models.
Using the CTDS modeling framework, we predicted the residence time of leatherback turtles during two months in 2016 (Fig. 2). Residence time estimates should predict leatherbacks transiting quickly from warm, coastal waters in December near the nesting beaches, moving toward productive fronts in the South Pacific Gyre (Fig. 2b). Turtles begin their post-nesting migration southwards through the eastern tropical Pacific in February to May ). In the South Pacific Ocean, there is a seasonal pattern with turtles moving south to the South Pacific Subtropical Convergence (Saba et al. 2008) in the austral summer (December to April) when temperatures are higher at these temperate latitudes (~30-40°S) (Shillinger et al. 2011). Leatherbacks generally return north to warmer, tropical waters (~0-20°S) in the winter (May-November; Fig. 2a). As expected, there is a large area in the South Pacific Gyre with low residence time in this warm, less productive water mass during both June and December 2016. In both months, leatherbacks were likely to move more slowly through the productive waters west of Peru and Chile. The prediction maps can serve as a valuable tool to assist with dynamic ocean management (Howell et al. 2015) to prevent and ultimately reduce bycatch of leatherback turtles in fisheries through their incorporation into an end-user interface, South Pacific TurtleWatch (SPTW).
Previous studies have conveyed the complex relationship between satellite-derived surface environmental metrics and leatherback distribution , Bailey et al. 2012a. Higher latitudes in the South Pacific are more productive, but temperature is ultimately expected to be a proxy for predicting prey abundance (gelatinous zooplankton), the driver of leatherback movement (Heaslip et al. 2012, Jones et al. 2012. Leatherbacks avoid cooler water farther south where they forage around 19°C and generally avoid warmer water (>31°C) when breeding (Shillinger et al. 2011). This expectation of higher residence time in the north during austral winter and higher probability in the south during summer, creating a north-south seasonal cycle throughout the South Pacific, is generally captured in our model predictions.
Although this modeling approach has many benefits, there are several limitations and complexities to consider. The model must be evaluated for its ability to predict the biologically realistic behavioral responses. If regression coefficients are small at the individual level, patterns across population-level inferences will not be observed. The uncertainty in the estimated movement paths of individuals can be decreased in this framework by applying multiple path imputations (Hooten et al. 2010, Hanks et al. 2015, Wilson et al. 2018, but with larger telemetry datasets, it can become computationally challenging. Within our model, vague priors were used to drive inference. However, priors can be specified to increase predictive power when greater species information is available, a benefit of using this Bayesian approach. A manual backward model selection was conducted, but a Bayesian model averaging approach could make it easier to evaluate overall predictive power of a given set of predictor variables. We used a generalized linear model, having to assume the association between the leatherback movement and environmental variables was parametric.
However, a semi-parametric generalized additive mixed model could be incorporated to model more complex associations with the environment.
The amount of telemetry data becoming available is ever-expanding, as are the complex models relating animal behavior to environmental cues, but high computational power is often required. Therefore, it is essential to use a predictive model capable of incorporating robust model estimates of movement over large tracking datasets and vast amounts of environmental information. Here, we utilized a novel approach by incorporating dynamic drivers of animal movement in a broad framework other studies can utilize, and we addressed the data analysis needs of more advanced observational techniques without super-computing computational requirements. The ability to predict motility and residence times of marine species based on environmental conditions can play a valuable role in assisting with their management and conservation in a dynamic ocean.

ACKNOWLEDGMENTS
Financial support for South Pacific TurtleWatch came from the Cinco Hermanos Fund, Marisla Foundation, and The Resources Legacy Fund. Leatherback tagging data were provided by many sources, and many collaborators contributed to those individual projects. Among others, these included Kristin Reed, Scott Eckert, Scott Benson, Barbara Block, Stanford University, personnel from the Block Lab and the Tagging of Pacific Pelagics (TOPP), and staff from Las Baulas National Marine Park (Ministerio de Ambiente y Energ ıa de Costa Rica) and The Leatherback Trust. We thank Peter Dutton and Laura Sarti for contributing leatherback tracking data. South Pacific Turtle-Watch would not have been possible without support from Upwell; Block Lab of Stanford University; DEFRA Darwin Initiative, UK; Marine Turtle Research Group; University of Exeter; National Fish and Wildlife Foundation; and NOAA's Southwest Fisheries Science Center, Pacific Islands Fisheries Science Center, and Pacific Islands Regional Office. We thank NASA, NOAA, and the E.U. Copernicus Marine Services Information for providing satellite data, especially staff from the Environmental Research Division including Steven Bograd, Elliott Hazen, and Lynn DeWitt for hosting, updating, and providing assistance on the ERDDAP interface. Merged microwave and infrared SST data were provided by Remote