Journal list menu

Volume 32, Issue 7 e2642
ARTICLE
Open Access

Near-term phytoplankton forecasts reveal the effects of model time step and forecast horizon on predictability

Whitney M. Woelmer

Corresponding Author

Whitney M. Woelmer

Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, USA

Correspondence

Whitney M. Woelmer

Email: [email protected]

Search for more papers by this author
R. Quinn Thomas

R. Quinn Thomas

Department of Forest Resources and Environmental Conservation, Virginia Tech, Blacksburg, Virginia, USA

Search for more papers by this author
Mary E. Lofton

Mary E. Lofton

Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, USA

Search for more papers by this author
Ryan P. McClure

Ryan P. McClure

Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, USA

Search for more papers by this author
Heather L. Wander

Heather L. Wander

Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, USA

Search for more papers by this author
Cayelan C. Carey

Cayelan C. Carey

Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, USA

Search for more papers by this author
First published: 26 April 2022
Citations: 10
Handling Editor: David A. Lytle

Funding information: National Science Foundation, Grant/Award Numbers: CNS-1737424, DBI-1933016, DBI-1933102, DEB-1753639, DEB-1926050, Graduate Research Fellowship

Abstract

As climate and land use increase the variability of many ecosystems, forecasts of ecological variables are needed to inform management and use of ecosystem services. In particular, forecasts of phytoplankton would be especially useful for drinking water management, as phytoplankton populations are exhibiting greater fluctuations due to human activities. While phytoplankton forecasts are increasing in number, many questions remain regarding the optimal model time step (the temporal frequency of the forecast model output), time horizon (the length of time into the future a prediction is made) for maximizing forecast performance, as well as what factors contribute to uncertainty in forecasts and their scalability among sites. To answer these questions, we developed near-term, iterative forecasts of phytoplankton 1–14 days into the future using forecast models with three different time steps (daily, weekly, fortnightly), that included a full uncertainty partitioning analysis at two drinking water reservoirs. We found that forecast accuracy varies with model time step and forecast horizon, and that forecast models can outperform null estimates under most conditions. Weekly and fortnightly forecasts consistently outperformed daily forecasts at 7-day and 14-day horizons, a trend that increased up to the 14-day forecast horizon. Importantly, our work suggests that forecast accuracy can be increased by matching the forecast model time step to the forecast horizon for which predictions are needed. We found that model process uncertainty was the primary source of uncertainty in our phytoplankton forecasts over the forecast period, but parameter uncertainty increased during phytoplankton blooms and when scaling the forecast model to a new site. Overall, our scalability analysis shows promising results that simple models can be transferred to produce forecasts at additional sites. Altogether, our study advances our understanding of how forecast model time step and forecast horizon influence the forecastability of phytoplankton dynamics in aquatic systems and adds to the growing body of work regarding the predictability of ecological systems broadly.

INTRODUCTION

Globally, ecosystems are experiencing unprecedented climate and land use change (Solomon et al., 2007; Vitousek, 1994), resulting in increased variability in their functioning (e.g., Komatsu et al., 2019; Rillig et al., 2019; Smith et al., 2009). As a result of this greater variability, management of the ecosystem services upon which society depends is increasingly challenging (Hines et al., 2019; Manning, Loos, et al., 2019). Thus, new predictive approaches to anticipate future ecosystem responses to global change are needed to assist managers and the public as they respond to changing ecosystem states (Clark et al., 2001; Dietze et al., 2018).

Freshwater ecosystems, which provide integral services to society (e.g., drinking water, fisheries, irrigation, hydropower), are particularly threatened by human activities (Jimenez Cisneros et al. 2014; Millenium Ecosystem Assessment, 2005). Freshwater lakes and reservoirs are experiencing rapid changes in water quality, including increased phytoplankton biomass in many waterbodies, due to climate and land use change (Carey et al., 2012; Ho et al., 2019; Paerl & Paul, 2012; Sinha et al., 2012). Increased phytoplankton biomass poses a potent risk to drinking water (Carpenter et al., 1998). Because of their toxins, odors, and scums, high phytoplankton concentrations increase the need for drinking water treatment, which is estimated to cost >$2 billion annually in the United States (Dodds et al., 2009).

Anticipating future lake and reservoir phytoplankton concentrations is critical to drinking water management. There is a great need to predict both phytoplankton blooms, which are large, ephemeral aggregations of phytoplankton biomass that can have substantial negative effects on drinking water quality (Cheung et al., 2013; Ewerts et al., 2013; Qin et al., 2010; Tarczyńska et al., 2001), as well as “baseline,” or non-bloom, phytoplankton concentrations in drinking water lakes and reservoirs. Having forecasts of both bloom and non-bloom phytoplankton conditions provide complementary information for drinking water managers. For example, knowing preemptively when a high-impact yet rare bloom will occur would allow managers to order additional treatment supplies in advance or provide swimming closure notices before the bloom occurs (Carey, Woelmer, Lofton, et al., 2021). Similarly, daily water treatment operations could be improved by forecasts of non-bloom conditions, which would provide important information on “typical” water quality conditions that occur over the majority of the year, allowing managers to choose depths from which to draw water. Here, we define predictions of blooms as the prediction of maximum or peak phytoplankton concentrations in a year, whereas predictions of non-bloom conditions are represented by mean or median phytoplankton concentrations. Despite the great need for scalable phytoplankton forecasts across many waterbodies, it remains unclear how well they can be predicted under bloom and non-bloom conditions.

Providing near-term, iterative forecasts, or future estimates of ecological variable(s) with quantified uncertainty, may be especially useful for management as they provide information on a timescale that is relevant for decision-making (Carey, Woelmer, Lofton, et al., 2021). Additionally, a full analysis of the relative contribution of different sources of uncertainty (Table 1) in forecasts is crucial to both properly inform management decisions (Berthet et al., 2016; Dietze, 2017; Morss et al., 2008) and iteratively improve forecast performance by constraining large sources of uncertainty (Dietze, 2017; Luo et al., 2011). Finally, building a forecasting framework that readily scales from one location to another is critical to furthering the field of ecological forecasting.

TABLE 1. Definitions of uncertainty sources that can contribute to total forecast uncertainty (derived from Dietze, 2017).
Uncertainty source Definition
Driver data Uncertainty in the forecasted estimates of model covariates (e.g., meteorology)
Initial condition Uncertainty in the observed conditions when a forecast is created
Parameter Uncertainty in model parameter values
Process Uncertainty due to the inability of a model to reproduce observed conditions

Current phytoplankton forecasts are often made at multiple forecast horizons to provide predictions at several timepoints in the future. Forecast horizon, or the length of time into the future at which a forecast is made, is generally expected to decrease the forecast's predictive accuracy (Dietze, 2017; Petchey et al., 2015). However, forecasts that extend to longer time horizons (~1–2 weeks) may be more useful decision support tools than 1–2 day forecasts by providing managers longer lead times for implementing interventions (e.g., Carey, Woelmer, Lofton, et al., 2021, Thomas et al., 2020). Forecasts are often made at longer time horizons by propagating forecasts forward over multiple time steps. Importantly, the time step of a forecast, or the temporal frequency of the forecast model output, may also influence prediction accuracy by representing different processes that occur over different time periods. However, the forecast time step is not generally taken into account when choosing a forecast model. As a result, finding a balance between the time horizon at which forecast accuracy deteriorates and the time horizon that provides the most useful decision support tool for managers is a critical step in advancing ecological forecast applications. For predictions made further into the future, it remains unknown whether models that are developed for that time step perform better than models that simply propagate daily forecasts out to multiple time steps, a focus of this study.

While the number of phytoplankton forecasts have increased in recent years, there is still a substantial range in the time step at which phytoplankton are forecasted (reviewed by Rousso et al., 2020). A majority of phytoplankton forecasts are made using a daily or sub-daily time step, which is then used to forecast time horizons ranging from 1 day to 14 days into the future (e.g., Chen et al., 2015; Kehoe et al., 2019; Loos et al., 2020; Page et al., 2018; Xiao et al., 2017). Daily or sub-daily forecasts have variable but reasonable success forecasting non-bloom dynamics and timing of bloom events, but are often unable to capture the magnitude of phytoplankton blooms, resulting in under-prediction (Chen et al., 2015; Huang et al., 2013; Massoud et al., 2018; Rajaee & Boroumand, 2015; Recknagel et al., 2016) or over-prediction (Loos et al., 2020; Mao et al., 2009; Page et al., 2018) of bloom concentrations. Moreover, these forecasts may be less accurate at forecasting multiple days or a week into the future, a lead time that is needed for drinking water managers to change treatment operations (e.g., Thomas et al., 2020). In contrast, phytoplankton forecasts that use much longer time steps (i.e., monthly or seasonal) show relatively high performance at predicting both bloom and non-bloom conditions (Manning, Wang, et al., 2019; Park et al., 2019; Rajaee & Boroumand, 2015), but lose the ability to capture short-term variability due to the coarseness of their model time step.

Forecasts on a weekly or fortnightly time step are comparatively rare (but see Recknagel et al., 2016), and none that we are aware of which compare the relative success of prediction at multiple time scales within the same ecosystem. However, given that most lake and reservoir monitoring programs typically collect weekly or fortnightly manual phytoplankton samples (Arhonditsis et al., 2004; Gerling et al., 2016; McGowan et al., 2017; Read et al., 2015; Wang et al., 2011), developing forecasting workflows on a weekly or fortnightly time step may be readily available for many waterbodies. In the meantime, the few but increasing number of lakes and reservoirs that have high-frequency phytoplankton sensors (e.g., Marcé et al., 2016) provide an ideal test bed for assessing the performance of phytoplankton forecasting models that operate on different time steps (sensu Hamilton et al., 2015).

To examine the effect of model time step on phytoplankton forecast skill, we developed ~600 days of near-term, iterative phytoplankton forecasts with quantified uncertainty for a drinking water reservoir. We made forecasts during bloom and non-bloom conditions using three time series models built on different time steps: daily, weekly, and fortnightly. To assess the scalability of our forecasting framework, we adapted the framework and applied it to an additional drinking water reservoir to produce 1 year of forecasts and compared the skill of the forecasts at both sites. We addressed four questions in this study: (1) How well can we forecast phytoplankton a) over the entire forecast period, as well as b) under bloom and non-bloom conditions? (2) What is the effect of forecast model time step on forecast performance? (3) How do the major sources of phytoplankton forecast uncertainty vary with forecast horizon and over time? And (4) how do forecast skill and uncertainty contributions change when scaled to an additional study site?

METHODS

Study site

Forecasts were produced for near-surface chlorophyll a (henceforth “chl a”, a common metric of phytoplankton biomass) at Falling Creek Reservoir (FCR). FCR is a small (0.119 km2), shallow (maximum depth = 9.3 m), eutrophic reservoir located in Vinton, Virginia, USA (37.30° N, 79.84° W, Figure 1). FCR is dimictic, with thermal stratification occurring annually from approximately May to October (Carey et al., 2019). FCR has a history of phytoplankton blooms and is primarily fed by one major upstream tributary that has been monitored with a weir since 2013 (Gerling et al., 2014, 2016). The reservoir is owned and operated as a drinking water source by the Western Virginia Water Authority (WVWA), who partnered with our research team throughout forecast development (described by Carey, Woelmer, Lofton, et al., 2021). This partnership informed several aspects of our forecasting framework, including the location of the chl a sensor used for updating forecasts, the time steps of interest for management decision-making, and forecast delivery mode. While we note that this study produced hindcasts (following the definition of Jolliffe & Stephenson, 2003), we refer to them as forecasts for consistency.

Details are in the caption following the image
Map of Falling Creek reservoir, Vinton, VA, USA (37.30° N, 79.84° W) study site showing locations of the weir, meteorological station, and chl a sensor. Forecasts were generated for the location where chl a measurements were collected.

Forecasting overview

Forecasts of near-surface (~1.0 m depth) chl a at FCR were produced over the course of ~600 days (from 2 January 2019 to 15 August 2020) using daily, weekly, and fortnightly autoregressive (AR) linear models developed using observational sensor data from FCR. For each forecast, model driver and validation data were collected via automated sensors at the study site and wirelessly uploaded to an online data repository using secure sensor gateways (Figure 2: Step (1). At each model time step, new data up to the day being forecasted were appended to the historical training data set and used to re-parameterize the AR models (Figure 2: Steps (2–3). Forecasted model driver data included shortwave radiation and forecasted discharge to the reservoir from the major inflow. Shortwave forecasts were downloaded from the National Atmospheric and Oceanic Administration Global Ensemble Forecasts System (NOAA GEFS) repository. Forecasts of future discharge were modeled using observed sensor discharge measurements on the inflow at FCR and NOAA forecasted precipitation as inputs (see Thomas et al., 2020 for a detailed description) (Figure 2: Step (4). Uncertainty was propagated for four different uncertainty types (process, initial condition, parameter, and driver data, see Table 1 for definitions) (Figure 2: Step 5).

Details are in the caption following the image
Summarized workflow for development of forecasts, from data acquisition and quality assurance/quality control (QAQC) (Step 1), assimilation of new data (Step 2), refitting of the model with new observations (Step 3), acquisition and processing of model driver data (Step 4), quantification of uncertainty sources (Step 5), and production of forecasts (Step 6). This process occurs once a day for each forecast model (daily, weekly, fortnightly). NOAA GEFS refers to the National Atmospheric and Oceanic Administration Global Ensemble Forecasts System.

We generated probabilistic daily forecasts that had a 1-day, 2-day, 3-day, … up to 14-day time horizon, weekly forecasts that had a 1-week and 2-week (i.e., 7-day and 14-day) time horizon, and fortnightly forecasts that had a 2-week time horizon (i.e., 14-day; Figure 2: Step 6). Thus, to summarize, there were 14 forecast horizons for the daily model, two horizons for the weekly model, and one horizon for the fortnightly model. To develop and run our forecast models, we used a combination of linear parametric and Bayesian statistical methods. We used parametric linear model selection on historical data to select model covariates and initial parameter values (see Model development and training). To produce forecasts, we applied our model in a Bayesian framework (see Bayesian forecasting framework).

All driver data are published in the Environmental Data Initiative (EDI) repository (Carey, Breef-Pilz, Bookout et al., 2021, Carey, Hounshell, Lofton et al., 2021, Carey, Lewis McClure et al., 2021, Carey, Woelmer, Lewis et al., 2021) and all forecast output and code is available in Zenodo (https://doi.org/DOI:10.5281/zenodo.5963867).

Reservoir monitoring data set overview

Our research team has monitored physical, chemical, and biological conditions at FCR since 2013. This long-term data set was used to develop and train the AR models prior to the beginning of the forecast period and includes manual and automated sensor measurements of chl a at the deepest site of the reservoir, meteorological variables from a meteorological station on the dam of the reservoir, and sensor measurements of discharge at the major inflow stream to the reservoir (Figure 1; see Appendix S1: Section S1 for more details on each of these data sets). Because sampling primarily occurred from May to October in 2013–2016, we began running our models during the second week of May so that observations in the first week of May served as the AR lag, not the October observations.

Model development and training

We used standard linear model selection techniques (Appendix S1: Section S2) to select covariates for the AR models (following Box & Jenkins, 1970; Quinn & Keough, 2002) for forecasting surface chl a (1.0 m) using the monitoring data set described in Appendix S1: Section S1. We chose an autoregressive linear model to capture the high temporal autocorrelation common in phytoplankton, as was demonstrated in our data set (Appendix S1: Figure S4). While this model is relatively simple in its representation of phytoplankton dynamics, we wanted to test the ability of a data-driven model that was easy to implement for examining the effect of different model time steps and horizons at multiple sites, following numerous previous phytoplankton predictive studies (Rousso et al., 2020).

We used a weekly data set of surface chl a from 2013 to 2016 to select covariates for our AR model. We chose the weekly data set, as opposed to the daily data set, because it had the longest historical data coverage, dating back to 2013, whereas the daily data set only began in summer 2018. When developing our linear model, we aimed to meet the assumptions that variables are normally distributed and found that square-root transformation of our chlorophyll data led to a better fit of normality than log-transformation. Meteorological and discharge data were aggregated to daily summaries (mean, median, maximum, and minimum) on the same day as the chl a observations. Any weeks with missing data from May to October over the 4 years (n = 18 out of 107 weeks total) were linearly interpolated using the na.approx() function in the zoo package (Zeileis & Grothendieck, 2005) to allow for a consistent time interval in the training data. To determine the influence of these linearly interpolated training data points on model performance, we conducted an identical model analysis using only observed data without interpolated data points, which showed no substantial change in model accuracy (Appendix S1: Figure S1). Variables that did not follow a normal distribution were transformed to meet the assumptions of a linear model (Appendix S1: Table S2). Our best-performing weekly model (which also serves as the process model for our Bayesian analysis) was
C t + n = β 1 + β 2 C t + β 3 S t + n + β 4 D t + n + Ɛ $$ {C}_{t+n}={\upbeta}_1+{\upbeta}_2{C}_t+{\upbeta}_3{S}_{t+n}+{\upbeta}_4{D}_{t+n}+\upvarepsilon $$ (1)
Where C t $$ {C}_t $$ is the AR term or mean daily chl a at the current time, t. The response, C t + n $$ {C}_{t+n} $$ , is the mean daily chl a concentration at the forecasted time horizon, n days in the future, where n is the time step at each time horizon of the model (e.g., n = 1–14 days for daily forecasts, 7 and 14 days for weekly forecasts, and 14 days for fortnightly forecasts). S t + n $$ {S}_{t+n} $$ is the mean daily shortwave on the next time horizon, and D t + n $$ {D}_{t+n} $$ is the mean daily discharge on the next time horizon. Ɛ is normally distributed random noise with mean = 0 and SD of σ $$ \upsigma $$ . Parameter ( β 1 $$ {\upbeta}_1 $$ , β 2 $$ {\upbeta}_2 $$ , β 3 $$ {\upbeta}_3 $$ , and β 4 ) $$ {\upbeta}_4\Big) $$ , and process error standard deviation ( σ $$ \upsigma $$ ) values from the linear model fitting were used as initial parameter values in a Markov Chain Monte Carlo simulation when forecasting (see Bayesian forecasting framework for more details).

Given that the model drivers likely have varying effects on phytoplankton at different time scales, we performed the model selection analysis separately for the daily and fortnightly data sets to determine if the same covariates would be significant for the different time scales. We found that, for both the fortnightly and daily data sets, the same model (covariates of mean shortwave radiation and mean discharge) was within two AICc units (Akaike information criterion corrected for sample size) of the top model (Appendix S1: Tables S6–S8), indicating a similar model performance (Burnham & Anderson, 2002). Therefore, we used the same weekly model covariates (an AR term, shortwave radiation, and discharge) for all model time steps (daily, weekly, fortnightly) in our forecast analysis.

To examine the influence of drivers averaged over daily to weekly scales, we developed models using drivers that were cumulatively averaged over a week compared to drivers which were point estimates on the day of the forecast. We found little difference in the effect of model prediction accuracy (Appendix S1: Figure S2). Thus, the results presented here use driver data that are forecasted or observed on the day being predicted.

Although the data sets used to develop AR forecasts spanned different durations and temporal resolutions (Appendix S1: Table S2), we used a similar number of observations to train all models with differing time steps to enable comparability of model forecast performance when logistically possible. Due to data availability, the training data set for the daily chl a models ranged from 15 August 2018 to 15 December 2018 (n = 110 observations), while the weekly and fortnightly data set ranged from May to October each year in 2013–2016 (n = 107 weekly observations and n = 53 fortnightly observations).

Bayesian forecasting framework

Using the model covariates selected from the weekly training data set (mean shortwave radiation and mean discharge) and initial parameter values, we used a Bayesian framework to fit distributions for model parameters on each forecast day for the daily, weekly, and fortnightly models following the workflow outlined in Figure 2, Step 2a. This fitting occurred iteratively on every day in which a forecast was produced. Forecasts were produced and evaluated from 2 January 2019 to 15 August 2020 (hereafter, the full forecast period) for all forecast models.

Our state-space Bayesian model followed the same form as Equation 1), except with the process model and data model separated to allow the partitioning of model process and observation uncertainty. The process model was:
C t + n latent ~ normal β 1 + β 2 C t latent + β 3 S t + n + β 3 D t + n σ add $$ {C}_{t+n}^{\mathrm{latent}}\kern.75em \sim \kern.75em \mathrm{normal}\left({\upbeta}_1+{\upbeta}_2{C}_t^{\mathrm{latent}}+{\upbeta}_3{S}_{t+n}+{\upbeta}_3{D}_{t+n},{\upsigma}_{\mathrm{add}}\right) $$ (2)
with the following data model:
y t + n ~ normal C t + n latent σ obs $$ {y}_{t+n}\kern.75em \sim \kern.75em \mathrm{normal}\left({C}_{t+n}^{\mathrm{latent}},{\upsigma}_{\mathrm{obs}}\right) $$ (3)
where σ add $$ {\upsigma}_{\mathrm{add}} $$ and σ obs $$ {\upsigma}_{\mathrm{obs}} $$ are the standard deviation parameters describing normally distributed additive process uncertainty and normally distributed observation uncertainty, respectively. y are the observations and C t latent $$ {C}_t^{\mathrm{latent}} $$ are the latent states that represent the modeled distribution before observation noise is added. All other parameters are the same as described above in Equation 1). We used uninformative priors for the β 1 $$ {\upbeta}_1 $$ , β 2 $$ {\upbeta}_2 $$ , β 3 $$ {\upbeta}_3 $$ , and β 4 $$ {\upbeta}_4 $$ parameters by assuming normal distributions with large standard deviations:
β i ~ normal 0 , 1000 . $$ {\upbeta}_i\kern.75em \sim \kern.75em \mathrm{normal}\left(0,1000\right). $$ (4)
Similarly, σ add $$ {\upsigma}_{\mathrm{add}} $$ had an uninformative prior with a uniform distribution:
σ add ~ uniform 0.001 , 100 . $$ {\upsigma}_{\mathrm{add}}\kern.75em \sim \kern.75em \mathrm{uniform}\left(0.001,100\right). $$ (5)
The parameter σ obs $$ {\upsigma}_{\mathrm{obs}} $$ was a constant that was estimated using the standard deviation of a linear regression between our two sensor data sets of chl a derived from the CTD chl a and EXO chl a (Appendix S1: Section S3).

To estimate posterior distributions, we ran four Markov Chain Monte Carlo (MCMC) chains, with an adaptation period of 1000 iterations, a burn-in period of 1000 iterations, and a sample size of 10,000 iterations. Initial starting values for each parameter were taken from the linear model fit for each parameter value (see Model development and training). The latent state was initialized using the first observation in the training data set, and the MCMC was run using all data up to the day being forecasted. The posterior output from the MCMC distribution for all parameters and latent states were saved as inputs to the forecast model to quantify uncertainty (Figure 2, Step 5). MCMC chains were assessed for convergence using the potential scale reduction factor of the Gelman-Rubin statistic ( R ^ $$ R\hat{\mkern6mu} $$ ). All R ^ $$ R\hat{\mkern6mu} $$ values for all parameters in all forecast models were 1 or 1.01, indicating that the model had converged on a parameter estimate both within and among MCMC chains.

Observed model driver and validation data were automatically uploaded to a GitHub repository in real-time (Figure 2, Step 1) (https://github.com/FLARE-forecast/FCRE-data) and added (or assimilated) into the training data set for fitting model parameter distributions in the Bayesian framework (Figure 2, Step 2). New data were assimilated according to the time step of the model (i.e., daily, weekly or fortnightly). All models (daily, weekly, and fortnightly) were re-fit using the entire data set (including training data) up to the day being forecast using MCMC (Figure 2, Step 3) methods for each forecast day to allow the parameter values to evolve over time. MCMC analyses were carried out using the rjags package (Plummer, 2019) within the R statistical environment. For each forecast model and for every forecast run, we estimated the posterior distributions of parameters using MCMC, then sampled the posterior distributions of parameters, process error, and latent states to combine with forecasted driver variables to produce forecasts of chl a at each forecast horizon.

Forecast evaluation

To address Q1, we evaluated our forecasts' performance post-hoc for three time periods: (1) the entire ~600-day period during which multiple forecasts were produced at FCR, (2) non-bloom conditions, and (3) bloom conditions. Bloom conditions were determined as any time when observed chl a concentrations were above a bloom threshold. The threshold for bloom conditions was determined as four times the SD of historical chl a at FCR from CTD chl a data set (17.1 μg/L; Carey et al., 2019). We chose four SDs as our threshold, rather than three SDs (e.g., Healy, 1979), because of the high-frequency nature of our sensor data, which may increase the likelihood of outliers due to sensor fouling and phytoplankton quenching (Hamilton et al., 2010; McBride & Rose, 2018; Rousso et al., 2021). Forecasts were evaluated as occurring under bloom conditions any time the observed chl a was above this threshold on the day the forecast was initiated or any days when chl a was above this threshold within the 14-day forecast horizon. We included this second set of days in order to evaluate anticipatory predictions of bloom conditions before they occurred. Forecast performance over time was calculated for all three of these periods using root mean squared error (RMSE).

We also quantified the performance of our forecasts relative to a null persistence forecast (following Harris et al., 2018). A null persistence model assumes that the chl a concentration at the next time step is the same as the current time step while accounting for process error and observation uncertainty. The null forecast was fit for each forecast day in rjags using a random walk MCMC simulation following the same methods as described in Methods: Bayesian forecasting framework in which process error was estimated and observation uncertainty was calculated by sampling from a normal distribution around the observed chl a concentration (SD = 0.21, Appendix S1: Section S3).To create an ensemble null forecast, we randomly sampled 441 times from the distribution of predicted chl a for each forecast day. For RMSE statistics, the mean of all ensembles was used.

Last, to address Q2, we performed the same forecast evaluation as above and compared the relative skill of our forecast models across matching forecast horizons (e.g., we compared daily forecasts to weekly forecasts at the 7-day horizons, and daily, weekly, and fortnightly forecasts at the 14-day horizon).

Uncertainty quantification and partitioning (Step 5 in Figure 2)

To address Q3, we quantified total uncertainty in all forecasts and partitioned uncertainty among the individual sources (process, initial condition, parameter, and driver data, Table 1) using a separate post-hoc analysis (Figure 2, Step 5). All sources of uncertainty were included in each forecast (Figure 2, Step 6). We chose 441 ensembles by multiplying the 21 NOAA GEFS ensembles by 21 discharge forecast ensembles. The resulting number of 441 ensembles was both within the constraints of computational abilities and large enough to allow for a reasonable spread of uncertainty. Further, 441 ensembles was below the effective sample size of all parameters within the MCMC chain, and therefore an appropriate number of samples to pull from the posterior distributions (Appendix S1: Table S9).

To quantify initial condition and parameter uncertainty, we sampled a randomly selected index from the joint posterior distributions of the latent state and parameter distributions, respectively, 441 times. Similarly, process error was estimated using a normal distribution with a mean of zero and a SD sampled from the MCMC posterior distribution of σ add $$ {\upsigma}_{\mathrm{add}} $$ (Equation 5) for each of the 441 ensemble members. Meteorological driver data uncertainty was estimated by running the forecasts with the 21 unique NOAA GEFS meteorological forecasts, looping through the 21 ensembles for all 441 forecast ensemble members. Because the NOAA GEFS forecasts were statistically downscaled from regional forecasts to our study location (see Thomas et al., 2020 for methods), there is inherent uncertainty in downscaling, although we did not quantify its contribution in this study. Discharge driver data uncertainty was estimated by randomly sampling from a normal distribution around the discharge forecast for each ensemble member, where the normal distribution was defined using the discharge forecast estimate as the mean and the standard deviation of the residuals of the linear model between observed and forecasted discharge as the standard deviation in the normal distribution (following Thomas et al., 2020). To partition the relative contributions of uncertainty sources through time, we performed a post-hoc uncertainty analysis conducted for all forecast models. We quantified each individual source of uncertainty by propagating only that source, using the methods described above, while removing the contribution of the other sources for each week forecast (daily, weekly, and fortnightly). In the analysis, initial condition uncertainty was removed by initializing the forecast using the mean value of the posterior distribution of the latent state of chl a, parameter uncertainty was removed by using the mean values of the posterior parameter distribution, process uncertainty was removed by not adding any process uncertainty (i.e., not sampling from the normal distribution describing model error), meteorological uncertainty was removed by using only one of the members from the weather forecast ensemble, and discharge uncertainty was removed by not sampling from the normal distribution describing error in the discharge model. The proportion of variance in the forecast output for each uncertainty source relative to total variance summed across all sources was then calculated over the forecast period and for each forecast horizon.

Forecast scalability to new locations

To address Q4 and test the scalability of our forecasting framework to other locations, we produced forecasts at an additional drinking water reservoir, Beaverdam Reservoir (BVR). BVR is also located in Vinton, Virginia (37.31, −79.82) and is owned and operated by the WVWA (Appendix S1: Figure S3). BVR is slightly larger (0.39 km2) and deeper (13.0 m maximum depth) than FCR and is fed by numerous small inflow streams. Discharge in two of the largest contributing inflows has been sampled occasionally, allowing us to produce modeled estimates of inflow discharge to the reservoir. Like FCR, BVR is dimictic with thermal stratification occurring annually from approximately May through October (Carey et al., 2019). BVR has a history of deep-water phytoplankton blooms (Hamre et al., 2018), but has overall lower phytoplankton concentrations at the surface than FCR, allowing for a novel comparison between forecasts of surface phytoplankton in a system with lower surface phytoplankton concentrations (FCR historical median chl a, 5.4 μg/L; BVR historical median chl a, 3.4 μg/L). Because the aim of this analysis was to assess the scalability of our forecasting framework to a new site, we chose to use only the weekly model of our three (daily, weekly, fortnightly) forecast model timesteps at BVR, as it was the intermediate option of the three and provided training data dating back to 2013.

Moreover, BVR is an ideal test case for our scalability analysis due to its availability of data streams for applying our forecasting model. In August 2020, an EXO sonde collecting chl a fluorescence data was installed at the surface (~1.5 m) of BVR (Carey, Breef-Pilz, & Bookout, 2021). Because of its proximity to FCR, we used the same observational meteorological data (daily mean shortwave radiation) for model training, and NOAA GEFS forecasts of shortwave radiation to drive the forecasts.

Producing estimates of inflow to BVR was slightly more challenging than in FCR because the reservoir's inflows have not been routinely monitored, nor is there a high-frequency sensor measuring discharge into the reservoir. However, we used a publicly available watershed model, the Thornthwaite-Mather Water Balance (TWMB) model (see Appendix S1: Section S4 for a full description), coupled with observed soil characteristic data (Soil Survey Staff, 2021), and observed precipitation and air temperature from our meteorological station to model daily inflow rates to BVR over the training period (2013–2016). When producing forecasts of discharge at BVR, we used NOAA GEFS forecasts of daily precipitation and air temperature to drive the TWMB model.

In summary, to produce weekly forecasts of chl a at BVR, we followed the same framework as at FCR (see Methods: Forecasting Framework), resulting in 1-week and 2-week forecasts every day from 1 August 2020 to 31 August 2021. On each day a forecast was made, all available observational data up to the day being forecasted were used to fit the model in a Bayesian framework. We then used forecasts of shortwave radiation from NOAA GEFS, as well as NOAA GEFS forecasts of precipitation to drive the TWMB inflow model, to drive our chl a forecasts. Additionally, we produced 1-week and 2-week forecasts at FCR over this same period and conducted a full uncertainty analysis for both FCR and BVR to compare the relative performance of the forecast model at both sites.

RESULTS

Observed chl a patterns during study period

Chl a exhibited substantial variability over the forecast period, which included two blooms in July 2019 and March–April 2020 and rapid declines due to two copper sulfate dosing events in February and March 2019 (Appendix S1: Figure S4). Observed chl a over the forecast period exhibited a median chl a concentration of 7.3 ± 7.2 μg/L (mean ± SD), and a maximum concentration of 55.7 μg/L, which occurred during the July 2019 bloom. The magnitude of this large bloom greatly exceeded any previous surface bloom recorded by chl a sensors in the reservoir (Appendix S1: Figure S4). For comparison, the training data set from May to October of 2013–2016 had a slightly lower median concentration of 4.9 ± 4.3 μg/L, and the highest recorded bloom during this period was 24.2 μg/L in October 2014 (Appendix S1: Figure S4). Further, the two blooms that occurred during the forecast period exhibited different patterns. The 2019 bloom lasted from 17 July to 5 August 2019, whereas the 2020 bloom lasted intermittently from 16 March to 23 April 2020 and was much more variable in concentration (Appendix S1: Figure S4).

The reservoir was actively managed as a drinking water supply during the forecast period and was treated with copper sulfate twice in 2019 to reduce phytoplankton concentrations. Managers added 200 lbs (90.7 kg) of copper sulfate on 28 February 2019 and 100 lbs (45.4 kg) on 20 March 2019, effectively decreasing chl a concentrations to ~0 μg/L on both days. These time periods were excluded from subsequent analyses because they are not an ecological phenomenon instantiated in the model and thus could not have been predicted.

Q1a: Forecasts over the full ~600-day forecast period

Aggregated over the entire forecast period, daily, weekly, and fortnightly forecasts predicted non-bloom chl a dynamics with consistent accuracy (Figure 3), although bloom events were predicted with variable accuracy. Daily forecasts never outperformed a daily null model, but weekly and fortnightly forecasts performed better or the same as their respective null models (Figure 4a).

Details are in the caption following the image
Forecasted (black line) and observed (points) chl a concentrations over ~600 days from daily (blue, panels [a]–[c]), weekly (green, panel [d]–[e]), and fortnightly (purple, panel [f]) forecasts; the columns are grouped by forecast horizon, at 1 day (left column), 7 days (middle column), and 14 days (right column). The black lines show the mean of 441 forecast ensembles, and the gray shaded area gives the 95% confidence intervals of the forecast ensembles. Please note the differences in axes among panels.
Details are in the caption following the image
RMSE (root mean squared error) across 1–14-day forecast horizons for daily, weekly, and fortnightly forecasts and respective null models aggregated over (a) the full ~600 day forecast period, (b) non-bloom conditions only, and (c) bloom conditions only. Note the different y-axis scale in panel (c).

Daily forecasts at a 1-day horizon over the entire forecast period recreated observed dynamics with an overall RMSE of 3.9 μg/L (Figure 3, Appendix S1: Table S10), less than the observed historical standard deviation in chl a in this system (4.3 μg/L), a range of error that should still allow managers to make decisions with confidence. However, daily forecasts at the 1-day horizon did not perform better than the null model, which also had a lower RMSE (2.8 μg/L) than the historical standard deviation (Figure 4a, Appendix S1: Table S10). Weekly forecasts predicted chl a with an RMSE of 6.1 μg/L 7 days ahead and an RMSE of 6.8 μg/L at 14 days ahead (Figure 3, Appendix S1: Table S10), outperforming the null model at both forecast horizons (Figure 4a). Similarly, fortnightly forecasts also outperformed the null model at the 14-day horizon, with an RMSE of 7.0 μg/L, compared to 7.9 μg/L for the null model (Figure 4a, Appendix S1: Table S10).

Based on AR model selection, our forecasts were driven by covariates of shortwave radiation and discharge into the reservoir. Both discharge and shortwave showed a negative relationship with chl a in all forecast models, although the coefficients showed some variation in magnitude over time. In contrast, the AR chl a coefficient, intercept, and error term all remained positive throughout the forecast period, although their magnitude varied (Appendix S1: Figures S6–S8). The parameter values for the environmental predictors (solar radiation, discharge) were much higher in the weekly forecasts than daily, and slightly higher in the fortnightly forecasts than the weekly forecasts, indicating that solar radiation and discharge were stronger predictors of chl a at longer time horizons. In contrast, the AR chl a coefficient showed the opposite trend (higher in daily forecasts, and lower in weekly and fortnightly). This pattern indicates that daily forecasts are much more sensitive to the AR chl a term than to environmental predictors, while weekly and fortnightly forecasts have more weight on the environmental predictors relative to daily forecasts.

Q1b: Forecasts during non-bloom versus bloom conditions

Forecast performance was consistently and substantially higher during non-bloom than bloom conditions (Figure 4b vs. 4c). During non-bloom conditions, daily forecasts still did not outperform the null model but at least had similar performance as the null until the 9-day forecast horizon (Figure 4b). In contrast, weekly and fortnightly forecasts were quite accurate under non-bloom conditions and outperformed the null persistence models at all forecast horizons (Figure 4b), with an overall RMSE of 3.5 μg/L at the 7-day horizon and 3.1 μg/L at the 14-day horizon for weekly forecasts (Figure 4b) and 2.8 μg/L at the 14-day horizon for fortnightly forecasts (Appendix S1: Table S10).

In contrast, forecasts of blooms did not consistently outperform null forecasts. Bloom conditions were rare and occurred over 19 days during July and August 2019 and intermittently over a period of 25 days in March and April 2020, for a total of 44 out of 590 days, or 7.5% of the forecast period. During these bloom periods, daily forecasts always performed worse than the null model (Figure 4c, Appendix S1: Table S10), and weekly and fortnightly forecasts performed the same or worse than the null model. Further, forecast accuracy appreciably decreased during blooms, with RMSE values ranging from 12 μg/L at the 1-day forecast horizon to 23.8 μg/L for fortnightly forecasts at the 14-day horizon (Appendix S1: Table S10).

Forecast performance was markedly different between the July 2019 and March/April 2020 blooms (Figure 3). For the July 2019 bloom, daily forecasts were able to predict the magnitude of the bloom at all forecast horizons, but only accurately predicted the timing of the bloom at a 1-day horizon (Figure 3a–c). At 7-day and 14-day horizons, peak chl a predictions highly exceeded the observed magnitude of the bloom, and were predicted only after the observed event occurred, with very large uncertainty. In contrast, during the smaller, less persistent bloom that occurred in March–April of 2020, daily forecasts were able to predict both bloom magnitude and timing.

In contrast to daily forecasts, weekly forecasts did not successfully predict the magnitude of observed chl a concentration for the July 2019 bloom, with peak values being much lower than observed chl a concentrations (Figure 3d–e). Like the daily forecasts, weekly forecasts much more accurately predicted the timing and magnitude of the March–April 2020 bloom, with observed concentrations falling within the confidence intervals of the forecast ensemble for almost all 7-day forecasts and 14-day forecasts.

Fortnightly forecasts underpredicted the magnitude and did not predict the timing of the July 2019 bloom (Figure 3e). However, the forecast performed much better during the March–April 2020 bloom, with only a few days of observed concentrations falling outside of the confidence intervals of the forecast.

Q2: Effect of model time step on forecast performance

There was a consistent trend in the effect of model time step on forecast performance, as determined by comparing the forecasts generated by multiple forecast models (daily, weekly, fortnightly) at the 7-day and 14-day time horizons. Weekly and fortnightly forecasts consistently outperformed daily forecasts, a trend that increased with forecast horizon except during bloom conditions (Figure 4). Over the full forecast period, the weekly and fortnightly forecasts had a >5 μg/L improvement in RMSE over the daily model at the 14-day forecast horizon (Figure 4a, Appendix S1: Table S10). However, the effect of model time step was not as pronounced between the weekly and fortnightly models, with weekly forecasts sometimes only slightly outperforming fortnightly forecasts (e.g., Figure 4a vs. 4c).

Q3: Uncertainty partitioning analysis

Process error was the dominant source of uncertainty at all forecast horizons for all forecast models, except for the 1-day forecast horizon, which had initial condition uncertainty as the dominant source (Figure 5, Appendix S1: Figure S9). The relative contributions of the different uncertainty sources varied over time for all forecast models (Figure 5). Process error remained the dominant source of uncertainty for most of the forecast period at all horizons greater than 1 day. However, parameter uncertainty increased dramatically during the March 2019 copper sulfate dosing and the July 2019 and March/April 2020 blooms. During these events, parameter uncertainty became the dominant source of uncertainty for daily forecasts at 7-day and 14-day horizons (Figure 5a,c) and increased sharply in importance for weekly and fortnightly forecasts (Figure 5b,d). Meteorological driver data uncertainty was a small contributor to the total uncertainty for all forecast models but increased in contribution in the late winter and early spring of 2019 (Figure 5b–d). Discharge uncertainty made up a very small yet consistent proportion of overall uncertainty throughout the year for all forecast models and was a larger contribution of uncertainty at the 14-day horizon for all forecasts (Figure 5b,d).

Details are in the caption following the image
Relative proportion of forecast variance (uncertainty) over the forecast period (left y-axis) for (a) daily forecasts at 1 day, (b) daily forecasts at 7 days, (c) daily forecasts at 14 days, (d) weekly forecasts at 7 days, and (e) weekly forecasts at 14 days, and (f) fortnightly forecasts at 14 days. The black line represents the total variance over the forecasting time period (right y-axis).

Interestingly, the July 2019 bloom resulted in a greater increase in parameter uncertainty and total forecast uncertainty than the March/April 2020 bloom (Figure 5). The increase in both parameter and total uncertainty in July 2019 was much larger for daily forecasts than for weekly or fortnightly forecasts. In particular, total variance increased by over seven-fold for the daily model at the 14-day forecast horizon during the July 2019 bloom. Total variance also increased during copper sulfate dosing events (late February and March) although this increase was much smaller for weekly forecasts than for daily forecasts (Figure 5a,c).

Q4: Forecast scalability to new locations

Our forecast model framework was successfully scaled from FCR to BVR, producing weekly forecasts of surface chl a at both sites from August 2020–August 2021. Forecasts at BVR matched well with observed chl a at both the 1-week and 2-week horizon (RMSE = 2.44 μg/L; 3.07 μg/L, respectively), with a small decline in performance up to the 2-week forecast horizon (Figure 6a,b). All forecasts at BVR showed very high uncertainty around the forecast mean throughout the forecast time period. Forecasts of chl a at FCR over the same time period (RMSE = 5.36 μg/L; 5.16 μg/L) were less accurate but had lower overall uncertainty (Figure 6c,d).

Details are in the caption following the image
Forecasted (black line) and observed (points) chl a concentrations over 1 year at Beaverdam reservoir (panels [a], [b]) and Falling Creek reservoir (panels [c], [d]) from weekly (green, panel [a], [c]), and fortnightly (purple, panel [b], [d]) forecasts; the columns are grouped by forecast horizon, at 7 days (left column) and 14 days (right column). The black lines show the mean of 441 forecast ensembles, and the gray shaded area gives the 95% confidence intervals of the forecast ensembles. Please note the differences in axes among panels.

While process uncertainty is still by far the largest contributor of overall uncertainty (Figure 7) in forecasts for FCR and BVR, parameter uncertainty had a higher contribution in BVR than in FCR. Additionally, while the overall contribution is much lower, both BVR and FCR show a slight increase in weather driver uncertainty over the late spring to mid-summer time period (~February to August). Discharge driver uncertainty remained a very small contribution to overall uncertainty in FCR but was negligible in BVR. Total variance between the two sites was similar at both 1-week and 2-week forecast horizons.

Details are in the caption following the image
Relative proportion of forecast variance (uncertainty) for weekly forecasts from August 2020 to August 2021. The top row shows the uncertainty analysis for Beaverdam Reservoir at 7 days (panel [a]) and 14 days (panel [b]), while the bottom row shows weekly forecasts for Falling Creek reservoirs over the same time period at 7 days (panel [c]) and 14 days (panel [d]). The colors represent the relative contribution of each uncertainty source (left y-axis), while the black line represents the total variance over the forecast period (right y-axis).

DISCUSSION

While the number of ecological forecasts is increasing (Lewis et al., 2021; Luo et al., 2011; Rousso et al., 2020), many questions remain regarding the appropriate time scale at which to develop ecological models for forecasting and management applications, the time horizon and conditions under which ecological variables are predictable, the major sources of uncertainty in forecasts, and their scalability across waterbodies (Clark et al., 2001; Dietze et al., 2018; Petchey et al., 2015). Our work indicates that forecast accuracy varies with model time step and forecast horizon, and that weekly and fortnightly chl a forecast models can outperform null estimates under most conditions. Importantly, our work also shows that sources of uncertainty, as well as forecast accuracy, do not remain constant through time, and that examining changes in uncertainty can elucidate mechanisms behind changes in forecast performance (e.g., during bloom conditions when parameter uncertainty increases and forecast skill decreases). Further, we show that our forecasting framework can produce accurate forecasts when scaled to another site. Altogether, our study advances understanding of forecasting phytoplankton dynamics in aquatic systems and adds to the growing body of work regarding the forecastability of rare, short-lived events (such as blooms) in ecological systems broadly, which we explore below.

Importance of the forecast model time step

Forecast performance differed with model time step and horizon (Figure 4), demonstrating that both are important considerations when choosing a forecast model. For the entire ~600-day forecast period (i.e., primarily non-bloom conditions), both weekly and fortnightly forecasts outperformed daily forecasts at all horizons, indicating that forecast accuracy depends on the time step of your model. Specifically, our study shows that tuning a forecast model to a longer time step may produce better forecasts than propagating a daily model to multiple horizons. In practical terms, this indicates that if you are aiming to forecast conditions in 1 week, rather than in 1 day, forecast performance may be improved by creating a model at the weekly timestep. Interestingly, fortnightly forecasts did not outperform weekly forecasts, which may be due to a saturating effect of model time step on forecast accuracy in predicting phytoplankton ~1 week ahead. Although phytoplankton are known to quickly respond to changes on hourly to daily time scales (Reynolds, 2006), our study demonstrates the benefit of exploring the relevance of the weekly time scale, a time period that may be better able to capture slower-moving processes, such as nutrient delivery from inflow streams (Liu et al., 2019). Further, weekly to fortnightly lead times may actually be more useful for management decisions than daily forecasts, providing more time to implement a water quality intervention, such as ordering chemicals or filters for water treatment or choosing the depth of extraction within the reservoir (Carey, Woelmer, Lofton, et al., 2021).

Forecast skill compared to a null model

All the forecasts performed the same or worse than a null persistence model during bloom conditions, and only weekly and fortnightly forecasts outperformed the null during non-bloom conditions (Figure 4a). Forecasts likely underperformed compared to a null model during bloom conditions due to an inability of our forecast models to accurately replicate the processes occurring during blooms. While weekly and fortnightly forecast models were more skilled at predicting changes in phytoplankton biomass during non-bloom conditions than a null model, the information added by our forecasted covariates was not sufficient to increase forecast skill over the null model during bloom conditions.

It is not unusual for phytoplankton forecasts to perform similarly or worse than null models, especially at the daily scale. While many forecasts of phytoplankton variables do not currently compare forecasts with a null model (Lewis et al., 2021), those that do report variable performance, with many showing that forecast skill is only greater than a null at monthly to yearly forecast horizons. For example, Park et al. (2019) found that their forecasts of marine chl a concentrations outperformed a null model at horizons ranging from 1 month to 1 year. In contrast, Page et al. (2018) and Kehoe et al. (2019) both found that their daily phytoplankton forecasts performed similarly or worse than a null persistence model, in some cases with no improvement as forecast horizon increased. These two phytoplankton studies support our findings that daily forecasts did not always perform better than a null model. However, the variability in null performance across different types of ecological forecasts (e.g., Harris et al., 2018; Lovenduski et al., 2019; Massoud et al., 2018; Yeager et al., 2018) underscores the need for more studies to integrate null models into forecast evaluation metrics (following Harris et al., 2018, Dietze, 2017, Lewis et al., 2021).

Challenges in forecasting phytoplankton blooms

There are multiple potential reasons why forecast accuracy was much worse during bloom conditions relative to non-bloom conditions (Figure 4). First, the decrease in forecast performance may be due to a decrease in autocorrelation of phytoplankton biomass during blooms, which result in rapid and steep increases in biomass, as opposed to non-bloom conditions, when biomass is relatively stable from day to day. Second, our 2013–2016 training data included very few blooms, and even though blooms did occur in our forecast period, a majority of the time series remains dominated by non-bloom conditions (92.5%). This means that the training data set was inherently biased to predicting non-bloom, as opposed to bloom conditions. Third, the relative importance of processes governing phytoplankton populations may change between bloom and non-bloom conditions (Gray et al., 2019, Ho et al., 2019, Ho & Michalak, 2019). Altogether, our work follows many other studies that have also observed that phytoplankton blooms are notoriously difficult to predict (e.g., Chen et al., 2015; Huang et al., 2013; Loos et al., 2020; Massoud et al., 2018; McGowan et al., 2017; Page et al., 2018; Recknagel et al., 2016; Rigosi et al., 2011, 2015), and underscore the need to expand our understanding of phytoplankton bloom dynamics.

Forecast accuracy differed between the July 2019 and March/April 2020 bloom events, indicating that phytoplankton dynamics during some blooms may be more predictable than others. It is possible that the differences in predictive ability between blooms may be because of different phytoplankton taxa underlying the two blooms, but unfortunately we do not have the data to test this hypothesis (Appendix S1: Section S5). However, all forecast models better predicted the March/April 2020 bloom than the July 2019 bloom (maximum concentration = 55.7 μg/L), likely because the March/April 2020 bloom was much lower in peak concentrations (maximum concentration = 24.2 μg/L), and lasted a longer period of time, allowing the forecasts to adjust to elevated chl a concentrations (see Appendix S1: Figures S6–S8 show dramatic changes in parameter values in response to bloom events).

Capturing both the timing of onset and magnitude of phytoplankton blooms is a critical benchmark of phytoplankton forecasts. While daily forecasts at a 1-day horizon recreated both the timing and magnitude of the two blooms, daily forecasts at time horizons greater than one day overpredicted the magnitude and missed the timing of the July 2019 bloom (Figure 3a,b,c). The autoregressive component of our forecast models likely contributed to the mismatch in the timing of bloom prediction at longer time horizons (e.g., Figure 3c). As the model was updated with the newest observed chl a concentration, the daily model was able to predict higher concentrations, but only after concentrations were already elevated. As a result, the model did not accurately predict the bloom ahead of time, but only responded to already elevated initial conditions. In contrast, weekly and fortnightly forecasts did not accurately predict the magnitude or timing of the July 2019 bloom (Figure 3e–f). Accurately predicting both the timing and magnitude of blooms has long evaded phytoplankton research (Chen et al., 2015; Huang et al., 2013; Loos et al., 2020; Massoud et al., 2018; McGowan et al., 2017; Page et al., 2018; Recknagel et al., 2016; Rigosi et al., 2011), with many of these studies successfully predicting non-bloom conditions with decreased accuracy during bloom conditions, highlighting the importance of further understanding the factors contributing to the onset of phytoplankton blooms.

Forecast model simplicity

Forecasting phytoplankton using a simple empirical modeling approach has limitations and benefits. Process-based approaches (e.g., Loos et al., 2020; Mao et al., 2009; Page et al., 2018; Xiao et al., 2017) are more likely to simulate numerous interacting processes, leaning less on historical patterns than empirical approaches (sensu Poff, 2018). In our case especially, the lack of large blooms in our training data set (92.5% non-bloom), as well as the chemical treatment and unnatural eradication of blooms at our study site, inherently limited the ability of the empirical model to predict bloom dynamics. However, even state-of-the-art process-based models often fail to reproduce phytoplankton dynamics (e.g., Loos et al., 2020; Page et al., 2018), and numerous studies document that simpler models consistently out-perform more complicated models, especially when used in predictive applications (Chevalier & Knape, 2020; Rousso et al., 2020; Ward et al., 2014; Wood et al., 2020). Further, using covariates that are readily predictable is crucial for forecasting applications, making simpler models with fewer covariates easier to convert from explanatory to predictive applications. Similarly, process-based forecasts such as Loos et al. (2020) and Page et al. (2018) require repeated updating of model states such as nutrient concentrations and phytoplankton functional groups. In lieu of frequent observations, these studies have to rely on model simulations to estimate these states.

Last, while our forecast model does not provide a direct explanation of mechanisms driving phytoplankton dynamics, our model selection procedure identifies covariates that are both readily forecastable and improve forecasts of phytoplankton over other simple models such as a null persistence model. Ultimately, a more balanced approach that leverages both empirical and process-based approaches may help to expand our predictive ability in phytoplankton ecology (e.g., Buckley et al., 2018; Briscoe et al., 2019; Read et al., 2019; Geary et al., 2020).

Forecast uncertainty

Uncertainty partitioning analyses can increase understanding of the mechanisms that drive overall forecast uncertainty, ultimately leading to improvement in forecast skill (Dietze, 2017; Harris et al., 2018). At all forecast horizons >1 day, forecast uncertainty was dominated by process uncertainty, indicating that a better understanding of the mechanisms driving phytoplankton dynamics is critical to improving future forecasts. However, under bloom conditions (e.g., July 2019), parameter uncertainty increased substantially, likely an indication that parameter distributions were not inclusive of values that could accurately recreate bloom dynamics (Appendix S1: Figures S6–S8). Importantly, the increase in parameter uncertainty may be inherently linked to process uncertainty, given that the parameter values in our linear model were not able to reflect the abrupt changes in phytoplankton biomass, which may be better addressed by employing a different model structure entirely (e.g., exponential growth, see Rousso et al., 2020). Forecasts at BVR also exhibit high parameter uncertainty. To the best of our knowledge, it is not common for parameter uncertainty to be a dominant source of uncertainty in phytoplankton forecasts, although many phytoplankton forecasts do not specifically quantify this uncertainty source at all. Two other studies found parameter uncertainty dominated in forecasts of red pine growth and snow geese populations (Gertner et al., 1996, Gauthier et al., 2016, respectively), although both studies used process-based models.

Despite many forecasting studies citing driver uncertainty as the dominant source of uncertainty (Dietze, 2017; Jiang et al., 2018; Mbogga et al., 2010; Thomas et al., 2020), we found that driver uncertainty (meteorological and discharge) contributed only a very small portion of forecast uncertainty. While the contribution of meteorological driver uncertainty varied throughout the year (Figure 5), it only contributed a very small fraction of total uncertainty, despite that we used the same weather data forecast product (NOAA GEFS) as other studies.

Overall, our uncertainty analysis demonstrates that the dominant sources of uncertainty in phytoplankton forecasts may be different under bloom vs. non-bloom conditions, but that until process uncertainty is adequately constrained, other sources of uncertainty may contribute a minimal proportion of overall uncertainty. As a result, focusing efforts on improving process representation of bloom and non-bloom dynamics must be a priority before improvements to driver or initial condition uncertainty can be expected to substantially improve forecasts.

Forecast scalability

Our scalability analysis showed that forecast performance remained high when applying the same forecast model and uncertainty structure in a new waterbody. Over a full year, forecasts at BVR accurately predicted chl a concentrations within 2.44 μg/L (RMSE) 1 week ahead, a promising result suggesting that autoregressive models can be readily applied in new systems. Despite the high performance of the forecast mean when compared to observational chl a, overall forecast uncertainty bounds were higher than in forecasts produced for FCR over this same time period. This is due to increased process and parameter uncertainty, likely an indication that covariates in the autoregressive model selected for FCR may not be appropriate for BVR, which has distinct phytoplankton and hydrological dynamics (e.g., Hamre et al., 2017, 2018). Over the forecasted time period in BVR, we did not see any major bloom events, providing limited support for examining how our forecast model performs under bloom conditions specifically. Additionally, BVR has numerous inflow streams, as opposed to a single primary inflow at FCR, which may lessen the importance of discharge as a driver for BVR's phytoplankton dynamics. Indeed, our uncertainty analysis between BVR and FCR showed that parameter uncertainty was higher in BVR than FCR, and that process uncertainty was dominant for both reservoirs, providing support for our hypothesis that different predictors may be important in BVR than in FCR.

Our scalability analysis indicates that our forecasting framework can be successfully transferred to other waterbodies. When scaling this framework to other sites, we recommend developing unique autoregressive models (with full model selection) for individual study sites to determine which covariates are most informative to driving dynamics in individual ecosystems. In lieu of long-term, high-frequency data streams for drivers such as inflow discharge, we demonstrate an application that uses freely available model structures (e.g., the TWMB model) that can use available data to estimate the necessary data streams. Other examples include using nationally available meteorological data products such as NLDAS or USGS stream gauge data. Overall, our scalability analysis is promising for expanding forecasting frameworks beyond a single system.

CONCLUSIONS

Our forecasting system successfully predicted phytoplankton biomass at multiple time scales over the course of ~600 days, requiring only two driver data streams and as little as 5 months training data. Our forecasting system highlights the feasibility of producing ecological forecasts that leverage historical monitoring data sets and forecastable model covariates at multiple sites, as well as the value of performing a formal uncertainty analysis on forecasts. Additionally, we emphasize the importance of considering the time step of a forecast model when producing forecasts at longer time horizons, which is critical to improving the use of forecasts as decision support tools for managers and the public. Further, by applying our forecasting system to an additional study site, we show that simple forecast models can be adapted for new locations using limited input data. Ultimately, this study provides insight on the predictability of freshwater phytoplankton dynamics as well as helpful considerations for developing ecological forecasts in a diverse set of ecosystems.

AUTHOR CONTRIBUTIONS

Cayelan C. Carey and Whitney M. Woelmer conceptualized the study, with substantial help from R. Quinn Thomas. The forecasting framework was developed by R. Quinn Thomas and adapted for use in this study by Whitney M. Woelmer. All models and forecasts were run and analyzed by Whitney M. Woelmer. Mary E. Lofton and Ryan P. McClure led the extensive data collection in this study. Heather L. Wander built the code base for producing inflow discharge estimates at Beaverdam Reservoir. The manuscript was written by Whitney M. Woelmer, with substantial editing and reviewing from Cayelan C. Carey; all authors provided feedback and approved the final version.

ACKNOWLEDGMENTS

We thank the Western Virginia Water Authority (WVWA) for their long-term access to field sites and continuing support and engagement in this research. The Smart Reservoir and FLARE team, in particular, R. J. Figueiredo, V. Daneshmand, B. J. Bookout, and A. Breef-Pilz, were critical in developing the forecasting workflows that enabled this project. We thank many field and lab assistants, especially A. Gerling, K. Hamre, A. Hounshell, D. Howard, J. Doubek, K. Krueger, A. Lewis, N. Hammond, J. Maze, Z. Munger, N. Ward, and J. Wynne, as well as B. Niederlehner for her tireless efforts in sample analysis. We thank the Virginia Tech Stream Team, Carey Lab, Global Lake Ecological Observatory Network, and Ecological Forecasting Initiative (EFI), especially the EFI Student Association for their support, feedback, and thoughtful discussions. This work was supported by the National Science Foundation (CNS-1737424, DEB-1753639, DBI-1933016, DBI-1933102, DEB-1926050, a Graduate Research Fellowship to Whitney M. Woelmer), and the WVWA.

    CONFLICT OF INTEREST

    The authors declare no conflict of interest.

    DATA AVAILABILITY STATEMENT

    Forecast data, metadata, and code (Woelmer & McClure, 2022) are available in Zenodo at https://doi.org/DOI:10.5281/zenodo.5963867. All data needed to run the analyses are published in the Environmental Data Initiative repository as follows: meteorological data (Carey, Breef-Pilz, Bookout, et al., 2021), https://doi.org/10.6073/pasta/890e4c11f4348b3ceda802732ffa48b4; discharge data (Carey, Hounshell, Lofton, et al. 2021), https://doi.org/10.6073/pasta/8d22a432aac5560b0f45aa1b21ae4746; observed chl a data in Carey, Lewis, McClure, et al., 2021 at https://doi.org/10.6073/pasta/5448f9d415fd09e0090a46b9d4020ccc, and Carey, Woelmer, Lewis, et al. 2021 at https://doi.org/10.6073/pasta/88896f4a7208c9b7bddcf498258edf78.