Inexpensive spot sampling provides unexpectedly effective indicators of watershed nitrogen status

. Stream water quality data are essential for understanding watershed processes and managing water pollution, but the effort and expense of stream monitoring limit how many watersheds can be studied. For 59 small watersheds in the Chesapeake Bay drainage, we compared water quality measurements from inexpensive spot sampling to data from costly automated monitoring that used 1 – 3 yr of continuous ﬂ ow measurement and weekly, temporally composited water sampling. Mean nitrogen (N) levels ranged from 0.01 to 16 mg N / L among streams. There were important temporal variations in N concentrations at each site, but the differences among sites were much greater. Spot samples were very effective at accurately and precisely placing average stream N levels within the N gradient among streams draining N-enriched watersheds. Among watersheds, nitrate (NO 3 ) and total N concentrations from spot samples were very strongly correlated with means from weekly composite sampling ( R 2 > 97%). We con ﬁ rmed this result for independent data for 85 larger watersheds in the Chesapeake Bay Non-tidal Network. NO 3 concentration from a single March spot sample was highly correlated ( R 2 > 92%) with ﬂ ow-weighted average total N concentration synthesized from ﬁ ve years of monitoring. Spot sampling effectively quanti ﬁ es average N status across N-enriched watersheds because most N moves as dissolved NO 3 in subsurface ﬂ ow and that ﬂ ux is much less variable than the episodic surface transport of particulate materials. For questions answered by quantifying average N levels, spot sampling can assess more watersheds at much lower cost than automated sampling, so it should be more widely used to support cost-effective N research and management. For materials that are mainly bound to particulates, such as phosphorus, spot sampling is much less effective.


INTRODUCTION
Measurements of nitrogen levels in streams and rivers provide critical information for advancing basic ecosystem science as well as quantifying and managing anthropogenic pollution of aquatic systems. Nitrogen is often the nutrient that limits plant production in natural ecosystems (Schlesinger 2009), so information on nitrogen loss in stream water is essential for quantifying watershed nitrogen balances Weller 1996, Boyer et al. 2002) and for better understanding terrestrial plant production and nitrogen cycling (Brookshire et al. 2011). Low nitrogen levels also limit productivity in managed systems, motivating the application of nitrogen fertilizer, especially to croplands (Jordan et al. 1997a, b, Harmel et al. 2006a, Stewart and Lal 2017. The resulting release of nitrogen in land runoff can pollute aquatic systems, causing eutrophication and associated ecological and economic disruption (Nixon 1995, Doney 2010, Sobota et al. 2015, Boesch 2019. Global fertilizer applications have increased roughly fivefold over the past 50 yr (Foley et al. 2011) and will likely continue to increase due to population growth and increasing meat consumption (Galloway andCowling 2002, Abbott et al. 2018). Managing the impacts on aquatic systems demands data on nitrogen levels in streams to identify nitrogen source areas, quantify aquatic nitrogen loading, and assess the value of management efforts to reduce it.
The most accurate methods for measuring stream nitrogen transport employ automated monitoring stations that combine continuous streamflow measurement with frequent samples of nitrogen concentration (Swistock et al. 1997). However, one must balance the high cost of temporally intensive sampling against the acceptable level of uncertainty in water quality characterization. Scientists and engineers have examined the effect of sampling strategy on uncertainty in concentration or load measurements, and many have concluded that composite sampling is an effective way to balance sampling effort against uncertainty (Harmel and King 2005, Moatar and Meybeck 2005, Schleppi et al. 2006a, Harmel et al. 2006b, c, Birgand et al. 2010). Volume-integrated composite sampling collects frequent water samples in volumes proportional to the flow rates at the times of collection but combines those samples over time to yield fewer samples requiring chemical analysis. Such sampling schemes yield essentially unbiased material flux estimates without requiring the chemical analysis of many samples (Schleppi et al. 2006a, b). They also ensure adequate sampling of particulate materials transported during stormflow (e. g., Jordan et al. 1986Jordan et al. , 1997a.
A synoptic survey-in which a single spot sample (also called a grab sample) of water is collected from each study site-is a much simpler and cheaper sampling strategy. The low labor and cost enable relatively larger sample sizes to expand spatial coverage or to include watersheds encompassing greater ranges of land use, geology, or other factors relevant to nitrogen export. Synoptic surveys have been criticized because they cannot characterize temporal dynamics (such as seasonality, storm events, or trends) and can yield unrepresentative estimates for the concentrations of materials that vary strongly with stream discharge rate (Kirchner and Neal 2013). Nevertheless, many studies have concluded that synoptic surveys are effective for applications where temporal dynamics are less important, such as for understanding differences among watersheds in average loads or ranking watersheds by important drivers, such as land use, fertilizer application, human population, and sewage output (Messer et al. 1988, Kaufmann et al. 1991, Grayson et al. 1997, Wolock et al. 1997. Our own experience suggests that for materials whose concentrations do not increase greatly during storm events, spot sampling can yield water quality data of sufficient accuracy and precision for many important purposes (Weller et al. 2010). Those include quantifying average differences among watersheds in the levels of materials in stream discharge as well as placing watersheds along the gradient of a driving variable, such as geology, fertilizer application, the proportion of cropland, or the prevalence of nitrogen sinks in a watershed (Correll et al. 1995, Jordan and Weller 1996, Liu et al. 2000, Weller et al. 2011, Weller and Baker 2014.
In this paper, we more formally test the power of spot sampling as a cost-effective way to characterize nitrogen status among nitrogen-enriched watersheds. We compare estimates of stream nitrogen levels based on seasonal spot sampling of stream nitrogen concentration (Liu et al. 2000) to measurements of average annual nitrate and total nitrogen levels for the same watersheds derived from automated monitoring stations performing volume-integrated composite sampling (Jordan et al. 1997a(Jordan et al. , b, 2000. We focus on nitrate concentration as a potential indicator of total nitrogen level for several reasons. Nitrate concentration is relatively easy to sample and measure, human intervention in the nitrogen cycle often raises nitrate levels in streams and rivers (Caraco andCole 1999, Seitzinger et al. 2002), and nitrate is often the dominant form of nitrogen in surface waters (Creed andBand 1998, Boyer et al. 2006) even in forested areas (Campbell et al. 2004, Eshleman et al. 2013. We demonstrate that inexpensive spot sampling provides very a strong indicator of the nitrogen levels measured by the more labor-intensive and costly automated sampling methods. We conclude that spot sampling of many watersheds can often be more useful and cost-effective way to explore spatial patterns and broad nitrogen-enrichment gradients than more expensive sampling of fewer watersheds. We recommend that spot sampling should be more widely utilized in such efforts.

Overview and study area
We used information from two data sets assembled for watersheds in the 166,000 km 2 Chesapeake Bay drainage, which extends over four major physiographic provinces-Coastal Plain, Piedmont, Blue Ridge, and Appalachian (Langland et al. 1995)-within the mid-Atlantic region of the United States. We first analyzed data from our own Smithsonian Environmental Research Center study of watersheds within the Chesapeake Bay drainage (here called the SERC data) to quantify and model the relationships between spot nitrogen measurements and measurements from automated monitoring. Then, we confirmed the results and conclusions from the SERC data with an independent data set assembled by the Chesapeake Bay Program Non-tidal Network (CBNTN; Chanat et al. 2012, Moyer et al. 2017).

SERC watershed data
Study sites. -For 59 study watersheds distributed across all four major physiographic provinces of the Bay drainage (see map, Fig. 1), we collected seasonal spot samples from the effluent stream and analyzed for nitrate and total nitrogen. For each site, we also established an automated monitoring station that measured stream depth continuously and controlled samplers that collected volume-integrated weekly water samples, which were also analyzed for nitrate and total nitrogen (Jordan et al. 1997a, b). The 59 sites are the subset of watersheds in which both spot and integrated sampling were done, taken from a larger group of 517 study watersheds (Jordan et al. 1997a, b, 2000, 2003, Liu et al. 2000, Weller and Baker 2014.
The watersheds are distributed in 14 clusters across the Chesapeake drainage basin ( Fig. 1 and Appendix S1: Table S1). The locations of the clusters represent prevalent geological types in each major physiographic province of the Chesapeake Bay drainage basin (Langland et al. 1995) as described in (Liu et al. 2000). Within each cluster, we sampled streams draining watersheds with strongly contrasting land covers to maximize our ability to observe and quantify the effects of land cover on nitrogen discharges and to detect differences in those effects among geological settings. We delineated the boundary and area of the watershed draining to each sampling point by applying automated watershed delineation to digital elevation and stream maps within a geographic information system (GIS, as described in Baker et al. 2006). To quantify Fig. 1. Boundaries for 59 SERC (blue outlines) and 85 CBNTN (red outlines) watersheds within the Chesapeake Bay drainage (outer boundary). Shaded areas within that boundary are four major physiographic provinces (Langland et al. 1995 (Homer et al. 2004). To identify the physiographic province of each study watershed, we intersected the GIS layers of study watershed boundaries and physiographic province boundaries (Langland et al. 1995) as previously described (Weller and Baker 2014). Stream sampling.-From each of the 59 watersheds, we collected 6-22 seasonal spot samples under baseflow conditions over a period of 1-3 yr. The sampling periods varied among the watershed clusters, but all were within 1992-2000 (Appendix S1: Table S1). Spot samples were filtered in the field so that subsequent laboratory analyses quantified only the dissolved fractions of nitrogen species. Correll et al. (1995) and Liu et al. (2000) provide more details on the spot sampling methods. An automated monitoring station measured stream depth continuously for 1.3-2.9 yr at the outlet of each watershed. The period of automated monitoring in each watershed overlapped with the period of spot sampling (above), and all automated sampling was within 1990-2000. At the Rhode River cluster (arrow in Fig. 1 and stations 101-111 in Appendix S1: Table S1), seven stations used Vnotch weirs, so water depth was converted to flow using published equations (Correll 1977(Correll , 1981. At all the other watersheds, the automated station monitored stream depth, and we calculated water flow from rating curves of flow vs. depth. The rating curves were calibrated using measurements of depth, cross-sectional area, and flow rate under a range of streamflow conditions (Jordan et al. 1997a, b). The automated stream stations implemented volume-integrated composite sampling by activating pumps to collect water every time a set volume of flow occurred. Thus, the station pumped water more frequently at higher flow rates, so that the composite samples properly represented materials in the water under all flow conditions as well as the contributions from overland stormflow and groundwater emerging in the stream. We retrieved the composite samples weekly for laboratory analysis. The number of weekly samples ranged from 51 to 144 according to the period sampled at each station. Fig. 2 illustrates the results of automated and spot sampling of nitrate concentration relative to discharge monitoring for one station. Previous papers provide more sampling details (Correll 1977, 1981, Jordan et al. 1997a. Stream nitrogen levels.-We measured nitrate concentrations in spot samples with a Dionex Ion Chromatograph Model 1400i. In the automated samples, we measured the sum of nitrate and nitrite concentrations by reducing nitrate to nitrite with cadmium amalgam and analyzing nitrite by reaction with sulfanilamide (APHA 1989). Nitrite concentrations were always very low relative to nitrate, so we refer to their sum as nitrate throughout the paper. Total Kjeldahl nitrogen was determined using the Kjeldahl  digestion (Martin 1972, APHA 1989) and analysis of the resulting ammonium by distillation and nesslerization (APHA 1989). Total nitrogen is the sum of total Kjeldahl nitrogen and nitrate. In the composite water samples, ammonium and organic nitrogen can be bound to particles as well as dissolved, so the nitrogen analyses for composite samples yielded the total of the particulate and dissolved fractions. Because we filtered the spot samples in the field, the nitrogen analyses assessed only the dissolved fractions. Nitrate is not significantly bound to particles, so the filtered spot samples capture all the nitrate. Dissolved total nitrogen in the spot samples was measured for only 48 of the 59 study watersheds. Previous papers provide more details of the chemical analyses (Jordan et al. 1997a, b, Liu et al. 2000. Data analysis.-We sought to quantify how well simple spot measurements of nitrogen concentration can predict the average nitrogen concentrations from high-quality, flow-weighted composite sampling. We summarized two dependent variables from the composite samples at each site: average total nitrogen concentration (TN) and nitrate concentration (NO 3 ). These flow-weighted averages were calculated by weighting each weekly composite concentration measurement by the volume of discharge during that week. As potential predictors (independent variables), we calculated the simple averages of the seasonal spot measurements of dissolved total nitrogen (sDTN) and nitrate (sNO 3 ) concentrations at each site. Single spot samples are often used to characterize stream water chemistry in stream assessments (Stranko et al. 2017); so, we also considered as possible predictors the dissolved total nitrogen and nitrate concentrations (fsDTN and fsNO 3 ) in the first spring spot sample collected between 1 March and 31 May-the time of year when stream assessment surveys typically collect water samples (e. g., the Maryland Biological Stream Survey; Ashton et al. 2014, Stranko et al. 2017. In all, we tested two spot measurements (sNO 3 and fsNO 3 ) as estimators of composite-sampled NO 3 and four spot measurements (sDTN, fsDTN, sNO 3 , and fsNO 3 ) as estimators of composite-sampled TN.
We evaluated three approaches for predicting the high-quality, flow-weighted measurements of nitrogen concentration from the spot concentration measurements. The first approach simply used the average of the spot measurements (or the first spring spot measurement) as the estimate of true average nitrogen concentration. Many studies have interpreted spot measurements this way, including our previous work (Correll et al. 1995, Liu et al. 2000, Weller et al. 2011, Weller and Baker 2014. The second approach exploited contemporaneous spot and flow-weighted composite samples from the same watersheds to calibrate a linear regression model (R lm function; Venables and Ripley 2002, R Core Team 2017) that predicts flow-weighted average concentration from a spot measurement. This approach quantifies the strength of association between spot and flow-weighted concentrations and identifies possible biases in the simpler first approach. Linear regression also yields a prediction equation for estimating multiyear average nitrogen concentrations from watersheds where only spot measurements are available. Finally, the regression model quantifies the uncertainty in its estimates by providing confidence limits and prediction intervals.
The third approach applied bootstrap resampling (Efron 1982, Efron andGong 1983) to enhance the statistical rigor of the regression approach. The variance of concentration measurements is typically greater at higher concentrations, and the residuals of our regression models are bigger at higher concentration (see Results). Such patterns violate the assumption of equal variance among residuals (homoscedasticity) underlying linear regression. Bootstrapping (detailed below) can accommodate heterogeneity in the variances of data and residuals, and it can also quantify the effects of including or excluding influential data points in an analysis. Because bootstrapping accounts for heteroscedasticity and sampling uncertainty, we expected the confidence intervals for parameters and predictions of the bootstrapped regression model to be larger than corresponding intervals for the simple regression, but those larger intervals better represent the true uncertainty of the estimates.
Logarithmic transformation is a simpler and more common solution for analyzing heteroscedastic variables for which the variance increases with the mean (Snedecor andCochran 1989, Draper andSmith 1998), and log-log regression relating one such variable to another v www.esajournals.org is widely applied in water quality analyses (Helsel and Hirsch 2002). However, statisticians have documented general problems with log transformation, including failure to eliminate heteroscedasticity and difficulty applying parameter estimates or hypothesis tests back to the untransformed variables (Feng et al. 2013, Choi 2016, Greenacre 2016, Rendevski et al. 2016, Curran-Everett 2018, Ekwaru and Veugelers 2018. We chose the bootstrapping approach instead of log-log regression because it handled our heteroscedastic variables without causing these problems and because bootstrapping gave other benefits, such as quantifying sampling uncertainty. Appendix S1 provides a more thorough review of the possible problems with log transformation. For our data, Appendix S1 also shows that log-log transformation failed to homogenize the variance and produced models that performed poorly for predicting the high-nitrogen levels that are of greatest concern in addressing management questions.
We implemented the bootstrap approach in two steps. The first step quantified sampling uncertainty using a pairs bootstrap (Wu 1986, Flachaire 2005, in which we created 2000 bootstrap samples with 59 observations by resampling observations with replacement. For each sample, we fit the linear regression model and then applied the model to predict flow-weighted average concentrations from spot concentrations ranging from 0 to 18 mg N/L in steps of 1 mg N/ L. For each of those 19 values of the independent variable, the median prediction across the bootstrap samples provided the bootstrap prediction of flow-weighted average concentration, and the 2.5th and 97.5th percentile values provided the 95% confidence limits for the median predictions. The 19 median values formed a perfect straight line, and we used the slope and intercept of that line as the coefficients of the linear bootstrap prediction model. We implemented the second bootstrap step to provide prediction intervals for the estimates of flow-weighted average concentration at an individual site. For each of the 2000 pairs bootstrap samples, we used the fitted linear model to predict flow-weighted average concentrations for all 59 study watersheds in the full data set and then calculated the model residual (observedpredicted) for each watershed. We then implemented a wild bootstrap-a method developed for heteroscedastic data (Wu 1986, Mammen 1993, Flachaire 2005, Davidson and MacKinnon 2006)-by generating 50 bootstrap samples in which we added to the predicted value for each watershed a resampled residual, calculated by multiplying the residual for that watershed by an independent normally distributed variate with mean 0 and standard deviation 1 (Roodman et al. 2019). The resampling of model residuals accounts for the variability in flow-weighted average concentration that is not explained by the prediction model, so that the 2.5th and 97.5th percentiles across the 100,000 bootstrap samples (2000 pairs × 50 wild) estimate the 95% prediction interval for an individual watershed. We applied loess smoothing (R ggplot2 package; Wickham 2016) across the 59 upper limits and 59 lower limits to provide a smoothed visualization of the prediction interval.
For each set of dependent and independent variables, we quantitatively evaluated the performance of the direct, simple linear, and bootstrapped approaches by comparing the predictions of each approach to the observed data using the gof (goodness of fit) function of the R hydroGOF package (Zambrano-Bigiarini 2020). We report five metrics of skill. Mean error (bias ɛ) and percent bias account for accuracy. Root-mean-squared error (RMSE) accounts for both accuracy and precision. As a measure of precision alone, we calculated unbiased rootmean-squared error (ubRMSE) from bias and RMSE by rearranging the equation RMSE 2 = ɛ 2 -+ ubRMSE 2 (Jolliff et al. 2009). We also report the percentage of variance in flow-weighted average concentration explained (R 2 ) by each approach. We used the R statistical package (R Core Team 2017) for all of the analyses.

CBNTN verification data
Study sites.-The independent verification data for our analysis came from data on streamflow and water chemistry assembled for the Chesapeake Bay Program Non-tidal Network (CBNTN). The data have been curated and analyzed by the U.S. Geological Survey (e.g., Moyer et al. 2017). We analyzed five years of data (water years 2012-2016 from October 2011 through September 2016) from a subset of 85 watershed sampling sites ( Fig. 1; Appendix S1: Table S2) for which stream discharge and loads of total nitrogen and nitrate have been summarized (Chanat et al. 2012, Moyer et al. 2017) and for which digitized watershed outlines are available (Ryberg et al. 2017). We summarized the 2013 National Land Cover Data Set (Yang et al. 2018) to characterize human activities in the watersheds by using a GIS to intersect the digital watershed boundaries (Ryberg et al. 2017) with the NLCD data and then tabulating land cover proportions.
Stream nitrogen levels.-The CBNTN does not employ composite sampling like the SERC study (above). Instead, the CBNTN monitors streamflow continuously and measures material concentrations in discrete water samples collected throughout the year and under different flow conditions. These sparse long-term monitoring data are combined with daily discharge to characterize episodic, seasonal, and long-term dynamics of nutrients and sediments. During water years 2012-2016, the median number of total nitrogen and nitrate concentration measurements per site was 98 (range: 55-193; Appendix S1: Table S2). The USGS applies advanced statistical models to the flow and concentration measurements to estimate material loads, flow-weighted concentrations, and other summary quantities (Moyer et al. 2017). The current model (called weighted regressions on time, discharge, and seasonality, WRTDS; Hirsch et al. 2010, Chanat et al. 2012) provides unbiased estimates of nitrogen and nitrate loads (Zhang et al. 2019). For the 85 study watersheds in water years 2012-2016, we extracted the 60 monthly estimates of discharge and the average concentrations of total nitrogen and nitrate from a recent WRTDS summary of the CBNTN (Moyer et al. 2017). For each watershed, we calculated the five-year (2012-2016) average nitrate and total nitrogen concentrations as the weighted average of the 60 monthly concentrations weighted by the product of monthly discharge and month length in days.
Data analysis.-Like the SERC composite sample data, the integrated estimates of average concentration from the WRTDS analysis were treated as the dependent variable-flow-weighted average concentration-to be estimated from simpler spot sampling. The independent predictor variables we evaluated were a single, discrete measurement of total nitrogen and nitrate concentration. For each site, we selected from the CBNTN concentration database the first uncensored TN and NO 3 measurements taken in March 2012 (see Moyer et al. [2012] for information on censoring). The month of March begins the period when streams are commonly visited for stream assessment (e.g., Ashton et al. 2014, Stranko et al. 2017. We call these potential predictors fsTN and fsNO 3 in the rest of the paper. We applied the same data analyses used for the SERC data: first summarizing the watershed characteristics and concentration data and then exploring relationships between the spot concentrations and the flow-weighted average concentrations from the WRTDS analysis of the CBNTN data. We compared WRTDS flow-weighted average nitrate (NO 3 ) concentration to the first spring spot nitrate concentration (fsNO 3 ), and we related average WRTDS total nitrogen (TN) to the first spring spot measures of total nitrogen and nitrate (fsTN and fsNO 3 ). We applied the same three prediction approaches (direct substitution, simple linear regression, and bootstrapped linear regression) and evaluated them with the same metrics of model skill (as described above, but with 85 CBNTN watersheds instead of 59 SERC watersheds). Like the SERC analyses, the CBNTN analyses test how well simple spot samples can predict average nitrogen concentration as measured by much more thorough and expensive sampling and modeling (composite sampling for SERC, advanced WRTDS synthesis for the CBNTN). We evaluated whether patterns and performance for the CBNTN data supported findings from SERC data.
A three-dimensional plot of three aggregated land cover categories illustrates the dominant patterns of human land cover disturbance across the data set (Fig. 3b). The three aggregates are cropland plus grassland (agricultural land), forest plus wetland (natural land), and developed land. Rural watersheds lie along the diagonal line in the plane of forest plus wetland vs. cropland plus grassland where the two aggregate categories together cover almost all of the land. Developed watersheds fall off that line and above that plane, reflecting the past replacement of natural and agricultural land with developed land. The data set includes watersheds from all four major physiographic provinces comprising the Chesapeake Bay drainage (Coastal Plain, 25 watersheds; Piedmont, 19; Appalachian Mountain, 8; and Appalachian Plateau, 7).
Stream nitrogen levels.-Flow-weighted average composite-sampled nitrogen concentrations ranged from very low (0.01 mg NO 3 -N/L and 0.12 mg TN/L) to quite high (16.2 mg NO 3 -N/L and 17.5 mg TN/L, Fig. 4 and Appendix S1: Table S4), reflecting the range from very low to high levels of human activity (and associated nitrogen enrichment) revealed by the land cover data (Fig. 3b). The distributions of NO 3 and TN concentrations were positively (right) skewed, with more low values and fewer high values (Fig. 4). The central values and ranges of the flow-weighted average concentrations and spotsampled concentrations were similar ( Fig. 5; Appendix S1: Table S5), and variability in flowweighted average nitrogen concentration was heteroscedastic, with variability increasing with the mean of either flow-weighted average NO 3 or TN as well as spot NO 3 or TN ( Fig. 5; Appendix S1: Table S5).
Estimating average concentration from spot measurements.-Spot concentration measurements were very strong predictors of flow-weighted average nitrate concentration regardless of prediction method, but the method did affect bias and confidence limits for the predictions (Table 1) . The aggregated categories shown on the three axes together cover more than 95% of the land in every watershed. The blue square is SERC station 522, which had the highest TN and NO 3 concentrations across both data sets. concentration (Fig. 5). For predicting flowweighted average NO 3 from average spot sNO 3 , these direct estimates explained 98.3% of the variability among watersheds in flow-weighted average NO 3 (R 2 in Table 1), but tended to overestimate (positive percent bias of 10.1%, Table 1) because nitrate in baseflow is often higher than in stormflow or overall (see Discussion). Implementing a simple linear regression did not change the amount of variability in flow-weighted average NO 3 explained, but it did eliminate the overestimation bias by fitting a regression slope less than one (0.970, percent bias 0%, Table 1, Fig. 6a). Unlike direct substitution (Fig. 5), the linear model also provided confidence and prediction intervals, which were quite narrow (Fig. 6a), reflecting the high R 2 and low residual variation of the regression (Table 1). Unlike the simple regression, the bootstrap model accounted for heteroscedasticity in the concentration measurements (Fig. 6b) as well as for sampling uncertainty in the predictions, especially uncertainty arising from including or excluding watersheds. The bootstrap method achieved a slightly higher proportion of variance explained (R 2 = 98.7%, Table 1, but had a small negative bias (percent bias = −0.4%, Table 1) and a slightly shallower regression slope (0.952). More importantly, the 95% confidence and prediction intervals of the bootstrap model (Fig. 6b) were wider than those of the simple regression (Fig. 6a), especially at higher nitrate levels. This reflects the ability of the bootstrap method to account for sampling uncertainty and heteroscedasticity.
Not surprisingly, the watershed with the highest observed nitrogen concentrations (uppermost point in Fig. 5 and Fig. 6a-f; watershed 522 in Fig. 3b and Appendix S1: Table S2) had a strong influence on the regression results. The distributions of parameters and predictions of the bootstrapped NO 3 vs. sNO 3 model (Fig. 7) reveal that influence. The distribution of regression slope estimates is bimodal (Fig. 7a) Table 1). Bimodality in the slope estimates yields bimodal predictions of flow-weighted average nitrate at high levels of spot nitrate (Fig. 7b), but not at low levels of spot nitrate (Fig. 7c). The ability of the bootstrap model to account for the sampling uncertainty arising from including or excluding influential observations such as watershed 522 demonstrates one advantage of the bootstrap approach. The high uncertainty at high-nitrogen levels also indicates a need to sample more highnitrogen watersheds to reduce the sensitivity of the results to influential observations like watershed 522.
Despite the advantages of bootstrapped estimates over the simple regressions (Table 1), the associations between flow-weighted average concentration and spot concentrations are so strong that even the simple regressions yield very good predictions. The simple regression predictions might be adequate for applications that require only predictions of mean concentration; however, for applications that also need uncertainty estimates, the ability of the bootstrap regression to account for uncertainties from heteroscedasticity and sampling error becomes more important.
The single first spring spot nitrate sample (fsNO 3 ) was almost as good a predictor of flowweighted average NO 3 concentration as the average based on 6-22 spot nitrate samples per station (sNO 3 ). The percent of variability in flowweighted average concentration explained by the bootstrapped model for fsNO 3 (R 2 = 97.2%) was slightly lower than the bootstrapped model based on average sNO 3 (R 2 = 98.7%), and the 95% confidence and prediction limits for the first  (f) first spot NO 3 . (g-i) Bootstrapped regressions of average flow-weighted concentration from WRTDS synthesis vs. spot measurements for CBNTN data: (g) WRTDS average NO 3 vs. first spot fsNO 3 ; (h) WRTDS TN vs. first spot fsTN; (i) WRTDS TN vs. fsNO 3 . Note differences in axis scaling between SERC (a-f) and CBNTN (g-i) data. All panels show the 1:1 line (long-short dashed), the regression line (solid), the 95% confidence interval (dark gray shading), and the 95% prediction interval (light gray shading). For bootstrapped models, the outer dashed lines are loess-smoothed representations of bootstrap prediction intervals. spot model (Fig. 6c) were wider than were those of the average spot model (Fig. 6b).
Spot concentration measurements were also very effective predictors of the flow-weighted average total nitrogen concentration from composite sampling. We explored four possible predictors of composite TN: average spot dissolved nitrogen (sDTN): first spot dissolved nitrogen (fsDTN), sNO 3 , and fsNO 3 . For all four predictors, we saw the same bias in the direct method and the same enhancements with the simple regression and bootstrap methods as reported above for fsNO 3 (Fig. 6, Table 1). For the bootstrapped models, average spot total nitrogen concentration (sDTN) was a slightly better predictor (R 2 = 98.2%; Fig. 6d, Table 1) than average spot nitrate (sNO 3 , R 2 = 98.2%; Fig. 6e, Table 1), and first spot concentrations were slightly weaker predictors than their corresponding average spot concentrations (fsDTN, R 2 = 98.0%; fsNO3, R 2 = 96.7%; Table 1, Fig. 6f). Importantly, even the single first spring spot nitrate sample provided a very strong indication of total nitrogen concentration (R 2 = 96.7%; Fig. 6f).
Stream nitrogen levels.-Among the CBNTN watersheds, the five-year, flow-weighted averages of monthly nitrogen concentrations estimated by WRTDS ranged from very low (0.03 mg NO 3 -N/L and 0.296 mg TN/L) to high (7.20 mg NO 3 -N/L and 7.89 mg TN/L, see Appendix S1: Table S7). As with the SERC data, the central values and ranges of the flowweighted average concentrations and spot-sampled concentrations were similar (Fig. 4,  Fig. 6g-i), and variability in flow-weighted average nitrate or spot nitrate concentration was heteroscedastic (Fig. 6g-i; Appendix S1: Table S7). The distributions of flow-weighted average nitrate and total nitrogen concentrations among the CBNTN watersheds are roughly like the distributions for the SERC watersheds ( Fig. 6g-i v www.esajournals.org with many low values and few high values, but the SERC data set includes four watersheds with nitrate and total nitrogen values above the maxima in the CBNTN data (Fig. 4). Those four SERC watersheds all had high levels of agricultural land (Appendix S1: Tables S3 and S4) and lie close to the apex representing high percentages of cleared land in the graph of land cover proportions (Fig. 3b).
Estimating average concentration from spot measurements. -For the CBNTN watersheds, the first spot concentration measurements were very strong predictors of the five-year, flow-weighted nitrogen concentrations from WRTDS synthesis (Table 1, Fig. 6g-i). As with the SERC data, the direct method produced biased estimates of flow-weighted average concentration, but the regression method removed the bias ( Table 1). The bootstrap method again explained the most variation in flow-weighted average concentration while also accounting for sampling error and heteroscedasticity. Compared to the SERC results, the proportions of variance explained were slightly lower and the regression slopes were shallower (Table 1). For example, to predict flow-weighted average nitrate concentration from first spot nitrate, the bootstrapped SERC model had R 2 = 97.2% and slope = 0.993, while the CBNTN model had R 2 = 94.5% and slope = 0.843. Importantly, a single spring spot sample of nitrate concentration was again a remarkably effective predictor of the five-year average total nitrogen level estimated by advanced statistical synthesis (WRTDS) of daily flow data and 55-193 (median 98) individual TN measurements per station (Fig. 6i, Table 1).

Central findings
Our main conclusions are that simple spot sampling provides a surprisingly effective way to estimate average nitrogen levels in streams (Table 1, Figs. 5,6) and that, for some purposes, more costly and laborious sampling programs may not be needed (see Applications section below). We demonstrated the effectiveness of spot sampling with two independent sets of study watersheds: relatively small watersheds from the SERC study and much larger watersheds from the CBNTN sampling network. The two data sets gave slightly different slopes relating flow-weighted average concentrations to spot measurements (Table 1), likely because of differences in methods of sampling, laboratory analysis, data synthesis (see Methods), and the ranges of nitrogen concentrations actually sampled (Fig. 4). However, the differences in the relationships between the two data sets are small in a combined plot of the two data sets (Fig. 8). In both data sets, just one spring spot sample was a strong predictor of flow-weighted average nitrogen levels. Importantly, each data set shows that one relationship between flowweighted average concentration and spot measurements works well for all the study watersheds (Fig. 6), despite strong differences among physiographic provinces in how land use affects stream nitrogen levels (Jordan et al. 1997c, 2003, Liu et al. 2000, 2011, Weller and Baker 2014. Spot surveys have long been conducted to complement automated watershed sampling (Messer et al. 1988, Kaufmann et al. 1991, Grayson et al. 1997, Wolock et al. 1997, and several studies have reported strong correlations of spot measurements with better measurements   (Schleppi et al. 2006a, b, Rozemeijer et al. 2010, Abbott et al. 2018). More recently, McCarthy and Haggard (2016) recommended that spot sampling alone may be sufficient for many nutrient management purposes. Schleppi et al. (2006b) recommended using parallel measurements of spot and flow-weighted samples to calibrate the first against the second. We extended that idea by calibrating spot measurements against multiyear, flow-weighted measurements to estimate nitrogen levels for many watersheds spanning a gradient from pristine to strongly agricultural or developed. Our analyses more formally tested the ability of spot samples to estimate multiyear, flow-weighted average concentrations. Our results rigorously demonstrate and quantify the very high efficiency of spot sampling for estimating multiyear, flow-weighted average nitrogen concentrations for nitrogen-enriched watersheds ( Table 1, Fig. 6).
Our results for watersheds in the Chesapeake Bay drainage should be relevant in other regions with significant rainfall and nitrogen enrichment from human population or agricultural activities. Our findings are less relevant for areas such as the western United States, where human population, nitrogen fertilization, and rainfall are all low and nitrate is not the dominant component of stream nitrogen (Scott et al. 2007).

Why does this work so well?
There are several reasons why spot sampling is such a surprisingly effective predictor of flowweighted average nitrate and total nitrogen concentrations. One key factor is the way nitrate is transported through watersheds and streams. Sediment and nutrients that are primarily bound to particles (such as phosphorus) are mobilized during storms and transported to streams by surface flow; therefore, their stream concentrations during storms can be orders of magnitude greater than in baseflow (Correll et al. 1999c). In contrast, nitrate is not strongly bound to soils or to suspended sediments, so it moves freely in dissolved form. In many watersheds, nitrate is transported toward streams primarily in subsurface flow and groundwater, and is often somewhat diluted during storm events so that stream nitrate concentrations are lower during storms than during baseflow (Jordan et al. 1997c, Correll et al. 1999c, Rozemeijer et al. 2010, McCarty and Haggard 2016. Because nitrate concentrations are not wildly amplified during storms, baseflow nitrate concentration is much more representative of stormflow concentration and of overall average nitrate concentration than are the baseflow concentrations of materials that are mainly transported on particles. Secondly, nitrate is the dominant chemical form of total nitrogen in major rivers (Caraco andCole 1999, Seitzinger et al. 2002) and in streams draining smaller watersheds ( Fig. 9; Creed andBand 1998, Boyer et al. 2006), even many forested ones (Campbell et al. 2004, Eshleman et al. 2013. Stream nitrate levels increase much more strongly with increasing human impacts from agriculture and land development than do other forms of nitrogen ( Jordan et al. 1997a, b, Liu et al. 2000, Golden et al. 2009). Among the watersheds we Right, flow-weighted average WRTDS estimates for 85 CBNTN watersheds. The black line is a smoothed curve (R loess function, R Core Team 2017) through the NO 3 data (black points). The (red) shaded area below that line is the smoothed fraction of NO 3 at any level of TN. The (blue) shaded area above that line is the fraction of other nitrogen components (essentially ammonium plus organic nitrogen). Above the dotted line, more than half of the TN is NO 3 . examined, nitrate becomes the majority of TN when TN reaches 0.6 mg N/L for the CBNTN watersheds and 2 mg N/L for the SERC watersheds (Fig. 9). Above those levels, nitrate increasingly dominates TN as TN levels rise further. The strong dominance of total nitrogen levels by nitrate, especially in streams draining human-impacted watersheds, means that dissolved nitrate in baseflow spot samples is strongly associated with multiyear average total nitrogen concentrations as well as multiyear average nitrate concentrations.
Finally, among watersheds ranging from low to high levels of nitrogen enrichment, both nitrate and total nitrogen show more spatial variation among watersheds than temporal variation within watersheds. We quantified the fraction of total variability among watersheds and weeks that is due to differences among watersheds for the weekly SERC and monthly CBNTN concentration data. We used a linear model with site number as a categorical random variable (R lmer function; Bates et al. 2015). Heteroscedasticity in the concentration measurements was not a concern for this model because we used it only to estimate the among-watershed fraction of total variability, not to estimate P values for hypothesis tests. For nitrate and total nitrogen concentrations in both data sets, the linear model explained 93% (NO 3 ) and 87% (TN) of the total variation among all watersheds and weeks in the SERC data as well as 97% (both NO 3 and TN) among all watersheds and months in the CBNTN data. Thus, 87% or more of the total variation can be attributed to differences among watersheds, leaving only 13% or less of the total variation to be attributable to temporal variation and error. Spot sampling does not effectively account for temporal variation (Kirchner and Neal 2013), but that did not much limit the ability of spot samples to predict flowweighted average nitrate and total nitrogen concentrations because temporal variation in those concentrations was much smaller than the differences in concentration among watersheds.
We emphasize that the dominance of spatial variation among watersheds relative to temporal variation at a watershed does not mean that the temporal variation is unimportant. To the contrary, we observed substantial temporal variation at each site in both data sets (Figs. 2, 5; Appendix S1: Tables S4 and S7), and nitrogen levels are known to vary among years, seasons, and storm events (Correll et al. 1999a, b, c, Kirchner and Neal 2013, Abbott et al. 2018. Measuring that variability and understanding its causes are critical to addressing many questions in nitrogen cycling and nitrogen management, but not so critical to the task of placing the temporally averaged nitrogen levels for watersheds across a broad gradient of nitrogen enrichment.

Other water quality constituents
This paper is focused on testing the ability of spot samples to match the average nitrate and total nitrogen concentrations sampled by more costly and labor-intensive methods, but the SERC and CBNTN programs also measured other water quality constituents. These were dissolved silicate (Si), total ammonium (NH 4 ), total Kjeldahl nitrogen (TKN), total phosphorus (TP), total orthophosphate (PO 4 ), and total organic carbon (TOC) in the SERC study (Jordan et al. 1997a, b) and total phosphorus (TP), dissolved orthophosphate (PO 4 ), and total suspended sediment (TSS) in the CBNTN program (Chanat et al. 2012, Moyer et al. 2017. We again used linear regression to quantify the ability of spot samples to predict the higher-quality concentration estimates for these additional constituents (see Methods). We also again used a linear model (R lmer function, Bates et al. 2015) with site number as a random categorical variable to assess amount of the total variation among high-quality measurements attributable to differences among watersheds rather than to temporal variability within watersheds (Table 2).
For dissolved silicate (SERC watersheds), the average spot sample concentration was a very strong predictor (R 2 = 9%) of the average concentration from flow-weighted composite samples (Table 2). Like nitrate, Si is diluted rather than amplified during storm events, as are other mostly dissolved constituents (such as Ca, Mg, K, Na, SO 4 , Cl, NO 3 , and conductivity; Schleppi et al. 2006b). As with nitrate and total nitrogen, most of the total variability in dissolved silicate among weeks and watersheds is explained by differences among watersheds (84.9%), with much less variability potentially due to temporal variation within watersheds (Table 2). In contrast, the other additional constituents from both data sets are materials that are transported mostly on particles (Jordan et al. 1997a, b). Compared to nitrate, total nitrogen, and dissolved silicate, spot samples are much less effective at predicting flow-weighted average concentration for the materials transported on particles (R 2 > 92% for dissolved materials, R 2 < 52% for particulates; Table 2). Furthermore, the proportion of the total variability in flow-weighted average concentration due to differences among watersheds is much lower for materials transported mostly on particles than for dissolved materials, so that the importance of temporal variability within watersheds is greater for particulate-transported materials. For materials transported mostly on particles, temporal variability appears to be more dominant in the SERC data (>85% of total variability) than in the CBNTN data (>43% of total variability; Table 2) due to differences in watershed size (smaller watersheds are more temporally variable (Abbott et al. 2018) and data frequency-the weekly SERC data inherently capture more temporal variation than the monthly CBNTN estimates.
The SERC and CBNTN data sets both support the conclusion that spot measurements are very good predictors of flow-weighted average concentration for materials transported in dissolved form, but much less effective for estimating flowweighted average concentrations of materials that bind to particles. This is consistent with other reports of much higher correlations for nitrogen than phosphorus when comparing spot samples to composite samples (Schleppi et al. 2006a, b) or baseflow spot samples to storm samples (McCarty and Haggard 2016). Table 2 also supports ranking nitrogen> phosphorus > sediment in order of predictability as reported for a variety of modeling approaches , Brakebill et al. 2010, Preston et al. 2011, Boomer et al. 2013.
Our analyses relate to the idea of spatial stability presented by Abbott et al. (2018). They developed concepts and methods to quantify patterns of spatial and temporal variability in water quality within stream networks, and they discussed the ecological and hydrological implications of those patterns. They proposed the correlation between instantaneous and longer-term concentrations (as in our Tables 1, 2) as a direct measure of spatial stability of water chemistry patterns, and they suggested that temporal synchrony among watersheds promotes spatial stability. Our analysis of the proportion of total variability among watersheds and sampling times due to spatial differences among watersheds (Table 2) provides another measure of spatial stability, and the results suggest that the domination of total variability by differences among stations also promotes spatial stability. Abbott et al. (2018) argue that spatial stability determines the sampling frequency needed to identify and evaluate critical source areas and that synoptic sampling can be useful for those purposes when water quality patterns are spatially stable. In our data, Notes: Slope and R 2 from linear regressions of flowweighted average concentration in weekly composite samples vs. average spot sample concentration (SERC), or of flowweighted average concentration from WRTDS synthesis vs. the first spring spot sample (CBNTN). R 2 s is the squared Spearman's rank-order correlations (R cor function, R Core Team 2017). The Station column is the percentage of total variation among weeks and watersheds (SERC) or among months and watersheds (CBNTN) attributable to differences among watersheds. The Residual column is the remainder due to temporal variation within watersheds and to error. For each data set, constituents listed above the dashed line are transported in dissolved form, while constituents listed below the dashed line are primarily transported on particles.
† We placed CBNTN PO 4 below the dashed line even though the CBNTN measures dissolved PO 4 on filtered samples. PO 4 is transported in streams and rivers mostly on particles (Follmi 1996, Jordan et al. 1997a, and dissolved PO 4 exchanges with that particulate PO 4 (Froelich 1988). Therefore, the factors that drive high temporal variability in particulate PO 4 concentration can also affect dissolved PO 4 measurements. the very high spatial stability of nitrate and total nitrogen levels across a broad nitrogen-enrichment gradient (Table 2) suggests that just one spot sample may be adequate for such evaluations of those materials.

Application to science and management
Synoptic spot sampling is already widely used in reconnaissance efforts to measure baseline levels, identify water quality problems, target critical source areas, or measure compliance (NRCS 2003), often as a complement to more frequent automated sampling at a few selected locations (Messer et al. 1988, Kaufmann et al. 1991, Grayson et al. 1997, Wolock et al. 1997. Synoptic sampling provides data for more locations, helps assess the relative importances of sources throughout a watershed, and is often interpreted to identify landscape parameters and ecosystem processes correlated with water chemistry (Liu et al. 2000).
The relatively low costs for labor and laboratory analysis are a prime advantage of synoptic sampling over frequent automated monitoring. Harmel et al. (2006c) note that success of monitoring projects depends on careful attention to the tradeoff between the resources available for data collection and adequate characterization of water quality. Automated samplers typically yield better data but are especially expensive compared with manual sampling. The cost of automated monitoring is a significant obstacle to assessing large numbers of watersheds and restricts data available for analysis.
We demonstrate statistically that spot sampling is even more effective than previously reported, especially for placing average nitrogen levels in watershed discharges within broad enrichment gradient (Fig. 6). For this purpose, the SERC data revealed that a single spot sample was almost as effective as the far greater and more costly effort of monitoring flow continuously and collecting and analyzing 52 weekly composite samples for 1-3 yr (Fig. 6a-f). Similarly, the CBNTN analysis showed that a single spot sample was almost as effective for assessing averaging nitrogen concentration as monitoring flow continuously, collecting and analyzing an average of 98 water samples per site, and integrating the flow and concentration data with an advanced statistical model (Fig. 6g-i). Of course, the more detailed CBNTN protocols remain necessary to meet the CBNTN goal of characterizing nutrient and sediment dynamics at multiple temporal scales, including events, seasons, years, and multiyear trends.
Spot sampling may be adequate to meet some purposes for which more expensive sampling methods are now recommended. Current recommendations suggest automated or composite sampling for measuring fate and transport, program effectiveness, and research (NRCS 2003) as well as for predicting longer-term water quality, especially for smaller systems with high temporal variability (Kirchner and Neal 2013). Cassidy and Jordan (2011) state that only near-continuous monitoring is adequate for comparative monitoring and evaluation. However, many research and management issues lead to questions about how average nitrogen levels compare among watersheds or before and after management interventions. Our results suggest that, for nitrogen, spot sampling can be adequate for answering those questions (Fig. 6), even given high temporal variability in nitrogen levels in individual watersheds (Figs. 2, 5). When the focus is on differences in average nitrogen levels among watersheds driven by different amounts of nitrogen enrichment, frequent sampling may not be needed. Our results also support stream assessment protocols that collect one spring nitrate sample to assess watershed and stream nitrogen status (Ashton et al. 2014, Stranko et al. 2017. Given the effectiveness of spot sampling (Table 1, Fig. 6), we support its more widespread application in nitrogen assessment and management. McCarty and Haggard (2016) made a similar recommendation. They argued for a revolution in allocating water quality monitoring resources by using spot sampling of baseflow to assess nitrogen and phosphorus pollution and to target management actions, thus freeing resources to examine water quality at finer spatial scales and to provide a more complete information on spatial variability in water quality across watersheds. Our analysis strongly supports their recommendation for nitrogen management and assessment, but less so for phosphorus.
Other authors have also emphasized the need for better spatial coverage in water sampling. Abbott et al. (2018) highlighted the need to understand sources and sinks in headwater catchments where the vast majority of water and solutes enter aquatic ecosystems , Baker et al. 2007, Bishop et al. 2008, McDonnell and Beven 2014. Those headwater systems are where water quality problems originate, yet they are too numerous (thousands or more in large river systems) to monitor frequently, presenting a headwater conundrum, which can be resolved with synoptic sampling (Abbott et al. 2018).
Spot sampling of stream nitrate could be especially useful in citizen science efforts to assess water quality. Such efforts engage citizen volunteers to expand the capabilities of research or assessment teams and to educate citizens about science and management issues. Nitrate monitoring with baseflow sampling could be a part of a citizen monitoring program, requiring only minimal training in sample collecting, sample storage, and using smartphone global positioning to locate and document sampling sites.
Enthusiasm for the success of spot sampling in predicting flow-weighted average nitrogen levels (Fig. 6) should be tempered when considering phosphorus or other materials transported mainly on particles. McCarty and Haggard (2016) suggested using baseflow sampling for assessing other materials, such as phosphorus. We did find statistically significant correlations between spot measurements of phosphorus and flow-weighted average levels in composite measurements, but those relationships have much lower explanatory power (R 2 < 42%) than the relationships for nitrate and total nitrogen (R 2 > 82%; Table 2). For nitrogen levels, spatial differences among watersheds explain more of the observed variability than does temporal variation within watersheds, but the opposite is true for phosphorus and other particulates (Table 2). Nor does good information on nitrogen levels help much with assessing phosphorus levels. The correlation between flow-weighted average total phosphorus and total nitrogen is weak and not significant in both data sets (R 2 = 8%, P = 0.07, for SERC composite samples and R 2 = 0.1%, P = 0.7, for CBNTN estimates from WRTDS synthesis). Successful assessment of phosphorus levels and other particulates continues to demand monitoring methods that capture episodic, high concentrations occurring during storm events.

Is bootstrapping really necessary?
We fit linear relationships using a two-step bootstrapping procedure. Many practitioners may not have the time or interest to implement bootstrapping, and they will seek easier ways to calibrate relationships predicting multiyear average nitrogen levels from spot sample measurements. In our analyses of nine linear relationships (six for SERC data and three for CBNTN), the slopes, intercepts, and R 2 values from simple linear regression closely match those from bootstrapping ( Table 1). The two approaches give very similar predictions of multiyear average nitrogen levels, but bootstrapping gives wider confidence limits (compare Fig. 6a to Fig. 6b) because bootstrapping accounts for heteroscedasticity and sampling uncertainty, while a simple linear model does not. These results suggest that a simple linear model might be adequate for applications that need to predict average nitrogen levels but do not need estimates of confidence limits. In contrast, simple linear models fit to log-log-transformed variables did not perform well for our data. The transformation did not eliminate heteroscedasticity, and the models underpredicted for watersheds with high-nitrogen levels-the most important watersheds for many research and management questions (see Methods: SERC data analysis and Appendix S1). Analyses of our data suggest that simple linear regression using untransformed data would provide the most accurate shortcut for avoiding the bootstrapping method. However, when needs include confidence limits or significance tests, not just predictions, a procedure like bootstrapping should be included to account for heteroscedasticity and sampling uncertainty.

CONCLUSION
The key findings of this study include: 1. Spot sample measurements estimate average nitrate and total nitrogen concentration in streams draining nitrogen-enriched watersheds almost as effectively as multiyear data from flow-weighted composite v www.esajournals.org sampling or from WRTDS synthesis of continuous flow measurements and frequent water samples. 2. Estimates from spot samples are unbiased when implemented using calibrated relationships between spot measurements and flow-weighted composite or WRTDS measurements. 3. A simple linear regression works very well for fitting the calibrated relationships, but bootstrapping can make the analysis more rigorous by accounting for sampling error and heteroscedasticity. 4. Even a single spring spot sample can efficiently place watersheds within a broad gradient of anthropogenic watershed nitrogen loading. 5. Spot sampling of nitrogen works well because nitrogen is transported to streams primarily as nitrate dissolved in subsurface flow rather attached to particles in surface flow during storms. 6. For nitrogen levels in the data sets we examined, more of the total variability across places and times was due to spatial differences among study watersheds than to temporal variation within watersheds. 7. Spot measurement of stream nitrate is a low cost, low labor way to quantify average nitrogen status. 8. Spot sampling can be a powerful tool for identifying nitrogen source areas and monitoring the results of nitrogen management actions. 9. Spot sampling should be more widely applied to make nitrogen assessment and management programs more expansive and cost-effective. 10. Spot sampling is much less effective for materials that are mainly transported on particles, such as phosphorus, so spot samples of such materials should be interpreted cautiously.

ACKNOWLEDGMENTS
The SERC stream data were collected with support from NSF (BSR-89-05219, DEB-92-06811, and DEB-93-17968), NOAA (NA66RG0129), the Governor's Research Council of Maryland, the Government of Charles County Maryland, and the Smithsonian Institution Environmental Sciences Program. David L. Correll initiated the SERC watershed study. We thank U.S.G.S. scientists Douglas L. Moyer, and Jeffrey G. Chanat for help accessing and interpreting the data from the Chesapeake Bay Program Non-tidal Network (CBNTN).