Ragweed and sagebrush pollen can distinguish between vegetation types at broad spatial scales

. Patterns of vegetation distribution at regional to subcontinental scales can inform understanding of climate. Delineating ecoregion boundaries over geologic time is complicated by the dif ﬁ culty of distinguishing between prairie types at broad spatial scales using the pollen record. Pollen ratios are sometimes employed to distinguish between vegetation types, although their applicability is often limited to a geographic range. The Neotoma Paleoecology Database offers an unparalleled opportunity to synthesize a large number of pollen datasets. Ambrosia (ragweed) is a genus of mesic-adapted species sensitive to summer moisture. Artemisia (sagebrush, wormwood, mugwort) is a genus of dry-mesic-adapted species resilient to drought. The log pollen ratio between these two common taxa was calculated across the North American midcontinent from surface pollen samples housed in the Neotoma Paleoecology Database. The relative proportion of Ambrosia has roughly doubled since European settlement, likely due to widespread disturbance, while Artemisia proportions are nearly unchanged. Correcting surface samples for the disturbance signal in modern Ambrosia proportions will allow Ambrosia , a strong indicator of summer moisture, to be more accurately represented. In surface samples where both Ambrosia and Artemisia are reported as nonzero proportions of the pollen sum, mean annual precipitation explains approximately 78% of the variation in the log Ambrosia -to- Artemisia ratio. Application of this model to Little Ice Age pollen samples produces precipitation reconstructions which generally agree with reconstructions from independent non-pollen proxies. In addition, we ﬁ nd that modern ecoregions within the North American midcontinent can be successfully distinguished from one another using the log Ambrosia -to- Artemisia ratio. These relationships can improve reconstructions of past climate and improve delineation of past ecoregion boundaries.


INTRODUCTION
Vegetation distributions at regional to subcontinental scales are a result of the interaction between climate, fire, dispersal, disturbance, and competition over millennia (e.g., Williams 2008). These complex processes create distinct regions of habitat that change dramatically over geologic time in response to both external and internal climate forcings. Understanding past vegetation distributions has been a major focus of paleoecological work since Gleason (1922) outlined the five floristic types present at the margins of the late-stage Wisconsin ice sheet. Subsequent research refined and extended Gleason's groundbreaking ideas from localized to landscape scales (e.g., Baker et al. 1992), from landscape to regional scales (e.g., Davis and Shaw 2001), and from regional to subcontinental scales (e.g., Williams et al. 2004). Delineating past ecoregion boundaries has long presented a challenge due to the need to access, standardize, and synthesize large amounts of data through space and time.
Reconstructions of past ecoregion distributions have improved dramatically with the advent of paleoecological databases such as the North American Pollen Database (NAPD), FAUNal MAP (FAUNMAP), and the Neotoma Paleoecology Database, the latter of which now serves as the primary repository for pollen data in the Western Hemisphere and has partially subsumed the two former . These massive efforts provide access to thousands of contributions from individual researchers and make it possible to stitch records together over landscape and subcontinental scales and from glacial to modern time periods. It is now possible to refine estimates of past ecoregion extent, and therefore climate, at ever-higher spatiotemporal resolutions.
Grasslands have dominated much of the North American midcontinent since the early Miocene (Axelrod 1985, Edwards et al. 2010. Current vegetation assemblages, however, are post-glacial in origin (Jackson and Williams 2004). Prairies quickly expanded their range as the climate warmed following the last glacial maximum and reclaimed much of the North American midcontinent by the mid-Holocene (e.g., Wright 1992, Baker et al. 2000. The modern tallgrass prairie region extends from the Blackland Prairie region of east-central Texas, USA, to southwestern Saskatchewan, Canada, and east into the prairie peninsula of Illinois (Risser et al. 1981;Fig. 1). The western boundary of the contemporaneous tallgrass prairie region extends from east-central Texas through eastern North Dakota before veering westward into far west Saskatchewan. Through all but the most northerly portion, the western boundary is flanked by regions of mixed grass and shortgrass prairie extending from southeastern Saskatchewan to New Mexico and into southern Texas. An international, multiagency effort to precisely delineate the presentday extent of all major vegetation types across the continent resulted in the publication of the Ecoregions of North America (CEC 1997).
One existing limitation in the effort to improve ecoregion delineations over geologic time is the difficulty of distinguishing between prairie types at broad spatial scales using the pollen record.
Multiple approaches have been proposed with mixed levels of success. In the southwestern United States, an eight-taxon pollen signature derived from the modern pollen rain was shown to successfully differentiate between extant vegetation types (Hoyt 2000), but this study covered a small geographical area with extreme differences in climate unlike those experienced by much of the Great Plains region. The ratio of Artemisia to Chenopodiaceae pollen has been applied to differentiate between vegetation types in arid to semiarid regions worldwide but was shown to be applicable only within a narrow precipitation range of 450-500 mm annually and is not comparable across geographic regions (Zhao et al. 2012). The ratio of Artemisia to Cyperaceae pollen was shown to differentiate reliably between highalpine meadow and steppe but is likely only applicable in these regions (Herzschuh 2007).
One promising technique for delineating ecoregions is the ratio of Ambrosia-to-Artemisia pollen, which has been shown to differentiate between tallgrass, mixed grass, and shortgrass prairie over a small portion of the climate space occupied by the modern Great Plains of North America (Morris 2013). Both Ambrosia and Artemisia are wind-pollinated dicotyledons in the Asteraceae (sunflower family), subfamily Asteroideae. Ambrosia (Supertribe Helianthodae, Tribe Heliantheae, Subtribe Ambrosiinae; e.g., Baldwin 2009), commonly known as ragweed, is a genus of mesic-adapted, herbaceous plants notable for high disturbance tolerance. Artemisia (Supertribe Asterodae, Tribe Anthemideae; e.g., Oberprieler et al. 2009), variously called sagebrush, wormwood, and mugwort, is a genus of dry-adapted woody or herbaceous plants. Ambrosia, in general, is sensitive to summer moisture, while Artemisia is tolerant of drought, except for some winter sensitivity (Grimm 2001). Although the ranges of Ambrosia and Artemisia overlap over most of the North American continent, their autecology differs enough to employ them as complementary indicators. It is this feature of their habitat tolerances that makes them particularly useful as a tool to aid in delineating ecoregions.
We here extend the Ambrosia-to-Artemisia ratio technique of Morris (2013) to the full geographical extent of the modern Great Plains, as defined by CEC (1997), and demonstrate its ability to distinguish between adjacent ecoregions, as well as prairie types, over broad geographical areas.
A noteworthy complication present in the modern pollen record is the extreme disturbance created by the arrival of large numbers of European colonists in the 19th century. This disturbance appears in the pollen record as a sharp increase in the proportion of Ambrosia pollen that  CEC (1997). Thick black outline is the boundary of the Great Plains region. Thick blue outline is the 2 decimal degree buffer used to select pollen datasets from the Neotoma Paleoecology Database. has been dubbed the "settlement horizon." Pollen assemblages before and after this horizon cannot typically be treated as analogs, even if they are closely separated in time and space (Kujawa et al. 2016). For this reason, many studies necessarily exclude the highly disturbed Midwestern United States from their modern analog training sets. We present a tentative correction for this disturbance signal so that records from this region may be incorporated into training sets.
In this study, we evaluate the effectiveness of using the Ambrosia-to-Artemisia ratio as a technique to robustly delineate ecoregions in modern presettlement pollen records. We hypothesize that the ratio of Ambrosia-to-Artemisia pollen can be used to distinguish between ecoregions of the North American midcontinent. Additionally, we outline a statistical approach to better align presettlement assemblages with their modern counterparts in order to assess how pollen assemblages change through space and time. We hypothesize that, following appropriate correction for post-settlement disturbance, the ratio of Ambrosia-to-Artemisia pollen can be used to reconstruct past precipitation.

Data acquisition and handling
Pollen data were obtained from the Neotoma Paleoecology Database ) via package neotoma ver. 1.7.2 (Goring et al. 2015). All analyses were performed in R ver. 3.5.1 (R Core Team 2018). A full list of datasets included in this study is given in Appendix S1 (Appendix S1: Tables S1:S2 surface pollen records and S3:S4 presettlement pollen records).
Pollen records for the North American midcontinent were downloaded from Neotoma as type pollen (i.e., fossil pollen), or type surface sample (i.e., modern pollen). Taxa from raw assemblages were standardized using the Whitmore Full list (Whitmore et al. 2005). Non-pollen palynomorphs and spikes (i.e., microspheres or Lycopodium tablets) were removed from the dataset before calculations were made. Iva pollen is sometimes indistinguishable from that of Ambrosia, and therefore, we cannot exclude the possibility that Iva contributes to reported Ambrosia counts in some regions.
Raw age models for fossil pollen datasets were calibrated to IntCal09 after Reimer et al. (2009). Fossil pollen datasets were then subset to just those that were 250-500 yr in age, corresponding to Little Ice Age climate in the North American midcontinent before the well-documented and sharp increase in the proportion of Ambrosia pollen resulting from the arrival of European colonists and implementation of widespread land clearing, known as the "settlement horizon." Hereafter, these datasets are referred to as "presettlement." Pollen records were clipped to within two decimal degrees (approximately 222 km) of the boundary of the Great Plains region (Fig. 1) as defined by CEC (1997). A two decimal degree buffer was chosen because it is approximately double the 50% source area of ragweed pollen entering the center of a basin 750 m in diameter as predicted by Sugita (1993). Basin size is unknown for most of the datasets obtained from Neotoma, and therefore, source distance can only be roughly estimated. The two decimal degree buffer around the boundary of the Great Plains region accounts for the source area of pollen entering all but the largest of basins and reflects the uncertainty inherent in delineating ecoregion boundaries. We excluded datasets in both surface and presettlement sets that were located within regions which were likely receiving most or all of their pollen rain from non-grassland taxa. These comprise the modern-day Western Cordillera, Cold Deserts, and Upper Gila Mountains.
A total of 549 records (462 unique sites) of surface sample type (hereafter, "surface") were returned by Neotoma. Of these, 26 were located in the Western Cordillera region and 5 were located in the Cold Deserts, and were removed. Of those that remained (518), 11 records (2.1%) reported neither Ambrosia nor Artemisia and were removed. A total of 507 records were included in analyses. Eighty-three records (16.4%) included Ambrosia but no Artemisia. A total of 101 records (19.9%) did not include Ambrosia but did include Artemisia. Both taxa were reported as nonzero in 323 records (63.7%). Records in which at least one of the two taxa was reported as nonzero were retained.
A total of 843 records (211 unique sites), each spanning a portion of the past 250-500 calendar years before present, were available from ❖ www.esajournals.org Neotoma at the time of analysis. We removed 62 records which were located within the 2 degree buffer region but at high elevations and likely to reflect pollen from non-grassland taxa. These were sites in the modern-day Western Cordillera (58 datasets), Cold Deserts (3), and Upper Gila Mountains (1). Of those that remained, 68 (8.7%) contained neither Ambrosia nor Artemisia and were removed. The remaining 713 records (169 unique sites) were included in analyses. Of these, 13 records (1.8%) reported Ambrosia but no Artemisia, and 64 (8.98%) reported Artemisia but no Ambrosia. Both Ambrosia and Artemisia were reported in 636 datasets (89.2%).
Locations in which either Ambrosia or Artemisia, but not both, were reported represent ecologically distinct areas and are important to include in analyses. Therefore, in both surface and presettlement records in which either taxon was reported as zero, and the other nonzero, the proportion was recoded to a dummy value. This allows variation in the relative proportion of the nonzero taxon to be expressed in the resulting ratio. Zero occurrences of either Ambrosia or Artemisia were recoded to 1 9 10 À5 . This coding produces a ratio two orders of magnitude different than the smallest ratio in either dataset in which both Ambrosia and Artemisia were nonzero. The ratio of Ambrosia-to-Artemisia is calculated from the relative proportion of each in their respective records. In surface records, the Ambrosia proportion was divided by the median increase relative to the presettlement proportion (2.201) prior to the calculation of the ratio. The log ratio was used in all analyses.

Ecoregion and climate space information
Site location information for each record was used to match records to their respective EPA level II ecoregions via spatial join in R package sp (Pebesma and Bivand 2005). Records were then joined to a raster containing climate information from the Bioclim 2 dataset (Fick and Hijmans 2017) at a 2.5-arcminute (roughly 5-km) resolution. For a table of the number of datasets per ecoregion included in analyses, see Appendix S2: Table S1.

Statistical analyses
Discriminant function analysis was used to test whether ecoregion could successfully be predicted from the Ambrosia-to-Artemisia ratio and precipitation in the surface pollen dataset. Data were z-scored (centered and scaled) using the package caret (Kuhn et al. 2019). Discriminant function analysis was performed in package mda (Hastie et al. 2017) on a split training set (80%) and a test set (20%) bootstrapped in package boot (Davison and Hinkley 1997, Canty and Ripley 2019) using 1000 random splits. Ecoregions with fewer than 10 datasets were excluded (Softwood Shield and Texas-Louisiana Coastal Plain).

RESULTS
The proportion of Ambrosia pollen in the surface pollen dataset (median proportion 0.059) is approximately double (2.201) that of the presettlement set (median proportion 0.027). Only small changes were found between the Artemisia proportion in surface (median proportion 0.02) vs. presettlement samples (median proportion 0.031; Fig. 2). After correcting the surface dataset for the doubling in median proportion between presettlement and surface Ambrosia, where both taxa were present, the log Ambrosia-to-Artemisia ratio ranged from À5.61 to 4.22 with a mean of 0.13 (Table 1; Fig. 3). In cases where zero Artemisia was reported, the log Ambrosia-to-Artemisia ratio ranged from 4.65 to 10.16 with a mean of 8.35. In cases where zero Ambrosia was reported, the log Ambrosia-to-Artemisia ratio ranged from À11.83 to À7.17 with a mean of À9.36.
Co-occurrences of Ambrosia and Artemisia were reported across a wide range of precipitation in the surface set, from 269 to 1,182 mm annually, with a mean of 598 mm. Where Ambrosia occurred but Artemisia did not, precipitation ranged from 492 to 1,631 mm annually, with a mean of 1,006 mm. Where Artemisia occurred but Ambrosia did not, precipitation ranged from 295 to 579 mm annually, with a mean of 458 mm.
The relationship between mean annual precipitation (MAP) and the log Ambrosia-to-Artemisia ratio differs among the three scenarios. Where either Ambrosia or Artemisia, but not both, occurs, the relationship between MAP and the log ratio is mostly linear (Fig. 4). MAP for the zero Ambrosia cases explains 31.0% of the variation in the Ambrosia-to-Artemisia ratio. The model is weak for the zero-Artemisia cases, with MAP ❖ www.esajournals.org explaining 21.7% of the variation in the log Ambrosia-to-Artemisia ratio. Where both types of pollen are present, the relationship between MAP and the Ambrosia-to-Artemisia ratio is modeled using a bootstrapped B-spline fit with three knots (Appendix S3: Fig. S1), which explains 77.7% of the variation in the Ambrosia-to-Artemisia ratio and produces normally distributed residuals (Appendix S3: Fig. S2). The relationship between MAP and the Ambrosia-to-Artemisia ratio is positive between approximately 200 and 800 mm MAP. Between approximately 800 mm and the maximum of approximately 1200 mm MAP, the slope of the relationship between the log Ambrosia-to-Artemisia ratio and precipitation is positive but decreases slightly.

Pre-European settlement
The surface set models were applied to a presettlement dataset in order to validate the relationships between the Ambrosia-to-Artemisia ratio and precipitation estimates. Where both taxa were present (706 records), the log Ambrosiato-Artemisia ratio ranged from À5.37 to 4.04, with   Fig. 3. Log Ambrosia-to-Artemisia ratio between scenarios in the surface set. For records in which either taxon was reported as zero, and the other nonzero, the proportion was recoded to a dummy value of 1 9 10 À5 . This allows variation in the relative proportion of the nonzero taxon to be expressed in the resulting ratio. A total of 507 records are included in analyses. A total of 323 records report both Ambrosia and Artemisia, 101 records report Artemisia but no Ambrosia, and 83 records report Ambrosia but no Artemisia. Fig. 4. The relationship between precipitation and the log Ambrosia-to-Artemisia ratio for the surface pollen dataset is linear for the zero Artemisia and zero Ambrosia, and exponential where both are present. a mean of À0.28. Where Artemisia was present but Ambrosia was not (75 records), the log Ambrosia-to-Artemisia ratio ranged from À10.69 to À5.34, with a mean of À7.59.

Model performance: pre-European settlement precipitation estimates
Application of the B-spline model to presettlement Ambrosia-to-Artemisia log ratios produced a mean MAP of 538.5 mm (range 349.4-927.3 mm) where both Ambrosia and Artemisia are present in the record (Table 2). This is, on average, 135.8 mm/yr less than modern observed precipitation over the midcontinent (range À209.0 to 677.9 mm/yr less relative to modern values; Fig. 5). Where Artemisia is present but Ambrosia is not, the linear model produced a mean annual precipitation of 525.0 mm and resulted in a reconstructed presettlement precipitation at these locations, on average, 26.5 mm/yr greater than at present (Fig. 6). There are too few cases of zero Artemisia in the presettlement set (13 records from seven unique locations) to adequately test the model. In the surface pollen set, the model fit to zero-Artemisia cases is not only weak, but also produces a low slope, leading to a high risk of error. It is unlikely to produce realistic precipitation estimates and we therefore exclude it from further consideration.

Ecoregions
The Ambrosia-to-Artemisia ratio varies across the midcontinent, being generally low in the northern regions and increasing toward the south and east (Fig. 7). The ratio is closely tied to precipitation (Fig. 8a, b) and varies less with temperature (Fig. 8c). See Supporting Information Appendix S4: Fig. S1 for an interactive threedimensional visualization. Discriminant analysis (DA) suggests that an additive model including the log Ambrosia-to-Artemisia ratio and MAP can distinguish between ecoregions with between 48.5% and 77.3% success, producing a mean success rate of 61.3% on 1000 random partitions of the dataset. ANOVA followed by adjustment for multiple comparisons using Tukey's honest significant difference (HSD) indicates that there are statistically significant differences in the Ambrosia-to-Artemisia ratio between the three subregions of the Great Plains: West-Central Semiarid Prairies vs. Temperate Prairies (difference = À2.13, P = 0.002), Temperate Prairies vs. South-Central Semiarid Prairies (difference = À3.09, P < 0.0001), and West-Central Note: The zero-Artemisia model has a low slope, producing a very high likelihood of erroneous precipitation values and is therefore not recommended for use. (difference = À5.22, P < 1 9 10 À7 ; Fig. 9; see Appendix S5: Table S1 for a full pairwise comparison between ecoregions). Inclusion of the interaction between latitude and longitude (to control for spatial autocorrelation) produced no change in the ANOVA coefficients and was therefore excluded from the final model.

DISCUSSION
Vegetation responses to climate are complex, yet predictable patterns often emerge. The Ambrosia-to-Artemisia ratio is strongly driven by precipitation in a predictable pattern across the North American midcontinent. Ambrosia species in this region tend to be moderately drought sensitive, especially in summer, whereas Artemisia demonstrates high drought tolerance, with some sensitivity in winter (Grimm et al. 2011). These differing climate tolerances make it possible to use their relationship to each other to understand hydroclimate at multiple spatiotemporal scales. Where both taxa are present, MAP explains nearly 78% of the variation in the Ambrosia-to-Artemisia ratio, and the addition of temperature as a variable does not markedly improve the model. Where no Artemisia is present and Ambrosia is reported as nonzero, the model is extremely weak and its use is inadvisable. This may be due to differences in moisture sensitivity; Artemisia does not occur in the surface pollen set where moisture is above 1200 mm annually, whereas Ambrosia occurs over the full range of precipitation in the dataset. It may also be a result of spatial bias in surface pollen samples housed in the Neotoma Paleoecology Database, which has been thoroughly analyzed by Inman et al. (2018). In the zero-Artemisia cases, the linear relationship is strongly affected by groups of samples taken within a very narrow range of precipitation and the removal of those groups effectively changes the apparent relationship with precipitation. For this reason, the Ambrosia-to-Artemisia ratio should not be applied where Artemisia is absent. Records where Ambrosia is absent and Artemisia is present invariably occur in areas that receive less than~600 mm of precipitation annually; given that the median relative proportion of Artemisia is only slightly different in pre-European settlement vs. modern records, this appears to be an especially strong aridity signal.
Our findings agree with Commerford et al. (2018), who report that inclusion of Ambrosia in transfer functions drove precipitation estimates, and that precipitation appears to be a much stronger factor in Ambrosia pollen abundance than is temperature. We suggest that the strong signal of disturbance in the post-European settlement pollen record can be adequately controlled for using a correction factor. Our simple twotaxon ratio also avoids the difficulties inherent in accounting for post-settlement disturbance to a large number of taxa with a wide variety of apparent disturbance signals. Morris (2013) successfully employed the Ambrosia-to-Artemisia ratio to distinguish between tallgrass, mixed grass, and shortgrass prairie over a small region within the Great Plains. Our models suggest the tool is widely Fig. 6. Difference in modern vs. reconstructed precipitation at presettlement sites. Blue indicates reconstructed LIA precipitation is higher than at present, yellow indicates modern and LIA precipitation that is roughly similar, and red indicates drier conditions than at present. applicable over the entirety of the Great Plains region and that modern ecoregions carry distinct signals driven largely by precipitation. This can improve our ability to define past vegetational boundaries and refine precipitation estimates at a range of spatial scales. We find tentative support for our hypothesis that the ratio of Ambrosiato-Artemisia pollen can be used to distinguish

Pre-European settlement
We reconstruct unchanged to slightly increased LIA precipitation in the central Great Plains relative to the present, with areas of moderately to substantially reduced precipitation at the eastern and western margins of the Great Plains. A paucity of presettlement pollen records in the central Great Plains, because of the scarcity of natural lakes, hampers precise estimation. However, tentative conclusions can be drawn from the relatively few records that are available, and we here consider independent precipitation proxies (salinity, lake level, stable isotope, tree ring, speleothem, etc.) to evaluate the ability of the Ambrosia-to-Artemisia ratio to accurately reconstruct precipitation.
Reconstructions of reduced precipitation north-central Great Plains immediately preceding European settlement broadly agree with records based on lake levels and salinity in eastern North Dakota (Fritz et al. 1994, Valero-Garc es et al. 1997. We reconstruct unchanged to slightly wetter conditions over the same period in southcentral Canada, which is corroborated by reduced lake salinity near Medicine Hat, Alberta (Vance et al. 1992). High apparent spatial heterogeneity of moisture near the northern and northwestern limits of the tallgrass prairie region has been reported from diatom-inferred salinity records and may indicate instability of the jet stream during the LIA (Laird et al. 2003). Strong drought signals in the Midwestern United States, especially Minnesota, Wisconsin, and Illinois, must be interpreted with care. Precipitation in the upper Midwest, USA, during the LIA was increased, and fire activity reduced, relative to the medieval Climate optimum (MCA) (Shuman and Marsicek 2016). However, precipitation since the LIA has increased markedly. We find tentative support for our hypothesis that, following appropriate correction for post-settlement disturbance, the ratio of Ambrosia-to-Artemisia pollen can be used to reconstruct past precipitation.

CONCLUSIONS
We provide past precipitation estimates to demonstrate that our calibration of modern pollen to modern precipitation produces models which can be applied across time and space. Precipitation reconstructions broadly agree with the majority of independent proxy records available throughout the North American midcontinent during the LIA. Despite pollen being a primarily regional signal, high spatial heterogeneity of precipitation appears to be reflected in the records satisfactorily. The ratio is strongly driven by precipitation, while temperature appears to be a less critical factor. It remains to be seen whether this holds true further back in time when climate was much different than at present. It is also unknown whether the Ambrosia-to-Artemisia ratio is applicable outside of the North American midcontinent. Most pollen ratio methods are applicable only to the regions for which they are developed, although the Ambrosia-to-Artemisia ratio is apparently useful over a much larger area than most. However, Ambrosia is an introduced species outside of North America and the use of the ratio on other continents is not advisable.
Our reconstructions of past precipitation generally agree with independent non-pollen proxies. Limitations occur where spatiotemporal coverage of pollen datasets is poor. However, for well-studied areas, the application of a correction factor to the modern Ambrosia proportion may enable researchers to include it in training sets and precipitation models. It remains to be seen whether alternative plant-derived markers (macrofossils, phytoliths, etc.) may be employed in a similar manner.