Reconstructing continental-scale variation in soil δ 15 N: a machine learning approach in South America

. Soil nitrogen isotope composition ( δ 15 N) is an essential tool for investigating ecosystem nitrogen balances, plant – microbe interactions, ecological niches, animal migration, food origins, and forensics. The advancement of these applications is limited by a lack of robust geospatial models that are capable of capturing variation in soil δ 15 N (i.e., isotopic landscapes or isoscapes). Due to the complexity of the nitrogen cycle and general scarcity of isotopic information, previous approaches have reconstructed regional to global soil δ 15 N patterns via highly uncertain linear regression models. Here, we develop a new machine learning approach to ascertain a ﬁ ner-scale understanding of geographic differences in soil δ 15 N, using the South American continent as a test case. We use a robust training set spanning 278 geographic locations across the continent, spanning all major biomes. We tested three different machine learning methods: cub-ist, random forest (RF), and stochastic gradient boosting (GBM). 10-fold cross-validation revealed that the RF method outperformed both the cubist and GBM approaches. Variable importance analysis of the RF framework pointed to biome type as the most crucial auxiliary variable, followed by soil organic carbon content, in determining the model performance. We thereby created a biogeographic boundary map, which predicted an expected multiscale spatial pattern of soil δ 15 N with a high degree of con ﬁ dence, performing substantially better than all previous approaches for the continent of South America. Therefore, the RF machine learning framework showed to be a great opportunity to explore a broad array of ecological, bio-geochemical, and forensic issues through the lens of soil δ 15 N.


INTRODUCTION
Coherent patterns of soil nitrogen (N) isotope composition ( 15 N/ 14 N) have been used to constrain an array of N cycling processes, including N fixation, mineralization, nitrification, denitrification, and leaching, leading to improved understanding of global biogeochemical fluxes and forecasts (e.g., Robinson 2001, Vitousek et al. 2013, Houlton et al. 2015, Zhu and Riley 2015, Denk et al. 2017. Moreover, δ 15 N isotopic landscapes, or isoscapes, can provide critical spatial baseline information for examining animal migration (Hobson et al. 2012, Garcia-Perez andHobson 2014), forensic drug trafficking, and tracing food origins (Chesson et al. 2018). However, uncertainty over spatial patterns of soil δ 15 N is a significant obstacle to further advancing various N isotopic modeling approaches, with widespread implications for diverse fields of inquiry (Amundson et al. 2003, Bai et al. 2012).
The first spatially explicit soil δ 15 N map was created by using simple regressions between δ 15 N and climate, based on a select number of the data extrapolated to continents and the globe (Amundson et al. 2003). More recently, studies have included clay and organic carbon content in such global soil δ 15 N analysis (Craine et al. 2015). While these past approaches have proven insightful, they are yet to provide a robust, spatial representation of N isotope composition across diverse ecosystems and landscapes, where soil δ 15 N can vary from −4‰ in temperate ecosystems to 20‰ in tropical ecosystems (Martinelli et al. 1999). Furthermore, previous global models tend to mask critical regional variations of soil δ 15 N within biomes (Freitas et al. 2010). Within savanna ecosystems, for example, the spatial pattern of soil δ 15 N at regional scale does not adhere to predictions of global models (Cheng et al. 2009, Freitas et al. 2010, Chen et al. 2018. Therefore, methods that integrate discrete and categorical variables can be essential tools for advancing our ability to reconstruct soil δ 15 N variation, both within and among ecosystems. Here, we explore the use of machine learning approaches in the assessment of spatial patterns of soil δ 15 N in South America. Ensemble learning algorithms are a class of machine learning techniques that combine several models to build a robust final prediction. Unlike other approaches, ensemble learning algorithms have the distinct advantage of integrating digital mapping of soil attributes as both discrete and continuous variables (Hengl et al. 2017, Gomes et al. 2019). Among the most widely used ensemble learning algorithms are the cubist (Adhikari et al. 2014, Gray et al. 2015, Viscarra Rossel et al. 2016, stochastic gradient boosting (Yang et al. 2016), and random forest (Yang et al. 2016, Bataille et al. 2018, Hengl et al. 2018, Hounkpatin et al. 2018, Gomes et al. 2019 varieties. These methods are also emerging as a powerful alternative for mapping isotopic variations in the environment (Bataille et al. 2018). Despite the full range of applications in mapping soil attributes, machine learning approaches are yet to be systematically applied to the problem of soil δ 15 N from regional to continental scales.
Our objective was to generate a useful soil δ 15 N isoscape for South America by applying a variety of machine learning approaches. Although we test three machine learning techniques, we hypostatize that random forest algorithm will have the best performance in predicting soil δ 15 N. This hypothesis is based on recent papers where random forest algorithm produced more accurate results in mapping soil attributes when compared to neural networks, linear models, and other ensemble learning algorithms (Bataille et al. 2018, Hengl et al. 2018, Gomes et al. 2019).

Study area and soil δ 15 N dataset
We used a total of 278 georeferenced locations to create the training dataset and validation of our machine learning model, of which 135 are from published soil δ 15 N, and 143 data points are new data from fieldwork. All georeferenced locations are in unmanaged vegetation. We used the Web Plot Digitizer to extract values from plots when data were not available in tables (Rohatgi 2018). The data locations in our study are distributed across Brazil, Argentina, Chile, Peru, Venezuela, Ecuador, and Colombia. We converted the geographic coordinate system to decimal latitude and longitude WGS-84. We considered only soil sampled above 20 cm of depth. For locations with multiple depth values between 0 and 20 cm, we standardized δ 15 N by calculating weighted means of N percent. We did not select deeper soil because there is a strong correlation between surface and deep soil δ 15 N (Amundson et al. 2003). The geographic distribution of data encompassed seven of the nine biomes in South America, according to the World Wild Foundation biomes boundaries (Olson et al. 2001; Fig. 1).
We aggregated geographical locations ≤1 km distance (~0.01 degree) and calculated mean soil δ 15 N among aggregated points by using the aggregate function in R package sp, thus reducing over-representation of specific sites and any spatial biases. v www.esajournals.org 3 August 2020 v Volume 11(8) v Article e03223

Soil sampling and isotopic analysis
We collected some sets of samples for this study in fragments of the original vegetation. We sampled at 0-20 cm depth interval by using a Dutch soil sampler. Soil samples were air-dried and sieved using a 2-mm mesh to a level of airdried fine earth (ADFE). A subsample of 30-35 mg of soil was placed into tin capsules for N and isotopic analysis using an elemental analyzer (Carlo Erba, model 1110, Milan, Italy). Combustion products were purified in a gas chromatography column and introduced directly into a mass spectrometer for isotopic analysis (ThermoQuest-Finningan Delta Plus, Finnigan-MAT, California, USA). The natural abundance of 15 N/ 14 N is expressed as deviations per mil (‰), an internationally recognized standard, through the equation δ = (R sample /R standard − 1) × 1000, where R is the 15 N/ 14 N ratio of the sample and standard. The standard for N is the atmospheric air with an N isotopic composition of 0‰. Internal, certified laboratory standards are routinely interspersed with target samples during analysis runs. Long-term standard deviations of internal standards (sugarcane leaves and tropical soil) used at the Ecology Isotope Laboratory (CENA/University of São Paulo) are 0.2‰ for δ 15 N. Isotopic analyses were performed at the Isotope Ecology laboratory at the Center for Nuclear Energy in Agriculture (CENA), University of São Paulo, Piracicaba, Brazil.

Auxiliary variables
We used predictive variables from global geospatial datasets in our model (Table 1). Climatic variables are from the WorldClim database (Hijmans et al. 2005). We used the mean annual temperature (MAT) and mean annual precipitation (MAP), the aridity index (ai), and potential evapotranspiration (pet) in our model.
At the global scale, soil δ 15 N has been shown to correlate with both climatic variables and pedological variables, such as soil organic carbon content and clay content (Craine et al. 2015). For soil variables, we used the Soilgrid database for South America with a spatial resolution of 250 m (Hengl et al. 2017). In addition to soil organic carbon and clay content, we used bulk density, cation exchange capacity, pH (H 2 O and KCl extractable), silt, and sand content in our analysis (Table 1). For elevation, we used a digital elevation model (DEM) with a 1 km spatial resolution from the Shuttle Radar Topography Mission (SRTM) (Jarvis et al. 2008).
We used the net primary production (NPP) from MODIS/Terra data (MOD17A3; Zhao et al. 2005). We specifically aggregated annual NPP grids over the period between 2000 and 2015 to estimate annual averages for NPP. All NPP grids are from the Numerical Terradynamic Simulation Group at the University of Montana. The biome and ecoregion maps used in our analysis were obtained from the World Wild Foundation (Olson et al. 2001). We transformed the biome and ecoregion original shapefile format to raster using ESRI ArcGIS 10.3 (ESRI 2016).

Modeling
We based the spatial analysis of soil δ 15 N on the caret package on R version 3.5.0 (Kuhn 2008). We adapted the script from Bataille et al. (2018) to our dataset (see Data S1). First, we projected all auxiliary data into the pattern projection WGS-84. Then, we resampled all auxiliary data at 1-km grid cells, by using the bilinear interpolation method for continuous variables and nearest-neighbor interpolation for categorical variables. After standardizing the spatial resolution of auxiliary data, we extracted the values for each geographical location to build the regression matrix, which we used to train the spatial prediction models in the caret package.
To select a group of representative auxiliary variables, we applied the recursive feature elimination (RFE) method, also present in the caret package (Kuhn 2008). RFE is an algorithm for retroactive selection of predictors. The RFE technique first builds a model with all predictors and calculates the importance of each one to the final model; then, it removes the least predictor and rebuilds the model (Kuhn and Johnson 2019). One of the advantages of this method is that RFE reduces the effect of the correlation on the importance measure (Gregorutti et al. 2016). In this study, we ran the RFE with thirteen subsets of variables: 4, 5, 6. . . 16. We selected the best subset of covariates based on the lower root mean squared error (RMSE) after a 10-fold cross-validation.
We stacked the chosen covariates to assess the accuracy and uncertainty of model outcomes. We compared three regression methods available in v www.esajournals.org the caret package (Kuhn 2008): cubist, random forest (RF), and stochastic gradient boosting (GBM; Fig. 2).
RF and GBM methods are similar to classification and regression trees (CART), which analyze variables that best appropriate values into two distinct groupings. The appropriation of values in distinct groups is repeated for subsets until a regression tree is constructed. The RF model is formed by a large number of random trees during the model-training process. All random trees are merged into a single prediction to reduce the noise and increase the accuracy of results. The random trees are averaged, resulting in low prediction variance. This approach is known as ensemble learning (bagging; Kuhn 2008). Model error is estimated by using the out-of-bag data (OOB). The main parameter of the RF method is the number of features tested in each node (m.try argument), which is used for model optimization (Liaw andWierner 2002, Hengl et al. 2018). In this study, we used m.try = 4.
In addition to RF, GBM is also an ensemble learning approach. The main difference is that trees in the GBM method are built following a sequence as opposed to the random pattern approach of RF. After each regression tree is created, a new subsample is selected at random from the training dataset, which is used in the following interactions (Friedman 2002). The errors generated by the previous trees are corrected in the subsequent tree created in the GBM. Standard parameters used for tuning GBMs are the number of trees or the amount of boosting interactions (n.trees argument), the depth of trees (interaction.depth argument), and the learning rate (shrinkage argument; Greenwell et al. 2019). We applied the follow parameters: n.trees = 500; interaction.depth = 30; shrinkage = 0.1.
Cubist is a rule-based model where, for each node of the model tree, linear regression is generated, which is smoothed by the linear regression of the previous node. Initially, the model tree is defined by a set of rules, which in turn can be combined or eliminated for model simplification. The committee is the main tuning parameter of the cubist method. It works as a boosting scheme with the creation of interactive trees in sequence, in addition to establishing the number of trees (Kuhn and Quinlan 2018). Subsequent model trees are created by adjusting the error of the previous model trees. The final prediction is established by the outcome linear regression model, with influence from previous models along the same path through the model tree.
We predicted soil δ 15 N by applying each of the three tested models using chosen covariates to select the most accurate one. We separated the essential covariates influenced the target variable by using partial dependence plots, where the yaxes indicate the mean marginal effect of each value of a given covariable to the final prediction (Friedman 2001). This approach has the advantage of graphically displaying the correlation between a given feature and the target variable.
Spatial prediction of soil δ 15 N isoscape We applied the whole dataset (n = 278 geographical locations) on the best performing model to create the final soil δ 15 N isoscape map for South America. We performed the spatial predictions by using the prediction function in a raster package on R and applying the chosen model to the stacked covariate raster. There is no function to predict the spatial uncertainty from the random forest model. An alternative way is to make several model predictions to calculate the coefficient of variation (CV % = (standard deviation/mean) × 100; Gomes et al. 2019). For this research, we applied 20 spatial predictions to RF models.

Model performances and comparisons
The highest performance on the RFE selection was the model with the sixteen covariates (Fig. 3). Cross-validation values of RMSE, R 2 , and MAE from the RF, cubist, and GBM algorithms showed that RF was best able to capture geospatial variation in soil δ 15 N in South America (Fig. 4). Although the models were not statistical different (Kruskal-Wallis test, P value > 0.05), the average R 2 was slightly higher in RF (R 2 = 0.64) vs. cubist (R 2 = 0.61) and GBM (R 2 = 0.60; Fig. 4A) models. For RMSE, the average was lower in RF algorithm (1.85‰) than in cubist (1.92‰) and GBM (1.97‰; Fig. 4B). The same pattern was observed for MAE, where RF had the lowest average value (1.37‰), followed by cubist (1.40‰) and GBM (1.52‰; Fig. 4C).
Soil δ 15 N predicted by the RF algorithm explained 94% of the observations in the training data, with an RMSE of 0.63‰ (P value < 0.01; Fig. 5A). After 10-fold cross-validation, the predicted δ 15 N values explained 64% of the values observed in the training data, with an RMSE of 1.85‰ (P value < 0.01; Fig. 5B). However, the most robust performance evaluation approach involves a comparison between model-predicted δ 15 N and the holdout testing data that we did not use in the model (n = 50). In this case, the predicted values explained 62% of the variation in the holdout testing data, with an RMSE of 1.72‰ (P value < 0.01; Fig. 5C).

Importance of covariates
Biome type was the most important predictor variable for geospatial soil δ 15 N in South America. Covariate analysis showed that biome contributed 17.5% to the decrease in mean accuracy (Fig. 6). Soil organic carbon was the second most important variable, with a 16.4% contribution, followed by climatic variables related to precipitation (pre, ai), bulk density, and NPP (Fig. 6). Figure 7 shows the influence of the six most crucial covariates for predicting soil δ 15 N. Each biome had a distinct signature range in soil δ 15 N. Deserts and xeric shrublands showed the highest values of soil δ 15 N between the observed data (11.9 AE 3.6‰). Tropical moist forests had the second highest soil δ 15 N value (7.4 AE 2.0‰). Conversely, temperate biomes showed the lowest values of soil δ 15 N: Temperate mixed forests showed the lowest mean value of soil δ 15 N between all represented biomes (1.1 AE 3.0‰), followed by temperate grasslands, savannas, and shrublands (3.4 AE 2.6‰). Tropical dry forests, tropical savannas, and shrublands, and flooded grasslands were intermediate, with soil δ 15 N v www.esajournals.org ranging from 6.8 AE 1.9‰, 5.5 AE 1.6‰, and 5.0 AE 1.4‰, respectively (Fig. 7A).
Soil organic carbon content showed a negative correlation with soil δ 15 N. We found a sharp drop in soil δ 15 N between 0 and 25 g/kg of organic carbon. The negative trend was much less between 25 and 100 g/kg of organic carbon and again had a sharp drop between 100 and 150 g/kg of organic carbon (Fig. 7B). Low MAP was related to higher soil δ 15 N, following a  negative correlation trend up to 1100 mm of precipitation. Between 1100 and 1900 mm MAP, the predicted marginal values were much smaller than the values for other MAP ranges; and it was also possible to observe an increase in the partial value of predicted δ 15 N as precipitation increases, followed by a sharp rise at 2000 mm. After this value, the inverse correlation returned, with higher values of δ 15 N (Fig. 7C). The same was seen for the aridity index since it is a covariate correlated with MAP (Fig. 7D).
Bulk density exhibited a general pattern of positive correlation with δ 15 N, with a sharp increase in δ 15 N values between 900 and 1000 g/m 3 . Partial dependence continued to rise as bulk density values range from 1000 to 1400 g/m 3 , but less strongly (Fig. 7E). The influence of NPP on predicted soil δ 15 N was more complex, with a negative correlation between 200 and 600 gC.m -2 .yr -1 of NPP, a positive correlation between 600 and 700 gC.m -2 .yr -1 , and a negative between 700 and 900 gC.m -2 .yr -1 . The correlation was again positive between 1000 and 1200 gC.m -2 .yr -1 before stabilizing (Fig. 7F).

Uncertainty of the model
Regression residuals between observed and predicted soil δ 15 N did not exhibit spatial autocorrelation (Moran's index = −0.02; P value = 0.77). The extreme values of residuals, both positive and negative, were concentrated in the areas with maximum and minimum observed soil δ 15 N, specifically in the northeastern region of Brazil and in Patagonia in Argentina (Fig. 8A). The coefficient of variation map calculated from the predictions showed larger inconsistencies in the southern area of the continent (Fig. 8B). CV higher than 40% was observed for the mountainous area along the Andes. On the other hand, the lowest CV was concentrated in the northern region corresponding to tropical forests, in the semi-arid tropical region, and in the tropical savannas (Fig. 8B). The spatial pattern of the CV followed the spatial distribution of available data, with higher values in places with little or no sampling.

Prediction of soil δ 15 N for South America
The highest soil δ 15 N was observed for the northeastern region of Brazil, where modeled values of soil δ 15 N (0-20 cm) reached approximately 15‰ (Fig. 9). High soil δ 15 N values were also mapped in the northern region of Brazil (between 8‰ and 11‰, approximately), in the Amazon Forest region, and the southern areas of Bolivia and northwestern Argentina. The lowest values appeared in the extreme southwest and south of South America, with negative isotopic ratios reaching approximately −2‰. Negative or near zero soil δ 15 N also appeared in the northwest region of South America, following the mountainous Andes between Peru and Colombia. Intermediate values, between 3‰ and 7‰, were mainly in the central region of Brazil, where the tropical savannas occur, and to the south in a continuous range that follows from the central region of Paraguay to the south of Argentina, where the temperate savannas predominate (Fig. 9).

Source of uncertainty and model performance
Our machine learning was capable of reconstructing patterns of soil δ 15 N for the South American continent, providing an improved Fig. 6. The relative importance of each covariate for soil δ 15 N prediction in South America using the random forest algorithm. %IncMSE refers to a decrease in mean accuracy. For covariates abbreviation, check Table 1.
v www.esajournals.org prediction of observed soil δ 15 N variation (down to 20 cm) compared to previous regression-based approaches (Amundson et al. 2003). This result is consistent with the development of machine learning algorithms in digital soil mapping and biogeochemical predictions (Hengl et al. 2017, Bataille et al. 2018, Gomes et al. 2019). Among machine learning approaches, the RF model performed best, followed by the cubist and GBM algorithms. Upon 10-fold cross-validation, the RF model produced an R 2 = 0.64 vs. observed soil δ 15 N data. This significant correlation was roughly similar to our validation with the subset of data not used in the RF model (i.e., R 2 = 0.62 vs. the testing subset), indicating that the model was not significantly affected by overfitting.
RF yielded the most efficient algorithm in our study is consistent with previous RF applications to the mapping of different soil variables (Nawar andMouazen 2017, Gomes et al. 2019). Similarly, in an attempt to map strontium isoscape to central Europe, Bataille et al. (2018), after comparing the efficiency of several methods besides the GBM and cubist algorithms, among them ordinary kriging, neural network, and the generalized linear model, found that RF had also the best performance.
The highest CV values concentrated in areas with limited data in our study followed the same pattern in the literature with the assumption that spatial gaps in the dataset are the primary sources of error in the modeling of any soil variable (Malone et al. 2018). In particular, areas between the central and southern regions of South America are the most underrepresented among soil δ 15 N observations. Filling these spatial gaps can further improve the accuracy of models using RF (Hengl et al. 2018).

Soil δ 15 N spatial distribution in South America
Our study identified biome type as the essential factor for understanding spatial patterns of soil δ 15 N at regional to continental scales. Areas of xeric shrublands in northeastern Brazil had the highest modeled soil δ 15 N, which is consistent with isotopic fractionation during ammonia volatilization, that leaves high residual soil 15 N/ 14 N ratios in areas with water deficits and v www.esajournals.org high pH soils (Högberg et al. 1997, Menezes et al. 2012. The same mechanisms are likely to explain the high soil δ 15 N (between 10‰ and 12‰) predicted for the Atacama Desert region (Ehleringer et al. 1992) and in the semi-arid areas of northwestern Argentina and Paraguay.
Conversely, the lowest predicted soil δ 15 N is apparent in temperate climates and along the mountainous Andes region in the northwest of South America. Temperate ecosystems have a slower rate of soil organic matter decomposition and N mineralization, which reduces isotopic Fig. 9. Isoscape of soil δ 15 N (0-20 cm) for South America. The unclassified correspond to areas without vegetation, mainly ice covered, water, and deserts. fractionation in the surface soil, resulting in soil δ 15 N convergence to litter inputs from plants (Martinelli et al. 1999). Further, N gaseous losses are relatively low in the mountainous Andes region compared to tropical forests, leading to less isotopic fractionation during gaseous N losses. Peri et al. (2012) found similar δ 15 N of surface soil and leaves along a precipitation gradient in the Patagonia region of Argentina, indicating that there is small isotopic fractionation during organic matter decomposition in the surface soil.
In tropical rainforest areas, the RF algorithm was capable of capturing finer (regional) scale soil δ 15 N variation than regression models based on climate alone. Such improved capacity was seen along a tropical gradient, mainly in the Amazon forest, where the predicted soil δ 15 N decreased from east to west. This gradient of δ 15 N has been widely observed in previous empirical studies (Nardoto et al. 2008(Nardoto et al. , 2014. The systematic decline in soil δ 15 N within the Amazon is also apparent in partial dependence plots with MAP, where a decrease in the modeled δ 15 N occurs between MAP 2000 and 2500 mm. Due to the high soil N emissions in the eastern Amazonia (Davidson et al. 2004), isotopically fractionating denitrification losses are probably the main driver of the increase of soil δ 15 N in this region (Houlton et al. 2006, Pérez et al. 2006. The extreme positive δ 15 N values in the semiarid areas probably accentuated these correlations, where there is a low quantity of soil organic carbon, and negative values in temperate forests, with the highest amount of organic carbon. However, soil organic carbon was unable to explain the spatial distribution of the intermediate values of soil δ 15 N in South America (from 3‰ to 7‰), which appear mainly in the tropical and temperate savanna areas of the model. Those three large patterns described above are partially (though not wholly) captured in the global trends found by Craine et al. (2015), in which latitudinal gradients of soil δ 15 N were shown to be indirectly related to the MAP and MAT, which in turn influenced clay content and soil organic carbon. Craine et al. (2015) suggested that in the hot and/or dry regions, there is a higher amount of mineral organic matter, enriched in 15 N, than less enriched non-mineral organic matter. It might explain the high influence of the organic carbon model on the prediction of soil δ 15 N, with the lowest values of organic carbon related to the highest predicted values of soil δ 15 N.
Savannas from South America have been substantially underrepresented in previous δ 15 N soil models (Amundson et al. 2003, Craine et al. 2015. Our study adds an array of new data, thereby making it possible to examine correlations with soil δ 15 N in this biome. The partial dependence plot shows that within the savanna's precipitation range (between 1000 and 2000 mm), predicted soil δ 15 N decreases considerably, followed by increasing soil δ 15 N > 2000 mm. Low predicted soil δ 15 N in savannas compared to semi-arid and forest regions are consistent with observations in the central region of Brazil (Bustamante et al. 2004). Savannas of South America have low N availability compared to tropical rainforests, but with less severe water deficit than tropical semi-arid regions. Particularly, the savannas of Central Brazil present low decomposition rates due to high C:N ratios (~60:1), low net N mineralization, although with high NH þ 4 production in the soil (Nardoto andBustamante 2003, Bustamante et al. 2006). This low N availability in the plant-soil system seemed to have been captured by low soil δ 15 N that the model presented here.
Spatial variation of soil δ 15 N is scale-dependent, given the difficulties encountered by global models in explaining important patterns at some regional scales (Cheng et al. 2009, Freitas et al. 2010, Chen et al. 2018. Even so, the Amundson et al. (2003) model has proven useful in several global scale studies of soil and plant δ 15 N. The difficulty in representing spatial regions comes from the fact that regional geographic or ecosystem limits have not been included in the analysis. Studies show that even in global biogeochemical models, the boundaries of biomes or climatic regions are important auxiliary variables (Terzer et al. 2013). When comparing our model with Amundson's et al. (2003), the importance of regional boundaries for future modeling of δ 15 N is evident (Fig. 10). Our model was able to delimit a large amplitude of soil δ 15 N variation, showing spatial patterns of non-linear correlation with MAP (Fig. 10A). On the other hand, Amundson's global model captured a smaller amplitude of variation of δ 15 N compared to our model ( Fig 10B).
v www.esajournals.org Applicability of the machine learning soil δ 15 N isoscape model Spatial models of soil δ 15 N have been identified as an essential benchmarking tool for improving N-carbon forecasts in global climate change (Houlton et al. 2015). Soil δ 15 N isoscapes at large scales aggregate information on N balance and loss to the atmosphere from the soilplant system (Brenner et al. 2001, Houlton and). Our model, therefore, offers a more robust constraint on models and for understanding local to global biogeochemical fluxes. Bai et al. (2012) demonstrated the strength of spatial models of soil δ 15 N, adapting the equation proposed by Amundson et al. (2003) to estimate the gaseous N losses of terrestrial ecosystems. The authors identified high NO emissions in Africa and high N 2 emissions in the southeastern United States, where there are few N emission measurements (Bai et al. 2012). The addition of the regional factor and the increase in the accuracy of the base models, as presented here, will increase the power of these predictions.
Soil δ 15 N isoscapes also hold demonstrated utility for conservation studies, providing a pathway for tracking wild animal movement (Hobson et al. 2012, Garcia-Perez andHobson 2014). These studies generally have a multiisotopic approach, aggregating oxygen and hydrogen stable isotopes from water together with stable isotopes over the animal's trophic niche, such as carbon and nitrogen. By relating these isoscapes to the animal tissue isotopic ratios, it is possible to create probability maps of origin (Garcia-Perez and Hobson 2014). The isoscape presented here has the potential to reduce the probability area of assignment studies since it accessed regional aspects of the variation of δ 15 N. This same principle applies to the use of the model in forensic investigations (Chesson et al. 2018), marking our approach of global cross-disciplinary importance.