From sample to pixel: multi-scale remote sensing data for upscaling aboveground carbon data in heterogeneous landscapes

. In times of rapid global change, ecosystem monitoring is of utmost importance. Combined ﬁ eld and remote sensing data enable large-scale ecosystem assessments, while maintaining local relevance and accuracy. In heterogeneous landscapes, however, the integration of ﬁ eld-collected data with remote sensing image pixels is not a trivial matter. Indeed, much of the uncertainty in models that use remote sensing to map larger areas lies on the ﬁ eld data integration. In this study, we propose to use ﬁ ne spatial resolution (5 9 5 m 2 ) remote sensing data as auxiliary data for upscaling ﬁ eld-sampled aboveground carbon data to target (meso-scale, i.e., 30 9 30 m 2 ) image pixels. In this process, we assess the effects of ﬁ eld data disaggregation and extrapolation, with and without the auxiliary data. We test this on three study sites in heterogeneous landscapes of the Brazilian savanna. We thus compare two methods that use auxiliary data — surface method, which uses a weighting layer, and regression method, which applies a regression model — with one method without auxiliary data — cartographic method. To evaluate our results, we compared observed vs. estimated aboveground carbon values (for known samples) at the pixel level. Additionally, we ﬁ tted a random forest regression model with the assigned carbon estimates and the target satellite imagery and assessed the in ﬂ uence of the fraction of extrapolated vs. sampled carbon values on model performance. We observed that, in heterogeneous landscapes, the use of ﬁ ne spatial resolution remote sensing data improves the upscaling of ﬁ eld-based aboveground carbon data to coarser image pixels. We also show that a surface method is more suitable for spatial disaggregation, while a regression approach is preferable for extrapolating non-sampled pixel fractions. In our study, larger datasets, which included a higher proportion of estimated values, generally delivered better models of aboveground carbon than smaller datasets that are assumed to more reliably re ﬂ ect reality. Our approach enables to link ﬁ eld and remote sensing data, which in turn enables the detailed mapping of aboveground carbon in heterogeneous landscapes over large areas through the optimized integration of ﬁ eld data and multi-scale remote sensing data.


INTRODUCTION
In times of rapid global change, with implications on ecosystem functioning and the services provided (Foley et al. 2005, Cardinale et al. 2012, the monitoring of ecosystems is of utmost importance. Indeed, only through monitoring it is possible to assess the degree and patterns of change in order to develop adequate mitigation and adaptation strategies (Turner et al. 2007). International programs toward mitigating the effects of climate change and halting biodiversity loss, such as the United Nations Reducing Emissions from Deforestation and Forest Degradation (REDD+) program or the Aichi Biodiversity Targets set by the Convention on Biological Diversity, require monitoring of carbon stocks and biodiversity at a global scale (Running et al. 1999, Schmeller et al. 2015, including the definition of Essential Climate Variables (GCOS 2018) and Essential Biodiversity Variables (Pereira et al. 2013).
Field-based monitoring schemes form the basis of our knowledge on, for example, stored carbon of ecosystems. Implementation costs render field-based assessments being best suited for local studies, while broad-scale monitoring is usually unfeasible (see S olymos et al. 2015). The use of remote sensing data, on the other hand, allows cost-effective ecosystem monitoring for large areas (Hansen et al. 2013, Petrou et al. 2015, but potentially with limited applicability at local scales (Burivalova et al. 2015). Continuous large-scale ecosystem monitoring requires permanent monitoring plots distributed over large areas, such as the Long Term Ecological Research Network (Magnusson et al. 2005, Magurran et al. 2010 or National Forest Inventories (Blackard et al. 2008). Data from local field monitoring programs, providing that the location of the plots is precisely recorded, can then be integrated with remote sensing data for broad-scale assessments of, for example, carbon stocks or biodiversity (McRoberts andTomppo 2007, Bustamante et al. 2016). Combined remote sensing and field survey data can thus address our needs for large-scale ecosystem assessments, while keeping local relevance and accuracy (Zheng et al. 2007, Boisvenue et al. 2016. The registration of field data (e.g., forest inventory data) to remote sensing pixels in managed and homogeneous environments is usually done by, for example, assigning tree density measures at the plot level (often through the interpretation of aerial imagery) which can then be related to the image pixels (Wulder et al. 2008, Tuominen et al. 2010. In natural, heterogeneous landscapes, it is, however, unfeasible to accurately assign single values to heterogeneous plots (Thessler et al. 2005), and the combination of field and remote sensing data becomes a difficult task (He et al. 1998). Assigning field data to a target image pixel may require the disaggregation and interpolation between field-based samples (He et al. 1998, Zheng et al. 2007, an area of active research-most particularly in demographic and climatological studies (Langford 2006, Chen et al. 2015. Also, when the sample units are smaller than the image pixels, the proportions of pixels not fully covered by the field data need to be extrapolated. Sample-to-pixel data integration allows subsequent analysis at the pixel level and hence at the full area covered by the image. Such analyses include, for example, modeling the aboveground carbon of a particular region by fitting the pixel-allocated sample data to the respective remote sensing data in a regression approach (Zandler et al. 2015). The integration of field and (meso-scale) remote sensing data can be done with the aid of high-resolution remote sensing data, either coming from high-resolution satellite sensors or coming from (manned and unmanned) airborne sensors (Marvin et al. 2016).
We illustrate this data integration with a case study where we upscale aboveground carbon data derived from field-based woody vegetation data, collected in small sample plots of up to 10 9 10 m 2 , to the pixel grid of spaceborne hyperspectral data (with a 30 9 30 m 2 spatial ❖ www.esajournals.org 2 August 2018 ❖ Volume 9(8) ❖ Article e02298 resolution). These data, by systematically describing the Earth's surface in a very detailed manner, have great potential for ecosystem monitoring and aboveground carbon mapping (Leitão et al. 2015). Indeed, pioneer studies have made use of spectral indices derived from hyperspectral Hyperion data (Pearlman et al. 2001) for modeling aboveground biomass of both woody and non-woody vegetation (Psomas et al. 2011, Zandler et al. 2015, or estimating forest structure and diversity parameters (Kalacska et al. 2007). While spaceborne hyperspectral programs are underway (Guanter et al. 2015, Lee et al. 2015, their planned meso-scale pixel (30 9 30 m 2 ) can pose problems for sample-topixel allocation, particularly in heterogeneous environments. We focus on three study sites in the Brazilian savanna (Cerrado), a highly heterogeneous system, which consists on a mosaic of different vegetation physiognomies (Schwieder et al. 2016). Our hypothesis is that the use of auxiliary data from a high spatial resolution sensor, such as from the RapidEye sensor, with a pixel size of 5 9 5 m 2 (Tyc et al. 2005) improves the upscaling of field-sampled aboveground carbon data to the image pixels, particularly in such heterogeneous landscapes, ultimately enabling its use for carbon mapping across larger regions. Indeed, a recent study by Gonzalez-Roglich and Swenson (2016) also used high spatial resolution satellite data for estimating tree cover in a savannah in Argentina, which was later related to carbon values. High spatial resolution satellite data, although not available to an extent that allows for large area mapping, could be used for the upscaling field samples.
We tested several methods of incorporating auxiliary information for integrating these data and assessed the effects of data disaggregation and extrapolation, on the respective data integration, with and without auxiliary data. Furthermore, we investigate the importance of data quality for the resulting model performance by using random forest (RF) regression models to fit different sets of pixel-based aboveground carbon data to hyperspectral data. We thus propose the use of high-resolution remote sensing data as auxiliary data for upscaling field-sampled data to target, meso-scale, image pixels in heterogeneous landscapes.

Study sites and field data
Our study is located in three sites in the Brazilian savanna (Cerrado; Fig. 1): Parque Estadual de Terra Ronca (PETR), Parque Nacional da Chapada dos Veadeiros (PNCV), and Parque Estadual da Serra Azul (PESA). These sites can be considered characteristic of the typical savannas of the central Brazil, while representing well its variability, ranging from a low altitude sandy savannah (PETR) to upland savannahs on rock (PNCV) or deep soil substrates (PESA). The Cerrado covers~20% of Brazil's land surface, and it holds the richest biodiversity of all of the world's savannahs (Franc ßoso et al. 2016). This system is, however, mostly unprotected and highly threatened thus constituting a global biodiversity hotspot (Myers et al. 2000, Klink andMachado 2005) which requires monitoring. For all sites, vegetation inventory data were collected, following a common scheme within the Program for Biodiversity Research (PPBio; Pezzini et al. 2012), using the RAPELD principle, adapted for sampling the Cerrado (Magnusson et al. 2005, Teixeira 2017). This approach constitutes a standardized integrated sampling of vegetation biomass and multi-taxa biodiversity data, thus allowing research on the linkages between carbon and biodiversity (Teixeira 2017. The data were collected on a system of trails and plots, following a systematic scheme, as follows. It consists of two 5 km long parallel trails with a distance of 1 km between them, placed in a manner that fit fully within natural vegetation areas. Along each trail, five transects were located 1 km apart from each other (at marks 500-4500 m, counting from the beginning of the trail). Each sampling transect consists of a 250 m center line that follows the elevation contour, with a varying width, according to the taxon sampled (Pezzini et al. 2012, Teixeira 2015. Each transect was further segmented in sections of~10 m length, in a total of 25 per transect. Woody plants with a diameter at breast height of 10 cm or more were sampled in two 10 m wide adjacent polygons (sample plots) on each side of the central line (although with a non-sampled sensitive area along the transect, on the right side), for each section (Fig. 2). In the cases where the transect crossed an obstacle (such as a road or river stream), or when adjacent sections had too much overlap (depending on topography), some sections were excluded and compensated at the end of the transect, thus guaranteeing the sampling of 25 sections per plot (Teixeira 2015). In this study, we considered only data collected in transects covering savannah vegetation, although with varying density (Table 1). Also, data with missing spatial reference were excluded. In total, we considered eight transects in PESA, six in PNCV, and eight in PETR. The floristic inventory data were converted into aboveground carbon, following general allometric equations for the region (Rezende et al. 2006).

Remote sensing data
To improve the upscaling of field-based aboveground carbon values to the target pixel level, we used auxiliary data derived from high spatial resolution (5 9 5 m 2 ) RapidEye data ( Fig. 2; Tyc et al. 2005). The RapidEye data are delivered with a ground sampling distance (spatial resolution) of 5 m. A particularly interesting feature of these data is the so-called red edge band that covers the spectral region between 690 and Fig. 1. This study is located in three sites in the Brazilian Cerrado: Parque Estadual da Serra Azul (PESA), Parque Nacional da Chapada dos Veadeiros (PNCV), and Parque Estadual de Terra Ronca (PETR). Above, the Cerrado is depicted in dark gray (right), and the study sites with a triangle. Below are represented the three study sites, with the respective parallel trails that define the sampling scheme, overlaid on the near-infrared spectral band of the respective RapidEye image. for characterizing vegetation condition and structure (Gitelson et al. 1996, Gomes andMaillard 2015). In this study, the ortho-rectified product was used, which was already corrected for radiometric-, geometric-, and sensor-specific effects, and delivered as top-of-atmosphere reflectance. For all three study sites, we used nearly cloud-free data, which were acquired close to the dates of the field surveys (Table 1). These data were corrected for atmospheric effects by applying a dark object subtraction (Chavez 1996), for subsequent analysis. Finally, we derived the Red Edge Normalized Difference Vegetation Index (RENDVI; Peng and Gitelson 2012), following the formula: (1) where NIR are the reflectance values on the nearinfrared spectral band (band 5: 760-850 nm) and RE are the reflectance values on the red edge band (band 4: 690-730 nm). This spectral index, by retrieving information relevant for describing vegetation productivity (Peng and Gitelson 2012), is thus suitable for use as auxiliary data for the spatial allocation of vegetation and aboveground carbon.
We used spaceborne hyperspectral data from the Hyperion sensor on board of the Earth Observing-1 (EO-1) platform as target remote sensing data (Fig. 1), to which the aboveground carbon data should be registered for subsequent modeling (Table 1). The EO-1 satellite was launched as a scientific demonstrator in 2001, and while it was originally planned for a lifetime of one year, it has recorded multi-spectral and hyperspectral data until March 2017 (Pearlman et al. 2001). The Hyperion data were radiometrically corrected, including correction for pixel shifts, striping, keystone, and smile, as well as atmospheric effects. The visible and near-infrared (400-1000 nm) and the shortwave infrared (SWIR, 1000-2500 nm) detectors, which separately record electromagnetic radiation in their respective wavelength ranges, were co-registered (Datt et al. 2003, Rogass et al. 2014a. Data were spatially subsetted to the respective study regions (Fig. 2) and co-registered using precision terrain-corrected (L1T) Landsat OLI scenes for spatial consistency across all study areas. Erroneous or noisy spectral bands were interactively screened and excluded. Data were spectrally smoothed with a Savitzky-Golay filter (Savitzky andGolay 1964, Miglani et al. 2011). This resulted in a total of 83 spectral bands per Hyperion image (from the original 242), covering the visible, near, and SWIR portions of the electromagnetic spectrum.

Tests on the disaggregation and extrapolation of aboveground carbon data
To test our hypothesis that using auxiliary high spatial resolution satellite data improves the  ❖ www.esajournals.org spatial allocation of the field data to the target image pixel, and to investigate which method for data integration should be used, we performed two tests (for each study site): one on spatial data disaggregation and one on spatial extrapolation. For these tests, we selected all the sample plots that laid fully within each of the target pixels, for which we have known (sampled) aboveground carbon values (Fig. 3). The first test is on the effects of using auxiliary data on the spatial disaggregation of field data related to one sample plot falling in two or more target pixels, i.e., whose carbon values had to be disaggregated to the respective pixels. In this case, the sample plots were merged (aggregated) in random pairs and the respective aboveground carbon values summed. These merged plots were subsequently disaggregated following the described methods. The second test, on data extrapolation, relates to situations where the target pixels were not fully sampled, and the respective carbon values need to be extrapolated to represent the full image pixel. In this test, one random plot was excluded per target pixel to be subsequently estimated. Each test was iterated 1000 times to ensure the robustness of the results. Fig. 3. Representation of the tests on the carbon data spatial disaggregation and extrapolation, for the Hyperion pixels shown in Fig. 2. The sample plots selected for these tests are those which are fully within a target (Hyperion) pixel. All tests used three different methods for data aggregation and extrapolation: cartographic (which does not make use of high-resolution auxiliary data from RapidEye); surface (uses RapidEye auxiliary data as a weighting layer); and regression (builds a regression model on the auxiliary data). In the test on spatial disaggregation, two plots within each pixel are randomly merged, to be then estimated using the three methods described. In the test on spatial extrapolation, within each pixel one random plot is deleted, to then be estimated based on the information from the remaining plots within the same pixel.
In each test, we applied three methods of data integration-cartographic, surface, and regression. The cartographic method refers to an areaweighting approach, which does not require any auxiliary information. It thus assumes homogeneity on the spatial distribution of the target values and served as control for testing our hypothesis. The surface method works in a similar way to the previous, but uses auxiliary data (in this case derived from the RapidEye imagery) as a weighting layer, which are multiplied with the area of the respective sample plots.
The regression method, here defined as a linear model, uses the auxiliary data and the plot area combined as a predictor variable in the model, in the following manner. For the data disaggregation, it was assumed that the carbon values (C t ) are a function of the respective area (A t ) multiplied by the auxiliary layer (W t ), while the model intercept was kept at zero: The resulting model was used to disaggregate the merged plots back to the original ones. For the data extrapolation, it was assumed that the total carbon value for the target pixel (C t ) to be the sum of the known (sampled) carbon value (C s ) and a function of the unknown (nonsampled) area (A u ) multiplied by the auxiliary layer (W u ), while the model intercept was kept at zero, and the coefficient for the first predictor variable (C s ) was fixed at one: This way we constrained the regression so that the response value is solely a function of the known (sampled) carbon value added to the area and auxiliary layer values. The resulting regression model was used to estimate the carbon value for all pixels.
We validated the spatial disaggregation and extrapolation tests at the target (Hyperion) pixel level, which means that the sum of the resulting carbon estimates per pixel were compared to the actually sampled values, by the averaged root-mean-square error (RMSE), relative RMSE (RMSE rel equals the RMSE divided by the mean input carbon value), and the coefficient of determination (R 2 ) between predicted and observed validation samples, over the 1000 iterations.

Sensitivity analysis on data quality
The number of pixels available for use in a regression model of aboveground carbon depends on the assigned threshold of minimum pixel coverage by field samples. We performed a sensitivity analysis on the resulting regression models to evaluate the trade-off between having a high number of pixels (which may cover a larger portion of the variability within the study region) and a low share of estimated samples. We defined equidistant 10% thresholds ranging from 0% (all pixels partially sampled in the field) to 90% (pixels which were at least 90% sampled in the field) leading to a decreasing number of input samples with an assumed increasing reliability (Fig. 4), based on the fraction of pixels actually sampled in the field. For each threshold, we iterated the data splitting into 70% training and 30% validation data 1000 times and fitted the aboveground carbon values to the hyperspectral Hyperion data using a RF regression model (Breiman 2001). RF is a machine-learning approach based on the Classification and Regression Tree (CART) algorithm (Breiman et al. 1984). The algorithm trains a decision tree with a randomly drawn subset of the given input data and internally evaluates its performance with the leftover data. An ensemble of many decision trees (a forest) is trained reflecting that every single tree can be erroneous. Results are then averaged into the final model. Model performance was evaluated based on average RMSE, RMSE rel , and R 2 between predicted and observed validation samples, from the withheld 30% validation data, over the 1000 iterations. Standard deviations around the mean are also reported and graphed. All processing was performed in R (R Development Core Team, 2016) using the ran-domForest-package (Liaw and Wiener 2002).

Tests on the disaggregation and extrapolation of aboveground carbon data
Our results were consistent across all study sites and performance measures ( Table 2). The tests on disaggregation of aboveground carbon data showed consistently better performances when using auxiliary data in a surface interpolation method (RMSE rel values ranging between 0.479 and 0.655 and R 2 values ranging between ❖ www.esajournals.org 0.750 and 0.826). The regression method was the least performing one for data disaggregation.
When extrapolating the aboveground carbon data, the use of auxiliary information was always beneficial. Their use in a regression approach achieved the best results (RMSE rel ranging between 0.370 and 0.509 and R 2 between 0.843 and 0.914), while the surface method was not so well performing. The cartographic method used was the least performing for the extrapolation of aboveground carbon data.
The coefficients obtained in the regression models (for the carbon data extrapolation) were specific to each study site, though consistent across all tests (between 2.728 and 2.805 for PESA; 3.915 and 4.082 for PNCV; and 2.693 and 2.711 for PETR), and with little variation between iterations.

Sensitivity analysis on data quality
The carbon model performances differed between the three study sites, although it was possible to identify general trends related to the data thresholds used. Best model performances were derived in the PESA study site with averaged relative RMSE values ranging from 0.34 to 0.68. In PNCV, the RMSE rel values ranged from 0.42 to 0.72 and in PETR from 0.52 to 1.09 (Fig. 5).
Generally, the decreasing number of training pixels led to less robust results with a higher standard deviation in the performance measures along with an overall performance loss (increase in RMSE and RMSE rel ). In terms of R 2 , the trend of performance loss with decreasing sample size was not always observed, although smaller samples always resulted in higher variation between models (higher standard deviation). Also, as long as a sufficient sample size is available (e.g., below the 0.6-0.7 thresholds), and a sufficient portion of the pixel is sampled in the field (e.g., above the 0.1-0.2 thresholds), it is possible to achieve reasonable good aboveground carbon models using this approach.

DISCUSSION
We verified our hypothesis that, in heterogeneous landscapes, the use of auxiliary information from high spatial resolution remote sensing data improves the upscaling of field-sampled aboveground carbon data to meso-scale remote sensing image pixels. This, in turn, enables their use for carbon mapping, potentially over large areas. In this study, we observed that the use of auxiliary data did improve the data integration, although the choice of the method used can be influential. We found that the best approach for upscaling field-collected aboveground carbon data makes use of auxiliary data in a (local) surface method for data disaggregation and in a regression for data extrapolation. Indeed, the lack of auxiliary information (with a cartographic method) never delivered the best results in upscaling the field-collected data. Its assumption of homogeneity in the distribution of aboveground carbon is not verified in such heterogeneous systems. Also, when disaggregating the field samples into different target pixels, the use of auxiliary data in a regression approach for data disaggregation resulted in the poorest results for data disaggregation. This also agrees with what has been previously found by Fisher and Langford (1995). As regression models are fitted globally, estimates from locally fitted methods should adapt better to heterogeneous environments. When extrapolating the field samples into the full (partially non-sampled) pixels, however, the regression method was the best performing approach.
Our conclusions on the use of high spatial resolution as auxiliary data agree with findings in a similar system, the Argentinean savannahs (Gonzalez-Roglich and Swenson 2016), which suggests the generality of our approach for savannahs and other heterogeneous systems. In the referred study, the authors used fine resolution satellite imagery for scaling up field data, to assess tree cover at the meso (Landsat) scale, which was later related to carbon. While extremely relevant for heterogeneous environments, this approach is not necessarily required in homogeneous environments. Indeed, in the latter conditions the co-registration of field samples and image pixels is commonly done through the estimation of plot-level tree density values in a dasymetric approach (McRoberts andTomppo 2007, Tuominen et al. 2010).
The use of multi-scale remote sensing imagery for vegetation monitoring has been widely used for, for example, generating maps of forest biomass or productivity over large areas (Tomppo et al. 2002, Lefsky et al. 2005, Muukkonen and Heiskanen 2007. The choice of the higher-resolution data to be used as auxiliary information in this approach is critical for successfully upscaling field data, as it needs to relate to the field measured variable-in our case, the vegetation's aboveground carbon. Here, we used a spectral index based on the red edge spectral bands of RapidEye imagery, known to relate to vegetation structure and therefore biomass in the Cerrado (Gomes and Maillard 2015). Further research is still required to learn about the best possible data to be used as a weighting layer. This, however, falls outside the scope of this study and would raise issues related to data availability constraints. Ultimately, the data used as auxiliary layer will determine the estimated regression coefficient used in this approach. Also, the approach presented here could potentially be used to integrate field data with multi-scale systems, such as that of Sentinel-2, which collects large amounts of data across the globe on a high frequency (Drusch et al. 2012). In this case, where the data are collected at different spatial resolutions (10, 20, and 60 m, depending on the spectral bands), for example, the 10 m data could be used for the upscaling of field data to the 20-m pixels. Likewise, data collected by unmanned aerial vehicles (or drones)-a technology with an exponential popularity-could be used for the bridging the gap between ground-based and satellite data (He et al. 2015, Marvin et al. 2016).
Our analyses also showed that, while a more restrictive (high) threshold on the share of sampled (vs. estimated) data should ensure better data reliability, it also results in fewer training pixels which in turn generate less good models. Indeed, smaller sample sizes usually mean that a smaller proportion of the system's variability is captured, thus resulting in less generalizable estimations (Wisz et al. 2008). Further, we can Fig. 5. Averaged carbon model results in terms of root-mean-square error (RMSE), relative RMSE, and R 2 after 1000 iteration for all three study sites. The gray shaded area along the curves shows AE one standard deviation around the mean performance measures.
conclude that by using high-resolution data for upscaling field data, as long as a sufficient sample size is available and a sufficient portion of the pixel is sampled in the field, it is possible to achieve good aboveground carbon models.
We also observed that spaceborne hyperspectral imagery is suitable for monitoring and mapping ecosystems properties (Abdel-Rahman et al. 2013, Leitão et al. 2015. Indeed, in this study we used end-of-life Hyperion data (was shut down in March 2017), which came with many different issues, such as data striping, pixel shift, and a low Signal-to-Noise ratio (Scheffler and Karrasch 2014). After thorough data correction and screening, the remaining 82 spectral bands (out of the original 242) had enough detailed information to characterize and predict the up-scaled aboveground carbon data. It is expected that the advent of forthcoming hyperspectral missions, such as EnMAP (Guanter et al. 2015) or HyspIRI (Lee et al. 2015), will enable many more applications related to ecosystem monitoring and mapping of natural resources (Schwieder et al. 2014, Leitão et al. 2015, Pellissier et al. 2015, Steinberg et al. 2016. Upscaling field samples to target pixels enables the use of remote sensing imagery for carbon mapping and ecosystem monitoring in heterogeneous environments (Schwieder et al. 2018). Through this approach, it is possible to, for example, do wall-to-wall mapping of carbon over large areas with time series of widely available multi-spectral imagery (Wulder et al. 2015), or characterize particular areas with high detail with spaceborne hyperspectral imagery (Guanter et al. 2015). Ultimately, this will have deep implications for global carbon mitigation programs such as REDD+, by allowing the detailed calculation of aboveground carbon in a spatially explicit manner.
To our knowledge, this is the first study that compares different methods of multi-scale data integration in heterogeneous environments, this way providing clear guidelines on how to proceed in such cases. Our study thus facilitates the integration of existing field-collected datasets with remote sensing imagery, this way contributing with the generation of workflows for the large-scale assessments of natural systems.

CONCLUSION
Using high spatial resolution remote sensing imagery as auxiliary data are beneficial for the spatial allocation of field-sampled data to a larger target pixel. This is particularly relevant in heterogeneous environments, where it is not possible to define homogeneous plots of known vegetation density. The method for integrating the auxiliary data in the analysis is, however, not trivial and can have a great influence in its overall performance. While local, surface approaches are preferable for the spatial disaggregation of field samples to the target pixel grid, the extrapolation of the data into the full pixel extent is (Fig. 5. Continued) better done with a global, regression model. This approach enables the spatial allocation of field data to larger image pixels, thus allowing the use of remote sensing imagery for ecosystem monitoring and carbon mapping over large heterogeneous areas.

ACKNOWLEDGMENTS
This study is part of the research activities of the EnMAP Scientific Advisory Group (EnSAG) and was funded by the German Aerospace Centre (DLR)-Project Management Agency, granted by the Ministry of Economics and Technology (BMWi grant 50EE1309). Underlying RapidEye data have been contributed on behalf of the German Aerospace Center through funding of the Federal Ministry of Economy and Energy (RESA ID 00186). The field data were collected within the project CNPq 457497/2012-2 and Edital Sisbiota-CNPq no 47/2010 (CNPq 563134/2010-0, Projeto Diversidade biol ogica do Cerrado: estrutura e padrões). J.R.R.Pinto benefitted from a Productivity fellowship (CNPq 307701/2014-0).