Inter-laboratory variation in the chemical analysis of acidic forest soil reference samples from eastern North America

. Long-term forest soil monitoring and research often requires a comparison of laboratory data generated at different times and in different laboratories. Quantifying the uncertainty associated with these analyses is necessary to assess temporal changes in soil properties. Forest soil chemical properties, and methods to measure these properties, often differ from agronomic and horticultural soils. Soil proficiency programs do not generally include forest soil samples that are highly acidic, high in extractable Al, low in extractable Ca and often high in carbon. To determine the uncertainty associated with specific analytical methods for forest soils, we collected and distributed samples from two soil horizons (Oa and Bs) to 15 laboratories in the eastern United States and Canada. Soil properties measured included total organic carbon and nitrogen, pH and exchangeable cations. Overall, results were consistent despite some differences in methodology. We calculated the median absolute deviation (MAD) for each measurement and considered the acceptable range to be the median 6 2.5 3 MAD. Variability among laboratories was usually as low as the typical variability within a laboratory. A few areas of concern include a lack of consistency in the measurement and expression of results on a dry weight basis, relatively high variability in the C/N ratio in the Bs horizon, challenges associated with determining exchangeable cations at concentrations near the lower reporting range of some laboratories and the operationally defined nature of aluminum extractability. Recommendations include a continuation of reference forest soil exchange programs to quantify the uncertainty associated with these analyses in conjunction with ongoing efforts to review and standardize laboratory methods.


INTRODUCTION
Long-term soil monitoring and resampling studies to assess temporal changes of soil properties have become a commonly used tool in the assessment of environmental change (Lawrence et al. 2013). The effectiveness of these types of studies depends on accurate and reproducible laboratory data (Desaules 2012). Agronomic and horticultural soil test methods have been developed to serve the agricultural industry and are often regionally standardized. These methods are often not applicable for soil monitoring when done in forest soils that can be acidic, high in exchangeable Al, low in exchangeable base cations and high in organic carbon (e.g., Johnson 2002, Ross et al. 2009). Forest soils are arguably more variable than agricultural soils, ranging more broadly in slope, rock content and chemical conditions. Perhaps for this reason, as well as the relative youth of forest soil science as a discipline, a variety of analytical methods have been developed, with little standardization among studies.
Organizations such as the North American Proficiency Testing Program (NAPT; http://www. naptprogram.org/), administered by the Soil Science Society of America, provide quality assurance testing for soil fertility testing laboratories. Samples used in the NAPT are usually typical agronomic soils, relatively lower in C and higher in pH than corresponding forest soils of eastern North America. The methods used for extractable cations in agronomic soil fertility testing are often not the same as those used in forest soil research laboratories, making it difficult to assess the precision of methodologies applied to forest soils. Thus, there is a need for both reference soil samples characteristic of forest soils and for inter-laboratory analyses of those samples so that the accuracy and uncertainty associated with the chemical analysis of forest soils can be assessed.
Carbon in forest soils is a topic of considerable current interest because of global climate change. Soils worldwide store about twice as much C as the atmosphere and play an important, but poorly understood, role at the interface of terrestrial and atmospheric C pools (Richter and Houghton 2011). Increased sequestration of C in forest soils is considered one strategy to contribute to the amelioration of ongoing increases in anthropogenic C emissions, though there is some uncertainty regarding whether forest soils in eastern North America will be sources or sinks of C in a changing climate (e.g., Dib et al. 2014). The importance of assembling data on soil carbon from varied sources is reflected in the recent formation of the International Soil Carbon Network (ISCN 2014) and the United Kingdom Soil Observatory (Lawley et al. 2014), both established to facilitate data sharing. Usually concomitant with C analysis, total soil N analysis is performed to study the effects of N deposition and fertilization on C dynamics and ecosystem function. Recent work has also focused on how N controls C accumulation (e.g., Lovett et al. 2013). The basic analytical techniques for determining total C and N are quite old and have been automated over the past few decades with the development of elemental analyzers. Therefore, close agreement in results among laboratories should theoretically be achievable. It is critical to be able to provide assurance regarding the accuracy and variability of soil C and N analyses to instill confidence in measurements of soil C change over space and time.
For studies examining the long-term consequences of soil disturbance due to forest management, assurance of accurate laboratory analyses is also essential. This is especially true if the original soil samples are not available for repeated analysis under the same procedures and conditions as the new samples. The variability associated with any measurement is needed to determine the magnitude of change over time that would be statistically significant. Resampling soils over decadal time scales to assess change in soil chemical properties is a primary objective of international studies such as the Long-Term Soil Productivity (LTSP) experiment (Powers et al. 2005), and the U.S. Forest Service Forest Inventory and Analysis Program (O'Neill et al. 2005). This is also true for other coordinated long-term soil monitoring studies, e.g., the Vermont Monitoring Coop's 200-yr study (Sleeper et al. 2009) and the Calhoun Long-Term Soil Experiment (Richter et al. 1999), for which repeating the analysis of all prior samples would be impractical, and issues of long-term storage effects on some measurements could complicate results (Lawrence et al. 2012).
Past research related to the effects of acidic deposition on forest ecosystems has highlighted the need for accurate determination of exchangeable cations and standardized methods for measuring pH (e.g., Blume et al. 1990, Johnson et al. 1994, Likens et al., 1996. More recent work on Ca depletion in acidic soils of eastern North America (Watmough and Dillon 2004, Bailey et al. 2005, Warby et al. 2009, Hazlett et al. 2011, Lawrence et al. 2012) has underscored the need for methods that can reproducibly measure relatively low exchangeable Ca concentrations.
Calcium and other base-forming cations are required nutrients for plants. Although soil levels of these nutrients for proper growth of agricultural crops are commonly considered, recommendations for forest species are less well established. However, there is increasing interest in use of soil testing in forest management (Horsley et al. 2002). For example, Bailey et al. (2004) proposed threshold values of Ca and Mg for identifying stands at risk to sugar maple decline disease. Ouimet et al. (2013) proposed soil guidelines for Ca, K, and P based on foliar deficiency symptoms. The soil guidelines for Ca proposed for sugar maple by these two studies differ by an order of magnitude. It is unclear what role differences in analytical techniques play in this difference. Availability of wellcharacterized forest soil reference samples would promote consistency between such initiatives and perhaps promote soil testing as a more commonly used management tool.
Another element of interest in acidic soils is Al, both because of its ability to effectively compete with base cations for cation exchange sites and its toxicity to forest vegetation and aquatic biota. For example, soil Ca:Al ratios have been used to define thresholds of stress and toxicity to tree species (Cronan and Grigal 1995). A wide variety of methods for measuring 'available' soil Al are used and, more so than for alkali and alkaline earth cations, the results are method dependent (Huntington et al. 1990). In acidic high-C soils, much of the extractable Al is organically complexed and not necessarily freely exchangeable (Ross et al. 2008). Because of this, extraction efficiency may vary with technique, time and solution:soil ratio, even when unbuffered salts are used for extraction (Skyllberg 1995). Exchangeable Al is operationally defined by the procedure and any comparison of results needs to carefully consider the methodology used.
Comparing results across laboratories for a common set of samples would therefore help us better understand how results from various methods relate to each other and provide a basis for quantifying analytical uncertainty. This information is essential for the purpose of linking data from individual studies to address questions of greater spatial and temporal scope, as the need for this type of data integration grows (Richter and Billings 2008). Under the auspices of the Northeastern Soil Monitoring Cooperative (NESMC, www.uvm.edu/;nesmc), a forest reference soil sample exchange was initiated in 2007. We collected samples of a forest floor organic horizon and a spodic horizon from a moderately well-drained northern hardwood stand in Vermont and distributed these to interested research and research-service laboratories in eastern Canada and the US. The two reference samples were representative of many of the properties unique to forest soils of the region. The organic (Oa) horizon was highly acidic and high in C. The spodic (Bs) horizon was also acidic and contained relatively low exchangeable Ca. The participating laboratories analyzed these samples using their routine methods and the results have been compiled to assess variability and evaluate the impacts of methodological differences. The objective of this paper is to v www.esajournals.org present results of the sample exchange to evaluate compatibility of results achieved by laboratories that have collected data relevant to forest soil chemistry and temporal change in eastern North America. There can be a number of other sources of variability in studies of soil change, including such things as experimental design and sampling methods (Lawrence et al. 2013), but the sole focus of this paper is variability in laboratory chemical analysis.

METHODS
The two reference soil samples were collected at an elevation of 695 m from Underhill State Park on the western slopes of Mt. Mansfield in north-central Vermont (latitude 44832 0 7 00 N, longitude 72850 0 8 00 W). Sampling was done downslope from the Soil Climate Analysis Network site in Underhill, VT (http://www.wcc.nrcs.usda. gov/scan/). The soil was mapped as a Peru sandy loam; coarse-loamy, mixed, frigid Aquic Haplorthod (Soil Survey Staff 2010); Humo-Ferric Podzol (Soil Classification Working Group 1998). Two relatively thin horizons (varying from ;2 to 8 cm) were sampled, the Oa (H) and Bs (Bf ), both identified in the field by color and relative position in the soil profile. Approximately 120 L of each horizon were sampled from an extensive area, air-dried and sent to Michael Amacher at the United States Department of Agriculture Forest Service Rocky Mountain Research Station Forestry Sciences Laboratory in Logan, UT. There, the samples were processed similarly to soils used in the NAPT Program. The two samples were further air-dried at ambient temperature in large stainless steel trays on glasshouse benches with periodic gentle remixing. Each was then passed through a large polyethylene 2-mm sieve, homogenized, and stored in polyethylene 20-L buckets. Subsamples from these buckets were taken after mixing by vigorous stirring and distributed to each participating laboratory.
Each laboratory used their normal routine methods for measuring exchangeable cations, pH and total C and N. These methods are briefly outlined below and presented in more detail in the Results and Discussion section. Laboratories were requested to periodically analyze the reference soils with their regular sample load and, if possible, provide at least 6 replicate determinations.
Statistical analysis was done using the univariate procedure in SAS 9.2. Normality was assessed by the Shapiro-Wilk 'W' statistic and by visual inspection of normal probability plots, stem-leaf and box plots of the data. Data of this type often contain outliers and are skewed (Feinberg et al. 1995). Because of this distribution, the median is usually used to represent the consensus value and variability is described by the median absolute deviation (MAD, van Montfort 1996), which is less sensitive to outliers than the standard deviation. The MAD is simply the median of the absolute deviation of each value from the sample median. The sample median 6 2.5 3 MAD was used to describe the 'acceptable' range for a particular chemical test.

Total carbon and nitrogen by elemental analyzer
Although a number of wet chemical methods are used for the measurement of soil organic C and N, most laboratories currently use an elemental analyzer with dry combustion to determine total soil C and N. If there is a measureable quantity of inorganic C in the soil (generally not found in acidic forest soils), then this must be removed before analysis or quantified and subtracted if organic C is the desired result. All elemental analyzers combust the sample at high temperature in the presence of catalysts and added O 2 . Nitrogen is usually measured at the same time and the combusted sample is reduced to N 2 , CO 2 , and H 2 O before separation by either traps or a combination of traps and a chromatography column, usually followed by detection with a thermal conductivity detector. A number of different manufacturers make these instruments, which vary in the specifics of combustion and reduction temperatures, the mixture of catalysts and the separation method for the N 2 and CO 2 gases (for more details on instruments and parameters used in this study; see Table 1 and Appendix: Table A1). None of the participating laboratories pretreated the samples to remove inorganic C, which was assumed to be negligible. The degree of grinding of the soil sample prior to analysis may also be important, especially in those elemental analyzers that utilize a small sample size for analysis. The participating laboratories used different v www.esajournals.org methods for this step in the reference sample preparation, including a ball mill to pulverize the sample to a finer particle size, hand grinding or simply analyzing as received (Table 1). Methods cited for C and N analysis were usually the published methods of the Soil Science Society of America (Bremner 1996, Nelson andSommers 1996) or the Canadian Society of Soil Science (Rutherford et al. 2008, Skjemstad andBaldock 2008). All data can be found in the appendices, Tables A2 and A3. Because all data were not normally distributed, Spearman's rank procedure was employed (using SAS 9.2) to examine correlations between instrumental parameters and both within-laboratory variability and error relative to the median.

Soil cations
Extractants designed to measure concentrations and pools of exchangeable cations in acid soils usually utilize unbuffered salts such as NH 4 Cl or BaCl 2 and often employ leaching of the sample to promote full exchange. During early research on the effects of acidic deposition, a number of methods manuals were published by government organizations coordinating acidification research (Robarge and Fernandez 1986, Blume et al. 1990, Kalra and Maynard 1991. In these manuals, NH 4 Cl was usually the recommended extractant, somewhat influenced by the emphasis on forest soils in these research and monitoring programs, and mechanical vacuum extraction (MVE) was one of the recommended procedures in an attempt to standardize methods. The MVE procedure involves drawing the salt extractant solution slowly through the sample over a period up to 14 hours. The method utilizes a specialized piece of equipment (mechanical vacuum extractor) and because of this and the long extraction time, it is not used for routine agronomic fertility testing; overall utilization of this extraction equipment has declined with time. 'Batch' extraction methods with NH 4 Cl are a common alternative to MVE. The participating laboratories that used this procedure had a wide range in solution:soil ratio and shaking time (Table 2).
Batch extractions used to assess nutrient cation status may entail a relatively low solution:soil ratio (e.g., 5:1) and a relatively short shaking time (as low as 5 min). While these procedures are common in agronomic fertility testing laboratories, they may not be completely efficient in removing most of the exchangeable Ca, Mg, K and Na because the low solution:soil ratio can result in an exchange equilibrium that leaves a small quantity of the exchangeable cations retained by the soil. One common extractant, NH 4 -acetate (NH 4 OAc), contains an organic acid and is often carried out at a buffered pH of 4.8 (Modified Morgans extractant) or pH 7.0 (originally proposed in Science by Schollenberger (1927)). More recently developed soil test extractants such as Mehlich-3 (Mehlich 1984) were designed to be 'universal' and contain F to extract P and EDTA to extract micronutrients, along with v www.esajournals.org NH 4 OAc and a small amount of HNO 3 . Taken together, the group of participating laboratories used all of the extractants described above ( Table  2). All data can be found in the Appendix: Tables  A4-A6. pH Measuring soil pH with a potentiometric electrode is routine in all laboratories and two common methods are used-pH in deionized water (pH w ) and a pH in a salt solution, usually 0.01 M CaCl 2 (pH s ). All participating laboratories used one or both of these methods on the reference soils, but with a range of solution:soil ratios and contact time between the soil and the solution prior to measurement. All data can be found in Appendix: Tables A7 and A8.

Other chemical tests
Fewer than half of the participating laboratories performed either loss-on-ignition (LOI), exchangeable acidity or a digestion for total elemental analysis. The results and method citations for these three tests are included in Appendix: Tables A9, A10, and A11, respectively.

Moisture determination
There were a wide variety of methods used to adjust the reported values to a dry-weight basis. These sometimes varied within a laboratory between reporting exchangeable cations and C and N. All laboratories preferred drying at 1058C for mineral soil horizons but the temperature for drying Oa horizons ranged between 608 and 1058C. Most laboratories performed a moisture determination on a separate subsample and adjusted their reported values to provide an oven-dry expression of the chemical results. However, some laboratories dried their samples before analysis, either by oven drying or by storing the sample in a desiccator. A few laboratories simply reported air-dried values for the cation extractions.

Participating laboratories
The laboratories and research groups that participated in this sample exchange were the Agricultural and Environmental Testing Laboratory at the University of Vermont; the Soil Biogeochemistry

Carbon and nitrogen
Fifteen laboratories determined C and N in the two reference samples (Fig. 1, Tables 3 and 4). For both horizons and both elements, the data approximated a normal distribution and the median was close to the mean. Overall, the variability among laboratories was relatively low, resulting in small MAD-values and standard deviations. For C in the Oa horizon, all laboratories were within 2.5 3 MAD (2.5 3 9.5 g/kg) of the median (286.5 g/kg), or a range of 68.2% (Fig.  1). For Bs horizon C, two laboratories reported median values below and two laboratories above the range of the median 6 2.5 3 MAD (Fig. 1), which in this instance was 62.8 g/kg (or 67.9%) of the median (35.4 g/kg). Similar results were found for N, with one laboratory lower than the calculated range (61.5 g/kg or 9.1% of the median) for the Oa horizon and three laboratories out of range (60.15 g/kg or 8.8% of the median) for the Bs horizon (Fig. 1). It should be noted that there is no absolute or certified value for any of the analytical results. As with other sample exchange programs, the actual value for a particular result is assumed to be the median of all data submitted (Feinberg et al. 1995). Since the MAD and the standard deviation are a function of the variability in the submitted results, a relatively narrow range will result if the data are in close agreement. Commonly, laboratories run quality control (QC) samples (of certified con-centration) with their unknowns and accept their data if the QC results are 65-10% of the certified value. Use of the MAD statistic generally created a similar acceptable range.
We also examined within-laboratory variability by calculating the standard deviation of any result with an n of 3 or more. Although the standard deviation of such a small sample size carries considerable uncertainty, it is still useful for comparison. For the Oa horizon, the strongest correlation between the within-laboratory variability and other parameters was actually with sample number (Spearmans r ¼ 0.63, P ¼ 0.012). This could be because laboratories that repeatedly ran the sample did so over an extended time period with different calibrations, introducing an additional source of variability. Four of the five laboratories with the highest standard deviations also had the four highest ratios of particle size to sample weight (Table 1 and Appendix: Table A3). A high ratio indicates the combination of a relatively coarse particle size with a relatively small amount of sample analyzed. Different instruments are configured to utilize different amounts of sample, with the concept that a larger sample size should reduce variability and the need for a finer grind. The other laboratory with a high with-in laboratory deviation (#11), had a low size:weight ratio but ran the sample as received (,2000 lm). The laboratory (#5) that reported a C concentration in the Oa furthest from the median also used unground 2-mm sample (and had the fourth highest standard deviation along with the third highest ratio of particle size to weight). It may be that the 2-mm sieved Oa horizon was simply not homogeneous enough to provide good repeatability, even in some laboratories that used a large sample weight. Similarly for the Bs horizon, three out of the four laboratories with the highest variance had the highest ratios of particle size to sample weight (Table 1 and Appendix: Table: A2). However, of the three laboratories that had the highest MAD for C in the Bs horizon, only one had a high size:weight ratio. The source of this error is not apparent. The strongest relationship between error in the Bs horizon and other parameters was with the error in the Oa horizon (Spearmans r ¼ 0.65, P ¼ 0.009), i.e., laboratories tended to have similar magnitude of error for both types of soil horizons.
The C:N ratio of forest soils has been used to predict the potential for net N mineralization and nitrification (e.g., Aber et al. 2003, Ross et al. 2009) and is often presented as an indicator of litter quality inputs for forest soils (Gholz et al. 2000). The C:N ratios for the two reference soils (Fig. 1) were within the range reported for similar horizons in the northeastern US (Ross et al. 2011). Variability among laboratories for the C:N ratio in the Oa horizon was similar to that v www.esajournals.org found with both C and N (Table 3). However, the Bs horizon showed considerably more variability in the C:N ratio than either C or N, with a range of 17.2-22.5 (Table 4). Thus, the concentrations of C and N did not vary in parallel. For example, laboratory 3 reported the highest value for N in the Bs horizon but a C value below the median, resulting in the lowest C:N ratio (Fig. 1). This variability resulted in the data from two laboratories being well below the lower threshold of the median À2.5 3 MAD (Fig. 1). These results suggest that accuracy in the C and N analyses is more difficult to obtain in the Bs horizon, either because of the lower concentrations or interferences between the more mineral-rich matrix and analytical conditions such as combustion.

Cations (Ca, Mg, K and Na)
Thirteen laboratories determined cations in the Bs horizon and employed five different extractants ( Table 2). Eight of the laboratories used 1 M NH 4 Cl, one used 0.1 M NH 4 Cl, three laboratories used NH 4 OAc buffered either at pH 4.8 or pH 7.0, and one laboratory used Mehlich-3. The procedures also varied, with eight laboratories using a batch extraction and five using MVE. The extractants used for the Oa horizon were identical to that of the Bs, however two fewer laboratories participated (one each using the 1 M NH 4 Cl MVE and batch methods). Seven of these 11 laboratories used a higher solution:soil ratio for the Oa horizon to promote greater efficiency of extraction. Because of this variety, we grouped the results by extractant and method (Fig. 2). The largest differences among laboratories were found with Ca and Na. In the Oa horizon, the Ca results from the pH 7 NH 4 OAc were outliers (lower than the median À 4 3 MAD). These were removed and the subsequent statistical analysis showed that the Mehlich-3 was now an outlier. With these three values removed, the Ca data were normally distributed and the median was close to the mean (Table 3). In the Bs horizon, the Ca data were normally distributed but had a broad range that  v www.esajournals.org resulted in a relatively high MAD in which the acceptable range would be 655% of the median ( Table 4). Because of this, we used the mean and the 95% confidence interval of the mean for data analysis (Fig. 2). Overall, the variability within these cation data was much greater than found with C and N, with 2.5 3 MAD ranging from 10% of the median for Ca in the Oa horizon (after outliers were removed) to over 50% of the median for Na in both horizons. With the exception of one outlier for K in the Bs horizon, both Mg and K results were relatively consistent across laboratories and extractants. Greater variability, both within and among laboratories, was found with Na. This was expected because of the low concentrations present in these soils and the tendency for Na (and K) to partially ionize during ICP analysis, affecting quantification by optical spectroscopy. In addition, Na is probably the most prone to field-sampling and laboratory contamination of all of the elements analyzed in this study, partly due to the ready source of contamination from human contact.
Because of the limited number of laboratories performing each of the procedures, it is difficult to draw firm conclusions as to the effects of extractant and extraction method. Nonetheless, we used our results to evaluate possible effects of extractant and extraction method. For Ca in the Oa horizon, the mean and median of the four laboratories using 1 M NH 4 Cl in batch extractions were nearly identical to the laboratories using MVE with different extractants. Past work has found small effects of extractant and method on extractable Ca in similar soils (e.g., Kraske et al. 1989). Lawrence et al. (1997) compared batch extractions of four Oa horizons with 1 M NH 4 Cl, 1 M KCl, 0.1 M BaCl 2 and pH 7.0 1 M NH 4 OAc. The three unbuffered salts gave comparable results but, similar to our data, the pH 7.0 extractant averaged 20% lower (although the difference in Lawrence et al. (1997) was not statistically significant). On the other hand, Bailey et al. (2005) used 1 M NH 4 Cl to reanalyze forest soil samples that had been extracted with pH 7.0 1 M NH 4 OAc 30 years previously (reported in Ciolkosz et al. 1970) and found ;30% lower Ca with NH 4 Cl, although the two were well correlated (r ¼ 0.97). The displacing cation in all of the extractants used in our study was NH 4 and there is no apparent reason that a pH 7.0-buffered solution should remove less Ca. Huntington et al. (1990) compared MVE and batch extractions with 1 M NH 4 Cl in both mineral and organic horizons of forest soils from the northeastern United States. They found 7% higher Ca with MVE but this was only evident at high Ca concentrations (.10 cmol c /kg).
In the Bs horizon, there was more variability within each extraction method (batch vs. MVE) and, although the batch results trended lower (0.128 vs. 0.156 cmol c /kg for batch and MVE, respectively), the difference was not significant. Ross and Bailey (D. S. Ross and S. W. Bailey, unpublished data) found nearly identical exchangeable Ca in 103 Vermont forest B horizon (Bhs, Bs and Bw) samples when comparing 1 M NH 4 Cl MVE and batch extraction. These samples all had Ca , 2.5 cmol c /kg, which is typical for the region, and batch-Ca was 98% that of MVE-Ca (R 2 ¼ 0.98). There was somewhat more scatter in Mg (R 2 ¼ 0.92, slope of batch vs. MVE ¼ 0.88) and considerable scatter with K (R 2 ¼ 0.76), although the batch vs. MVE slope of 0.98 suggested little or no bias. The Ca concentration in the Bs reference soil was quite low, about twice as high as the reporting limit of some laboratories (Table 2). Accurate and reproducible measurements at these low concentrations are challenging because of greater inherent variability near the quantification limits of the instrumentation. Low-concentration cation measurements are also limited by the purity of the extractants, as it is difficult to obtain NH 4 Cl salts that do not have some measureable impurity of Ca, which then requires blank subtraction, an additional source of variability in the data. Higher solution:soil ratios ensure complete cation exchange but also raise the practical reporting limit, both because of the greater dilution and a higher impact of Ca impurities. These issues with low Ca concentration are somewhat unique to acid forest soils and not found in agronomic soils that are commonly much higher in pH and exchangeable cation concentrations.

Aluminum
Eleven laboratories provided 13 different results for extractable Al (Fig. 3). Unbuffered salt extractions of the Oa horizon produced similar results regardless of the extraction method (median 11.06 vs. 11.48 cmol c /kg for MVE and batch, respectively), with the exception of the pH 4.8 NH 4 OAc and Mehlich-3 extractions (Fig. 3). However, MVE with the Bs horizon gave a median of 7.15 cmol c /kg whereas the batch median was 4.62 cmol c /kg. Aluminum results for neutral NH 4 OAc were not reported but the pH 4.8 NH 4 OAc, using MVE, removed about twice the amount for both horizons when compared to batch extraction. The one Mehlich-3 extraction interestingly gave a very low result for the Oa and a very high result for the Bs (Fig.  3). These results are not surprising in that it is well established that exchangeable Al is operationally defined by the procedure used (Huntington et al. 1990, Ross et al. 2008. The type and strength of the extractant, the solution:soil ratio and the time of contact all appear to affect results. Huntington et al. (1990) found 26% higher exchangeable Al with MVE vs. batch extraction (1 M NH 4 Cl), averaged across both mineral and organic forest soil horizons. Varying the time of extraction during MVE from 0.5-20 hours had a strong effect on the amount of Al removed, with most of the increase occurring over the first 4 hours. Ross and Bailey (D. S. Ross and S. W. Bailey, unpublished data) found an average of 76% higher Al in MVE, compared to batch extractions, of the 103 B horizons discussed above. In that study, results from the two methods (both with 1 M NH 4 Cl) were well related (R 2 ¼ 0.85) but the MVE data had a much broader range (0.5-15.6 cmol c /kg vs. 0.3-8.7 cmol c /kg for batch). The reason for the lack of a method (MVE vs. batch) effect on the Oa reference horizon is not clear. Skyllberg (1995) showed a small increase in Al extracted as the 1 M KCl solution:soil ratio increased from 15 to 150 in Bs horizons but found no effect as it changed from 91 to 375 in O horizons. He suggested that exchangeable Al was a definable subpool of the total organically bound Al in O horizons but that B horizon chemistry was more complicated. The longer contact time and greater solution:soil ratio with MVE vs. batch probably resulted in the higher extractable Al found with MVE in the Bs horizon (Fig. 3), possibly due to relatively slow dissolution of Al-bearing spodic materials during the extraction. It is well established that pH 4.8 NH 4 OAc removes a portion of the organically bound Al that is not extracted by an unbuffered salt, due to complexation of Al by the acetate molecule (Bartlett 1982). Both the acetate and low pH enhance Al extraction and this was reflected in the results from the one laboratory that used this extractant (Fig. 3). The use of 1 M KCl is a well-established method for the determination of exchangeable Al, and a titration can be performed to measure both exchangeable acidity and Al (Thomas 1982). Because K þ and NH 4 þ are similar in size and identical in charge, 1 M solutions of either should extract approximately the same amount of Al. This has been found in past studies (e.g.,  and appears to be the case in the present study across the different laboratories.

Soil pH
For the Oa horizon (Table 3), variability was considerably higher in the pH w determination compared to the pH s (all but one laboratory used 0.01 M CaCl 2 , with the one using 1.0 M NH 4 Cl). Most participating laboratories reported pH for the Bs horizon (Fig. 4). The variability was relatively low and 2.5 3 MAD was only 5-6% of the median (Table 3). Working with agronomic soils, Miller and Kissel (2010) found a similar result and attributed it to analytical error due to the effect of low-ionic strength on the liquid junction potential of the pH-electrode system. This effect should be stronger in forest soils that are generally lower in ionic strength and more sensitive to the salt effect (Richter et al. 1988). Part of the variability in our data could be caused by differences among laboratories in the solution:soil ratio (Fig. 4). There was a wider range used with the Oa horizon, ranging between 1:1 and 10:1 solution:soil, and the analytical effect of this difference is more evident with a water pH because it creates a wider range in solution ionic strength (Peech 1965). A lower solution:soil ratio will result in higher solution ionic strength (less dilution) and lower pH. This trend can be seen with the pH w in the Oa horizon (Fig. 4). Because of these factors, the use of 0.01 M CaCl 2 was recommended for both agronomic and forest soils (Richter et al. 1988, Miller andKissel 2010). The actual pH of forest soil solutions is likely to be somewhat higher than this salt pH.  showed that the pH of soil solutions, extracted from all horizons of upland v www.esajournals.org forest soils from Vermont, was higher than a soil pH measured in 0.0025 M CaCl 2, but usually lower than the pH w . Similar results were also shown by David and Lawrence (1996) for northeastern U.S. soils when soil solutions were compared with soil pH measured in 0.01 CaCl 2 . The laboratory that used 1.0 M NH 4 Cl for the salt pH measurement reported results within 0.1 pH units of the median for both horizons. This pH was measured in the cation extraction solution and was also used to calculate exchangeable H þ . The ionic strength of this solution (1.0 M NH 4 Cl) was considerably higher than that of 0.01 M CaCl 2 but the solution:soil ratio used (10:1) was quite a bit higher than that used by most other laboratories (Fig. 4).

Reporting results on a dry-weight basis
As described in the methods section, there was little consistency among laboratories in the approach for reporting results on a dry-weight basis, especially for the Oa horizon. Mineral soils generally retain a low amount of water after air drying, as low as 0.2% in sandy soils, whereas forest floor horizons may retain over 8% (Kalra and Maynard 1991). If expressing results on an oven-dry basis, all participating laboratories would use the standard 1058C for mineral soils but the temperature used for organic soils varied among 608, 658, 708, 808 and 1058C. O'Kelly (2004O'Kelly ( , 2005 showed that this range in temperature gave variable results for organic soils. At 608C, hygroscopic water was retained, even with extended drying time. However, at temperatures above 808C, organic soils lost weight through oxidation (O'Kelly 2005). Thus, there appears to be a balance needed and, using a limited number of samples, O'Kelly (2005) showed that drying organic soils for 24 hours at 808C provided the most accurate results. For mineral soils, there is relatively low error associated with not adjusting results to an oven-dried basis because of the relatively low air-dried water content. For organic soils, the error could be considerable for the reasons discussed above.

CONCLUSIONS
The results of this inter-laboratory comparison suggest the following: 1. Expressing chemical results on a dry-weight basis, especially for organic horizons, should be standardized across laboratories. The standard drying temperature of 1058C used for mineral soils is not suitable for organic samples (Kalra and Maynard 1991, O'Kelly 2004, 2005. High-C soils can retain an appreciable and variable amount of water under ambient conditions. We recommend standardizing organic soil drying, and suggest 808C for 24 hours based on the findings of O' Kelly (2005). More research should be done with a range of organic forest soil horizons to validate this approach. 2. The results for C and N were relatively consistent within each horizon, even with differences in dry-weight calculations. However, there was relatively high variability in the C/N ratio of the somewhat low-C Bs horizon, with a fairly broad range of 17.2-22.5. We recommend that all laboratories perform sample grinding, either by hand or with a ball mill, to reduce variability of small samples weights used for C and N analysis. 3. Good precision in the determination of base-forming cations in acidic B horizons may be challenging because of the low concentrations. The participating laboratories reported a broad range in their lower reporting limit for exchangeable Ca 2þ (Table  2). While 1 M NH 4 Cl is widely used for forest soil cation extractions, the solution: soil ratio and analytical instrumentation conditions may need to be standardized to achieve consistency for specific research and management objectives. 4. As previously shown in the literature, the quantity of extractable Al is method-dependent and comparisons among laboratories should be made with caution. This also applies other metrics of soil condition, such as Ca:Al ratios, calculated from these types of extractions. 5. As previously shown in the literature, the variability in soil pH measured in water is greater than soil pH measured in dilute salt solutions.
In most cases, the results from different laboratories were remarkably consistent despite v www.esajournals.org different methods. When there were standardized methods, such as the analysis of total C and N, variability among laboratories was low and 2.5 3 MAD ranged from 68% to 9% of the median. For those laboratories that ran repeated samples over time, within-laboratory variability was often higher than the among-laboratory variability. In addition to the issue of sample particle size vs. weight, the most significant methodological difference among laboratories for C and N analysis was probably sample drying technique and temperature, especially for the organic horizon in which a wide range of drying temperatures were used (608 to 1058C). A number of different methods and variations in methods were used for the extraction of cations. The use of 1 M NH 4 Cl was widespread and will likely continue to be used for this purpose in forest soil research. This extractant is not commonly used in fertility testing of agronomic soils, and proficiency programs such as the NAPT do not include it.
We encourage continued inter-laboratory comparisons of forest soil methods, using forest soil reference samples that are representative of the soil types being analyzed by participating laboratories. The two reference samples used in this study were typical of upland acidic forest soils of eastern North America. The high C concentration of the Oa horizon and the low Ca concentration of the Bs horizon both present some analytical challenges for laboratories not used to these types of samples. Reference soils, such as these, are needed for forest soil research and monitoring to help quantify the uncertainty of analytical results and help lead to greater reproducibility among laboratories. Researchers can review and revise laboratory methodologies when they deem their results to be beyond an acceptable range established for the reference samples. Continued use of relevant reference standards can provide assurance of a laboratory's reliability or, in the worst case, document a pattern of unreliability. The results of this study indicate that despite the limited standardization in forest soil analyses to date, analytical variability among methods and laboratories does not preclude combining data from different sources to expand the scope of questions that can be addressed. The use of relevant reference samples is strongly recommended for long-term monitoring and resampling studies to help validate comparison of results obtained from different methods and laboratories in different time periods. Coupled with experimental design, these types of results can be used to determine the threshold of change that is detectable given the variability inherent in laboratory analysis. Beyond laboratory variability, this threshold will depend on sample numbers and being able to quantify the uncertainty associated with sampling.

ACKNOWLEDGMENTS
The following individuals also contributed data and/ or performed analyses for this study: Andy     Notes: LOI ¼ loss-on-ignition; the methods used were from Robarge and Fernandez (1986), Blume et al. (1990), Kalra and Maynard (1991) and Nelson and Sommers (1996). An ellipsis (. . .) indicates that either the LOI was not determined or the standard deviation was not calculated.   Thomas (1982) and Robarge and Fernandez (1986). An ellipsis (. . .) indicates that the standard deviation was not calculated.