Estimating ecological production from biomass

Productivity-diversity relationships (P/D) are a vital theme in ecology, but productivity is typically not measured directly in that research. Instead, biomass (B) is the most common proxy for productivity, often as a 1:1 substitute. Unfortunately, this practice may cause error and uncertainty in P/D research, due to the fundamental difference between B and P and variable P/B ratios among and within systems. As a result, P/D research often measures a B/D relationship but interprets it as P/D. Fortunately, plausible, statistically legitimate and predictive P/B relationships can be found with careful analyses based on model selection of alternative allometric scaling equations and tests of model assumptions. Analyses are presented here for P/B relationships of 19 data sets, ranging from plant and animal populations and assemblages to ecosystems and biomes, representing over 2,300 analyzed P/B data. Models included standardized major regression (SMA) and ordinary least squares (OLS) regressions. Simple linea...


INTRODUCTION
Biological diversity is valued for many reasons, including the expectation that more diversity increases and/or stabilizes productivity of an ecosystem. That expectation has been actively studied and debated (Huston 1997, Waide et al. 1999, Fridley 2001, Mittelbach et al. 2001, Whittaker and Heegaard 2003 contribute to the observed variance in P/D (Newbould 1967, Vollenweider et al. 1974, Coupland 1979, Downing and Rigler 1984, Magurran 2004. Even different terms have been applied to this subject, including species richness-productivity relationship (SRPR; Whittaker and Heegaard 2003) and biodiversity and ecosystem functioning (BEF; Cardinale et al. 2011). Here the simple term P/D is used because it denotes the basic goal to derive a predictable bivariate relationship and is consistent with a long history of highly relevant research on productivity/ biomass relationships already known as P/B.
How might clarity emerge in this important subject? We might first address a common methodological problem affecting P/D research (Fig. 1). Among the 256 P/D relationship studies described by Mittelbach et al. (2001), only 1 in 7 (14%) actually measured P. Biomass (B) was the single most common proxy for P (34.4% of studies; Mittelbach et al. 2001), including notable studies of P/D relationships then and since (e.g., Hector et al. 1999, Loreau et al. 2001, Cardinale et al. 2007, Adler et al. 2011, Hooper et al. 2012. The bad news is that using B as a simple proxy for P may cause error and uncertainty in P/D research, due to the fundamental difference between B and P (Fig. 1). The good news is that the bad news can be addressed, with careful analysis.
Below is a brief summary of bad news regarding P/B relationships in two regards: fundamentals and history. That bad news is followed by good news, in the form of a summary of a way forward. The remainder of this work is a statistical exploration of that good news. To be clear, this paper reports statistical analyses of P/B relationships. It uses model selection (Burnham and Anderson 2002) to identify regression models that most plausibly represent P/B relationships among diverse data sets, ranging from microbes to salamanders and species to biomes. The work here is based on empirical data collected by many others and analytical approaches that have been widely applied in other contexts (e.g., Jensen et al. 2005, Kerkhoff and Enquist 2006, Warton et al. 2006). This is not an effort to develop a theoretical explanation of P/B, nor a study of mechanistic processes causing production to vary, nor a comparison of different methods to estimate either production or biomass in particular study systems.

Bad news: fundamentals
Biomass is fundamentally different from production (e.g., Odum 1971, O'Neill et al. 1986, Jenkins and Buikema 1998. Biomass (B) measures standing stock, which is a static measure of the quantity of living tissue in a place and time (e.g., gÁm À2 ), much like counting a warehouse's current inventory. It is a snapshot measure of ecological structure, not a rate, and not necessarily related to rate functions of energy or material flow through a system (O'Neill et al. 1986).
In contrast, net production is a functional measure of the rate that new biomass is generated in a place over time (e.g., gÁm À2 Áy À1 ), much like calculating annual volume shipped from a warehouse. Net production is the difference between gross production and respiration; net production (hereafter P) is the focus here. Biomass need not equal P, just as current inventory in a warehouse need not represent Fig. 1. Many studies seek to evaluate the P-D relationship (arrow 1) but do not directly measure P. Instead, B is often used as a surrogate for P (arrow 2). In that case, a strong 1:1 P/B relationship (Box i ) is assumed unless another relationship is explicit. If the P/B fit is not linear, strongly predictive, or does not meet regression assumptions (Box ii ), then P-D inferences actually represent the unintended B-D relationship (arrow 3). This study seeks predictive P-B relationships (i ) in order to more accurately represent P-D analyses (arrow 1) and avoid arrow 3.
Why is this a problem for P/D research? At one extreme, it is not, if P and B happen to be tightly related in a 1:1 relationship (Fig. 1, box i ). In this case, a study of the P/D relationship (Fig. 1, arrow 2) can reliably use B to indicate P, as is the common assumption in much P/D research, where the two are often discussed interchangeably. At the other extreme, B may badly represent P because the P/B pattern is scattered, or the slope is not 1, or both ( Fig. 1, box ii ). In that case, little will actually be learned about a real P/D relationship by using B as a proxy for P. Instead, a B/D relationship is mistakenly interpreted and discussed as a P/D relationship (Fig. 1, arrow 3). The truth likely lies between those two extremes, and may be uncovered with allometric approaches that are common but not widely applied to P/ D research (e.g., Calder 1984, Brown et al. 2004, Marquet et al. 2005, Kerkhoff and Enquist 2006. Allometric scaling typically uses a power law relationship for P/B rather than the 1:1 relationship often assumed in P/D research (e.g., Niklas and Enquist 2001, Ernest et al. 2003, Kerkhoff and Enquist 2006. The use of allometric scaling for P/ B analyses is important here because: 1. Allometry is built to relate disparate metrics (e.g., P and B); it should apply, though it has been rarely used. 2. A linear model (e.g., P ¼ a þ bB) differs greatly from a power law (e.g., P ¼ aB b , or log(P) ¼ log(a) þ b(logB)) because the slope term b is a multiplier of either B or log(B), and because a also affects the shape of the power law outcome when represented in linear space (Lomolino 1989). Thus, finding b ¼ 1 in a power law model does not mean it is safe to assume a 1:1 P/B relationship in linear P/B data. 3. A linear model is likely to fail regression assumptions if data resemble Fig. 1, panel ii. In that case, a 1:1 assumption makes a poor indicator and a worse estimator. In contrast, log transforms often assist with regression assumptions, making power law or other models (e.g., P/logB) statistically legitimate. 4. A linear P/B model assumes unsaturated increase in P with B, whereas a power law model can vary from linear to strongly asymptotic, depending on a and b coefficients.
Assuming plausible, statistically legitimate, and predictive models can be obtained with allometric scaling, it is then appropriate to compare different study systems (e.g., microbes, fishes, forests) and hierarchical levels of organization (i.e., populations, assemblages, ecosystems, and biomes). Without those models, generality in P/B relationships has been elusive (Mittelbach et al. 2001).

Bad news: history
Decades of research on P, B and P/B ratios have not fully informed subsequent P/D research (Mittelbach et al. 2001, Whittaker and Heegaard 2003, Gillman and Wright 2006. Beyond those cited reviews, evidence for this problem itself has three parts; two general problems from statistical analyses and one stemming primarily from sampling vegetation. Regression modeling.-If allometric scaling (see above) had percolated into P/D research, more P/ D analyses may have transcended the 1:1 equivalence assumption. Glimpses of this possibility exist in rare logP/logB analyses (Webb et al. 1983, Duarte 1989, Downing and Plante 1993. But power law relationships should not be assumed by default, especially if assumptions are not supported, and especially homoscedasticity (Quinn and Keough 2002). Unfortunately, regression assumptions have been rarely evaluated in P/B models (Downing and Plante [1993] is a singular example of reported careful practice), and it may be expected that assumptions are routinely violated with a 1:1 assumption, given data often transcend multiple orders of magnitude.
Model selection.-Obtaining a valid P/B relationship is more complex than simply finding a regression model with a greater R 2 (Burnham and Anderson 2002). Instead, careful decisions are required to match regression methods to goals and data, regression assumptions of homoscedasticity and normal error variance need to be evaluated, and then model selection based on information theory should be applied (Burnham and Anderson 2002, Warton et al. 2006, Smith 2009, Xiao et al. 2011. The vast majority of P/B research simply reported values, but not all P and B were independently sampled (i.e., a P/B ratio was often assumed to estimate either P or B). Thus, thorough model selection has not been widely applied to P/B data.
Sampling.-Much terrestrial P/D research has been conducted with vegetation (Mittelbach et al. 2001), including grasslands (e.g., Briggs and Knapp 1995, Hector et al. 1999, Butler et al. 2003, Knapp et al. 2007, Adler et al. 2011. It is common practice in such P/D studies to use peak, aboveground B as a proxy for P, which assumes annual production is adequately captured in the single sample event, and is often asserted to indicate P without citations or tests of the assumption. As an aside, belowground B and P of vegetation are not often measured in P/D research; the relative importance of above-vs. belowground P is a debate of its own, and analyses here do not address that separate question. Unfortunately, peak B inadequately estimates P in grasslands (Coupland 1979), with P/B ratios varying widely and most in excess of the 1:1 value assumed by this practice (Fig. 2). Also, peak B can be quantified based on either live tissue or total tissue (i.e., the sum of live and dead standing plant tissue), and may not be restricted to peak samples (Singh et al. 1975. To count only live tissue is to ignore plant tissue that senesced between sampling events, a common occurrence in herbaceous plants. The choice between peak live vs. peak total biomass in grasslands may be important. For example, the use of peak live B as a proxy for P was the basis for a paper entitled ''productivity is a poor predictor of plant species richness'' (Adler et al. 2011), whereas peak total B as a proxy for P generated various significant fits with species richness among study sites (Hector et al. 1999).
Peak B may very well indicate P for some systems, though confidence in this potential relationship requires that P be estimated separately from peak B (Singh et al. 1975 and P/B relationships need to be developed. A major goal of this work is to help P/D research be based on estimated P rather than indicated P.

Good news
Time for some good news, in two sets. The combination of the two sets of good news means that past P/B research results can help estimate reliable, general P/B relationships, which may then be applied to better estimate P/D relationships ( Fig. 1).
First, decades of research offer a basis to quantitatively evaluate the P/B relationship because P and B were (or can be) estimated independently for many organisms, assemblages, ecosystems, and biomes. Surprisingly, few such efforts have statistically related P/B among various systems, with recent work focused on a different purpose (Kerkhoff and Enquist 2006). Nor have alternative data transformations and tests of regression assumptions been conducted. The Bad News described above would not exist if such an analysis existed.
Secondly, allometric scaling approaches can be readily applied to the P/B relationship (Marquet et al. 2005, Warton et al. 2006, Smith 2009, Xiao et al. 2011. Allometric scaling typically uses standardized major axis regression (SMA; e.g., Niklas and Enquist 2001, Ernest et al. 2003, Kerkhoff and Fig. 2. Coupland (1979.4) listed 40 grassland study sites in which maximum live-shoot biomass (gÁm À2 ) and net annual shoot production (gÁm À2 Áy À1 ) were reported. The use of peak B to measure ANPP assumes a 1:1 relationship between the two variables. In fact, the ratio of ANPP/peak B varied almost four-fold among those globally-distributed study sites, with most values .1. Assuming 1:1 would often underestimate P. The largest value was from a study that also accounted for mortality losses (Coupland 1979: Table 33.5). The variability among systems and nonequivalence of ANPP and peak B are inconsistent with the common use of peak B as a measure of ANPP.
v www.esajournals.org Enquist 2006, Warton et al. 2006, Smith 2009, which was formerly known as model II or reduced major axis regression (see Warton et al. [2006] for a comprehensive review). In contrast, ordinary least squares (OLS) regression has typically been applied in P/B research. With model selection, alternative OLS and SMA models can be compared to identify the most defensible models that represent fundamental and predictive relationships (Burnham andAnderson 2002, Stephens et al. 2005). The analyses must be conducted carefully to avoid errors of method and interpretation, but comparison methods are far more resolved and available than when most P/B data were collected.
Goals and expectations.-Given wide variance in both P and B (typically 3-4 orders of magnitude range with triangular scatter) and prior work (e.g., Webb et al. 1983, Duarte 1989, I expected OLS and SMA regressions based on logP/logB transformed data to most often be most plausible; a goal was to evaluate regressions based on logP/logB transforms relative to other models. A second goal was to compare OLS and SMA regression model results as a way to examine the potential effects on P/B research, where SMA P/B scaling relationships should be more appropriate for much of P/D research conducted to date (explained further below). I expected models of greater hierarchical levels of data (i.e., a population of one species , an assemblage of multiple species , ecosystems , biomes) to be weaker (i.e., more often poor fits to assumptions, lower R 2 ) because I expected greater inherent measurement error and accrued variance in unmeasured covariates (e.g., elevation, precipitation, etc.) at greater organizational levels.

Data collection and selection
Paired and independently estimated P and B data were obtained from journal publications and books. I made multiple decisions regarding consistency and reliability of data in subsequent analyses. In all data sets used here, P and B were verified to be independently estimated; data sets were excluded if P was calculated with an assumed or estimated P/B or ''estimative ratio'' (Whittaker and Marks 1975). Where a range was reported, I recorded the midpoint for use in regressions. With two exceptions, I selected studies for which data were expressed in units of dry mass, which were typically most common (i.e., all analyzed data were expressed as gÁm À2 Áy À1 for P and gÁm 2 for B). The first exception was zooplankton assemblages, for which a substantial number of data sets were compiled by Morgan et al. (1980) in energetic units of JÁm À2 and JÁm À2 y À1 . Not enough other zooplankton studies expressed in units dry mass could be found to replace this set; I chose to analyze it rather than ignore it given the importance of zooplankton in freshwater ecosystems. I did not assume a conversion factor (J to g). The second exception was soil microbes (Persson et al. 1980), with P measured as mg CÁm À2 Áy À1 and P measured as mg CÁm À2 . Again, conversion factors were avoided.
As a result of the important conditions above, entire data sets or values within compilations were excluded. I only computed P/B regressions for data sets with sufficient data (;20 points; Jolicoeur 1990), except for global biomes, for which ,20 types exist. All possible studies may not be included in any one group but analyses of sufficient data points should represent a general relationship. I could find too few data for some groups (e.g., acari, echinoderms, bivalve molluscs, worms) based on the above conditions, despite their ecological importance. Analyses were also organized as based on populations, assemblages, ecosystems, and biomes, and by obvious divisions within one of those categories (e.g., fishes, terrestrial vegetation). Populationlevel data were defined as those reporting species-specific (or sometimes more coarse taxa, e.g., genera) P and B data in a site. In contrast, assemblage-level data refer to aggregates of such taxa in a site (e.g., soil microbes). Ecosystem-and biome-level analyses aggregate further, where ecosystems include multiple possible assemblages and biomes include multiple ecosystems. Ecosystem-and biome-level data represent net primary production, which comprises the majority of true productivity at those levels.
Data were analyzed in reported units and as logarithmic transforms. I used log 10 because: (1) ln and log 10 are directly and linearly interconvertable, (2) log 10 units vary in 10-fold steps that more readily convert among systems and scales, and (3) log 10 is commonly used in the scaling v www.esajournals.org 5 April 2015 v Volume 6(4) v Article 49 literature (e.g., Enquist et al. 1998, Niklas and Enquist 2001, Warton et al. 2006, Hechinger et al. 2011).

Alternative regression models
OLS and SMA regressions of bivariate data (x and y) differ in purpose. OLS regression is designed to predict a specific dependent y from an independent x variable and test for a difference from zero for slope and intercept (Warton et al. 2006). OLS regression assigns all scatter to y, which it then minimizes to solve for a line through the scatter (Smith 2009). As a result, OLS regressions are asymmetric; the equation changes when y is used to predict x. OLS regression is recommended where x is thought to cause, determine, or predict values of y, and where one seeks to know if different x values have different y values (Smith 2009). Parameters of most interest for OLS regression are predicted y values, a null hypothesis test of slope, and the strength (R 2 ) of the model (Warton et al. 2006).
By definition, production creates biomass and so P should be the independent variable (x) in OLS regressions (e.g., Keeling and Phillips 2007). However, B has historically been used as x in P/B research because it is easier to estimate and thus used as a predictor for P. Thus, historical use of OLS regression contradicts the logical assignment of variables to independent and dependent axes. This matters because OLS regressions are not symmetric, and so a B/P equation (e.g., Keeling and Phillips 2007) is not the same as a P/B equation (Warton et al. 2006, Smith 2009). In addition, B is always an estimate with error, and B of natural systems is a random factor. Though not the major criterion for OLS vs. SMA regression decisions (Warton et al. 2006), no variation is assumed for x in OLS regressions, which contradicts the nature of B estimates. In sum, OLS regression is not the de facto method for P/B research.
In contrast, the goal of SMA regression is to identify a true relationship or biological ''law'' between x and y and is recommended where designation of x or y as the independent variable is not a matter of causation (Warton et al. 2006, Smith 2009). SMA regression assumes variance in both x and y, and seeks to reduce that combined variance with a best line through the scatter (Warton et al. 2006, Smith 2009). As a result, SMA regression equations are symmetric. The parameter of greatest interest is the slope of the line that best fits the bivariate y/x pattern, and both slope and intercept can be tested for fit to hypothesized values (Warton et al. 2006). For the same data set, OLS and SMA regressions yield an identical strength (R 2 ) of the y/x relationship, though SMA regression typically yields a greater slope than OLS.
Which regression method is most appropriate for P/B research? OLS regression makes most sense for interpolation within a study system because OLS regression is designed to predict a specific y value given an x. For example, a study may establish that B is a good predictor of P by estimating B and P independently and finding a plausible, legitimate and strong P/B model. Thereafter, B may be used as a predictor of P to relate to D in other, simultaneously and identically treated, randomly selected treatment plots, where P could not be estimated without ruining the ability to estimate D due to destructive sampling. The range of P values should also encompass the range in P/D plots to ensure valid interpolation.
In contrast, SMA regression finds general scaling relationships that best describe P/B patterns (Warton et al. 2006). Assuming sufficient, representative data values are used, SMA regression should approximate a ''true'' relationship that enables comparisons of P/B relationships among systems. If sufficiently plausible, legitimate, and strong, a general SMA relationship may be used to better estimate P where only B was collected but a P/D relationship was sought ( Fig. 1).
Given the different roles of OLS and SMA models for P/B research, OLS and SMA models were compared separately by model selection here, with the goal to identify a most plausible version of each for a given data set. OLS regression analyses here built on work by Xiao et al. (2011), who compared two relevant models: the OLS linear regression of logP/logB transformed data (here log 10 P ¼ a þ b(log 10 B)) and the OLS power law (here P ¼ aB b ) regression of untransformed data. Xiao et al. (2011)  by penalizing explained variance for the number of model parameters (Burnham and Anderson 2002). More complex models that may provide a better fit by including more variables can then be fairly compared to simpler models.
Unfortunately, model selection cannot compare models with different response variables (Burnham and Anderson 2002). Thus, models of log 10 P must not be compared using model selection to models of untransformed P (e.g., by the power law; Xiao et al. 2011). As a result, an AIC-based approach could not serve as a sole basis to select models here. Instead, a more complex but better informed analysis was required, in which model selection for subsets of models with the same response variable was accompanied by tests of regression assumptions (homoscedasticity and normal error variance). Model selection here used AIC corrected for sample size (AICc) and model weights (Burnham and Anderson 2002).
I added three additional OLS models to the two OLS models compared by Xiao et al. (2011) and compared the five models in two subsets: 1. OLS linear regression of untransformed data (P ¼ a þ bB) was compared to models based on a semi-log transform (P ¼ a þ b(log 10 B)) and the power law (P ¼ aB b ). This set of models represented the historically common assumption that B is a simple linear proxy for P and two possible curvilinear relationships. 2. A second semi-log transform (log 10 P ¼ a þ bB) was compared to the logP/logB (i.e., log 10 P and log 10 B) transform. This set of models represented a second curvilinear relationship and a common method in allometric scaling (e.g., Kerkhoff and Enquist 2006).
SMA regression requires a bivariate, linear model and cannot evaluate the nonlinear power law. Thus, four SMA models were compared with AICc weights in two further subsets, using the reasoning above and including homoscedasticity and error normality tests: 3. P ¼ a þ bB was compared to P ¼ a þ b(log 10 B) 4. log 10 P ¼ a þ bB was compared to log 10 P ¼ a þ b(log 10 B).
SMA regression enables tests of slope and intercept relative to designated values. Here, slope ¼ 1.0 and intercept ¼ 0 were tested because both hypothesis tests are important for the use of B as a proxy for P. A slope of 1 indicates that the measure of B directly relates to the corresponding measure of P (e.g., log 10 B ; log 10 P), whereas other slopes require explicit conversion. An intercept of zero also supports simple use of B as a P proxy; otherwise, an intercept term must be used when estimating P from B. All models were evaluated for homoscedasticity based on graphical evaluation and Breusch-Pagan tests. Normality of error was also evaluated based on graphical evaluation and Shapiro-Wilk tests. A slope significantly different from zero was a final condition to consider a model as best of a subset. In some cases, I judged that support for homoscedasticity or normal error was inappropriate and chose to think rather than blindly follow a p value (Quinn and Keough 2002). For example, a Breusch-Pagan test may indicate homoscedasticity despite a clear parabolic pattern of residuals, because the test merely evaluates the slope of residuals as a function of the predictor variable; a flat linear slope through a parabola can pass the Breusch-Pagan test. Conversely, both OLS and SMA regression are robust to modest violation of homoscedasticity and normality of error, and the ability of Shapiro-Wilk and Breusch-Pagan tests to detect violations of assumptions depends in part on sample size; data sets with more values can be indicated as heteroscedastic or non-normal for error variance when graphical results indicate minor problems. Thus some violation beyond strict statistical tests was permissible, especially if AICc weight and other indicators suggested a model potentially useful for P/D research.
I prioritized homoscedasticity as more important than normality of error variance for regression models (Quinn and Keough 2002). Of course, full compliance with regression assumptions was preferable. As a result, OLS and SMA models were reported if the AICc weight indicated a plausible model in the analyzed subset, and were judged to meet or nearly meet homoscedasticity but failed normality of error. Models with similar AICc weights, based on an evidence ratio of 2 (i.e., w i /w j , 2) were reported; those with an evidence ratio . 2 were considered less plausible (Burnham and Anderson 2002) and v www.esajournals.org not reported.
Analyses described so far exclude R 2 and slope significance. I considered R 2 only when comparing best models among different AIC-based subsets because model selection was constrained to subsets with the same response variable (Burnham and Anderson 2002). For example, AICc weights may indicate that models from two subsets are plausible, but R 2 values may differ greatly. Based on the expectation that P/D research needs strong P/B models, those with greater R 2 will be preferred. Of course, a useful model will also have a slope that is different from zero, but this test was trivial in most cases.
Modern nonlinear regression methods do not report R 2 because it is inadequate by itself to compare models (Spiess and Neumeyer 2010). For OLS power law results only, I computed R 2 as [1 À (variance of model residuals)/(variance of raw P data)] (Motulsky and Ransnas 1987) so that power law R 2 values could be compared to the other eight models.
In summary, the most plausible, legitimate, and strong models were identified and reported in a two-step process.
Step one compared models within a subset using criteria in the following order; AICc weight (with a critical evidence ratio ¼ 2), homoscedasticity, normality of error, and slope significance (for OLS models only).
Step two compared leading models among subsets for R 2 . A slope of ;1 and an intercept of ;0 inform subsequent thought but were not criteria for selecting models to report.
The above analyses describe approaches applied to all data sets analyzed here. In addition, alternative approaches to estimate P and B in grasslands and forests were examined in greater detail for subsidiary but important purposes.

Grassland and forest data
Grasslands.-Grasslands have been central to P/D research, and studies have typically used peak aboveground biomass as an indicator of P (see Bad News: History, above). For example,  represented common practices in grasslands and treated B as an estimator of P. I used the same data sets as , but handled data differently, in three ways: 1.  compared average annual values among sites (appropriate for their purpose), whereas I extracted annual P and B values from  for the purpose of P/B regressions. 2. I used method 1 of  i.e., peak live biomass; B 1 ) and method 2 (peak aboveground live þ standing dead matter; B 2 ) as estimators of B. Those B estimators (B 1 and B 2 ) were then regressed separately against data based on Scurlock et al.'s (2002) recommended methods 5 and 6 to estimate P. Method 5 (P 5 ) is based on Singh et al.'s (1975) trough-peak analysis 7, and sums positive increments in live biomass and standing dead matter, where only positive increments in standing dead matter that coincided with those in live biomass are used. Without any coinciding increments in standing dead matter, P 5 represents only live biomass. Method 6 (P 6 ) of  is based on Singh et al.'s (1975) trough-peak method 8. It matches P 5 but adds positive increments in fallen litter (i.e., such increments that coincide with matching increments in live biomass are summed).
Coinciding positive increments estimate production of live biomass during the sampling interval plus newly produced biomass that senesced (as standing or fallen matter) during the sampling interval. Increments of zero were counted as positive, which obviously did not change a summed value but would enable a category (e.g., litter) to be evaluated relative to other categories. 3.  used calendar years to define annual cycles, consistent with common practices and northern temperate grasslands. Here trough-peak cycles were data-driven, using biomass trends as a guide (Singh et al. 1975). Decisions were required because some data sets varied in temporal extent and minimal live biomass did not always occur in December or January. Both P 5 and P 6 (described above) were based on positive increments and a trough-peak approach to detecting increments (Singh et al. 1975. If data extended continuously through multiple years, then a minimal (trough) value (Singh et al. 1975) marked the end of one annual v www.esajournals.org cycle and the start of the next. In that case, consecutive samples could be also be used to calculate an increment for the start of the next year. For data sets ,1 year, an initial biomass value .0 suggested some earlier and missed production; increments began as the difference between the second and first values, whatever it was. However, if live biomass values did reach zero in the data, then a zero was assumed to start a year. Thus, some troughs may have been underestimated, and resulting P estimates here are likely conservative (let alone matters related to litter estimation and belowground P; Singh et al. 1975. Some of the 31 data sets  did not meet the above conditions, nor did some partial years. In P/D research, the choice between P 5 and P 6 will depend on how samples are collected; P 5 requires live and standing dead samples are processed separately, whereas samples unsorted for live and dead tissue, including litter, can be analyzed as P 6 . Method 4 (i.e., R( positive increments in live biomass)) was not used here because it underestimates P by omitting recently dead tissue and was not recommended by . To be clear, four sets of model comparisons (described above) were conducted: P 5 /B 1 , P 5 /B 2 , P 6 /B 1 , P 6 /B 2 .
Forests.-The converse of relationships evaluated here (i.e., the B/P relationship) was recently evaluated among 191 forest sites, where P estimates were standardized for losses due to litterfall, tropical litterfall decomposition, volatile organic compounds, and herbivory (Keeling and Phillips 2007). Litterfall decomposition estimates in the tropics (only) were often ;1/2 of litterfall. I analyzed P/B relationships for the Keeling and Phillips (2007) tropical forest data separately from data for nontropical forests. As a secondary goal, the effect of that standardization was evaluated relative to the increment method, which more closely corresponds to much P/D research. In sum, P/B relationships were analyzed in tropical and non-tropical forests using both increment and standardized P estimates.
Operational details.-All analyses were conducted using established packages in R v. 3.0.2 (R Core Team 2014). The power law OLS regression was computed using the self-starting iterative option SSarrhenius in the nls command. The linear OLS and SMA regressions were conducted with the smatr package (Warton et al. 2012). OLS regression results produced by smatr were verified as identical to those produced by lm in R. Homoscedasticity was evaluated graphically and with Breusch-Pagan tests for OLS regressions (in the lmtest package; Zeileis and Hothorn 2002), or the equivalent for power law and SMA regression residuals (i.e., squared residuals regressed using lm in R and plotted against predictor values). Normality of residuals was evaluated graphically and with a Shapiro-Wilk test (Razali and Wah 2011). Model selection was conducted using AICctab in the bbmle package (Bolker 2014).

RESULTS
Annual P/B data representing 1,861 separate systems were collected, distributed among 19 data sets (6 for populations, 9 for assemblages, 4 for ecosystems and biomes). Some systems (e.g., grasslands) included multiple versions of P or B used in different regressions; in total, 2,320 P/B data points were analyzed (all data are in the Supplemental Materials). Five OLS models and 4 SMA models were compared in four subsets (171 models, 76 model comparisons). One model was often identified by AICc weights as the most plausible in a subset; 73% of comparisons yielded an AICc weight . 0.90; Appendix). However, consideration of regression assumptions modified inferences based on AICc weights alone. The most plausible model of a set either met both (32, or 42%) or one (22, or 29%) regression assumption (in all but four cases it was homoscedasticity), or may have met neither assumption (22, or 29%). As a result AICc-based model selection and assumption tests, almost half of model comparisons (35/76; 46%) yielded a model that was clearly most plausible and judged to comply with regression assumptions (Appendix). Of those 35 models, 22 (63%) were logP/logB transforms, followed by P/logB and power law (tied with 6 each, or 17%), and one linear model (3%). Please note that R 2 did not enter into this examination of outcomes.
Overall, P/B regressions based on logP/logB transforms were far more likely than a simple v www.esajournals.org linear B-as-P proxy (or other models) to represent a P/B relationship well and partially or fully meet assumptions (Fig. 3). Among the 19 data sets, OLS logP/logB models were plausible and homoscedastic for 17 (89%), of which 10 (53%) also had normal error variance. Likewise, 16 (84%) SMA logP/logB models were plausible and homoscedastic; 12 (63%) also had normal error variance. In contrast, power law models (comparable in principle to a logP/logB model) were often plausible by AICc weights (17 of 19 comparisons) but met one (5 of 19; 26%) or both (6 of 19; 32%) regression assumptions less often and rarely compared well with logP/logB models for R 2 . Of the failed models (i.e., w i , 0.05, heteroscedastic and/or non-normal errors), most were logP/B (26 cases), linear (17 cases), or P/logB (8 cases). Below is a summary per data set of model comparisons; please see Table 1 and Appendix for model equations and other details.

Terrestrial plant populations
This data set represented 125 P/B data values for terrestrial plant species, many of which were woody. The OLS logP/logB equation was clearly most plausible (w i ¼ 1.00), homoscedastic, and predicted log 10 P well (R 2 ¼ 0.86; Fig. 4A; Table 1). The intercept 95% confidence interval (hereafter CI) did not include zero nor did the slope CI include 1.0. The OLS logP/logB model could be used advisedly (error variance was non-normal and herbaceous plant species were underrepresented) as a reference for comparable P/B analyses within a P/D study. Though yielding different coefficients, the SMA logP/logB model was similar in statistics to the OLS logP/logB model ( Table 1). The SMA model should be useful to estimate P from B for P/D research of terrestrial plant species where only B is available. The relatively high R 2 for the OLS and SMA models ( Fig. 4A) suggests that additional covariates (e.g., latitude, elevation) represent relatively minor further variance.

Wetland plant populations
This data set represented 52 P/B data values for wetland plant species collected across freshwater, brackish, and estuarine systems. The OLS logP/ logB was quite plausible (w i ¼ 0.99) and met both regression assumptions but was not highly predictive (R 2 ¼ 0.22; Table 1, Fig. 4B), potentially related to high and unaccounted litter production or strong effects of environmental conditions (e.g., hydrology). If available (I could find none), studies that include litter production (and even underground tissue) may be more predictive. Covariates (e.g., tidal flux, latitude, salinity, etc.) also may explain substantial variance among study systems.
Two SMA models (logP/logB and P/logB) were most plausible in their subsets and met regression assumptions (Appendix). However, R 2 ¼ 0.22 of the logP/logB was nearly twice that of the P/logB model. The intercept CI did not include zero nor did the slope CI include 1.0 but values were not far off (Table 1, Fig. 4B). Again, relatively low R 2 (0.22) may be improved with additional covariates (e.g., freshwater, brackish, or saline wetlands) or better accounting for litter production in further analyses.

Insect populations
This data set represented 911 P/B data values for aquatic and terrestrial insect taxa (often Plausible models that were homoscedastic and had normal error variance are tallied in gray. Plausible models that were homoscedastic but did not have normal error variance are tallied in white. v www.esajournals.org  v www.esajournals.org species, some as genera, or family). Among OLS models, only the logP/logB model was very plausible and met regression assumptions ( Table  1). The slope term was not different from 1.0 and R 2 was strong (0.77; Fig. 4C). Investigators would be well justified to compare their own OLS regressions within a P/D research study to this equation. The strong fit among so many data points suggests a general relationship among insects, with room for only modest improvement by covariates (e.g., latitude, phylogeny, etc.). Similarly, the SMA logP/logB model was very plausible and met both regression assumptions ( Table 1). The slope differed from 1.0 despite a near value, by virtue of the many data points (Fig. 4C, Table 1). In the absence of independent B and P data, P could be reliably estimated from this relationship, and it could be used to retrospectively adjust P/D estimates for studies among insect species.

Crustacean populations
This data set represented 29 P/B data values for species or genera, with only Ostracoda represented more coarsely. The OLS logP/logB model was clearly most plausible in its subset, was fully compliant with both regression assumptions, and had a high R 2 (Table 1, Fig. 4D). I considered that model preferable to other models, which did not meet all criteria as well (Appendix). Similar to the case for insects above, investigators would be well justified to compare their own equations to this one. Again, the strong fit indicates little additional benefit from exploring covariates (e.g., latitude, etc.).
The SMA logP/logB model was also very plausible and met both regression assumptions (Table 1, Appendix). The model shared the strong R 2 (0.82) with the OLS model. In the absence of independent B and P data, P could be reliably estimated from this relationship (Fig.  4D), and it could be used to retrospectively adjust P/D estimates for studies among crustacean species. Relatively little more explanatory power is to be expected by exploring covariates (e.g., latitude, etc.).

Fish populations
One hundred P/B data values for fish species were included in this data set, including freshwater, marine, estuarine, and anadromous/catadromous species. Only the OLS logP/logB model was very plausible, met both regression assumptions, and had a strong R 2 (Table 1, Fig. 4E). As above, it would be fully defensible to compare a study-specific model to this equation, given appropriate use of log-transforms. Little additional advantage is apparent from a search for covariates (e.g., latitude, etc.).
Among SMA models, only the SMA logP/logB model was very plausible and met both regression assumptions (Table 1, Appendix). The model had the same strong R 2 as the OLS model (Table 1, Fig. 4E). Given only B data (i.e., if P data were not independently collected), log 10 B could be reliably converted to log 10 P from this relationship, and it could be used to retrospectively adjust P/D estimates for studies among fish species.

Salamander populations
This data set included 20 P/B data points for two genera (Eurycea and Desmognathus) in one study area (North Carolina, USA; Cross et al. 2006). Though limited, the two genera represented an order of magnitude difference in biomass  Fig. 4F). Similar to patterns presented so far, the SMA logP/logB model (Table 1, Fig. 4F) was most plausible in its subset and met both regression assumptions. Based on this result, log 10 P could be reliably estimated from log 10 B in the absence of independent P data, and estimated P values could be used to retrospectively adjust P/D estimates for studies of (or including) salamander species. The veracity of either population-level model would be best tested with more species and study sites.

Grassland assemblages
This set of analyses was based on up to 98 data points extracted from . Four P/B methods from  were used, where P methods denoted here correspond to methods recommended by . Production was estimated in two ways: as the sum of positive increments in live and dead tissues (P 5 ), or as the sum of coinciding, positive increments in live and dead Fig. 4. Population-level P/B relationships based on OLS and SMA logP/logB regressions. Axis scales necessarily vary among data sets but all are logP/logB. SMA regressions (black lines) and OLS lines (gray) are shown relative to a dashed 1:1 line.
v www.esajournals.org tissues (P 6 ). Maximal annual B was used to represent peak B, where peak B was estimated two ways: based on live biomass (B 1 ) or based on (live þ standing dead) biomass (B 2 ). Data points for live biomass were more numerous (98) than those for live þ standing dead biomass (72).
The P 5 /B 1 regressions (Fig. 5A) evaluated P based on live þ standing matter as a function of peak live B. Only the moderately predictive (R 2 ¼ 0.63) SMA logP 5 /logB 1 model was plausible and met both assumptions (Table 1, Appendix). The OLS power law model was most plausible in its subset and fit data well (R 2 ¼ 0.77) but failed the normality assumption (Table 1). All other models, including a linear P/B model, were relatively implausible, failed assumptions, or both. Some grassland P/D studies measure B as live þ standing dead matter (B 2 ). For cases where live þ standing dead P is also measured, the OLS logP 5 /logB 2 model (Fig. 5B) was the only plausible and partially-legitimate OLS model (error variance was not normal; Table 1). The SMA logP 5 /logB 2 model was plausible, legitimate and moderately predictive (R 2 ¼ 0.58). All other models, including a linear P/B model, were relatively implausible, failed assumptions, or both (Appendix).
The P 6 /B 1 regressions (Fig. 5C) represent P based on live þ standing þ fallen litter as a function of peak live B only; in principle, this represents the most complete estimate of P with the simplest estimate of B. Among OLS regressions, only the moderately predictive (R 2 ¼ 0.67) power law model was plausible and met regression assumptions (Table 1, Appendix). The SMA model of logP 6 /logB 1 (R 2 ¼ 0.54) was plausible and met regression assumptions. Again, other models (including linear P/B), were relatively implausible, failed assumptions, or both (Appendix).
The P 6 /B 2 regressions (Fig. 5D) estimate P based on live þ standing þ fallen litter from B based on live þ standing dead matter. The OLS linear, power law, and logP 6 /logB 2 models were v www.esajournals.org plausible, statistically legitimate, and fairly predictive, as was the SMA logP 6 /logB 2 model (Table  1).
Overall, the use of B as a 1:1 proxy for P was not well supported for grasslands. The OLS linear model was implausible or failed regression assumptions in 3 of 4 cases; for P 6 /B 2 the OLS linear model was plausible and legitimate but the slope was significantly ,1. Moreover, B 2 data (live þ standing dead) are less often used in grassland P/D studies than are B 1 data. The meaning of this result for P/D research in grasslands is explored in more detail in Discussion below.

Woody assemblages
This data set was comprised of 91 P/B data points representing various coniferous and deciduous woody vegetation, including scrub, heath, and forests. None of the referenced citations overlapped with those of Keeling and Phillips (2007;analyzed below). The only OLS model that was plausible and met assumptions was the OLS logP/logB model. Given the strong fit to data (Table 1, Fig. 6A) and assumptions, the logP/logB model could be compared to studyspecific models, and it seems little additional variance may be gained by additional covariates.
Similar to the OLS results, only the SMA logP/ logB model was plausible and met both regression assumptions (Table 1, Fig. 6A). Based on this result, log 10 P could be reliably estimated from log 10 B in the absence of independent P data among various woody assemblages, and estimated P values could be used to retrospectively adjust P/D estimates. Little added advantage seems available from a search for additional covariates.
Tropical forest assemblages Keeling and Phillips (2007) analyzed and listed P and B data from 96 tropical forest study sites, for which they reported traditional incrementbased P estimates and standardized estimates to better account for losses of litterfall to decomposition, etc. They used both kinds of data to model B as a function of P (i.e., the inverse of regressions here). I compared the increment and standardized data sets by OLS and SMA regression models for P as a function of B.
Increment-based data.-For the increment-based data, the OLS linear and power law models had similar plausibility (w i ¼ 0.47 and 0.35, respectively) and met both regression assumptions. Models fit the data scatter rather poorly (both R 2 ¼ 0.10), but no other model performed better. The OLS linear model (Fig. 6B) had a significant slope ( p , 0.01). Of all other models, including SMA, only the OLS logP/B model was also plausible (relative to logP/logB), and was homoscedastic but did not have normal error variance. With the same R 2 as the linear model, it was less satisfactory. Interestingly, SMA did not meet model selection criteria.
Standardized data.-Standardization for litter and losses (due to decomposition, volatilization, and herbivory) generally improved model strength but altered model comparison outcomes. With standardizations, the OLS P/logB model had the greatest w i (0.74) compared to power law and linear models, but was not highly predictive (R 2 ¼ 0.27). The power law was ;3fold lower in w i and did not meet regression assumptions. On the other hand, the OLS logP/ logB model was plausible (w i ¼ 1.0 relative to logP/B) and homoscedastic, but did not have normal error variance. The OLS logP/logB model was moderately predictive (R 2 ¼ 0.49; Fig. 6B). Again, SMA regressions failed to meet model selection criteria.
In summary, standardization for litter and other losses mattered greatly for P/B regression models of tropical forests by altering the plausibility, strength, and terms of models. The OLS logP/B model of standardized data was potentially useful for estimating P in tropical forests, though it should be used advisedly because fit was less satisfactory than other models evaluated here. Abundant variance remained unexplained by even the best models to warrant exploration of covariates to improve models (e.g., those considered by Keeling and Phillips 2007). Keeling and Phillips (2007) also included 95 nontropical (i.e., boreal, temperate, forest study sites in their compiled data set. Losses in P were the same as listed for tropical forests, but excluded those due to litterfall decomposition. As for tropical forests, P estimates used increment-based and standardized data.

Non-tropical forest assemblages
Increment-based data.-Three OLS models for increment data were plausible and met assumptions wholly or in part. The OLS P/logB model was almost twice as plausible as the OLS power law model; both met regression assumptions and had similar R 2 (0.48 vs. 0.47; Appendix). The P/ logB model was reported based on its w i . The OLS logP/logB model was also plausible (w i ¼ 1.0 relative to logP/B) and homoscedastic, but did not have normal error variance, and strength (R 2 ¼ 0.39) was less than the P/logB model. Thus, for increment-based data, the P/logB model (Fig. 6C) was preferable as a reference for comparison to OLS regressions in other studies.
SMA regressions of increment-based data shadowed OLS patterns. The SMA P/logB model was plausible (w i ¼ 1.0 relative to P/B), met both regression assumptions, and had the same R 2 (0.48) as the OLS model. The SMA logP/logB was also plausible relative to the logP/B model and homoscedastic, but did not have normal error variance and the R 2 was a bit lower (0.39).
Standardized data.-Standardization for losses did not greatly improve models for non-tropical forests and choices among models remained similar as those for tropical forests. The OLS P/ logB model was again most plausible and met regression assumptions (Appendix). The P/logB model fit data fairly well (R 2 ¼ 0.40). The OLS logP/logB was also plausible (w i ¼ 1.0 relative to logP/B) and homoscedastic, but did not have normal error variance, and its model strength (R 2 ¼ 0.37) was slightly lower. Thus, for incrementbased data, the P/logB model was most plausible and best fit assumptions; investigators should explore multiple OLS models for non-tropical forest P/B relationships, but the OLS P/logB model (Fig. 6D) may be best within P/B studies of non-tropical forests with both B and P data.
Similar to OLS regressions, SMA P/logB and logP/logB models were again plausible relative to their respective competing models (Appendix). This time, both models also met both regression v www.esajournals.org assumptions, though P/logB did so more clearly. The SMA P/logB model (Fig. 6D) fit data slightly better than the SMA logP/logB model. For standardized data, the SMA P/logB model was again slightly better for estimating P when only B data exist.
In summary, standardization increased estimated P values but did not strongly alter P/B regression models of non-tropical forests (compare Fig. 6C and D and regression coefficients in the Appendix). A greater difference exists between OLS and SMA models for each data set. The same model choices rose to the top regardless of data type, and some models based on standardized data had R 2 values slightly lower than comparable models based on increment data (Appendix). In addition, sufficient variance remains to be explained by covariates to improve models (see Keeling and Phillips 2007).

Soil microbial assemblages
All 24 data values for this set of analyses were obtained from a single study, which included multiple assemblages ranging from fungi, bacteria, and protists, to small metazoans with P and B measured as mg C (Persson et al. 1980). Additional studies would help to use relationships here with more confidence. The only plausible OLS regression to meet regression assumptions was the OLS logP/logB model, which had a slope not different from 1.0 and the highest R 2 (0.92; Fig. 7A) of any reported here. Similar studies of soil microbes seeking to predict specific P values from B should be able to compare models to this one, though it would be best validate it or modify it with other study areas.
In parallel to OLS results, the only plausible SMA regression to meet regression assumptions was the logP/logB model. The slope coefficient was not different from 1.0, though the intercept was different from zero (Fig. 7A). Based on this result, log 10 P could be reliably estimated from log 10 B in the absence of independent P data, and estimated P values for studies of soil microbe assemblages. Again, the veracity of this model would be best tested with more study sites.

Herbivorous zooplankton assemblages
This data set included 33 data points compiled by Morgan et al. (1980) and analyzed apart from carnivorous zooplankton (below); the two were expressed as energetic units (JÁm À2 and JÁm À2 y À1 ) rather than dry mass like other studies analyzed here. The OLS power law model was plausible but failed to meet both regression assumptions. In contrast, the OLS logP/logB model was plausible (w i ¼ 1.0 relative to logP/B), met both Fig. 7. Heterotrophic assemblage P/B relationships. Note axes are log scaled but necessarily vary in extent and units among data sets. (A) Soil microbes P (mg CÁm À2 Áy À1 ) and and B (mg CÁm À2 ), data from Persson et al. (1980). (B) Herbivorous zooplankton P (JÁm À2 Áy À1 ) and B (JÁm À2 ), data from Morgan et al. (1980). (C) Carnivorous zooplankton P (JÁm À2 Áy À1 ) and B (JÁm À2 ), data from Morgan et al. (1980); no plausible, legitimate regressions were obtained. OLS (gray) and SMA (black) regressions, and the 1:1 ratio (dashed) are shown.
v www.esajournals.org regression assumptions, and had a strong R 2 (0.83; Fig. 7B). The logP/logB should serve well as a reference for similar P/B regressions in P/D studies that include herbivorous zooplankton. Also, relatively little can be gained from exploring additional covariates.
In parallel to OLS results, only the SMA logP/ logB model was plausible (w i ¼ 1.0 relative to logP/B), met both regression assumptions, and had the same strong R 2 (0.83; Fig. 7B). The intercept differed from zero and the slope differed from 1, based on CIs, but the model should be predictive of P when B is available, keeping log-transforms in mind and the fact that units are Joules. Again, covariates are unlikely to improve the model greatly, given the already high R 2 .

Carnivorous zooplankton assemblages
This data set included 26 data points compiled by Morgan et al. (1980), also in units of Joules. Interestingly, all OLS regressions were similarly plausible within their subsets and homoscedastic, but none had a significant slope (Fig. 7C). Therefore, none of the OLS models are reported for carnivorous zooplankton. The greatest R 2 among OLS models was 0.10.
The SMA models served only a little better. Only the SMA logP/logB model was plausible (w i ¼ 1.0 relative to logP/B), and met both regression assumptions, but its R 2 was also low (0.10). There is clearly much room for improvement in the modeling of P/B for carnivorous zooplankton. More data sets and additional covariates may help make these models more predictive; as they stand now, they are not recommended.

Terrestrial ecosystems, by type
This data set included phytomass and aboveground net production values for 106 ecosystem types on land (e.g., tropical arid, boreal humid and subhumid, etc.), compiled by Rodin et al. (1975). Among OLS models, none met regression assumptions except the logP/logB model, which was plausible, met both regression assumptions, and was quite strong (Table 1, Fig. 8A). This model should serve as a reference for other comparisons among ecosystems; for example, it may be compared to efforts to better account for litter losses and underground biomass production.
Likewise among SMA models, only the logP/ logB model met regression assumptions and was plausible and strong (Table 1). Though room for improvement exists by the use of covariates (e.g., latitude, etc.), this model should help predict P among ecosystems for which only B is available. Moreover, it should serve as a point of reference for estimates of P based on entirely different Fig. 8. Ecosystem and biome P/B relationships, where ANPP represents aboveground net primary production and B represents aboveground phytomass. Note axes are log scaled and share units but necessarily vary in extent among data sets. (A) Ecosystems by type (Rodin et al. 1975). (B) Terrestrial biomes (Olson 1975). (C) Global biomes (Whittaker and Likens 1975), where filled circles ¼ aquatic biomes; open circles ¼ terrestrial biomes. OLS (gray) and SMA (black) regressions, and the 1:1 ratio (dashed) are shown.
v www.esajournals.org methods (e.g., satellite imagery, evapotranspiration models). Olson (1975) listed NPP and B for 36 terrestrial biomes (e.g., northern taiga, warm wetland swamp) with values estimated for pre-agricultural conditions. Only the OLS logP/logB model was plausible and met both regression assumptions (Table 1, Fig. 8B). The R 2 ¼ 0.66 exceeded that of other models (Appendix). Efforts to improve P estimates based on covariates and litter losses may improve the model. Among SMA models, only the logP/logB model met a regression assumption (homoscedasticity; error variance was non-normal) and was plausible (Fig. 8B). As above, improvements are possible, but the model describes a fairly well-fit relationship between terrestrial biome B and P.

Global biomes
This list of P and B for 19 global biomes has been reproduced in many ecology texts and included marine ecosystems (Whittaker and Likens 1975). Analyzed values were reported means for P and B, and data were divided into aquatic (N ¼ 7) and terrestrial (N ¼ 12) sets (Fig.  8C). Though analyzed, SMA-based models are less essential than for other systems because extrapolation to other global biomes is moot. The only model (OLS or SMA) for terrestrial biomes that was plausible and met both regression assumptions was the OLS power law (Table 1, Fig. 8C).
Results for global aquatic biomes differed from those for terrestrial biomes (Table 1, Appendix). The most defensible OLS model was the logP/ logB model, which was plausible, statistically legitimate, and strong (Fig. 8C, Table 1). Among SMA models, both the P/logB and logP/logB models were plausible, statistically legitimate, and strong.

Conclusion
Overall, logP/logB models were most often the most plausible, statistically defensible, and strong models evaluated (62% of OLS models, 89% of SMA models in Table 1). Even among those potentially symmetric logP/logB models, slopes were often not ;1 (30% of OLS, 41% of SMA); a 1:1 relationship is far less obvious in other models (e.g., P/logB). Models that were most plausible and statistically defensible often obtained fairly strong fits to data, indicating many models recommended here (Table 1) represent P/B patterns well.

DISCUSSION
Much research on P/D relationships has taken a risk when it assumed, without transformations and without plausible and statistically-legitimate regression modeling, that B is an adequate 1:1 indicator of P. As a result, much of P/D research has too often evaluated a B/D relationship and interpreted it as a P/D relationship (Fig. 1); a practice that likely contributed to uncertainty in P/D relationships (Mittelbach et al. 2001). Please note that the mere presence of a P/B correlation is not disputed here. The goal here was to find more plausible, statistically legitimate, and predictive relationships where possible, so that B may go beyond a questionable indicator to become a reliable estimator of P. To that extent that this goal was achieved, P/D research is strengthened, and some confusion and debate may be abated.
Plausible, legitimate, and predictive models do exist for the P/B relationship among diverse taxa and levels of ecological organization. Those models are often in the form of a logP/logB relationship, though not always. Depending on the taxon, organizational level, and methods of data collection, B may now be used to more reliably estimate P and better evaluate a potential P/D relationship. Two general regression models were applied here and address two different goals. Ordinary least squares (OLS) models are best for interpolating specific P values within a study where both B and P data have already been collected, and where P is needed in other samples with B and D data. That scenario is a relatively rare event in P/D research, but could be a way forward with careful experimental planning and execution. More commonly, B data alone are collected, but P is the variable of interest. For those many cases, a standardized major axis (SMA) regression model of other similar study systems may offer an appropriate best fit between P and B.
Each investigator must consider the relative risks of assuming their study system is similar to v www.esajournals.org others in a SMA regression. In the absence of P measurements, does one simply assume B is a 1:1 proxy for P, or does one use a SMA-based estimate of P? Results here do not indicate the 1:1 proxy often will serve well to indicate P. But before using an SMA-based model, one should think about how the plausibility, legitimacy, and strength of the model, as well as how well a study system is represented by other systems used in a model listed here. Also, the decision should not rest on hoping that analyses may yield a greater P response to D treatments or sampled conditions. The use of SMA models presented here may affect P/D research in three ways: (1) effects of D on P are greater than thought, because the assumed 1:1 P/B relationship was an underestimate; (2) no real difference exists relative to a 1:1 P/B assumption; or (3) D affects P less than already thought, because the assumed 1:1 P/B relationship was an overestimate. As an example of how better estimation of P may affect studies, I analyzed two experimental data sets important to P/D research. In both studies, the treatment variable was species richness and the response variable was B, used as a proxy for P.

Example 1: Cedar Creek
Readers who have made it this far are surely familiar with the experiment conducted by Dr. Tilman and colleagues at the Cedar Creek Natural History Area (e.g., Tilman et al. 2001), in which plots were inoculated with a range of species (range ¼ 1-16), and later followed in detail (1996)(1997)(1998)(1999)(2000)(2001). Data from the E120 experiment are provided at http://www.cedarcreek. umn.edu/research/data.html. The analysis here does not duplicate or replace the work by Dr. Tilman and colleagues. Instead, it merely uses data they kindly make available to evaluate the importance of P/B relationships presented here. The treatment factor analyzed here was the number of planted species (SpNum). The response variable was peak aboveground living biomass (gÁm À2 ), used by Tilman et al. (2001) as a proxy for P. I analyzed the repeated measures data using a generalized linear mixed model (glmmadmb package in R [Fournier et al. 2012]), with Gaussian distributions assumed and without zero inflation, where study year and soil nitrogen were random factors.
Aboveground biomass was highly significantly affected ( p , 0.001) by SpNum, as reported by Tilman et al. (2001). The coefficient for the effect of SpNum on aboveground biomass was 10.24, meaning that (on average) every new species planted caused 10.24 additional gÁm À2 biomass, separate from interannual variation and soil N effects.
Aboveground live biomass was converted to estimated P using the grasslands P 5 /B 1 SMA regression equation (Fig. 5A, Appendix, described above). The generalized linear mixed model was repeated, but with estimated P now as the response variable. The model was again highly significant ( p , 0.001); fundamental conclusions by Tilman et al. (2001) are not disputed here. However, the coefficient for the effect of SpNum on estimated P was now 12.36, meaning the effect of planted species on production was underestimated by 17% when biomass was the proxy response. To put it another way, the response of P to SpNum was 20.7% greater than originally estimated using B.

Example 2: BIODEPTH
A related, even larger experiment was conducted across Europe by a large team (Hector et al. 1999, Spehn et al. 2005. Plots at eight sites were seeded with a range of species richness (SR; minimum ¼ 1, maxima ranged 8-32 among sites). The composition and functional groups of seeded plants was also important. Aboveground biomass (live and standing dead) was sampled once (5 sites) or twice (3 sites) each of three years, and plots were mowed after sampling. As before, this B was a proxy for P. My goal here was to compare two models that reasonably represented the more complex and detailed analyses of Spehn et al. (2005); one that evaluated aboveground B as a response and the other that evaluated estimated P as a response. Data kindly reported in the Supplement (Ecological Archives M075-001-S1) to Spehn et al. (2005) were analyzed here.
A generalized linear mixed model was used (as described for the Cedar Creek data), where B was a function of: SR, functional richness (FR; range ¼ 1-3), sites, blocks, and study year (as a random factor). Modeled factors were selected to mimic analyses of Spehn et al. (2005), and consistent with their work, sites significantly and strongly affected results ( p 0.001). Beyond site effects, B was also significantly affected by SR and FR ( p , 0.001 in both cases). With this model, the coefficient for the simple B/SR relationship was 6.34, meaning that (on average) every species planted caused 6.34 additional gÁm À2 biomass, apart from other factors. The matching coefficient for FR was 84.6, meaning that adding one functional group (often composed of multiple species) had a large effect on B.
Given reported data, the SMA P 5 /B 2 model using logP/logB (Fig. 5B, Appendix, described above), was most appropriate, plausible and legitimate to estimate P. After substituting estimated P for B in the model above, sites remained significant and important ( p 0.001; same pattern among coefficients), as did both SR and FR ( p 0.001). But the SR coefficient was now 5.04 and the FR coefficient was now 67.78. Again, basic conclusions were not altered, but the effects (coefficients) in the model were now reduced 20%, meaning that effects had been overestimated using B as a proxy for P. To put it another way, the response of P to planted SR was 20.5% less than originally estimated using B.

Comparing Cedar Creek to BIODEPTH
Regression-based P estimation yielded opposite effects for the Cedar Creek and BIODEPTH experiments. Both SMA regressions used for these data sets have slopes greater than the 1:1 slope that is assumed when B is used as a proxy for P (Fig. 5A, B). However, the elevations of the SMA lines relative to the 1:1 line also mattered. The model used for Cedar Creek (Fig. 5A) crosses over the 1:1 line, while the model used for BIODEPTH (Fig. 5B) is below the 1:1 line in the data range. Thus, conversions of B to P for BIODEPTH downgraded values to reduce estimated P relative to the 1:1 assumption, while Cedar Creek tended to be upgraded in estimated P. The relative position of a study's data in the appropriate data cloud (Fig. 5) will affect the B-to-P conversion.
Please note the conclusions of the Cedar Creek and BIODEPTH studies (Hector et al. 1999, Spehn et al. 2005 were modified but not contradicted by analyses here, as should be the case for well-conducted studies. The intended role of analyses here is simply to help P/D research turn from indication to estimation of P. By doing so, we may more confidently infer effects of D on P and help resolve general P/D patterns among studies, including but not limited to grasslands experiments.

Grasslands relative to other systems
A quick inspection of Figs. 4-8 indicates that estimation of P from B in other systems will often make a greater difference than it did in the two grassland data sets, in that other SMA regressions are further displaced and/or have slopes much different from the 1:1 line. For example, best fits to the P/B data for woody plant species (Fig. 4A) and assemblages (Fig. 6A) are generally displaced below the 1:1 line. The consequent reduction in P estimates is especially pronounced for the tropical and non-tropical forest data compiled by Keeling and Phillips (2007), where the 1:1 line barely appears within the plots (Fig.  6B-D). This makes sense, given woody plants accrue biomass over multiple years, so that B should exceed annual P.
Wetland plants were too varied and regressions too weak to treat them as strongly predictive (Fig. 4B). A useful research direction would be to compile more independent P and B and develop better P/B models for wetland vegetation, given the importance of wetlands in landscapes for biological diversity and productivity (Mitsch and Gosselink 2011).
Among heterotrophs (Figs. 4C-F, 7), production is generally estimated by methods based on B increments or metabolic activity (microbes), without risk of assuming a peak B indicates P. There is little basis to expect certain slopes or intercepts among these coarse taxa, nor was this the intent of analyses here. However, model terms varied among taxa and represent macroecological differences among varied phylogenies and life history strategies represented here. Given that the analyses of species-level data ( Fig. 4) essentially represent an allometry of populations, it may be useful to combine phylogenetic constraints with allometric models (Rall et al. 2011) to further resolve patterns. For example, the insect data set (911 entries) encompasses 6 orders of magnitude in P, with variation around regression lines ;2 orders of magnitude. Phylogeny (e.g., Misof et al. 2014) likely accounts for some of that 100-fold variation. At even greater phylogenetic breadth, it would be interesting to evaluate phylogenetic bases for varia-tion among taxa in the allometric scaling of population biomass production.
Across the diverse systems and organizational levels examined here for P/B relationships, there is no evidence to suggest B should generally serve as a simple 1:1 proxy for P. Specifically, SMA logP/logB relationships reported here as plausible, legitimate, and reasonably predictive (R 2 . 0.50) varied substantially. Seven of 11 (64%) had slopes different from a 1:1 line in loglog space, and the intercept term varied by ;1 order of magnitude and also contributed to variation among the relationships (Lomolino 1989). Essentially, a focus only on a P/B ratio draws attention to the slope alone, whereas a full regression equation, including intercept terms, is required for logP/logB models. One should also remember that a slope different from 1 in log-log space can mean something quite different in linear space, depending on coefficients.

Going forward
Research on P/D relationships will benefit from better estimation of P, which represents one important ecosystem property in the BEF context and directly relates to SRPR (e.g., Mittelbach et al. 2001, Whittaker and Heegaard 2003, Hooper et al. 2005, Cardinale et al. 2011. The OLS and SMA modeling approach and model selection applied here should apply to other ecosystem properties as well (Hooper et al. 2005). If so, relationships between ecological structure and function (O'Neill et al. 1986) may become clearer, to the benefit of ecology and a scientific basis for informed policy.
Earlier research seeking predictive P/B ratios found ratios varied widely and scaled with adult body mass (e.g., Coupland 1979, Banse andMosher 1980). The few P/B studies that used logP/logB transforms (e.g., Webb et al. 1983, Duarte 1989, Downing and Plante 1993 were on the right track but rarely followed. Fortunately, many past P/B studies now provide a substantial basis to better estimate P from B. Model selection (Burnham and Anderson 2002) is a recent advance relative to most P/B research; it has not been applied to P/B models until this work but should be a common basis in future research on the effects of biodiversity on ecosystem functioning. Likewise, SMA regression is common in allometric scaling (Warton et al. 2006) and should be expected in future research on allometric relationships in ecosystem functioning. In future P/D (or more general BEF) research, I recommend the following: 1. Optimally, estimate P directly by wellestablished empirical methods (e.g., Newbould 1967, Vollenweider et al. 1974, Downing and Rigler 1984 and relate those P estimates to sampled D. Results here do not replace that fundamental approach or suggest it may be avoided. 2. If both P and B data in some study plots are available and B and D data are available in matched plots, obtain a P/B relationship with OLS regression and use that model to estimate P in the matched plots. Then relate estimated P to sampled D. 3. If B and D data are available (but not P), use SMA regressions here (or generated with other data) to estimate P from B, and use estimated P to relate to sampled D. 4. If other ecosystem properties are to be estimated from B and then related to sampled D, do not assume a 1:1 indication of a property by B. Instead, use the general approach here to: (a) avoid confounding a structural measure with a functional one; (b) use empirical evidence rather than a simplistic assumption; (c) include standardized major axis regression to best represent the bivariate pattern; (d) use information theoretic model selection to identify the most plausible models among possible choices; and (e) identify models that best comply with regression assumptions.

ACKNOWLEDGMENTS
This work would not be possible without the many who collected production and biomass data through many decades in many systems; their careful work remains relevant and a model for future research. I thank anonymous reviewers for comments that helped improve the manuscript, Dr. Nelson Ying for his ongoing support via the James & Annie Ying Eminent Scholarship in Biology, and the University of Central Florida for sabbatical support. Adler, P. B., et al. 2011. Productivity is a poor predictor v www.esajournals.org Table A1. Detailed results of P:B regression analyses by level and data set. OLS and SMA generate identical R 2 values. Only OLS power law (P ¼ aB b ) is nonlinear. In columns, Y ¼ yes, N ¼ no. Entries generated advisedly (see footnotes) are flagged with brackets (e.g., [Y]). A slope not different from 1 indicates that B can act as a simple proxy for P, given transformations. OLS power law and SMA slopes were evaluated as significantly different from zero with 95% confidence intervals ( p ¼ 0.05). Model selection was conducted for subsets as follows: the first subset contained the first three models of each group; the following three subsets each contained the next two subsets in the set, e.g., for terrestrial plants, the first subset contained the OLS power law, OLS logB, and OLS linear models; the second subset contained OLS log-log and OLS logP models; the third subset contained SMA logB and SMA linear models; the fourth subset contained SMA log-log and SMA logP models. Regression models are listed only for most plausible models of a subset (evidence ratio [w i /w j ] 2) if models also have approximate homoscedastic error and a slope significantly different from zero. *p 0.05; **p 0.01; ***p 0.001.      (Motulsky and Ransnas 1987). Slope significance here relates to coefficient b in P ¼ aB b , though a also contributes to curve shape (Lomolino 1989).

LITERATURE CITED
à Despite apparent homoscedasticity according to a Breusch-Pagan test or the equivalent regression of residuals 2 as a function of B (or log 10 B, as appropriate) visual examination of a residuals plot indicated strong data structure undetected by the test (e.g., curvilinearity of residuals that indicates nonlinearity but has a flat linear regression slope through the points). This was most likely with log-transformation of one axis.
§ Despite a significant Breusch-Pagan test ( p 0.05) or equivalent regression of residuals 2 , visual examination of a residuals plot indicated little data structure. Results were interpreted as indicating no more than modest violation of homoscedasticity. Despite a nonsignificant Shapiro-Wilk statistics ( p ! 0.05), a histogram of residuals was visually different from a normal distribution, and points deviated substantially from a QQ normal line. Results were interpreted as indicating violation of normal error distribution that was not detected (often due to relatively low N).
} Despite a significant Shapiro-Wilk statistics ( p 0.05), a histogram of residuals was visually similar to a normal distribution, and relatively few points deviated from a QQ normal line or did so with a limited extent. Results were interpreted as indicating no more than modest violation of normal error distribution.
# Despite a nonsignificant Shapiro-Wilk statistics ( p ! 0.05), a histogram of residuals was visually different from a normal distribution, and points deviated substantially from a QQ normal line. Results were interpreted as indicating violation of normal error distribution that was not detected (often due to relatively low N).