Beyond bivariate correlations : three-block partial least squares illustrated with vegetation , soil , and topography

Ecologists, particularly those engaged in biogeomorphic studies, often seek to connect data from three or more domains. Using three-block partial least squares regression, we present a procedure to quantify and define bi-variance and tri-variance of data blocks related to plant communities, their soil parameters, and topography. Bi-variance indicates the total amount of covariation between these three domains taken in pairs, whereas tri-variance refers to the common variance shared by all domains. We characterized relationships among three domains (plant communities, soil properties, topography) for a salt marsh, four coastal dunes, and two temperate forests spanning several regions in the world. We defined the specific biand tri-variances for the ecological systems we included in this study and addressed larger questions about how these variances scale with each other looking at generalities across systems. We show that a system tends to exhibit high bi-variance and tri-variance (tight coupling among domains) when subjected to the effects of frequent and widespread (i.e., broadly acting) hydrogeomorphic disturbance. When major disturbance events are uncommon, bi-variance and tri-variance decrease, because the formation of vegetation, soil, and topographic patterns is primarily localized, and the couplings of these properties diverge over space, contingent upon site-specific disturbance history and/or fine-grained environmental heterogeneity. We also demonstrate that the bi-variance and tri-variance of a whole system are not consistently either greater or smaller than those of the associated sub-zones. This point implies that the overall correlation structure among vegetation, soil, and topography is conserved across spatial scales. This paper addresses a critical aspect of ecology: the conceptual and analytical integration of data across multiple domains. By example, we show that bi-variances and tri-variances provide useful insight into how the strength of couplings among vegetation, soil, and topography data blocks varies across scales and disturbance regimes. Though we describe the simplest case of multi-variance beyond the usual two-block linear statistical model, this approach can be extended to any number of data domains, making integration tractable and more supportive of holistic inferences.


INTRODUCTION
The spatial distribution and temporal dynamics of vegetation are often correlated with a set of known environmental conditions, including those associated with soil and topography (e.g., Cowles 1899, Boerner 2006, Kim and Yu 2009. In floodplains, for example, the role of topography is often manifested through its effects on flood regime, resulting in communities that are distributed along gradients of flooding, geomorphology, and soils (e.g., Sharitz and Mitsch 1993). Conditions related to soils and topography are then used as independent factors to predict the behavior of individual plant species, populations, and communities measured as changing abundance, biomass, or composition. Over decades, this approach facilitated considerable progress in understanding the mode of vegetation-soil and vegetation-topography couplings in upland areas and vegetation-flooding couplings in floodplain settings. The present study aims to stretch the discipline's perspective beyond such bivariate relationships by estimating the variance simultaneously shared by vegetation, soil, and topography in various ecological systems.
Quantifying relationships between variables has a long history in scientific research (Galton 1888, Pearson 1920, Wright 1921, Kendall 1938, Fisher 1954. Conventional multivariate regression approaches, while appearing complex, are essentially based on quantifying the covariation between a response (i.e., dependent) variable of interest and each of several predictor (i.e., independent) variables. The amount of this covariation is hereafter termed bi-variance. The concept of bi-variance is widespread and deeprooted in modern science; to wit, researchers often contort their analyses into bi-variance problems even when the intuitive structure or fundamental nature of the data involves three or more fundamental domains.
This research focuses on the simplest case beyond bi-variance in which the relationships among three data blocks are considered; thus, the amount of the common variance shared by all of them will be called tri-variance (i.e., d in Fig. 1). Uni-variance will indicate the variance in a data block that is unrelated to any other data block under consideration (i.e., the ''error'' in standard parametric statistics; a, e, and g in Fig. 1).
Conceivably, covariance relationships among blocks may be structured in many ways. At one extreme, any two of the three factors are related to each other to some degree, but there is no variance shared by all of them simultaneously ( Fig. 2A). Put another way, the interaction between Factors A and B occurs exclusively, regardless of the other interactions between A and C or between B and C. Therefore, knowing how A and B covary does not provide any insight into the A-C or B-C correlation (and vice versa). In the second case, covariance between any two factors and among all factors exists (Fig.  2B). Here, from a certain type of interaction, e.g., between Factors A and B, one can infer to some extent the pattern of the other interactions between A and C or between B and C. The reliance of such inference should improve as trivariance relative to bi-variance increases, and the reliance should become maximized when the magnitude of bi-variance and tri-variance is the same, another extreme case (Fig. 2C). Overall, these three situations can be commonly summarized as where ''other factors'' refer to chance and latent (unmeasured causal) influences. As these equa- v www.esajournals.org tions imply, any one of Factors A, B, and C is apparently related to the other two; however, the equations deliver no more than such bivariate information, as if the causal/correlative scenario were structured only as in Fig. 2A. Whether and how much the covariation among the three exists cannot be determined until one actually estimates the tri-variance.
As bi-variance indicates how closely two data blocks are related to each other, tri-variance is a proxy for the level of covariance among all three blocks. For example, many coastal dunes are located where the effects of overwash disturbance are not manifested (i.e., tide-dominated coasts; Stallins and Parker 2003). Under the absence of widespread control over the dunefield, the formation of vegetation, soil, and topographic patterns is primarily localized; therefore, their relationships may diverge over space. As a consequence, locations having similar topographic conditions may exhibit greatly different plant species composition or successional pathways, contingent upon site-specific variations in disturbance history or soil microsites. In such a case, the amount of the variance shared by any two factors and, especially, by all three is likely low over larger areas. In contrast, other dunes are highly subjected to periodic or episodic overwash events that influence the mortality and establishment of plant species, soil development, and surface morphodynamics (i.e., wave-dominated coasts; Miller et al. 2010, Young et al. 2011). Under such widespread hydrogeomorphic control across the dunefield, vegetation, soil, and topography likely behave in unity over space and time, namely, their relationships may be more consistent and less noisy than the case above. Accordingly, the associated tri-variance as well as the bi-variance is expected to be high over larger spatial areas.
The multi-block data methods we describe below allow inquiry into the specific nature and scaling pattern of bi-variance and tri-variance across systems with different biogeomorphic conditions. In this paper, we develop three-block partial least squares regression (3-B PLS) for field data on vegetation, soil, and topography acquired from widely divergent ecological systems (coastal dunes, salt marshes, regenerating floodplain forests, and upland temperate forests) in various regions of the world. Each of these systems encompasses a gradient of topo-edaphic conditions, and is considered to be spatially heterogeneous in terms of ecological and physical processes. Therefore, we will adopt a twostep approach in which we first characterize the vegetation-soil-topography relationships throughout a whole system, and then decompose it into sub-units. In the case of coastal marshes, for example, we will examine patterns for the entire marsh first, and then subdivide the system into creek bars, bank levees, and interior marsh platforms to perform separate investigations on the same relationships within each of these subzones. Specifically, we intend to evaluate the v www.esajournals.org following three hypotheses: Hypothesis 1 (H 1 ): Systems under widespread (i.e., broadly acting) environmental controls have higher bi-variance and tri-variance than those characterized by localized interactive biotic-abiotic processes. This hypothesis is directly related to the example of two contrasting coastal dune types mentioned above. We will evaluate this idea in the context of vegetation-soil-topography couplings.
Hypothesis 2 (H 2 ): Separate analyses of sub-zones produce consistently smaller or greater bi-variance and tri-variance than analysis of a whole system. Scale-dependence is a fundamental notion in ecology (Levin 1992, Stohlgren et al. 1998, Turner 2010. It is widely known that distinctive ecological patterns and processes are dominant at different spatial scales. Thus, modifying scale, be it the size of a unit plot or the size of a whole study area, would involve a change in the correlation structure among biotic and abiotic components . In sum, the results of ecological analyses and the associated inter-pretations are strongly dependent on the spatial scales in which a study is focused. For instance, small-scale studies often demonstrate a negative relationship between native and exotic plant species diversity (Tilman 1997, Stachowicz et al. 1999, Brown and Peet 2003, whereas at large scales the opposite relationship has been discovered (Lonsdale 1999, Stohlgren et al. 2003, Davies et al. 2005). In the same context, the strength of vegetation-soil-topography relationships should vary depending on the scales of data collection. However, there is no consensus as to whether such strength will increase or decrease after dividing a heterogeneous whole system into a number of homogeneous sub-zones. Alternatively, there might be no consistently increasing or decreasing strength after the division; that is, the relationships may become weaker in some subzones and stronger in the others. In this case, the bi-variance and tri-variance of the whole system represent more or less the average strength of the biotic-abiotic couplings observed in the subzones (see Fig. 3). Fig. 3. Comparison of analyses for a whole system and for its sub-zones. The factors of interest are vegetation, soil, and topography. The whole system encompasses a range of environmental gradients (e.g., landforms, disturbance regimes, etc.), so it is considered spatially heterogeneous with regard to ecological and physical processes. The sub-zones are relatively homogeneous, however. Each dot represents the location where vegetation-soil-topography relationships are examined.
v www.esajournals.org Hypothesis 3 (H 3 ): Bi-variance is a good indicator of tri-variance. This hypothesis is somewhat related to why there has been no explicit estimation and use of tri-variance in related literature. Is this because bi-variance would be a sufficient measure to provide enough detailed insight into vegetation-soil-topography relationships in any systems (i.e., do we ecologists not even have to bother with the concept of trivariance)? We expect that, in general, there will be a positive correlation between the values of bivariance and tri-variance: the greater the amount of the total variance shared by any two of vegetation, soil, and topography in pairs, the greater the amount of the variance shared by all of them simultaneously. However, the significance (or variability) of the correlation is a different issue; one should not ignore a wide range of possible patterns in vegetation-soiltopography relationships as briefly described in Fig. 2. It is plausible that two distinct ecological systems, possessing the same magnitude of bivariance, may have different tri-variances. Also, one can imagine even a case of negative relation: the bi-variance of system A is smaller than that of system B, but A has a higher tri-variance than B.

3-B PLS
Vegetation, soil, and topography are multivariate in nature: most vegetation data sets are composed of multiple l plant species, the abundances or biomass of which are measured in a certain number of plots. In these same plots, one can collect and analyze a set of soil data consisting of m physical and chemical properties, as well as topographic data of n landform attributes. For each data domain, there are relationships among variables both within and across data blocks. In this paper, use of community, soil, and topographic data results in three data blocks. Taken together, these blocks produce l 3 m 3 n possible bivariate cases of vegetationsoil-topography relationships. An attempt to understand each of these relationships will be considered inefficient if several variables within each block are significantly correlated with one another-the problem of multicollinearity (Farrar andGlauber 1967, Graham 2003).
In such a situation, the most widely employed solution has been the use of ordination, a family of multivariate data reduction techniques that extract one or a few axes from the original variables. The data points (e.g., plots, species) are then ordered to aid interpretations of ecological and environmental patterns (Pielou 1984, ter Braak and Prentice 1988, Palmer 1993, Austin 2013. For example, canonical correspondence analysis (CCA) can directly and immediately relate plant species composition to measured environmental variables, along a few major axes (ter Braak 1986). Although CCA has been widely used in ecology (Lepš andŠmilauer 2003), it cannot actually help to achieve the goal of the present research: understanding the tri-variance among vegetation, soil, and topography. This is because, in CCA (and in similar methods), all soil and topography variables are lumped together as environmental variables (i.e., predictors) and the community data are treated as dependent variables; but of course, many taxa also change the environment around them, making covariance a very fluid concept that can propagate in multiple directions among data domains.
As a possible approach (Fig. 4), one may perform a principal component analysis (PCA) based on the correlation matrix produced from a single block. Among the resulting eigenvectors composed of axis scores for each plot, those axes with eigenvalues greater than 1 (assuming standardized data) are considered significant and selected as the principal components of that block. The same procedure is repeated for the remaining other blocks. Then, the components extracted from and representing each block are correlated to understand the patterns of vegetation-soil-topography relationships (e.g., Kim et al. 2012).
This PCA-based approach has one clear limitation, however: the extraction of components in a block is conducted without taking into account the covariation structure of the other blocks (see Carrascal et al. 2009 for more details). In other words, the extracted components maximize the correlation among variables within a block, but these components do not reflect the maximization of the correlation across blocks. As a consequence, in many cases, the principal components from an environmental block and those from a vegetation block exhibit low and non-significant correlations, hindering ecologically meaningful interpretation (Carrascal et al. v www.esajournals.org

2009).
Various multivariate techniques have been developed to address this issue, dealing with multi-block data sets in a symmetrical perspective (for a more thorough discussion, see Borcard et al. 2011, Legendre andLegendre 2012): coinertia analysis (Dray et al. 2003), multiple factor analysis (Escofier and Pagè s 1994, Bécue-Bertaut and Pagè s 2008, Abdi et al. 2013, and Procrustes analysis (Gower 1975, Gower andDijksterhuis 2004). While these methods have been employed by a number of ecologists in recent decades, the present work intends to introduce PLS as an alternative powerful approach that has nonetheless been underrepresented in ecological litera-ture (Wold et al. 1984, Rosipal and Krämer 2006, Abdi 2007, Esposito Vinzi et al. 2010. It is an extension of multiple regression analysis that, when extracting major components from a certain block, seeks simultaneously to maximize the explained variance of the other blocks. Thus, components from different blocks would have the greatest mutual linear predictive power (Naes and Martens 1985, Geladi and Kowalski 1986, de Jong 1993. PLS can be a useful approach, especially in ecological studies, because it is designed to overcome typical situations, where (1) the number of predictor variables is similar to or greater than the number of observations and/ or (2) environmental variables, measured with an Fig. 4. A possible ordination procedure followed by correlating primary axes of different data blocks. In this particular case, we present an example of principal component analysis, but this analytic flow is also applicable to the other modes of ordination such as correspondence analysis and principal coordinate analysis. In this conceptual example, two principal components (PC) are extracted in each block, but the number can change depending on the magnitude of eigenvalues. The thick, solid, lines imply that the extraction of PCs is conducted exclusively to each block, that is, without taking into account the covariation structure of the other blocks (V: variable; P: plot; EV: eigenvalue).
v www.esajournals.org aim to predict the spatial and temporal dynamics of organisms, are significantly correlated with one another. Despite such advantages of using PLS, there is a surprising paucity of research that employs the method in ecological literature (Carrascal et al. 2009, but see ter Braak andJuggins 1993, Birks 1996). Furthermore, to the best of our knowledge, PLS has not been conducted for three-block biotic and abiotic data, which is the subject of this paper. It should be noted that an exhaustive comparison of 3-B PLS and the several multi-block methods mentioned above is beyond the scope of this paper. There should be relative pros and cons of all these approaches depending on the type of ecological data-e.g., number of samples, degree of skewness, coefficient of variation, and direction of causality among data blocks. We illustrate the potential of 3B-PLS for addressing questions about reciprocal interactions between the environment and biota that go beyond bivariate relationships but do not become obscured by the dynamic instantaneousness of the feedbacks among them.

MATERIALS AND METHODS
Overall, this paper analyzes vegetation, soil, and topographic data collected by five groups of ecologists from seven different systems at widely varied locations in the world. Brief explanations of each study area and field survey are presented below (for more detailed information, see Appendix).
Costa's study area and data collection Costa et al. (1996) collected vegetation, soil, and topography data sets at coastal dunes along the southernmost 217 km long stretch of Brazil. The study region encompassed northern and southern sites: the northern dunes exhibited relatively high foredune ridges (;7 m) and the southern sites showed relatively large seashore deflation plains. At each site, Costa et al. (1996) established five 100 m long and 1 m wide strip transects perpendicular to the coastline, starting at the leading edge of the vegetation. Each transect was divided into 10 zones each with an equal length of 10 m. Along each zone, the presence of plant species was recorded at 2 m intervals using 1 m 2 square quadrats. Then, all floristic data from these zones were pooled to represent the overall frequency of each species. In each zone, the researchers also examined the pH of superficial sand (10-20 cm depth) and the proportion (%) of coarse (.0.50 mm), medium (0.25-0.50 mm), fine (0.062-0.25 mm), and clay/ silt (,0.062 mm) sediments. Two topographic attributes of each zone were included in the present work: surface elevation (m) and distance from the coastline (m), which corresponded to the gradients in sedimentation rates and salinity.

Kim's study area and data collection
Kim's research sites included the Sindu coastal dunefield in South Korea and the Skallingen salt marsh in Denmark. The dunefield ranged for about 4 km along the coastline and extended 0.5-1.5 km inland. This system exhibited welldeveloped topographic features, such as foredune, second dune, and interdune depression (or swale), as well as distinct vegetation zonation in each of these habitats (Kim et al. 2008, Kim andYu 2009). In the summer of 2003, Kim established a rectangular plot (180 3 280 m) that fully encompassed these three geomorphic zones. The plot was divided into 126 grids of 20 3 20 m. In each grid, he surveyed the abundance of 21 main plant species, the mean cover of which exceeded 0.5% across the plot. Also, a total of 11 physical and chemical properties were analyzed for the 126 soil samples collected from the grid centers. Finally, for these centers, eight terrain parameters were extracted from a digital elevation model of the dunefield.
The Skallingen marsh was characterized by a complex tidal creek system that actively migrated through bar deposition and cutbank erosion across the marsh's platform (Kim 2012). In the summer 2006, Kim established 11 line transects along four tidal creeks. These transects were, on average, about 25 m long and perpendicular to the streamline. Each transect fully encompassed three distinct topographic zones, such as newly deposited point bar, bank levee (or transition), and inner marsh field (or marsh interior). Along these transects, a total of 9-10 locations were selected for soil and topographic surveys. Kim identified 10 soil physical and chemical properties and two topographic variables (surface elevation and distance to the channel). At each location, two replicate square v www.esajournals.org quadrats (1 3 1 m) were identified for vegetation survey.

Kupfer's study area and data collection
The Kupfer data set comes from a study of post-logging succession on the floodplain of the Bates Fork tract within Congaree National Park (NP), near Gadsden, South Carolina, USA (Kupfer et al. 2010). Sample transects (N ¼ 63; 30 m in length) were established in forest clearcuts that were located in three different geomorphic settings (a backswamp, a remnant natural levee associated with a former meander of the Congaree River, and the active natural levee adjacent to the current river channel) and had been undergoing succession for 4-5 years. Vegetation composition was measured in three circular subplots (2.82 m radius) spaced evenly along the length of the transects; these transects were oriented parallel to the forest edge and located 5-50 m into the clearcut. Soil cores were taken from the top 15 cm of soil on each subplot, composited by transect, dried, and then analyzed for pH, extractable phosphorus, potassium, calcium, magnesium, zinc, copper and sodium, organic matter, cation exchange capacity (CEC), exchangeable acidity (the amount of total CEC occupied by H þ and Al 3þ ), and total percent base saturation (the percent of exchange sites occupied by base cations). Two variables related to topography, and by extension, flood regime, were determined for each subplot and averaged by transect: (1) elevation, which was extracted from a high resolution, LiDAR-derived digital terrain model of the study area, and (2) the depth of inundation during a 98,000 cfs flood, which was estimated using a flood inundation model (HEC-RAS; Kupfer et al. 2010).

McEwan's study area and data collection
The McEwan data set was generated in a study of a 52-ha old-growth forest in southeastern Kentucky, USA (McEwan and Muller 2011). Sampling took place within four 1-m 2 circular plots (N ¼ 320) nested within 80 larger plots established by Muller (1982). Muller (1982) used ordination techniques to classify these 80 plots into three ecological communities: sugar maple (Acer saccharum; 17 plots), beech (Fagus grandifolia; 31 plots), and chestnut oak (Quercus montana; 32 plots; McEwan and Muller 2006). An inventory of herbaceous plants, with an associated visual estimate of cover for individual species, was conducted on all herb-layer subplots beginning the first week of April, June, and August of 2000. Topographic variables were collected by Muller (1982) who measured elevation and slope aspect on each of the plots. Soil samples (N ¼ 4/subplot) were collected within each herb subplot in June of 2001. Samples were homogenized and analyzed for plant-available nitrogen (NH 4 þ , NO 3 À ), Mehlich III-extractable P, K, Ca, Mg and Mn, cation exchange capacity, and base saturation. Hemispherical digital canopy images were collected by McEwan at each permanent plot center and analyzed using Gap Light Analyzer image processing software (Frazer et al. 1999, McEwan andMuller 2011).

Stallins's study area and data collection
The two U.S. barrier islands in the studies conducted by Stallins were Sapelo Island, Georgia and South Core Banks, North Carolina, USA. Dune vegetation and soils were sampled in the summer of 1999. At each of the five sites along a 5-km stretch of Holocene dunes on Sapelo, three transects were aligned perpendicular to the shoreline. Vegetation was sampled at 1-m intervals with a point frame. One hundred and forty soil samples were systematically drawn from representative habitat types (dune ridges, flats, and swales) among these transects, with the additional requirement that the distance between successive soil samples was no greater than 10 m. A similar sampling design (five sites of three transects each) was employed on South Core Banks. However, island length here was greater (35 km), as was the distance between successive soil samples (20 m) given longer transect lengths. A total of 136 soils samples were collected. Full details are provided in Stallins (2001) and Stallins and Parker (2003).

Statistical procedure for 3-B PLS
Before the 3-B PLS was conducted, all variables were transformed using appropriate mathematical functions so that the data met the normality assumption in statistics. Because these variables were measured in distinct units, they were standardized and unitless (i.e., their mean and standard deviation were 0 and 1, respectively). Not surprisingly, vegetation, soil, and topogra-phy blocks were of different size in each data set; that is, the number of variables in each block was different. For example, Kim's data from the Sindu coastal dunefield consisted of 24 plant species, 11 physical and chemical soil attributes, and eight landform parameters. To scale these varying block sizes (BS) into 1, we applied the following modification for each data value (DV) within each block: This scaling equation (adapted from Ginter et al. 2012) can also be used if desired to scale data blocks to any size, such as the number of independent dimensions, or ''true rank'' of data, by replacing the numeral 1 with the desired block size.
Following Streissguth et al. (1993) and Bookstein et al. (2003), the first stage of 3-B PLS defined three unit vectors, one each for vegetation, soil, and topography blocks (U v , U s , and U t ) and then iteratively optimized them. Bookstein et al. (2003) suggested using arbitrary starting values for the starting vectors. However, we performed PCA for each block to select the first principal components as the starting unit vectors.
In our preliminary analysis, we found that these two different approaches produced almost identical results in the end. Interpreting these unit vectors as linear combinations, we calculated where X v , X s , and X t represented the whole vegetation, soil, and topography blocks, respectively. X was a data matrix with N rows (samples) and k columns (variables; i.e., X ¼ N 3 k). Each s was normalized to sample variance 1, and we then computed the correlations among s v , s s , and s t in pairs This was the end of the first iteration in our approach of 3-B PLS. The second iteration began with developing updated unit vectors, U v , U s , and U t , based on the inter-block correlation structure as follows: In other words, these new unit vectors were generated by summing ''predictions of the individual variables using combinations of the scores s with the correlations r as weights'' (Bookstein et al. 2003:185). We then returned to Eq. 3 to repeat the same iteration until arriving at a convergence of stable correlation coefficients, r vs , r st , and r tv , that did not vary between iterations (e.g., see Fig. 5).
At convergence, the final vectors, U v , U s , and U t , were labeled as the first singular axes representing the blocks of vegetation, soil, and topography, respectively. A clear distinction exists between these singular axes and principal components: principal components are extracted from the internal correlation structure within each block of data, with no regard to covariation across data domains (vegetation, soil, topography). Thus, each principal component maximally represents the overall pattern of only a given data block. In contrast, a singular axis from 3-B PLS is predicted by the weighted combination of the other two blocks as Eq. 5 indicates. Therefore, Fig. 5. An example of converging correlation coefficients among the unit vectors of vegetation, soil, and topography data blocks in pairs, acquired by Kim at the Sindu coastal dunefield, South Korea. Correlation coefficients r vs , r tv , and r st indicate vegetation-soil, topography-vegetation, and soil-topography couplings, respectively. In this particular case, the three coefficients started to become stable (i.e., convergence) at about the 8th iteration.
v www.esajournals.org each singular axis represents the correlations of the elements of a given data block with other blocks.
While the first singular axes express the patterns that have the greatest correlation among vegetation, soil, and topography blocks, one may also be interested in extracting the second (and higher) axes. By construction, the second singular axes-analogous to the second principal component in PCA-are uncorrelated with the first vectors (Bookstein et al. 2003). Hence, the second (and third or more) axes can be easily extracted by applying the same algorithms, presented above, to the residuals after the first vectors have been regressed out of the original data blocks. In this paper, however, our analysis and results are concerned primarily with the first singular axes.

Estimating bi-variance and tri-variance
The first three singular vectors (each from vegetation, soil, and topography data blocks) were standardized (i.e., scaled to unit length) to make them directly comparable to one another. Therefore, in the Venn diagram shown in Fig. 1 (Factor A ¼ vegetation; Factor B ¼ soil; Factor C ¼ topography), the circles of these three factors were identical in size. In the context of regression modeling, the size of each circle was equivalent to the total sum of squares (TSS ¼ N À 1).
We then employed the concept of extra sums of squares (ESS), tentatively treating vegetation as the response variable, with soil and topography as the predictor variables (indeed, it did not matter which factor was treated as the response variable). ESS measured the marginal reduction in the residual sum of squares that occurred due to the addition of a (group of ) predictor variable(s) to the existing regression model (i.e., SSR (X2jX1) in Table 1). In this research, the original regression model was where it was assumed that topography was the preexisting predictor variable and soil was the newly added predictor variable. The a value in the Venn diagram ( Fig. 1) equaled vegetation's variance unexplained by soil and topography, i.e., residual sum of squares (SSE in Table 1). b indicated soil's sum of squares (i.e., SSR (X2jX1)), namely, the pure influence of soil on vegetation, excluding topography's effect.
To estimate c, we changed the role of soil and topography, such that topography became the newly added predictor variable, whereas soil became the preexisting predictor variable In the subsequent regression, the c value was the same as topography's sum of squares, or the pure influence of topography on vegetation, excluding soil's effect. Finally, tri-variance (d) was calculated by subtracting a, b, and c values from TSS. Now, we switched Eq. 6 into where soil was the response variable, and topography and vegetation were the predictor variables. In particular, vegetation was the preexisting predictor and topography was the newly added predictor. In this regression, the e value equaled soil's variance unexplained by vegetation and topography, i.e., the residual sum of squares. f could be estimated in two ways, either as TSS À b À d À e or as topography's sum of squares (i.e., topography's pure effect on soil excluding that of vegetation). g was equivalent to TSS À c À d À f. As a result, bi-variance was the sum of b, c, d, and f. Finally, we expressed these raw bi-variance and tri-variance values in proportion (%) to TSS Final tri-variance ¼ 100 3ðRaw tri-variance=TSSÞ: All of these procedures were conducted using The additional variation explained by X2 when added to a regression model already containing X1 (i.e., the pure effect of X2 on the response variable, excluding that of X1).
à In this present paper, because all of the first singular axes from vegetation, soil, and topography data blocks have been standardized before regression modeling, TSS ¼ N À 1.
v www.esajournals.org R 2.14.2 (R Development Core Team 2012). An example R-code for this procedure is given in the supplementary material (see Supplement).

Comparison of PCA-based and PLS-based approaches
Before evaluating the empirical hypotheses set out in the Introduction, we first examined the structure of inferences from a traditional ordination approach and 3-B PLS. For the traditional part, we followed the steps presented in Fig. 4 so that the first principal components-each derived from vegetation, soil, and topography data blocks using PCA-were directly correlated in pairs. In the same way, the first singular axeseach extracted from vegetation, soil, and topography data blocks using 3-B PLS-were directly correlated.

Evaluation of the hypotheses
Within each system, an examination of the specific structure of bi-variances and tri-variances would be a useful implementation of the 3-B PLS, but since detailed examination of the systems has appeared in our published accounts, albeit not using 3-B PLS, we elect here to illustrate use of the method in order to address higher-level questions that transcend detailed results from specific systems. In particular, system function will be explored only in relation to evaluating the hypotheses.
H 1 : Systems under widespread (i.e., broadly acting) environmental controls have higher bi-variance and tri-variance than those characterized by localized interactive biotic-abiotic processes. This hypothesis was evaluated using Stallins's barrier islands data because his two systems, Sapelo and South Core Banks, were well known in terms of their contrasting hydrogeomorphic disturbance regimes and the consequent ecological processes of vegetation recovery and soil formation (see Stallins and Parker 2003).
With increasing latitude along the southeastern U.S. Atlantic coast, there is a climatological gradient of increasing exposure to extra-tropical cyclones (Mather et al. 1964, Davis et al. 1993 and the consequent gradient of increasing shore disturbance exposure (Williams and Leatherman 1993). In the wave-dominated microtidal barrier islands of South Core Banks in North Carolina, relatively frequent and intensive overwash events mediate biotic-abiotic interactions: disturbance-dependent plant species, which are burial tolerant, recolonize overwash sediments using their horizontally extensive rhizome networks. Due to the stabilized sediment surface, the formation of protective foredune ridges is often inhibited and the landforms are generally flat (Godfrey and Godfrey 1976). This low-relief topography, in turn, reinforces exposure to future overwash events, which will further facilitate the establishment of the disturbance-dependent vegetation (Godfrey et al. 1979, Stallins andParker 2003). This clearly explains a positive feedback between vegetation and geomorphology.
By contrast, the mixed-energy mesotidal barrier islands of Sapelo in Georgia, USA, exhibit a conspicuous ridge-and-swale topography. Such terrain ruggedness is maintained by dune-building taxa that enhance sedimentation rates and by burial-intolerant woody shrubs and grass species that stabilize intervening low swales. These ridge-and-swale plant species reinforce their presence by contributing to the high-relief topography that acts as a barrier to the spread of wave energy into backshore dune habitats. In both island types, these landform-modifying plants are considered ecosystem engineers, in that they regulate the distribution of soil resources and species coexistence (Jones et al. 1997, Corenblit et al. 2007, Zarnetske et al. 2012. H 2 : Separate analyses of sub-zones produce consistently smaller or greater bi-variance and tri-variance than analysis of a whole system. First, we estimated the bi-variance and tri-variance values for all of the seven whole systems introduced above. These whole systems encompassed a range of landforms and disturbance regimes, and each of them was apparently heterogeneous in terms of physical conditions and ecological processes. Therefore, we divided each whole system into sub-zones as follows: Costa's whole dune into northern and southern sites; Kim's whole dune into foredune, swale, and second dune; Kim's whole marsh into point bar, transition, and interior (or platform); Kupfer's whole forest into remnant levee, backswamp, and current levee; McEwan's whole forest into beech, maple, and oak communities; Stallins's whole dune in Sapelo into flat, ridge, and swale; and Stallins's whole dune in South Core Banks into flat, ridge, and swale. Then, the bi-variance and tri-variance v www.esajournals.org values for all of these 20 sub-zones were also estimated. Finally, we calculated the average difference (D ave ) between the bi-variances of the i whole systems and the bi-variances of their j sub-zones where n was 20 (the number of sub-zones) and x whole and x sub-zone indicated the bi-variance values of each whole system and their sub-zones, respectively. Then, we verified if D ave was significantly different from zero where l whole and l sub indicated the population mean of the whole systems and the sub-zones, respectively. Statistically, these mean values are assumed to be same because the null hypothesis of this test is that there is no difference in the mean variances between the whole systems and the sub-zones. SD was the standard deviation of the differences. After conducting Eq. 12, we estimated the corresponding P-value assuming a t distribution (Fisher 1954). We repeated the same procedure to compare the tri-variances of the whole systems and their sub-zones. H 3 : Bi-variance is a good indicator of tri-variance. This hypothesis was tested by conducting a simple linear regression, in which bi-variance and tri-variance were treated as predictor and response variables, respectively.

Comparison of PCA-based and PLS-based approaches
In most cases, 3-B PLS revealed much stronger correlations of vegetation-soil, soil-topography, and topography-vegetation couplings than PCA ( Table 2). The coefficients produced by PCA indicated the correlations of the major principal components for each pair of domains. There were only three exceptions in which the magnitude of the correlations estimated by PLS was slightly lower than that by PCA, all of which came from Stallins's Sapelo data (see the values in boldface in Table 2). These results were directly related to the difference between the two approaches with regard to the bi-variance and tri-variance values they produced (Table 3). With only one exception (the tri-variance of Stallins's 'Ridge' data from Sapelo), 3-B PLS produced much greater bivariance and tri-variance than PCA. In 10 cases, the PCA-based approach indicated an absence of variance simultaneously shared by vegetation, soil, and topography (i.e., tri-variance ¼ 0).

Evaluation of the hypotheses
H 1 : Systems under widespread (i.e., broadly acting) environmental controls have higher bi-variance and tri-variance than those characterized by localized interactive biotic-abiotic processes. This hypothesis was clearly supported: we found that both the bivariance and tri-variance of South Core Banks were greater than those of Sapelo Island ( Fig. 6 and Table 3). Notably, the difference in the trivariance (44.8 vs. 30.0) between the two regions was greater than the difference in the bi-variance (72.9 vs. 65.0). Table 4 demonstrates the detailed contribution of each variable or plant species to the first singular axes of topography, soil, and vegetation. In both regions, as the distance to the sea increased, surface elevation declined, while the content of silt, clay, and organic matter increased. However, each of pH and fine sand content exhibited contrasting spatial gradients between Sapelo Island and South Core Banks. Concomitantly, away from the coastline, there were decreases in the cover of dune grasses (e.g., Panicum amarum) and increases in the cover of swale-dwelling species like Andropogon spp. and Phyla nodiflora in Sapelo Island. In the inland part of South Core Banks, frequent overwash effects seemed to favor the increased abundance of Spartina patens and a dune-stabilizing moss species Ditrichum.
H 2 : Separate analyses of sub-zones produce consistently smaller or greater bi-variance and tri-variance than analysis of a whole system. In Fig. 7, the bivariance and tri-variance plots of the seven whole systems were generally located around the middle of those plots estimated from the 20 sub-zones. The average values of the bi-variance and tri-variance from these sub-zones were slightly smaller than those from the whole systems, but a quantitative comparison revealed that such differences were not statistically significant at the 0.05 significance level (see the P values (0.45 and 0.60) at the bottom of Table 3). In a similar context, we compared the whole systems and the sub-zones in terms of the average magnitude of their correlations for vegetation-soil, soil-topography, and topography-vegetation pairs. In all of these relationships, the differences were not statistically significant at the 0.05 significance level (see the P values at the bottom of Table 2).
Looking at the seven whole systems individually, we found that the bi-variance and trivariance values of each whole system were generally somewhere around the middle of those values from the associated sub-zones, with only three exceptions (see the values in boldface in Table 3): in Costa's data, both bi-variance and trivariance of the whole dune were smaller than those of both the northern and southern sites. In Stallins's data from Sapelo Island, the tri-variance of the whole system was consistently greater than that of the three sub-zones. Overall, these results failed to support the second hypothesis.
H 3 : Bi-variance is a good indicator of tri-variance. There was a general trend in which tri-variance increased with increasing bi-variance, and this positive relationship was statistically significant (R 2 ¼ 0.61, P , 0.01; Fig. 7). Boldface values indicate that the magnitude of the correlation coefficient resulting from 3-B PLSR approach is slightly smaller than that from principal component analysis.
à The values were produced by following Eq. 12 of the main text to test whether, in each column, there was a significant difference between the seven whole systems and the 20 separate sub-zones in terms of their mean magnitude of correlations. None of the six cases proved significant.

Comparison of PCA-based and PLS-based approaches
We found that PCA-based approaches explained much less of the relationships between vegetation, soil, and topography in pairs than did PLS methods (Table 2). Furthermore, in many cases, the PCA-based approach statistically implied that the variance simultaneously shared by vegetation, soil, and topography does not exist (i.e., tri-variance ¼ 0; Table 3). This is an illogical conclusion which, according to the 3-B PLS, is demonstrably false. Those who work in ecolog-ical systems would almost certainly assume there are causal links among these three natural components so that tri-variance 6 ¼ 0. In our analysis, PCA was an example of ordination methodologies and we hypothesize that other ordination techniques (e.g., principal coordinate analysis, correspondence analysis, multidimensional scaling, etc.) would also have yielded much weaker correlations than those found using the PLS method. By nature, these non-PLS methods extract major axes that maximally represent the patterns of a certain data block. However, these axes may not significantly correlate with the axes (or variables) belonging Boldface values indicate that separate analyses of sub-zones produced consistently lower bi-and tri-variance (Costa's data) or higher tri-variance (Stallins's Sapelo data), than the analysis of the associated whole system. In other words, in these three cases, the bi-variance and tri-variance of the whole system did not represent the average strength of the biotic-abiotic couplings in each sub-zone. These are the only cases that support the second hypothesis (H 2 ). All of the other cases do not support H 2 .
à The values were produced by following Eq. 12 of the main text to test whether, in each column, there was a significant difference between the seven whole systems and the 20 separate sub-zones with regard to their bi-or tri-variances. None of these four cases proved significant.
v www.esajournals.org to the other blocks, thereby underestimating the amount of inter-block covariation. Our results demonstrate that this lack of correlation between ordination axes from different blocks may be a limitation when attempting to explain patterns observed in natural systems (see also Carrascal et al. 2009 for a simulation experiment).
Our point is not to suggest a blanket rejection of PCA and other conventional ordination approaches. Such approaches are useful, but, as our results indicate, there is additional or residual information that can be partitioned by PLS. Traditional methods can provide insights into the variance structure among vegetation, soils, and landforms. However, PLS is a powerful quantitative tool because it is constructed to directly estimate the covariance of the native variables of all data blocks simultaneously. We recommend the use of PLS for more integrative, holistic investigations that are explicitly focused on abiotic-biotic feedbacks and their geographic variability.

Evaluation of the hypotheses
H 1 : Systems under widespread (i.e., broadly acting) environmental controls have higher bi-variance and tri-variance than those characterized by localized interactive biotic-abiotic processes. In ecological literature, it is widely agreed that disturbance is an environmental process that regulates plant species abundance and composition and that the effects of disturbance are interrelated with topographic conditions and soil resources (Swanson et al. 1988, Bendix 1998, Shafroth et al. 2002, Hupp and Rinaldi 2007. We interpret the high tri-variance among vegetation, soil, and topography on South Core Banks as a reflection of the profound influence of frequent exposure to extensive overwash across the relatively flat dunes in this wave-dominated coastal region. In this context, Stallins (2001) earlier found that pH and organic matter, even in the unstable, poorly sorted substrate of South Core Banks, were significantly correlated with the cover of burialtolerant stabilizers such as Spartina patens and Ditrichum spp., which recolonized after an overwash to maintain the flat surface. These plants, soil conditions, and uniform topography reproduce one another by facilitating the spread of future overwash effects across space and the subsequent intrinsic recovery processes.
On Sapelo, the low tri-variance is attributable to the rarity of major hydrogeomorphic disturbance in this tide-dominated region, which in turn induces fine-scale habitat heterogeneity (Phillips 1999(Phillips , 2000(Phillips , 2004. Here, efficient dune- Fig. 6. Proportion (%) of variances estimated from the relationships among the first singular axes of vegetation, soil, and topography, identified by 3-B PLS for Stallins's barrier islands data from Sapelo Island (A) and South Core Banks (B). These proportional Venn diagrams were generated using a statistical loss function and a minimization procedure (Wilkinson 2012). The three circles of each region are of the same size because the corresponding three singular axes had been standardized before estimating the proportions.
v www.esajournals.org builders (e.g., Uniola spp.) trap suspended sediments and develop high-relief dune ridges that dampen the direct impact of future overwash. Both rarity and dampening of disturbance effects eventually enhance the localization of vegetation and pedological processes that do not necessarily correspond to the overall dune morphologies. Accordingly, the bi-variance of vegetation-topography and soil-topography pairs is very low on Sapelo. Conversely, such localized processes gradually strengthen over time the coupling between vegetation and soil on Sapelo, and the associated bi-variance is indeed greater than that on South Core Banks (i.e., 60.3% vs. 49.8%; see Fig. 6).
These discussions can further be related to the interval between disturbance events and the time it takes for an original biogeomorphic feedback among dune vegetation, soil, and landform to be restored after a disturbance event (Brunsden and Thornes 1979, Phillips 1995, Corenblit et al. 2011, Eichel et al. 2013). On Sapelo, due to the rarity of overriding disturbance, time is allowed for vegetation and soil to become closely intertwined, and this vegetation-soil coupling likely diverges over space, contingent upon site-specific disturbance history and/or soil microbial composition (Phillips 2013). In such a case of high stochasticity, the importance of dune morphologies to floristic and pedological processes should decline gradually. As a consequence, it is quite possible to observe variable plant species composition or edaphic characteristics at different locations possessing very similar topographic conditions (Gleason 1926).  v www.esajournals.org On South Core Banks, development time is too short to induce such spatial diversification of vegetation and soil factors over varying dune surface topography. Rather, frequent and spatially extensive overwash events lead to dynamic landform changes, which immediately influence spatial patterns of both vegetation and soil. Therefore, relatively tight vegetation-soil-topography couplings are likely maintained before spatial divergence (i.e., decoupling) of these relationships occurs across the dunefield.
H 2 : Separate analyses of sub-zones produce consistently smaller or greater bi-variance and tri-variance than analysis of a whole system. In this hypothesis, we inquired what would happen to vegetationsoil-topography couplings if a whole system was divided into a number of sub-zones (Fig. 3). On Sapelo, the tri-variate couplings become weaker after division (Table 3), presumably due to the high spatial stochasticity driven by the gradual decoupling of local vegetation and soil processes from topography within each sub-zone, as discussed above. By contrast, the stronger couplings we observed after dividing Costa's whole system are related to highly different environmental settings between the northern and southern sites: these two regions are geographically separated by ;120 km, exhibiting distinct coastline orientations. Hence, they undergo differing approach angles and impacts of onshore winds, which, in turn, induce varying gradients in overwash frequency and sand transport (see Appendix). As a result, dissipative and finegrained beaches are developed in the north, whereas reflective and coarse-grained beaches are found in the south (Calliari and Klein 1993, Calliari et al. 1996, Costa et al. 1996. To reveal significant relationships among vegetation, soil, and topography, it would be reasonable to perform separate analyses for the northern and southern regions, rather than combining these greatly different sub-zones into a single whole system. Other than these exceptional cases observed just for the dune systems (Stallins's Sapelo and Costa's data), there is no prevalent pattern of either stronger or weaker vegetation-soil-topography couplings; rather, both occur simultaneously upon division. Thus, the bi-variance and tri-variance values of the whole systems are more or less the average of those values of their sub-zones; that is, the relationships become weaker in some sub-zones and stronger in the others. This suggests that, in many cases, the overall correlation structure among vegetation, soil, and topography is conserved across spatial scales. In other words, during the aggregation of sub-zones or the division of a whole landscape, there is no net loss or gain of the strength of vegetation-soil-topography couplings. Such potential scale-invariance confers some degree of freedom to ecologists who, due to their concern about spatial non-stationarity induced by landscape heterogeneity (Milne 1998), have been reluctant to predict large-scale biotic-abiotic correlative patterns by investigating and averaging the information of smaller units. Concerning this issue of scale-invariance, two caveats should be noted: (1) the sub-zones are not characterized by too high spatial stochasticity of vegetation and soil processes, as in the case of Stallin's Sapelo data; and (2) the whole system is not too spatially heterogeneous, that is, it does not aggregate physiographic types that are too different, as in the case of Costa's northern and southern sites. There is a danger that the local heterogeneity in variance structure cancels out when aggregated to the large scale.
H 3 : Bi-variance is a good indicator of tri-variance. Despite the positive relationship we found between bi-variance and tri-variance values, it is still an open question whether bi-variance can be a good indicator of tri-variance, and further, whether there would be no need to struggle to estimate the tri-variance. In a statistical sense, bivariance significantly explains tri-variance at the confidence level of 99% (i.e., P , 0.01); however, it should also be noted that 39% of the trivariance still remains unexplained. The associated scatterplot (Fig. 7) even illustrates several cases of locally negative relations in which the bivariance of system (or sub-zone) A is smaller than that of system (sub-zone) B, but A has a higher tri-variance than B. Given such uncertainties, we posit that, while the overall coefficient of determination looks generally encouraging (R 2 ¼ 0.61), the need for estimating tri-variance as well as bi-variance still exists. In particular, the large tri-variance residuals could represent important three-way interactions among variables in the three data domains. Such insights are less accessible using traditional approaches.

Reciprocal nature of vegetation-soil-topography relationships
Our 3-B Venn diagram approach (e.g., Fig. 6) closely accords with one of the current mainstream topics of ecology and, more generally, Earth sciences: feedbacks among biological and physical factors. Until recently, one-way relationships between vegetation, soil, and topography have been dominantly considered when studying the spatial pattern or evolution of landscapes. Such a unidirectional viewpoint inspired investigations that focused on the influence of one factor upon another. Accordingly, this cause-andeffect logic necessitated a division of abiotic and biotic components into groups of response and predictor variables, followed by the application of such traditional approaches as multiple regression and ordination. This causal approach has generally held true even in cases where researchers have sought to identify the unique and shared contributions of two sets of predictor variables (e.g, environmental and spatial variables) on a response variable (e.g., vegetation composition) through variance partitioning (e.g., Borcard et al. 1992, Kupfer et al. 1997. Today, Earth scientists are increasingly redirecting their attention from discrete one-way couplings to integrated two-or three-way interactions that address the importance of the reciprocal nature of vegetation-soil-topography linkages (Stallins 2006, Corenblit et al. 2007, Osterkamp and Hupp 2010. Key to this new paradigm is an understanding of the feedback mechanisms among natural phenomena that drive a perpetual, gradual modification or an abrupt collapse of the existing landscape complexity (Malanson 1999, Savage et al. 2000, Naylor et al. 2002, Bever et al. 2010, van der Putten et al. 2013. Our results suggest that statistical approaches commonly used in contemporary ecology are not keeping pace with the conceptual advances in our understanding of multi-block and multidirectional ecological relationships. Within the constraint of conventional multivariate statistics, we face a challenge concerning how to divide our data into groups of response and predictor variables. Our results refute the idea of treating vegetation strictly as a response variable, given that there are clearly mutual influences between vegetation and environmental factors. This paper presents a method that avoids such a monodirec-tional assumption structure, and estimates bivariance and tri-variance by treating all involved components at the same level (i.e., a symmetrical approach). The resulting Venn diagrams fully reflect the idea of mutual interactions among vegetation, soil, and topography in a wide range of ecological systems from various regions of the world. Although the focus of this research is on 3-B data, the general procedure illustrated in Eqs. 3-5 can be extended to deal with 4-B data and beyond (Bookstein et al. 2003). In general, statistical models are static in nature; therefore, we do not claim that our 3B-PLS approach would immediately represent the dynamic nature of feedback mechanisms between environmental conditions and organisms. Rather, our point is that the symmetrical approach of, e.g., 3B-PLS, multiple factor analysis (Escofier and Pagè s 1994), and co-inertia analysis (Dray et al. 2003) would be consistent with the ever-increasing perspective of reciprocal interactions between the environment and biota.

Suggestions for future research
In this article, we have introduced a new conceptual and methodological framework for understanding vegetation-soil-topography relationships. We do not posit that our approaches and perspectives are unquestionable or decisive; rather, more research should be conducted to further evaluate and improve their applicability to a variety of ecological and environmental contexts (e.g., Cope et al. 2014). We conclude this paper by presenting key research topics yet to be explored.
First, ecologists need to investigate vegetationsoil-topography couplings in more spatially explicit ways. Vegetation, soil, and geomorphic patterns and processes are essentially spatial phenomena. Hence, the associated variables and data blocks are likely to possess inherent spatial autocorrelation (SAC), or spatial structure, which is now a standard paradigm in ecology (Legendre 1993, Dray et al. 2006, Miller et al. 2007, Franklin 2009). To take into account the presence and effects of SAC, we should develop a methodology to partial out from each data block the pure variance explained by spatial filters only (Borcard and Legendre 2002, Griffith 2003, de Marco et al. 2008, Kim 2013. The resulting new vegetation, soil, and topography data blocks could then be subjected to the same 3B-PLS procedure that we introduced in this paper. Second, it is desirable to examine biotic-abiotic relationships over time, preferably over long timescales that encompass both periods of discrete disturbance events and biogeomorphic recovery. In the seven systems we investigated and similar to most of the previous studies, our analyses were performed based on one snapshot of patterns of vegetation, soil, and topography. In an attempt to model species distributions (Franklin 1998, Lehmann et al. 2003, Austin and van Niel 2011, we ecologists have often postulated the presence of tight couplings among community structure, edaphic conditions, and terrain settings. In contrast, there have been few attempts to examine the changing correlation structure among these factors, responding to several recurring disturbance events at the same landscape. Recently, Kim and Arthur (2014) demonstrated a gradual weakening of bioticabiotic couplings over the course of their field surveys between 2002 and 2010, each occurring before and after six prescribed fire events in an oak-dominated temperate forest in eastern Kentucky, USA. These findings suggest that, in general, sharp breaks may exist, at which bivariance and tri-variance transition as a result of gradual or abrupt changes in climate conditions or modifications of substrate.
Third, related to the temporal viewpoint above, there still is much uncertainty as to how bi-variance and tri-variance would change if a design for data collection worked across domains of scale. As mentioned, distinctive ecological patterns and processes are known to be dominant at different spatial scales (Ludwig et al. 2000). This scalar discrepancy has been conceptualized as transition between domains of scale (Wiens 1989) or discrete breaks (O'Neill and King 1998). Here, domain indicates ''a portion of the scale spectrum within which process-pattern relationships are consistent regardless of scale'' (Wiens 1989:392). An attempt to extrapolate the information of vegetation-soil-topography relationships acquired at a particular scale, to other scales within the same domain, may be successful; however, an extrapolation across the disjunction between domains will likely fail. As such, a key question in future research will be how to develop scaling theories that effectively link bi-variances and tri-variances estimated at different spatial scales (Wiens 1989, Ludwig et al. 2000.
Last, more disturbance types, other than just hydrogeomorphic disturbance in riparian and coastal systems, should receive attention. The exact trajectories of bi-variance and tri-variance through time and across space may differ among different biogeomorphic contexts, due to sediment mobility, landform stability, the type of plants, and the disturbance regime. From a rugged mountain forested landscape to a highly mobile marine sea floor sandy substrate, there could be some linkages between systems in terms of bi-variance and tri-variance. How variances change through time and in response to the type of disturbance forcings (historically prevalent versus novel) would not be expected to be the same, but each would exhibit patterns of convergence and divergence in variances as feedbacks between the biota and the geomorphic context developed (Phillips 1999(Phillips , 2004. In this way, bi-variances and tri-variances potentially become ways of tracking the multicollinear relationships among vegetation, soil, and topography in various ecological systems. Eventually, bi-variances and tri-variances can be used to predict overall ecosystem responses to changing climate conditions and disturbance regimes.

CONCLUSIONS
This paper has addressed a critical, underdeveloped aspect of ecology: conceptual and analytical integration across multiple data domains. Often ecologists, especially as the era of big data progresses, deal with data from fundamentally, sometimes radically different domains. A given project may have a data block on community composition, soil (or water) chemistry, geography, genetic data (sequence or expression arrays) or anthropogenic impacts. Typically, we attempt integration of data domains by combining principal components analysis (a within-block approach) and two-block multivariate ordinations. But we miss a staggering amount of information by doing so, as illustrated in our paper using several case studies. Partial least squares regression, though an established tool in chemometrics and bioinformatics (e.g., Cope et al. 2014), is important to develop and popularize in ecology.
We developed three-block partial least squares, an intuitive graphical method for summarizing data relations and illustrated high-order questions that can be addressed using vegetation, soil, and topography examples from several ecological systems and their respective sub-zones. We demonstrated that bi-variances and tri-variances provide useful insight into how the strength of couplings among vegetation, soil, and topography data blocks varies across scales and disturbance regimes.
A perspective shift like we are describing is important as ecology grows even more integrative. Until actually measured, trivariate relationships cannot be fully understood with even the most detailed bivariate approaches. Though we describe the simplest case of multi-variance beyond the usual two-block linear statistical model, this approach can be extended seamlessly to any number of data domains, making true integration possible, so that the long-held ideal of ecology being a holistic science can be more fully realized.

LITERATURE CITED
present work: surface elevation (m) and distance from the coastline (m), which corresponded to the gradients in sedimentation rates and salinity.

Kim's study area and data collection
The Sindu coastal dunefield is located in west Korean Peninsula (36850 0 33.81 00 N, 126811 0 37.85 00 E; Fig. A2A). The foredune ridge is adjacent to the beach and is dominated by Elymus mollis, a typical grassy dune species in northeastern Asia, and a shrub species Vitex rotundifolia. The second dune is situated in the landward part of Sindu and is characterized by the presence of woody individuals of Robinia pseudoaccacia and an invasive plant Phytolacca americana. The interdune depression is prone to the concentration of groundwater flows from the surrounding dune ridges; hence, this swale experiences frequent submergence in freshwater during wet summer seasons.
In each of the 126 grids (20 3 20 m; Fig. A2B), Kim systematically placed five square quadrats (Fig. A2C) and recorded the percent cover of each plant species present in each quadrat. The size of quadrats was 1 3 1 m when there were no woody individuals and 2 3 2 m when any of them was present. Cover data from the five quadrats were then averaged to represent the grid as a whole. Species nomenclature followed Lee (1996). Kim also collected subsurface soil samples (;1 kg at 15-20 cm depth) in the middle of each grid. Soil properties analyzed included soil moisture (%), find sand (%), organic matter (%), soil pH, electrical conductivity (mS/cm), nitrate (NO 3 -) (mg/kg), total phosphorus (mg/kg), and exchangeable Na þ , Mg 2þ , Ca 2þ , K þ (mg/kg). See Kim et al. (2008) for more detailed information of analytic procedures.
Using a total station (Sokkia SET5FS), Kim acquired horizontal (x, y) and vertical (z) coordinates of 1383 surface points across the study plot.  Tables 2 and 3 of the main article).
v www.esajournals.org He then used Surfer 7.0 (Golden Software) to perform semivariogram analysis and to create a digital elevation model (DEM). DiGeM 2.0 (Conrad 2002) was utilized to acquire values for terrain descriptors for each grid of the DEM in tandem with the Getgridvalue extension of Arc-View GIS 3.3 (Environmental Systems Research Institute 2000). Finally, eight topographic variables were calculated: elevation (m), slope gradient (radians), aspect (radians), plan curvature (radians/m), profile curvature (radians/m), upslope area (m 2 ), wetness index (dimensionless), and distance from the coastline (m).
The Skallingen salt marsh lies at the northern end of the Wadden Sea (55830 0 29.85 00 N, 8815 0 03.08 00 E; Fig. A3A). The marsh has a micro-tidal condition with the mean range being approximately 150 cm. The range becomes about 170 cm at spring and 130 cm at neap tides. The marsh has formed since the beginning of the 20th century when the accretion of silt and clay started on top of the peninsula's centuries old extensive sand flat (Aagaard et al. 1995).
Along each transect, Kim put 1 3 1 m square quadrats, each of which was subdivided into 25 cells of 0.2 3 0.2 m (Fig. A3B). The number of quadrats and the distance between them increased as he moved from marsh edges toward the interiors: (1) two or three quadrats were separated by 0.5 m in the zone of point bar, (2) two or three quadrats were separated by 1 m in transitional zones, and (3) four or five quadrats were separated by 2-4 m in marsh interiors. Two replicate quadrats were established at every site At the middle between the two replicate quadrats, soil and topographic surveys were conducted (Fig. A3B). For the soil sampling, Kim used a cylindrical core with diameter and depth being 4.5 cm and 10 cm, respectively. A total of 10 soil physical and chemical analyses were then performed on each soil sample including bulk density, soil pH, electrical conductivity (umhos/cm), phosphorus (mg/kg), sulfur (mg/kg), nitrate (NO 3 -; mg/kg), Na þ (mg/kg), K þ (mg/kg), Ca 2þ (mg/kg), and Mg 2þ (mg/kg). For the topographic survey, a differential Global Positioning System (Trimble R4 GPS Receiver and Trimble Recon Controller) was used. The horizontal and vertical precision was 1 cm and 2 cm, respectively. Distance (m) was measured from each point to the streamline. See Kim et al. (2012) for more detailed information of analytic procedures.

Kupfer's study area and data collection
The Kupfer data were collected on the floodplain of the 969-ha Bates Fork tract within Congaree National Park (NP). Originally set aside as a National Monument in 1976 and redesignated as a National Park in 2003, Congaree NP protects the largest intact expanse of old-growth bottomland hardwood floodplain forest in the southeastern USA and has been designated as an International Biosphere Reserve and a Ramsar wetland site. The Bates Fork tract is located at the easternmost extent of the park at the confluence of the Congaree and Wateree Rivers (33846 0 N, 80837 0 W; Fig. A4), which have upstream drainage areas of ;21500 km 2 and 14500 km 2 , respectively. Discharges of these rivers are generally highest from January to April, when evapotranspiration is low, but they are also regulated to some degree by dams on the Saluda River, one of the two major tributaries to the Congaree River, and the main stem of the Wateree River (Conrads et al. 2008). Floodplain geomorphology at the study area reflects hydrologic influences from both rivers and is characterized by levee complexes, ridge and swale systems, abandoned channel meanders, and low gradient backswamps.
The Bates Fork tract was not within the original boundary of Congaree NP, but was acquired and incorporated into the park in 2005. The data for this study come from three v www.esajournals.org forested stands that were clear-cut shortly before the tract's acquisition. All of these stands are located on the active floodplain surface, but they encompass a gradient of conditions associated with differences in elevation, hydroperiod, geomorphic setting, and soil conditions. The Congaree River site (clearcut area ¼ 51.7 ha; elevation, mean ¼ 77.86 m asl; range ¼ 76.67-79.57 m asl) is most affected by flood disturbance as it sits on an old meander scroll complex immediately below the active levee of the current river. The Bates Old River site (clearcut area ¼ 24.0 ha; elevation, mean ¼ 80.26 m asl; range ¼ 78.82-82.27 m asl) is located on the natural levee of a permanently flooded, channel meander of the Congaree River that was abandoned in the 1850s. The levee is still building from overbank sediment deposition during high floods, but these sites flood less often and for shorter durations than those at the Congaree River site. The Sampson Island site (total clearcut area ¼ 24.0 ha; elevation, mean ¼ 79.50 m asl; range ¼ 78.62-80.41 m asl) is located in the interior of the Bates Fork floodplain and is influenced by a sandsplay deposit resulting from the denudation of a Late-Pleistocene age aeolian sand dune.
As part of a long-term study on post-logging succession at the Bates Fork tract, vegetation data were collected from May 2006 through August 2007 using a replicated sample design with sample plots located along each of 32 transects spanning the boundary between logged and unlogged forest (Kupfer et al. 2010). Along each transect, two plot corners were randomly located: (1) 5-10 m into the clearcut (field edge) and (2) 30-50 m into the clearcut (field interior). Beginning at each corner, a 30 3 10 m sample plot was laid out, with the long axis aligned perpendicular to the transect direction. Three circular subplots (2.82 m radius) were then established at even intervals along a 30-m line running lengthwise down the middle of the sample plot. Measurements included diameter at breast height (dbh; 1.37 m) of all individuals .1 cm in dbh (saplings) and counts by species of all individuals .0.25 m in height but ,1 cm in dbh (seedlings). Individuals .10 cm dbh, of which there were few, were measured and recorded within a larger radius of the subplot center (3.99 m). Sapling and seedling values were integrated into a single measure of abundance for the entire plot by: (1) transforming sapling diameters to basal areas and summing the values by species for each plot, and (2) assigning each seedling a nominal basal area of 0.1 cm 2 and adding these values to the sapling basal area. This resulted in data on woody plant regeneration for 64 plots (two plots each at 32 transects).
Soils are predominantly classified as moderately well-drained, silty clay loams on the Bates Old River and Congaree River sites and as poorly drained loams on the Sampson Island site (USDA NRCS 2008). However, variations in microtopography and flood regime create considerable within-stand differences in soil characteristics. Samples from the top 10-15 cm of soil were taken from each of the three subplots and composited to provide data on plot-level soil conditions (Kupfer et al. 2010). All soil samples were collected over two weeks during a period of uniform weather with no substantial precipitation. They were then analyzed for pH, extractable phosphorus, potassium, calcium, magnesium, zinc, copper and sodium, organic matter, cation exchange capacity(CEC), exchangeable acidity (the amount of total CEC occupied by H þ and Al 3þ ), and total percent base saturation (the percent of exchange sites occupied by base cations) at the Clemson Agricultural Laboratory.
Two measures of topography and flood regime were calculated. Elevation (m asl) was extracted for each of the sample subplots from a digital terrain model constructed using high-resolution Lidar data with 3-m horizontal accuracy and 10cm vertical accuracy. However, because elevation is an imperfect indicator of site hydrology, flood regime was examined using an interactive GIS model of floodplain inundation by coupling the U.S. Army Corps of Engineers Hydrologic Engineering Center-River Analysis System (HEC-RAS v.3.1.1: USACE 2002) with a Geographic Information System (ArcGIS 9.2), using a specialized set of ArcGIS extension tools (HEC-GeoRAS v.4.1.9.2: USACE 2005). Details of this modeling process can be found in Kupfer et al. (2010), but the end result is a seamless map of projected flood inundation depths that are specific to a user-defined river discharge. Data used in this study were the extracted inundation depths for each subplot during a 98000 cfs flood, a discharge that represents a flood which inundates all of the Bates Fork floodplain. These values were then averaged to get plot-level inundation values that were consistent with the vegetation and soils data.

McEwan's study area and data collection
The McEwan data set was generated in a study of a Big Everidge Hollow (BEH), a 52-ha watershed within the Lilley Cornett Woods Appalachian Research Station (37805 00 N, 83800 00 W, Roxana Quadrangle; Fig. A4) on the Cumberland Plateau, in southeastern Kentucky (McEwan and Muller 2011). Muller (1982) used a random sampling scheme stratified by elevation to establish 80 permanent circular plots, 0.04 ha in size on the horizontal plane throughout the BEH watershed, which is U-shaped and opens to the east. Woody species inventories of stems !2.5cm diameter at breast height were conducted in , 1989(Muller 1982, McEwan and Muller 2006. McEwan et al. (2000) established four 1-m 2 subplots within each of these permanent plots (N ¼ 320) to assess ground-layer vegetation. Subplot centers were established at 908 intervals, around the overstory center stake at a distance that was equal to the radius of a circle bisecting the area of the overstory plot. The first subplot was located 458 to the right of uphill, and the rest were equally spaced around the plot center. The combined area of the four herb-layer subplots measured 1/100th of the overstory plot area (0.0004 ha/plot or 1 m 2 per subplot). For the purposes of this study, only herbaceous species (i.e., cryptophytes; Raunkiaer 1934) were considered. Botanical nomenclature follows Jones (2005) and voucher specimens were deposited into the University of Kentucky Herbarium. Muller (1982) measured elevation and slope aspect on each of the plots. Elevation has since v www.esajournals.org been verified using GPS technology. For the purposes of analysis, slope aspect data were converted using the Beers transformation (Beers et al. 1966), which rescaled the original 3608 observations to a scale representing dryness where a value of 2 is for northeast facing slopes that are assumed to receive the least sunlight (are more mesic) and a value of 0 to southwest facing slopes (assumed to be the most xeric plots). All other aspects were linearly interpolated between these two (Beers et al. 1966). We also assessed soil chemistry through composite soil samples consisting of 16 pooled samples per overstory plot (80 total samples; collected in June of 2001). Samples were collected in the mineral soil with a 2.5-cm tube sampler to a depth of 2.5 cm, starting just below the A o layer. Samples were homogenized and then a subsample was analyzed for plant available nitrogen (NH 4 þ , NO 3 -) on a Bran-Luebbe autoanalyzer following KCl extraction. Mehlich III-extractable P, K, Ca, Mg, and Mn were analyzed on an inductively coupled plasma spectrometer. Cation exchange capacity and base saturation were determined via ammonium saturation followed by analysis with an ammonium ion-selective electrode. To evaluate the understory light environment, hemispherical digital canopy images were collected at each permanent plot center. Digital images were collected with a Nikon Cool-Pix 950 digital camera equipped with a Nikon FC-E8 1838 fisheye converter, with the lens positioned 1 meter above the ground. We used percent canopy openness in the Gap Light Analyzer image processing software (Frazer et al. 1999), as the measure of understory light conditions because it incorporates influences on light availability from both canopy vegetation and topography.

Stallins's study area and data collection
The Georgia Bight, South Core Banks, North Carolina (34841 N, 76828 W), and Sapelo Island, Georgia (31823 N, 81815 W) differ strongly in v www.esajournals.org their wave and tidal regime (Fig. A5). They also exhibit different island and dunefield morphologies, which necessitated an adapted sampling design. A 2-m point-frame sampler, aligned perpendicular to the centerline of a transect, was used to measure species presence. Pointframe observations were spaced at 1-m intervals along transects on Sapelo. On South Core, point frame observations were made every 2 m due to greater width of the dune zone and an overall longer length for transects. Each point-frame sample along transects consisted of 20 observations of species presence, with 10-cm intervals between observations. Presence was summed for individual species and expressed as percent absolute cover for each point-frame sample. A total of 1082 point-frame samples were made on Sapelo, and 1139 on South Core.
For each point-frame sample along a transect, a TOPCON total station was used to determine the elevation and distance from the seaward base of each foredune. Transects began on the seaward base of the foredune, a position approximating the mean high water mark, and extended inland until the first occurrence of extensive woody shrub cover. Soils were characterized for pH, particle-size distributions, and percent content by weight for organic matter and carbonate. Soil pH was measured with a handheld, electronic pHmeter. Wet and dry sieving was used to determine the percent contribution of particle sizes in four classes: silt and clay (,0.0625 mm), very fine to fine sand (,0.25 mm and .0.0625 mm), very coarse to medium sand (,2 mm and .0.25 mm), and granules plus coarser-grained inorganic material (.2 mm). Soil organic matter, as a proxy for soil nutrient status, was measured using loss on ignition. A HCI digest was employed to determine percent content by weight of calcium carbonate.