Journal list menu

Volume 85, Issue 4 p. 557-581
Free Access

Native American impact on past forest composition inferred from species distribution models, Chautauqua County, New York

Stephen J. Tulowiecki,

Corresponding Author

Stephen J. Tulowiecki

Department of Geography, University at Buffalo, Wilkeson Quadrangle, Buffalo, New York 14261 USA

E-mail: sjt7@buffalo.eduSearch for more papers by this author
Chris P. S. Larsen,

Chris P. S. Larsen

Department of Geography, University at Buffalo, Wilkeson Quadrangle, Buffalo, New York 14261 USA

Search for more papers by this author
First published: 01 November 2015
Citations: 21

Corresponding Editor: J. A. Jones.


Little consensus surrounds the extent of Native American impacts upon tree species composition in Eastern North America, prior to European-American settlement (presettlement). Native American land-use practices (e.g., forest clearance and burning) likely altered forest composition, but the spatial extent of these alterations remains vaguely quantified. Previous research has attempted to quantify the spatial extent of clearance practices, but little research has addressed the more subtle alterations to tree species composition resulting from Native American land use. Research has also inadequately distinguished between environmental and anthropogenic controls upon tree species composition, leaving open the possibility that, instead of modifying forest composition, Native American societies instead settled where favored tree species were already present. This study employed species distribution models (SDMs) trained from tree species data within presettlement land survey records (PLSRs), in order to understand Native American impacts upon presettlement tree species composition in Chautauqua County, New York. Using historical and archaeological data, this study developed “Native American variables” (NAVs), which represented human accessibility to features of Iroquoian settlement. This study then modeled the distribution of tree species in relation to both environmental variables and NAVs. Notable results indicate that NAVs significantly improved the predictive performance of SDMs for mast-bearing taxa, such as oak (Quercus spp.), chestnut (Castanea dentata (Marsh.) Borkh.), and hickory (Carya spp.). Under a simulated absence of Iroquoian settlement, the amount of “suitable” area in Chautauqua County decreased by 2 to 23 percentage points for five mast-bearing taxa, depending upon species and modeling procedure. Results imply that Iroquoian alterations to tree species composition covered a larger spatial extent, in comparison to previous estimates of the spatial extent of clearance practices in Iroquoian regions. Yet, the majority of forest compositional modifications occurred within 10 to 15 km of village sites. This study offers a novel methodology for quantifying Native American impacts upon past tree species composition, and suggests that Iroquoian land-use practices of the early Historic era shaped forest compositional patterns at local extents in one region of presettlement Eastern North America.


Numerous scholarly debates have focused upon the relationships between Native American societies and their environments, prior to European-American settlement (“presettlement”) in North America (Smith 2011b). Debates have concerned the degree to which the development of Native American cultures influenced natural environments, and vice-versa; as well as the degree to which Native American societies were ecological stewards of their environment (Krech 2000). These debates and others are informative in numerous ecological contexts, including the development of modern land management plans (Swetnam et al. 1999, Lorimer 2001) and ecological restoration goals (Hart and Buchanan 2012, Nowacki et al. 2012).

One of the most vigorous debates has regarded the spatial extent of past Native American modifications to their environments (Vale 2002). Across the North American continent, Native American societies incorporated into their subsistence economies numerous strategies to increase ecosystem productivity, which had the potential to alter their environments. Strategies included the creation of disturbed environments that favored preferred plant or animal resources; the planting, transplanting, or in-place promotion of preferred plants; and the manipulation of environments to boost prey populations (Smith 2011a). The use of different strategies by Native American societies varied widely across North America (Doolittle 2002), which produced complex and diverse impacts upon environments (Nowacki et al. 2012), even among Native American societies within smaller geographic regions (e.g., New England; Cronon 1983). Moreover, the magnitude of environmental impacts fluctuated over time as Native American societies incorporated different subsistence strategies, and as societies utilized or abandoned resource patches (Munoz et al. 2014).

Included within this debate are investigations into whether environmental impacts were evident in the forested landscapes of presettlement Eastern North America, where Native American societies may have altered tree species composition (Black et al. 2006) through multiple land-use practices (Day 1953, Whitney 1996). For instance, in portions of the northeastern United States and adjacent southern Ontario, Iroquoian groups cleared forests for horticulture and dwelling construction, as well as extracted forest resources for buildings and firewood, within a few kilometers of villages (Jones 2010). Iroquoian groups may have introduced and managed preferred tree species near settlements, such as black walnut (Juglans nigra L.; Wykoff 1991). These societies also utilized fire to provide hunting advantages at hunting grounds (Abrams and Nowacki 2008), and to create open, grassy habitats for game such as deer (Odocoileus spp.). Burning in Iroquoian regions benefitted fire-tolerant and mast-bearing tree species such as oak (Quercus spp.) and hickory (Carya spp.; Black et al. 2006), at the expense of fire-sensitive species such as beech (Fagus grandifolia Ehrh.) and maple (Acer spp.; Munoz and Gajewski 2010). After Iroquoian settlements were abandoned, clearings may have been colonized by pioneer tree species, such as white pine (Pinus strobus L.; Finlayson et al. 1998). Fire may have also been utilized to thin forests and ease travel along major trails and travel routes (Dwight 1823).

Though Native American land use altered forested landscapes, debate surrounds the spatial extent of these impacts (Patterson and Sassaman 1988), with estimates of impacts ranging from the local scale (Russell 1983), to the biome scale (Denevan 1992, Abrams and Nowacki 2008), to some intermediate scale (Vale 2002). One early estimate of land under Native American cultivation was 0.5–1.0% of total land area in eastern North America (ca. 1500 CE) at any one time, based upon estimates of population and maize (Zea mays) consumption (Kroeber 1939). In Iroquoian regions, estimates of clearing and burning from Haudenosaunee (Iroquois) land use equaled 1.27% of areas in central New York State, as quantified using original land survey records of the late 18th century CE (Marks and Gardescu 1992). One study synthesized ethnographic and archaeological research to estimate that an upper limit of 3.2% of total land area in southern Ontario was cleared for horticulture by Huron societies from 900 to 1600 CE (Campbell and Campbell 1994). Yet, these estimates generally do not include alterations to forest composition (e.g., by fire), which may have covered larger areas than clearance practices (Whitney 1996).

Numerous methodological approaches have been applied and synthesized, in order to characterize the spatiotemporal dimensions of Native American impacts (Smith 2011b, Munoz et al. 2014). To study impacts upon forest composition, approaches have included the analysis of pollen records (Delcourt et al. 1998), soil charcoal records (Fesenmyer and Christensen 2010), dendrochronological records (Ruffner and Abrams 2002), ethnographies (Heidenreich 1972), and presettlement land survey records (PLSRs; Wang 2005). Yet, only a few studies have used historical tree species data found in PLSRs to statistically assess whether tree species composition is best explained by former Native American land use, environmental factors, or both (Foster et al. 2004, Black et al. 2006). Moreover, though these studies have associated Native American settlement with tree species composition, they have established only relationships between settlement and composition, and have not assessed whether past species distributions are explainable in the absence of Native American influence. These studies also employed proximity-based measures from settlement (i.e., villages) to serve as a proxy for Native American impacts, though measures of human travel costs (e.g., walking speed, energy expenditure) over terrain may more accurately represent the locations of resource catchments surrounding settlements (Surface-Evans 2012).

To supplement the aforementioned methodologies, great potential exists in the use of species distribution models (SDMs) to distinguish the impacts of Native American land use upon past tree species distributions. Built upon ecological niche theory, SDMs are a class of statistical and machine-learning algorithms, which predict the geographic distribution of a species based upon relationships between species records and underlying environmental predictors (Franklin and Miller 2009). SDMs typically output a probability of presence, or a binary presence/absence value, in order to represent the geographic distribution of a species. In comparison to traditional statistical techniques (e.g., logistic regression), many SDM algorithms are uniquely suited to handle the challenges of modeling ecological phenomena, which include higher-order interactions between predictor variables, and nonlinear relationships between response and predictor variables (De'ath and Fabricius 2000).

Though SDMs have been employed to understand past forest composition (e.g., Hanberry et al. 2012), no study has utilized the many advantages of SDMs to quantify Native American impacts. If SDMs more accurately predict the past distributions of tree species when including variables that represent Native American land use, then it would provide evidence of the importance of Native American impacts upon forest composition. SDMs could also compare the importance of variables representing Native American land use, to other environmental controls upon the distributions of tree species. Most importantly, SDMs afford the simulation of different scenarios of species distributions, by projecting distributions into new environmental conditions (Franklin and Miller 2009). By developing and validating SDMs that include variables of Native American land use, species distributions could be projected into alternative scenarios, in order to simulate the distribution of tree species in the absence of Native American impacts.

The purposes of this study are to utilize SDMs to understand the importance of Native American impacts upon the presettlement distribution of tree species, and to understand the spatial characteristics of Native American impacts. Specifically, we examine the impacts of Iroquoian land use upon the presettlement (ca. 1799–1814 CE) distribution of tree species in Chautauqua County, New York, USA. This study's purposes are carried out primarily by modeling tree species distributions as a function of environmental variables, and of variables that represent accessibility to features of Iroquoian settlement. To carry out the study's purposes, three hypotheses are tested. First, we hypothesize that predictive performance will increase for SDMs that include variables that reflect Iroquoian land use. Second, we hypothesize that fire-tolerant and mast-bearing tree species will be positively associated with areas that were accessible to Iroquoian settlement. Third, we hypothesize that Iroquoian impacts increased the spatial extent of fire-tolerant and mast-bearing tree species, and also decreased the spatial extent of fire-sensitive and non-mast-bearing tree species.

Study Area

Modern-day Chautauqua County (Fig. 1) totals approximately 2750 km2 in land area, and contains portions of the warmer-drier Erie Lowland and the cooler-moister Allegheny Plateau physiographic sections. During the Late Woodland period (ca. 1000–1650 CE), various Native American groups occupied the study area, such as those associated with Monongahelan and Iroquoian sites of the 13th–15th century CE (Emans 2007), and later the Erie Confederacy (Parker 1920). During the early Historic era (ca. 1650–1800 CE), the Seneca nation of the Haudenosaunee (Iroquois) Confederacy moved into former Erie lands (Parker 1920), establishing villages along Cattaraugus Creek in the northern portion of the county, as well as along the Allegheny River to the east of the study area in modern-day Cattaraugus County. Large villages were also established along the Allegheny River south of the study area, in modern-day northwestern Pennsylvania. Chautauqua County has been described as the hunting territory of the Seneca during the early Historic period (Parker 1920), and the Seneca also established additional villages within the modern county borders during the 18th century CE. Though some Monongahelan cultural groups may have occupied the study area (Emans 2007), the majority of cultural groups were proto-Northern Iroquoian, or Northern Iroquoian (e.g., the Erie and Seneca); thus, this study will use the term “Iroquoian” to refer to Native American groups in the study area.

figure image

A hillshaded relief map of modern-day Chautauqua County, New York, USA.

Previous research has indicated relationships between Iroquoian settlement, and forest composition within Chautauqua County and vicinity. Archaeological investigations in an excavated Late Woodland village site within the county revealed nut fragments from chestnut (Castanea dentata (Marsh.) Borkh.), hickory, oak, black walnut, and white walnut (butternut; Juglans cinerea L.) within refuse pits (Guthe 1958). In northwestern Pennsylvania, higher abundances of oak, hickory, and chestnut bearing-trees were found in the vicinity of Seneca-occupied villages of the mid- to late-18th century (Black et al. 2006). A 426-year white oak (Quercus alba L.) tree-ring chronology suggested frequent, low-intensity fires surrounding these same village sites during the Seneca occupation (Ruffner and Abrams 2002). Other associations between presettlement oak forests and Seneca villages were noted near the Seneca Indian Territories (Seischab 1992), which also coincide with areas where land surveyors described burned areas in the early 19th century (Kenoyer 1937).

Data and Methods

This study contains four main steps. The first step develops and compares two sets of SDMs: both sets include the same 11 environmental predictor variables, but one set also includes three variables that represent accessibility to features of Iroquoian settlement. The second step compares the predictive performance between the two sets of SDMs. The third step assesses the relative importance of predictor variables within the SDMs, as well as species responses to individual predictor variables. The fourth step uses SDMs to simulate the distributions of tree species in the absence of Iroquoian influence.

Developing SDMs

The distributions of 23 tree species were modeled within each set of SDMs. Four SDM algorithms were utilized to model each species in each set of SDMs, at a grid cell resolution of 1 ha (100 × 100 m). All SDMs were created within the R statistical computing software (R Development Core Team 2011), and all geographic information systems (GIS) tasks were performed using ArcGIS 10.1 (Esri 2012). Described in this section are the training data, environmental predictor variables, predictor variables representing accessibility to features of Iroquoian settlement, and SDM algorithms.

Training data

This study utilized tree species records from presettlement land survey records (PLSRs) as training data for SDMs. PLSRs are records of the first land surveys in North America, which divided land into townships and lots for European-American settlement (Wang 2005, Liu et al. 2011). PLSRs contain descriptions of the vegetation that surveyors observed along survey lines (“line-descriptions”), as well as records of the trees that surveyors blazed (“witness-trees” or “bearing-trees”) to designate important survey locations (Wang 2005). Despite some data-quality issues that include biases for or against certain taxa (Kronenfeld 2014), ambiguity in recorded taxa (Mladenoff et al. 2002), and irregularity in survey design (Black and Abrams 2001), PLSRs are considered one of the most important data sources available for reconstructing presettlement forest composition (Whitney 1996, Wang 2005). Researchers have previously used vegetation data within PLSRs to understand Native American impacts (Dorney and Dorney 1989, Black and Abrams 2001), including to test for statistical relationships between past species distributions and Native American settlement (Foster et al. 2004, Black et al. 2006).

The line-description data of the Holland Land Company (HLC) lot surveys (ca. 1799–1814 CE) provided the training data for the SDMs (Fig. 2), which were previously transcribed and georeferenced using linear referencing tools (Esri 2012) within a GIS (Tulowiecki 2014). The lot surveys subdivided townships within the county into typically 1.21 × 1.21 km (0.75 × 0.75 mile) lots. Within the line-description data of the lot surveys, surveyors recorded descriptions of forest composition that they observed along survey lines, and these descriptions generally corresponded to homogenous landscape features (e.g., a swamp) or soil qualities (e.g., upland first quality). The lot surveys contained approximately 5871 line-descriptions within Chautauqua County, ranging from 20 m to over 3 km in surveyed length (median = 803 m). Surveyors listed a median of three tree species per line-description (Tulowiecki 2014). Though previous research has predominately utilized bearing-tree data to train SDMs (e.g., Hanberry et al. 2012), Tulowiecki (2014) demonstrated that line-description data of the HLC lot surveys produced SDMs of greater predictive performance within Chautauqua County, in comparison to bearing-tree data.

figure image

The locations of line-description data, and their associated centroids, used for training species distribution models (SDMs). In this figure, line-description data with chestnut (Castanea dentata (Marsh.) Borkh.) listed is depicted as an example. Modified from Tulowiecki (2014).

This study modeled the 23 taxa with the most mentions in the line-description data (Table 1). All 23 taxa were discernible to the species level with the exception of hickory (Carya spp.) and elm (Ulmus spp.). The taxa of this study will typically be referred to as “species,” because most were discernible as species. The presence or absence of each species was designated, based upon whether or not surveyors recorded the species within a line-description. Presence or absence of each species was assigned to the centroid (midpoint) of each digitized line-description (Tulowiecki 2014), because the SDM algorithms in the R statistical computing software required that species records be represented as point locations (Fig. 2). Though some environmental variability may have occurred over the length of a line-description (Puric-Mladenovic 2003), the centroids in this study were believed to be representative of the environmental conditions that corresponded to each line-description, given the manner in which line-descriptions were recorded in HLC surveys (see previous paragraph).

Table 1. The number of presence records for each of the modeled species, in both the training data and the evaluation data sets.
table image

Environmental predictor variables

The SDMs of this study considered 11 environmental predictor variables, which were selected from a larger set of predictor variables after an initial screening for multicollinearity. In addition to the final predictor variables that were considered, numerous water-balance variables (e.g., actual evapotranspiration [AET], potential evapotranspiration [PET]) for different combinations of months (e.g., May–June, July–August) were initially examined, but were eliminated because they were highly correlated with other predictor variables. In addition, a surficial geology variable (categorical) was examined to represent the potential effects of geology and parent material upon species distributions (Black et al. 2006), but it was highly related with other soil variables such as percent sand.

Three climatic variables were inputted into models: mean growing-season temperature (May–September, in °C), mean coldest-month temperature (January, in °C), and total growing-season precipitation (May–September, in mm). These data were downloaded from the PRISM Climate Group at a grid cell resolution of approximately 800 × 800 m (30 arcsec), and were resampled to a 100 × 100 m resolution using cubic convolution (Keys 1981, Esri 2012; data available online).2 Modern climate variables were included given the unavailability of finer-scale historical climate data, in order to represent the relative variability in temperature and precipitation within the study area (Tulowiecki 2014).

Five soil variables were included in models: percent sand (top 100 cm of soil), percent clay (top 100 cm), pH (top 100 cm), ranked soil drainage class, and compound topographic index (CTI). The first four soil variables were obtained as vector-format soil polygons from the SSURGO soil database, which were then converted to a 10 × 10 m resolution raster layer, and then mean-aggregated to a resolution of 100 × 100 m (data available online).3 Prior to mean aggregation, the ranked soil drainage class layer was reclassified into an ordinal representation of soil drainage (1, excessively well-drained; 7, very poorly drained). CTI (Beven and Kirkby 1979) provided an index of topographically controlled soil moisture, with values ranging from approximately 1 (drier soil conditions) to 30 (moister soil conditions) in the study area. The equation for calculating CTI at each grid cell is
where a is the total upstream contributing area from the cell and β is the slope angle at the cell (in radians). To calculate parameter a, a digital elevation model (DEM; USGS 2013) was inputted into the flow accumulation tool in ArcGIS (Esri 2012). Parameter β was calculated using a DEM and the slope tool, and then converted to radians using the raster calculator tool. All steps that were involved to calculate CTI were performed at a 100 × 100 m grid cell resolution.

A measure of moisture stress was calculated as the ratio (from 0.00 to 1.00) of AET to PET for the warmest month (July), using the water balance toolbox for ArcGIS (Dyer 2009). The water balance toolbox utilized a DEM, soils data, solar radiation data, temperature data, and precipitation data in GIS raster formats, in order to model soil demand and availability for each month of the year. AET/PET for July was calculated at a grid cell resolution of 100 × 100 m.

Two final variables were included in models: slope angle (degrees), and total growing-season solar radiation (in watt-hours per square meter, W·h·m−2). Slope angle was first calculated at a grid cell resolution of 10 × 10 m, by inputting a DEM into the slope tool (Esri 2012), and then mean-aggregated to a resolution of 100 × 100 m. Total growing-season solar radiation was calculated using the area solar radiation tool (Esri 2012) at a grid cell resolution of 10 × 10 m, and then mean-aggregated to a resolution of 100 × 100 m.

Predictor variables representing Iroquoian influence

One set of SDMs additionally considered three “Native American variables” (NAVs): accessibility to a Historic Iroquoian village site, accessibility to a Late Woodland Iroquoian village site, and accessibility to an Iroquoian trail. Each NAV represented the minimum caloric cost for a person to travel (mainly by foot) to any point within the study area from a settlement feature, and assumed that areas of Iroquoian land use were more likely to occur in areas that were more accessible. Creating these variables required two steps. First, villages and trails were mapped in a GIS. Second, cost–distance analysis tools (Esri 2012) were used to model minimum accumulated energy expenditure (in kilocalories [kcal]) for human travel, in order to represent accessibility to the aforementioned features of settlement.

To ensure proper calculation of NAVs, the locations of villages and trails were digitized both within and outside of the study area, because there were areas within Chautauqua County that were most accessible to villages or trails outside of the county. The locations of Historic village sites (Fig. 3; Appendix A: Table A1) were obtained and corroborated using first-hand accounts (Procter 1791, Seaver 1860, Kent and Deardorff 1960), surveyor maps (Adlum and Wallis 1791), HLC maps (SUNY Fredonia 2014), Seneca ethnographies (Morgan 1901, Parker 1920), and early county histories (Young 1875, Hazeltine 1887, Whitman and Russell 1887). The locations of 82 Late Woodland village sites (Fig. 3) were provided by University at Buffalo's Marian E. White Anthropology Research Museum and Archaeological Survey, the Pennsylvania Cultural Resources Geographic Information System, and the Rochester Museum and Science Center. Though some sites possess radiocarbon dates and diagnostic artifacts that provide finer dates of occupation, the precise times of occupation for the majority of Late Woodland sites could not be determined. Late Woodland village sites in the study area were generally believed to be palisaded horticultural centers that were occupied for a few decades.

figure image

The locations of Historic villages, known Late Woodland villages, and trails. The three “Native American variables” (NAVs) were derived by calculating the minimum accumulated energy expenditure to each of these three features. Only geography that could influence the study area was mapped; additional sites exist within the map extent that are not shown. Numbers correspond to Historic village sites described in Appendix A: Table A1.

Trails (Fig. 3) were located and digitized using HLC township surveys (Wang 2007) and lot surveys, county histories (Young 1875, Edson 1904), late 18th-century maps (Adlum and Wallis 1791), and other sources (Morgan 1901, Wallace 1965, Houck 2003). Many segments of trails developed into early European-American travel routes, which then developed into modern roads (Wallace 1965). Some segments of trails within Chautauqua County were digitized by tracing GIS layers that depicted modern streets, where sources suggested that this coincidence existed; other trail locations were estimated using textual descriptions.

Once villages and trails were digitized, each NAV was created by modeling minimum accumulated energy expenditure (in kcal) to each NAV's respective features (Fig. 4; Appendix B: Fig. B1), using cost–distance analysis tools in ArcGIS (Esri 2012). Cost–distance analyses have been used throughout archaeology literature (Surface-Evans and White 2012), including to model resource catchments surrounding archaeological sites based upon human energy expenditure (Verhagen and Whitley 2012). Modeling the minimum accumulated energy expenditure for each NAV was performed at a 10 × 10 m grid cell resolution and later mean-aggregated to 100 × 100 m. The modeling was largely based upon the methods of Jobe and White (2009), who modeled energy expenditure as a function of terrain slope angle, stream crossings, and other variables. A complete description of how NAVs were calculated is included in Appendix B.

figure image

Minimum accumulated energy expenditure (in kilocalories [kcal]) for a person weighing 70 kg, traveling from a Historic-era Iroquoian village site to any location in Chautauqua County. This figure represents one of the three Native American variables (NAVs) that were included in species distribution models (SDMs). The kcal values are classified into 200-kcal value ranges; all kcal ranges are right-inclusive.

SDM algorithms

To model each species, four SDM algorithms were implemented in R 3.0.1 (R Development Core Team 2011): boosted regression trees (BRT), generalized additive models (GAM), generalized linear models (GLM), and multivariate adaptive regression splines (MARS). Three of the four SDM algorithms (GAM, GLM, and MARS) were implemented within the biomod2 package (Thuiller et al. 2013), whereas BRT was implemented within the dismo package (Hijmans et al. 2013). Case weights were assigned to the presence and absence records of each species, in order to fix each species' prevalence at 0.5 (Franklin and Miller 2009). Data and code used to develop SDMs with biomod2 and dismo are included in the Supplement.

BRT was the main algorithm involved in modeling the distribution of each species, given its high performance compared to other SDM algorithms (Elith et al. 2006), as well as its ability to handle higher-order interactions between predictor variables. The BRT algorithm iteratively builds a series of regression trees (Elith et al. 2008), in which the residuals from a previous tree are incorporated into training the next tree. For this study, up to five-way interactions (tree complexity = 5) were modeled using BRT. The bag fraction was set to 0.5, meaning that 50% of the training data sites were selected at random, in order to train each regression tree. These random subsamples were stratified by prevalence, so that the random subsamples maintained the same proportion of presences to absences as the full, original sample. The optimal number of regression trees was selected using the gbm.step function in the dismo package (Hijmans et al. 2013), which uses a cross-validation technique to find the number of trees that minimizes predictive deviance. A learning rate was selected such that the optimal number of regression trees would fall between 1000 and 2000. Furthermore, models were made more parsimonious by eliminating less important variables, using additional cross-validation techniques proposed in Elith et al. (2008).

GAM, GLM, and MARS algorithms were used to train SDMs, in order to provide additional assessments of the predictive performance of models. GAMs typically model the relationship between the response variable and predictor variables using data-defined, nonlinear relationships (Hastie and Tibshirani 1986). Function fitting and variable selection for GAMs were performed in R within the biomod2 package (Thuiller et al. 2013), using methods of the mgcv package (Wood 2014); up to two-way interactions were modeled. GLMs (McCullagh and Nelder 1989) were used to model binary outcomes as a function of linear and quadratic relationships with predictor variables, and with no interactions between variables. Variable selection for GLMs was made using a stepwise procedure (i.e., bidirectional selection and removal of variables), based upon the Akaike information criterion (AIC; Akaike 1974). MARS (Friedman 1991) fits piecewise linear functions between response and predictor variables, and performs automatic variable selection via “pruning” cross-validation techniques; up to two-way interactions were modeled using MARS. Fewer variable interactions were modeled using GAM, GLM, and MARS, because additional interactions did not appreciably improve model performance for these algorithms.

Comparing SDM performance

Once SDMs were trained, each model was evaluated upon training data, as well as independent evaluation data sets. The predictive performance between models that included or excluded NAVs were then compared.

Assessing the predictive performance of SDMs

The predictive performance of each SDM was quantified using two model evaluation statistics: the area under the receiver operating characteristic curve (AUC) and the true skill statistic (TSS). AUC is a threshold-independent measure of a model's ability to discriminate between the locations of known presences and absences. AUC ranges from 0 to 1 (0.5, discriminating ability equal to chance; 1, perfect discriminating ability), and can be interpreted as the probability that a model will assign a higher prediction value at a random presence location than at a random absence location (Fawcett 2006). TSS is a threshold-dependent measure of a model's ability to predict the locations of presence and absence; TSS ranges from −1 to 1, with 0 equaling a model that is equal to chance and 1 equaling perfect predictive ability (Allouche et al. 2006). TSS is calculated at a threshold in the predicted probability values made by a model, which maximizes the sum of sensitivity (true positive rate) and specificity (true negative rate).

In addition to evaluating models upon the training data, they were also evaluated upon two temporally independent data sets created from the HLC township survey (1797–1799 CE), which subdivided Chautauqua County into typically 9.7 × 9.7 km (6 × 6 mile) townships (Wang 2007). The HLC township survey differed from the HLC lot surveys in three ways: in the time of survey, in the design of survey, and in the surveyors involved. The HLC township survey recorded observations of tree species in line-description data, as well as two to four bearing-trees adjacent to survey posts at 0.8-km (0.5-mile) intervals. The township survey data were georeferenced previously (Tulowiecki et al. 2015), and were also utilized previously to develop independent data sets for SDMs (Tulowiecki 2014). For one independent evaluation data set, the presence or absence of a species was designated at the centroid of each line-description, based upon whether or not that species was observed in the line-description. In the other independent evaluation data set, the presence or absence was designated at each survey post, based upon whether or not that species was blazed and recorded as a bearing-tree adjacent to that post. Each evaluation data set (i.e., the line-description or bearing-tree data) from the township survey was only used to evaluate SDMs when species possessed more than 10 presence records in the evaluation data set.

Statistical tests to compare SDMs

Paired t tests were used to examine differences in evaluation statistics between models that included or excluded NAVs. One set of paired t tests involved comparisons of SDMs on a species-by-species basis. For each modeled species, evaluation statistics were compared with a t test, using all SDMs that were developed for that species (i.e., using BRT, GAM, BLM, and MARS). Another set of paired t tests uniquely compared evaluation statistics for models of mast-bearing taxa collectively per SDM algorithm, because much research has suggested that mast-bearing taxa were associated with Iroquoian settlement patterns (see Introduction). Chestnut, hickory, black oak (Quercus velutina Lam.), white oak, black walnut, and white walnut were designated as the mast-bearing taxa within the study area.

Assessing variable importance and species responses

For BRT models, the relative importance of NAVs was compared to other predictor variables, and species responses to NAVs were assessed. For each predictor variable, a measure of relative importance in BRT models is calculated as the number of times that a variable is used at a split in regression trees, weighted by the squared improvement that each split brings to the model (Elith et al. 2008). Measures are then scaled so that the sum of the weighted-averages for all variables equals 100, and are typically expressed as a percentage. To examine the relationship between a species and each predictor variable, response curves were generated using partial dependence plots (Elith et al. 2008), which portray the univariate relationship between a species and a predictor variable (Fig. 5). These plots are created by plotting the change in the prediction made by a BRT model when one predictor variable's values are changed, and all other variables are held at their mean value. In the plots, the fitted values are centered by subtracting the mean of the fitted values (Hijmans et al. 2013).

figure image

Idealized representations of the response curves between Native American variables (NAVs), and the probability of presence for a tree species, using partial dependence plots. The plots show either that (a) a species has a higher probability of presence near a feature of Iroquoian settlement, (b) a species has a lower probability of presence near a feature of Iroquoian settlement, or (c) the relationship is nonlinear and/or difficult to interpret.

Simulating species distributions in the absence of Iroquoian settlement

BRT models (with NAVs included) were used to simulate changes in suitable areas for species, under a scenario that best represented the absence of Iroquoian settlement. This simulation involved four steps.

First, NAVs were replaced with constant values that corresponded to the maximum value for each respective NAV. These values for NAVs collectively represented a scenario of maximum inaccessibility from each feature of Iroquoian settlement across the study area. This step is analogous to setting different values of temperature and precipitation in SDMs, in order to predict species distributions under different climate scenarios. Second, BRT models were applied to predict the distributions of species in the study area, under this new scenario of inaccessibility. To ensure that only high-quality models were utilized to predict alternative species distributions, only species were selected that (1) possessed SDMs that showed a significant or nearly significant increase when NAVs were included, in one or more evaluation statistics using either independent evaluation data set (Assessing the predictive performance of SDMs); and (2) produced “good” BRT models with a minimum AUC value of 0.80 (Araújo et al. 2005), as evaluated upon at least one independent evaluation data set.

For each selected species, the third step converted the original and the simulated prediction surfaces into binary prediction surfaces, to offer different estimates of changes in “suitable” and “unsuitable” areas under the original and simulated conditions, relative to a threshold. Three thresholds were applied to the prediction surfaces, to classify areas as either suitable or unsuitable for each species (Liu et al. 2005, Franklin and Miller 2009). The thresholds in predicted probabilities were determined using the original prediction surfaces, and this same threshold value was applied to both the original and simulated prediction surfaces to determine suitable areas. One threshold held the sensitivity at 95%, meaning that only 5% of observed presences in the training data (under the original conditions) would be falsely classified as unsuitable in the binary prediction. Another threshold best equalized sensitivity and specificity, essentially creating a binary prediction surface that was the least biased toward accurately predicting either presence or absence in the training data. A final threshold maximized the sum of sensitivity and specificity when models were evaluated upon the training data. The fourth and final step was to quantify and compare the amount of suitable areas within Chautauqua County for each species, using the original and simulated binary prediction surfaces.


The performance of SDMs

When models were evaluated upon the training data, t tests revealed that 14 out of 23 species significantly improved model performance (at α = 0.05) when NAVs were included in models according to the AUC evaluation statistic; and 14 out of 23 species improved performance according to the TSS evaluation statistic (Fig. 6; Appendix C: Table C1). These significant test results included five out of the six mast-bearing taxa (all except black walnut; Data and methods: Comparing SDM performance: Statistical tests to compare SDMs), and white pine. Significant improvements also occurred for models of abundant mesic species such as sugar maple (Acer saccharum Marsh.), beech, and basswood (Tilia americana L.), when evaluated using either AUC or TSS. No models that included NAVs decreased in performance, when they were evaluated upon the training data. While NAVs improved the predictive performance of SDMs for some species, not all SDM algorithms that considered NAVs selected all three NAVs (Appendix D: Table D1).

figure image

The line-description data used for training models, as well as the model predictions from boosted regression tree (BRT) models without Native American variables (NAVs) and with NAVs, for select species. Line-descriptions with a species mentioned are symbolized with a thick black line. Model predictions range in probability values from 0 (black) to 1 (white). The crosshatches (in the middle and right panels) symbolize Lake Erie and Chautauqua Lake.

Models that included NAVs displayed improved performance in predicting distributions within independent data sets. Using the township bearing-tree data to calculate evaluation statistics, seven out of 17 species significantly improved in AUC values, and five out of 17 species significantly improved in TSS values (Table 2). Only one species (hemlock, Tsuga canadensis (L.) Carr.) exhibited a significant decrease in model performance when NAVs were included, using both AUC and TSS. Both oak species (i.e., black oak and white oak), as well as chestnut, showed significant (α = 0.05) or nearly significant (α = 0.10) improvements using each evaluation statistic when NAVs were included, and also exhibited some of the largest increases in evaluation statistic values when using township bearing-tree data for evaluation. Using township bearing-tree data to evaluate models, significant or nearly significant improvements occurred for models of basswood, beech, and sugar maple, in both AUC and TSS values. Six species (Table 1) did not have the minimum of 10 presence records in the township bearing-tree data, in order to be evaluated independently using this data set.

Table 2. Model evaluations upon the township bearing-tree data, using both the area under the receiver operating characteristic curve (AUC), and the true skill statistic (TSS).
table image

Using the township line-description data as an evaluation data set, five out of 21 species exhibited significantly higher AUC values, and five out of 21 species exhibited significantly higher TSS values, when NAVs were included (Table 3). Only models of yellow birch (Betula alleghaniensis Britton) performed significantly poorer, and for only one statistic (i.e., TSS). Four out of six mast-bearing taxa (chestnut, white oak, black walnut, and white walnut) displayed significant or nearly significant improvements in performance using AUC, and four out of six mast-bearing taxa (chestnut, hickory, black oak, and white oak) displayed significant or nearly significant improvements using TSS. Similar to the previous results, chestnut and white oak exhibited some of the largest increases in evaluation statistic values. Models of sugar maple once again displayed improved predictive performance using both the AUC and TSS statistics, when evaluated using the township line-description data. Only two species (Table 1) did not have the minimum of 10 presence records in the township line-description data, in order to be evaluated upon this data set.

Table 3. Model evaluations upon the township line-description data, using both the area under the receiver operating characteristic curve (AUC), and the true skill statistic (TSS).
table image

When NAVs were included, improvements in the predictive ability of models occurred for the mast-bearing taxa collectively (Table 4). When evaluation statistics were calculated using the township bearing-tree data as an evaluation data set, models of mast-bearing taxa (n = 4) improved nearly significantly for AUC, and significantly for TSS, for three out of four SDM algorithms (BRT, GAM, and MARS). When evaluation statistics were calculated using the township line-description data for evaluation (Fig. 7), models of mast-bearing taxa (n = 6) improved significantly for both AUC and TSS, for three out of four SDM algorithms (BRT, GLM, and MARS).

figure image

A plot showing the predictive performance of species distribution models (SDMs) for mast-bearing species, as evaluated upon the township line-description data. The (a) area under the receiver operating characteristic curve (AUC) and (b) true skill statistic (TSS) evaluation statistics are shown. Each point represents a pairing of two models that share the same SDM algorithm and species, but differ regarding whether they included Native American variables (NAVs). Models are boosted regression trees (BRT), generalized additive models (GAM), generalized linear models (GLM), and multivariate adaptive regression splines (MARS). Mast-bearing species in this study are chestnut, hickory, black oak, white oak, black walnut, and white walnut.

Table 4. The results of paired t tests for differences in evaluation statistics, between models for mast-bearing species.
table image

NAVs in models of tree species distributions

Relative importance of NAVs

NAVs exhibited high importance when included in BRT models. The first-, sixth-, and seventh-most important predictor variables were NAVs, whereas the second-, third-, and fourth-most important predictor variables were climate related, according to the mean relative importance (Data and methods: Assessing variable importance and species responses) of NAVs in BRT models (Table 5a; Appendix D: Table D2). When including all of the BRT models, the mean relative importance was 30.1% for the three NAVs combined, 27.9% for the three climate variables (i.e., temperature and precipitation variables) combined, and 41.9% for the eight remaining variables combined. When including all BRT models of mast-bearing taxa, the mean relative importance was 34.2% for the three NAVs combined, 30.3% for the three climate variables combined, and 35.6% for the eight remaining variables combined (Table 5b). For the BRT model of white oak (one of the top-performing models, as judged by its evaluation upon both independent data sets), the three most important variables were accessibility to a Historic village site (28.2%), mean growing-season temperature (13.9%), and mean January temperature (13.7%). The highest relative importance for any NAV was for accessibility to a Historic village site in models of white oak (28.2%) and chestnut (24.3%); these relative importance values were the highest for any variable in any BRT model that included NAVs.

Table 5. The mean relative importance of predictor variables, for (a) all boosted regression tree (BRT) models (n = 23), and (b) BRT models of mast-bearing species (n = 6).
table image

Responses to NAVs

The importance of variables in BRT models, and the nature of the relationship between each species and each of the NAVs (determined using partial dependence plots; Data and methods: Assessing variable importance and species responses), varied by species (Table 6). Models for five species (chestnut, hickory, black oak, white oak, and black walnut) exhibited higher probabilities of presence in areas more accessible to all three features of Iroquoian settlement (Fig. 8); no species exhibited lower probabilities of presence near all three features of Iroquoian settlement. The five models with the highest relative importance (>13%) for accessibility to a Historic village site (Table 6), which also predicted higher probabilities near Historic village sites, were for white oak, chestnut, black oak, ironwood (Ostrya virginiana (Mill.) K. Koch), and hickory, in decreasing order. The two models with the highest relative importance (>14%) for accessibility to a Historic village site, which also predicted lower probabilities near Historic village sites, were for sugar maple and beech. Generally, less notable responses were detected involving accessibility to a Late Woodland village site (except relationships with hickory, white pine, and black walnut), and even less so with accessibility to trails (Table 6).

figure image

Response curves showing the relationship between each predictor variable, and the five tree species meeting the criteria presented in Data and methods: Simulating species distributions in the absence of Iroquoian settlement, as outputted by dismo functions in R. The y-axes contain the predicted values, prior to its transformation into a probability via a logistic function, f(x) = 1/(1 + ex). The black lines indicate the modeled response, and the gray dashed lines represent a smoothed response to represent a more generalized relationship.

Table 6. The relationships between Native American variables (NAVs) and individual species for boosted regression tree (BRT) models.
table image

Changes in species distributions under different scenarios of Iroquoian land use

Five taxa met the required criteria (Data and methods: Simulating species distributions in the absence of Iroquoian settlement) to simulate the changes in suitable area under a scenario that represented an absence of Iroquoian settlement, all of which were mast bearing: chestnut, hickory, black oak, white oak, and black walnut. The changes in suitable area from the original and simulated conditions, using three different threshold selection procedures, are summarized in Table 7. The five mast-bearing taxa generally found suitable area near Iroquoian settlement under the original conditions, but under a scenario of maximum inaccessibility to features of Iroquoian settlement, the amount of suitable area (relative to a threshold in predicted probability values) decreased for the five taxa (Fig. 9). For example, using a threshold that maximized the sum of model sensitivity and specificity, the amount of suitable area for chestnut decreased from 476.0 km2 to 48.4 km2, equaling a decrease from 17.4% to 1.8% of the total land area in Chautauqua County.

figure image

An example of the binary predictions (i.e., suitable or unsuitable) made by boosted regression tree (BRT) models under the original conditions (left), and under a scenario in which Iroquoian land use is absent (right). Shown are the five taxa that met the criteria (Data and methods: Simulating species distributions in the absence of Iroquoian settlement) for this analysis. White indicates suitable areas for a species, whereas black indicates unsuitable areas for a species. In this example, thresholds were selected that maximized the sum of sensitivity and specificity in the original model predictions, using the ← training data. The crosshatches symbolize Lake Erie and Chautauqua Lake.

Table 7. Changes (Diff) in suitable area (measured in percentage of study area) between the original prediction surfaces (NAVs) and simulated (Sim; i.e., the scenario of maximum Iroquoian inaccessibility) prediction surfaces, using three different threshold selection procedures.
table image


The results of this study support the three hypotheses of this study (Introduction), strongly suggesting that Iroquoian groups shaped forest composition and patterning at a local spatial extent (i.e., within a 2750-km2 study area). Using a synthesis of SDMs with archaeological modeling methods to generate NAVs, the results show that while the impacts upon forest composition were notable, Iroquoian groups settled near both naturally occurring and human-modified forests (i.e., oak–hickory–chestnut communities). In the following discussion, Iroquoian settlement and the past distribution of species addresses the three hypotheses of this study, Methodological advances and major findings discusses this study's methodological advances and major findings, and Representing Native American land use presents other methodological considerations.

Iroquoian settlement and the past distribution of species

General model performance

In contrast to previous research (Foster et al. 2004, Black et al. 2006), this study compared and independently evaluated quantitative models of species distributions that included and excluded NAVs, in order to assess whether NAVs improved models. This study uniquely indicated that, when NAVs were included, models of tree species distributions improved in predictive performance for species generally found in upland settings (Tables 2 and 3). In particular, NAVs enhanced the predictive performance of models for mast-bearing taxa (Fig. 7), as well as for dominant mesic species in the study area. These results are consistent with past research in regions occupied by Iroquoian groups (Engelbrecht 2003, Abrams and Nowacki 2008), which indicated that the most noticeable effects of land use upon forest composition occurred on upland sites.

This study's results also offered a refined understanding of Iroquoian impacts upon various tree species, because unlike previous work, it examined whether Iroquoian impacts extended into forested wetlands. The study found that in contrast to tree species typically found in upland sites, wetland species generally exhibited less response to NAVs in Iroquoian regions. When models were evaluated upon independent data sets, NAVs did not improve model performance for models of many wetland species (e.g., alder, Alnus incana). Models also retained very high predictive performance for these same species, when NAVs were excluded (Tables 2 and 3). Additionally, SDM algorithms selected fewer NAVs in final models for wetland species, such as for SDMs of black ash (Fraxinus nigra Marsh.; Appendix D: Table D1). These results suggest that swamps were less disturbed by Iroquoian influence, in comparison to more upland sites where clearing and burning generally occurred (Engelbrecht 2003, Munoz et al. 2014).

Some modeled relationships between NAVs and species are potentially due to correlative rather than causative circumstances, which underscores the need to interpret model results cautiously when NAVs are considered in SDM development. For instance, SDMs for wetland tree species selected NAVs, yet did not offer improvements in model performance. Iroquoian villages were often situated somewhat near wetlands, given their value for providing medicinal resources, building materials, and hunting opportunities (Engelbrecht 2003). Moreover, many segments of trails in the study area followed the high, dry ground above swamps within major valleys (compare Figs. 1 and 2), which may explain why accessibility to trails was selected for models of wetland and other species (Appendix D: Table D1). Future modeling should analyze Native American impacts over larger spatial extents, in order to avoid spurious relationships and model overfitting that could occur from fitting models to ecological “noise” (Peterson et al. 2011) at small spatial extents, or from using SDMs that fit highly nonlinear responses (Guisan and Zimmermann 2000, Vaughan and Ormerod 2005, Randin et al. 2006). Other modeling frameworks, such as process-based modeling of tree species distributions (e.g., Morin et al. 2007), may be useful for disentangling the effects of Native American land use from environmental controls, and could supplement approaches that use SDMs.

Species responses to Iroquoian settlement

This study also demonstrated species-specific responses to NAVs (Table 6). The SDMs showed both strong positive and strong negative relationships between tree species and NAVs, which suggests a spatial organization of Iroquoian impacts upon tree species composition surrounding Historic village sites. Moreover, accessibility to Late Woodland (ca. 1000–1650 CE) village sites was related to some species distributions, suggesting that in contrast to other studies (Peacock et al. 2008), the impacts of Native Americans persisted for considerable time after abandonment (Munoz et al. 2014).

Models of some tree species predicted higher probabilities of presence in the immediate vicinity of villages (i.e., within an approximately 3–6 km radius; Appendix B: Fig. B1), which is exemplified by black walnut (Fig. 8e). Models of black walnut showed nearly significant improvement when NAVs were included, as evaluated upon township line-description data (Table 3); accessibility to Late Woodland and Historic villages represented the second- and third-most important variables in the BRT model for black walnut, respectively. This tree species was more present in areas requiring less than approximately 150 kcal (from Late Woodland village sites) to 300 kcal (from Historic village sites) to access (for a human weighing 70 kg), meaning that the species grew in areas that could be accessed via a short walk from village sites (Figs. 4 and 8e). Other research found that black walnut was often observed near Iroquoian village sites prior to European-American settlement (Black and Abrams 2001), almost exclusively within 5 km of village sites (Black et al. 2006). Black walnut may have been cultivated near village sites given its high caloric content (Wykoff 1991), or have found ideal growing conditions in the open environments provided by horticultural fields (e.g., within 2 km of village sites; Jones 2010). This valuable species may have required active management, because the species is highly shade intolerant (Niinemets and Valladares 2006).

Additional mast-bearing taxa showed relationships with Iroquoian settlement that extended in more diffuse patterns beyond village sites, generally within a 10–15 km radius around villages (Figs. 4 and 8a–d; Appendix B: Fig. B1). Oak species were more likely to be present near Iroquoian village sites, with white oak generally exhibiting higher probabilities within approximately 300 kcal (from Late Woodland village sites) to 900 kcal (from Historic village sites) of effort from village sites. Similarly, higher probabilities of chestnut occurred within 400 kcal (from Late Woodland village sites) to 800 kcal (from Historic village sites) of effort, and for hickory within 300 kcal (from Late Woodland village sites) to 600 kcal (from Historic village sites) of effort (Fig. 4). These associations are consistent with relationships observed in northwestern Pennsylvania, which showed that patterns of oak, chestnut, and hickory extended to farther distances beyond Iroquoian settlement, in comparison to species such as black walnut (Black et al. 2006). Results are also consistent with the idea that hunting practices took place farther from village sites, including at hunting camps (Engelbrecht 2003). Being both fire adapted (Abrams 2003) and intermediately shade tolerant (Niinemets and Valladares 2006), oak species likely would have benefitted from disturbances initiated by Iroquoian land use, including clearing and burning. Chestnut and hickory are more sensitive to damage from fire than oak, but their strong ability to sprout after fire damage would have allowed these species to compete in areas of occasional burning (U.S. Forest Service 1990). A forest opened by burning would have also increased sunlight to the understory, thereby benefitting oak, chestnut, and hickory, which are only moderately shade tolerant (Niinemets and Valladares 2006).

Areas that were more accessible to Iroquoian village sites were also associated with lower probabilities of presence for mesic tree species, especially beech and sugar maple (Table 6). The probabilities of presence for both species generally decreased, in areas that required less than approximately 900 kcal (from Historic village sites) to access (results not shown). Studies also noted decreases for these same species in the vicinity of Iroquoian settlement, as determined using PLSRs (Black et al. 2006) and fossil pollen records (Munoz and Gajewski 2010). Both of these species are sensitive to burning (U.S. Forest Service 1990), and may have also been outcompeted in opened and burned environments by more fire-tolerant, fast-growing species.

Generally less important to models was accessibility to trails (Table 6; Appendix D: Table D2), though this result may be due to issues surrounding the records of Iroquoian trails. Areas more accessible to trails were associated with higher presences of mast-bearing taxa (e.g., chestnut, hickory, black oak, and white oak), which aligns with previous research (Black et al. 2006). However, this NAV possessed little or no importance for some mesic species (e.g., sugar maple), yielded response curves that were less interpretable for other mesic species (e.g., beech), and presented relationships for other species that were difficult to explain using existing literature (e.g., ironwood; Table 6). Among other factors, an incomplete record of Native American trails could have affected analyses, which previous studies have acknowledged as an issue (Jones 2010).

The spatial extent of Iroquoian impacts

Whereas previous research quantified the spatial extent of disturbances such as clearing and recent burning, this study quantified Iroquoian modifications to forest composition, and showed that these modifications covered a greater total area than the aforementioned disturbances. By comparing original model predictions to those under an alternative scenario (i.e., maximum “inaccessibility” to Iroquoian settlement), results suggest that Iroquoian groups increased the amount of suitable area for five mast-bearing taxa, by amounts ranging from 1.8–2.0 (black walnut) to 18.2–24.0 (black oak) percentage points of the total area in Chautauqua County (Table 7, Fig. 9). These estimates are greater than estimates of clearing and burning practices in Iroquoian regions, which have equaled 1.27% of Cayuga- and Onondaga-occupied areas in the Military Tract of Central New York during the late 18th century CE (Marks and Gardescu 1992), and 3.2% of Huron-occupied areas in southern Ontario around 1600 CE (Campbell and Campbell 1994). However, this study's predicted changes in the amount of suitable area depended upon the method of selecting a threshold in predicted probability values to designate an area as suitable (Liu et al. 2005). Furthermore, future studies should attempt to quantify changes in the relative abundances of tree species resulting from Native American land use, rather than simply changes in the total suitable area for tree species.

These results also show that the amount of suitable area for some mast-bearing taxa would have decreased substantially, under an absence of Iroquoian settlement. Though some areas remained suitable under this scenario for some species (i.e., chestnut, hickory, and black walnut), BRT models predicted a considerable decrease in suitable area for black oak and white oak (Table 7, Fig. 9). While these decreases in oak species were dramatic, the results are bolstered by other research. One study of forest compositional change in western New York State indicated that some of the greatest decreases in white oak, from the presettlement era to the present day, occurred near the southern end of Chautauqua Lake (Wang et al. 2009). This area corresponds with the location of Historic village sites (Fig. 3), as well as where BRT models predicted large decreases in suitable area for white oak under a scenario of no Iroquoian influence (Fig. 9). Studies have also shown that there is presently little white oak recorded in Forest Inventory and Analysis data within Chautauqua County (Wang et al. 2009), and that oak species are experiencing poor regeneration throughout Eastern North America (McEwan et al. 2011). Thus, the study area may be less suitable for oak taxa in the absence of Iroquoian influence, as evidenced by the mesophication (Nowacki and Abrams 2008) of forests in the present-day.

Methodological advances and major findings

Methodological advances

This study presented a novel methodology for understanding the influence of Native American land use upon past tree species distributions, using PLSR data, SDMs, and the development of NAVs. In addition to providing quantitative measures of Iroquoian impacts, this methodology afforded the opportunity to identify areas where Iroquoian land-use practices potentially altered forest composition. The possibility exists that Native Americans utilized resources from both naturally occurring and human-modified forests, yet the methodologies of previous studies have not allowed these types of forests to be distinguished. Related to this point, the nature of the relationship between NAVs and forest composition implies that Iroquoian influence upon forested landscapes occurred along a gradient of intensity (Fig. 8), in keeping with previous conceptualizations of Native American impacts upon environments (Vale 2002).

Two caveats surround the methodological advances described above. First, given the variety in subsistence strategies of past Native American societies (Smith 2011b), this study's results are mainly applicable to Iroquoian regions of northeastern United States and southern Ontario. Similar methodologies would need to be adapted, in order to study the impact of alternative subsistence strategies upon environments in other cultural regions. Second, future studies should explore additional modeling approaches for separating environmental from Native American impacts upon tree species distributions. The distribution of a tree species might be better conceptualized as resulting from a non-stationary process (Miller 2012) comprised of two separate components (i.e., Native American land use and environmental conditions), rather than Native American impacts acting as an additive process superimposed upon environmental conditions. Native American impacts may also decouple the relationship between tree species distributions and environmental variables across smaller spatial extents, making it troublesome to differentiate the processes affecting distributions. SDM-based approaches that include a wider range in environmental conditions (Franklin and Miller 2009), or alternative modeling approaches that account for non-stationarity (e.g., geographically weighted regression, Austin 2007), may better differentiate the environmental and Native American processes that affect tree species distributions.

Another methodological advance of this study pertains to the NAVs used to represent Native American impacts. This study's development of NAVs (Appendix B) offered two improvements over simpler proxies of American land use used in analyses, such as distance from village sites (Foster et al. 2004), distance-based indices of Native American land use (Black et al. 2006), and binary representations of the presence/absence of Native American land management (Steen-Adams et al. 2011). First, this study's creation of NAVs accommodated the possibility of asymmetrical catchments around village sites, as defined by landscape accessibility (Fig. 4; Appendix B: Fig. B1). This approach offers an improved representation of catchments in comparison to distance-based proxies of Native American land use, because irregularly shaped resource catchments, as defined by accessibility, have been shown to better match the spatial distribution of available resources surrounding a site (Surface-Evans 2012). Second, the NAVs of this study provided a more direct link between human landscape accessibility and tree species distributions, rather than using distance-based proxies to represent areas of Native American land use. The NAVs were developed in units that were related to human mobility (i.e., kcal) over a three-dimensional terrain, rather than in units of two-dimensional, horizontal distance; these NAVs thereby allowed the two-dimensional area of catchments (Fig. 4; Appendix B: Fig. B1) to vary with landscape conditions. Future studies could explore other measures of “access” to the landscape; for instance, travel time could potentially serve as a variable to represent where Native American land use occurred (Kantner 2012).

Did the spatial manifestations of Native American impacts depend upon climate conditions?

The results of this study demonstrated how the spatial manifestations of Native American impacts may have varied between different climatic regions. Specifically, this study showed quantitatively that Native American groups utilized and promoted existing mast resources in warmer-drier climatic regions (e.g., the Erie Lowland), and manipulated forest composition in more concentrated areas near village sites in cooler-moister climatic regions (e.g., the Allegheny Plateau), which aligns with previous research (Nowacki et al. 2012). Two results are presented in support of these findings.

First, NAVs represented some of the most important variables in BRT models, but temperature-related variables were also influential to tree species distributions (Table 5; Appendix D: Table D2). In particular, mean January temperature was the second-most important predictor variable in BRT models, and likely reflected the climatic differences between the warmer Erie Lowland (along Lake Erie; Fig. 1) and the cooler Allegheny Plateau. Partial dependence plots suggest that some species appeared to have favored areas with milder January temperatures (e.g., black walnut), whereas other species were more likely to be present in areas of colder temperatures (e.g., beech). Given the role of temperature in species distributions, Native Americans may have sought to shift forest composition toward favored species (chiefly mast-bearing taxa), but their modifications could have been somewhat constrained by temperature. For example, the combined influence of warmer temperatures and Iroquoian influence may account for the more widespread distribution of black walnut in the Erie Lowland, in contrast to its more sporadic distribution immediately near village sites on the cooler Allegheny Plateau (Fig. 9). A similar phenomenon was noted in southern Ontario, where European travelers noted chestnut growing mainly near Huron villages at the climatic limits of this tree species (Day 1953).

Second, the simulated changes in the spatial extent of suitable area for tree species, under a scenario representing an absence of Iroquoian settlement, provided further insight into relationships between Native American impacts, temperature, and tree species distributions. Specifically, this scenario revealed areas where Iroquoian land use may have more extensively modified forest composition, with key differences manifested between physiographic provinces. Under this scenario, decreases in suitable area for mast-bearing taxa typically occurred upon the Allegheny Plateau, but BRT models still predicted suitable areas on the Erie Lowland for taxa such as chestnut, hickory, and black walnut (Fig. 9). In the absence of Iroquoian settlement, many mast-bearing taxa may have still been able to compete with mesic species on the Erie Lowland, which experiences milder temperatures in winter, and soil moisture deficits in summer. Other research has similarly pointed out that warmer climates and drier soils may have been the chief determinants of oak savanna in southern Ontario, instead of Iroquoian activities (Szeicz and MacDonald 1991). On the Erie Lowland, the scenario suggested that Iroquoian groups utilized and expanded naturally occurring forests of mast-bearing taxa, rather than initiating a major shift between forest community types.

On the other hand, Iroquoian land-use practices upon the Allegheny Plateau in the study area may have been necessary to shift forest composition toward mast-bearing taxa, where mesic species would have otherwise dominated. Specifically, the combined effects of steep terrain that made human travel more laborious (Fig. 4; Appendix B: Fig. B1), and conditions more favorable for mesic species (i.e., colder winters and soil moisture surpluses year-round), may have focused land-use efforts to areas surrounding villages (Nowacki et al. 2012). Other researchers similarly point to Iroquoian land use when explaining oak forests in mesic conditions on the Allegheny Plateau (Marks and Gardescu 1992); a dendrochronological study from the Allegheny Plateau also contended that the Seneca maintained oak forests with frequent, low-intensity fires, in areas that were more suitable for mesic species (Ruffner and Abrams 2002). Mast-bearing taxa, such as chestnut, hickory, and oak, may have been present in low relative abundances prior to Iroquoian settlement, but land-use practices such as burning may have shifted dominance toward these species. Alternatively, Iroquoian groups may have utilized fire on the Allegheny Plateau to maintain oak-hickory-chestnut forests that established during warm-dry climate periods (Fuller et al. 1998), when forests may have otherwise become dominated by mesic species during cool-moist climate periods.

The above interpretations of the dynamics between past tree species distributions (ca. 1799–1814 CE), climate, and Iroquoian impacts are tempered by the limitations associated with using modern climate data for independent variables. The possibility remains that past (rather than modern) climate conditions more adequately explain tree species distributions in areas near Iroquoian villages. In particular, an increase in drought frequency associated with a drier climate during Native American occupation may have increased the abundance of xerophytic species (e.g., oak species) in presettlement forests (Pederson et al. 2014); mesophication of forests since presettlement may thus be partially attributable to decreases in 20th century drought frequency, in addition to the cessation of Native American land-use practices (McEwan et al. 2011). Whereas the spatial patterns of temperature and precipitation gradients are likely comparable between modern and presettlement times, the total extent of drought-prone areas may have differed. Yet, given the high relative importance of temperature variables and NAVs upon tree species distributions (Table 5b), as well as the high predictive performance of models (Tables 2 and 3), more accurate representations of soil moisture stress may have only a modest impact upon SDMs of mast-bearing tree species in the study area.

Representing Native American land use

In addition to addressing issues surrounding correlative SDMs (Discussion: Iroquoian settlement and the past distribution of species: General model performance), adequately characterizing Native American land use is crucial for understanding Native American impacts on forested landscapes. Aside from the limitations that are posed by spatially incomplete archaeological or historical (Jones 2010) records, representing areas where Native American land use occurred several centuries ago inherently presents difficulties.

This study based the creation of NAVs upon the reasoning that land use occurred in areas that were accessible from village sites or trails. While this study's approach may have adequately represented areas where horticulture, village construction, and hunting occurred, additional hunting practices may have occurred far from village sites. Native American groups may have travelled long distances (e.g., 50–100 km) from villages to hunt, including the Seneca (Morgan 1901), because long-distance hunting for larger game could have produced a larger net caloric gain, in comparison to hunting near village sites for smaller game (Grimstead 2012); multiple Native American groups may have also conducted communal hunts far from villages (Waselkov 1978). Burning for hunting purposes may have also taken place far from village sites, due to the exhaustion of prey populations within the village catchment (Engelbrecht 2003, Jones 2010). Despite this knowledge of Native American land use, the above issues are mitigated in this study's approach, in at least three ways. First, hunting may have occurred near established trails (Wallace 1965), which was captured in this study by considering accessibility to trails. Second, this study's approach to representing areas of Iroquoian land use indirectly captured long-distance hunting practices, because some hunting occurred in abandoned or future village sites (Engelbrecht 2003). Third, archaeological evidence from other areas of 18th century Seneca settlement indicates that the Seneca were hunting near villages (Jordan 2008).

Due to somewhat limited knowledge of the Iroquoian occupancy of village sites within the area, this study did not differentiate areas of repeated use: either areas where land use may have occurred repeatedly throughout time, or areas where two or more villages may have conducted land-use practices contemporaneously. Yet, accounting for areas that were used more frequently or intensively (e.g., by repeated burning over time) may have presented a more realistic representation of Native American land use. Representing repeated use of the landscape may be more feasible in areas that contain sites with finer-resolution temporal data, such as the Seneca homeland of the 16th–18th centuries CE, where the dates of occupation and village movement sequences have been determined with greater precision (Wray and Schoff 1953, Jordan 2008). Additional modeling approaches, such as agent-based modeling (e.g., Iwamura et al. 2014), may also be useful for simulating the varying intensity of Native American land use across forested landscapes, and its impact upon tree species composition.


This study produced three direct findings regarding Native American impacts upon forest composition, specifically in relation to Iroquoian groups of the late prehistoric and early Historic eras. First, understanding the geography of Iroquoian settlement improved reconstructions of past forest composition using SDMs. Second, Iroquoian settlement was associated with higher probabilities of fire-tolerant and mast-bearing taxa recorded within PLSRs, as well as lower probabilities of fire-sensitive and non-mast-bearing taxa. Third, the land-use practices of Iroquoian groups shifted forest composition toward mast-bearing taxa in environments that were less optimal for such species.

This study also supports and advances previous research regarding Native American impacts upon past forest composition in many ways. The results align with the general framework of Native American impacts in the northeastern United States and southern Ontario: clearing and other practices near villages impacted a small portion of forested landscapes, but alterations to forest composition occurred over a larger total area (Whitney 1996). This study adds that where settlement occurred, Iroquoian land-use practices altered upland forest composition, but that their impacts did not extend into forested wetlands. The results also uniquely contribute that Iroquoian groups utilized resources from both naturally occurring and human-modified forests, and that forest compositional modifications may have been more important to subsistence strategies where the underlying environmental conditions were less optimal for preferred mast-bearing tree species.

In addition, this study contributes to the understanding of the spatial extent of Native American impacts. This study suggests that Iroquoian impacts were observable at local spatial extents, generally occurring within a 10–15 km radius around settlements. The overall impact of Native American settlement upon forest composition may have been dispersed at continental scales, because the spatial extent of Native American influence may have been limited by human mobility; though, future estimates of Native American impacts may vary with numerous factors, which include Native American subsistence strategies and population densities (Patterson and Sassaman 1988). Large areas uninhabited by Native American groups (e.g., see Milner and Chaplin 2010) may have experienced little or no compositional modifications over long timescales.

This study contributes a unique perspective on Native American impacts upon past forests, chiefly by using PLSRs, SDMs, and records of Native American settlement to simulate past forest composition in the absence of Native American impacts. Future synthetic analyses, additional modeling approaches, and improved representations of Native American land use will help reveal not only more nuanced relationships between Native Americans and forest composition, but also a deeper understanding of the past distribution of tree species within the forests of Eastern North America.


We thank Yi-Chen Wang for lending us her HLC township data to evaluate SDMs in this study. We also thank S. E. Munoz and one anonymous reviewer for their comments on an earlier version of the manuscript. S. J. Tulowiecki wishes to thank Douglas J. Perrelli for providing access to archaeological site files at the University at Buffalo's Marian E. White Anthropology Research Museum and Archaeological Survey. S. J. Tulowiecki received funding support from the University at Buffalo Presidential Fellowship during the development of this article.

  1. 2
  2. 3
  3. Supplemental Material

    Ecological Archives

    Appendices A–D and the Supplement are available online: