Evaluating correlative and mechanistic niche models for assessing the risk of pest establishment

Insect pests pose a great threat to global food security. Improved methods for assessment of the risk of pest establishment are needed to enhance informed decision-making, to develop cost-effective pest management strategies, and to design quarantine policies for preventing the spread of pests. We evaluated the capabilities of a correlative and a process-based mechanistic niche model, and their combination, to assess the risk of pest establishment. The correlative model MaxEnt and the process-based mechanistic model CLIMEX were used to assess the risk of establishment of western cherry fruit fly, Rhagoletis indifferens Curran (Diptera: Tephritidae) in California. We integrated R. indifferens occurrence records and spatial environmental variables using MaxEnt to assess the potential risk of establishment of this pest. The CLIMEX model was developed using eco-physiological tolerances of R. indifferens. The predictive performance of the MaxEnt model improved by including the host species' distribution and Ecoclimatic Index generated using the CLIMEX model. The best model predicted no risk for R. indifferens establishment in the Central Valley around the areas where sweet cherries are produced in California. Most of the high to very high risk areas for R. indifferens were predicted in northern parts of California and the Sierra Nevada Mountains, where the fly exists on its native host, bitter cherry [Prunus emarginata (Douglas) Eaton]. Precipitation of driest quarter, degree days with average temperatures ≥8.3°C, degree days with average temperatures ≤5°C, and mean diurnal range in temperature were the strongest predictors of R. indifferens distribution in western North America. We showed that the predictive power of correlative niche models can be improved by including outputs from the process-based mechanistic niche models. Overall results suggest that R. indifferens is unlikely to establish in the commercial cherry-growing areas in the Central Valley of California, largely because heat stress is too high and chilling requirement in those areas is not met.


INTRODUCTION
Invasive alien insect pests cause enormous damage to the environment, human health and wildlife health, and pose a great threat to the economy and global food security. Resource managers need improved methods for assessment for the risk of pest establishment to enhance decision-making, developing cost-effective pest management strategies, and designing quaran-tine policies for preventing the spread of invasive pests. One method for assessing the risk of pest establishment is through use of niche models that correlate known occurrences of species with environmental variables and predict species potential distributions . The niche models are based on classical concept of ''niche'' in ecology, and model potential or realized distribution of a species depending on the modeling algorithm used (Jimenez-Valverde et al. 2008, Franklin 2009. Niche models can be broadly classified into two groups: correlative models and process-based or mechanistic models (Dormann et al. 2012). The correlative niche models associate species occurrence data with spatial environmental layers of the study area and produce maps of probability of presence or relative environmental suitability for a species. The process-based or mechanistic niche models use species' functional traits and physiological tolerances for model fitting (Kearney et al. 2010). The correlative models generally do not perform well when extrapolated to novel environments (Webber et al. 2011, Owens et al. 2013. However, the extrapolations can be improved by fitting these models using biologically informed, hypothesis-driven variables, and minimizing model complexity (Kumar et al. 2014). Correlative models can be fitted using existing species occurrence data from museums/herbaria (Kearney et al. 2010, Elith andFranklin 2013), whereas most process-based mechanistic models need detailed experimental data that may not be available for the target species (Dormann et al. 2012). Both types of models have been used in quantifying and mapping the potential risk of establishment of insect pests in areas outside their current distributional range (e.g., Lozier and Mills 2011, Ni et al. 2012, de Villiers et al. 2013. This approach has rarely been applied to temperate fruit flies such as R. indifferens. In this study we evaluated the capabilities of a correlative model MaxEnt and a process-based mechanistic model CLIMEX, and their combination, to assess the risk of pest establishment. The western cherry fruit fly, Rhagoletis indifferens Curran (Diptera: Tephritidae), is a major quarantine pest of sweet cherry, Prunus avium (L.) L., in the western U.S. and potentially could invade areas of the U.S. that currently do not have the fly. The fly is native to Washington, Oregon, Montana, Idaho, British Columbia in Canada and in northern California (Bush 1966). The fly moved into drier areas of Washington and Oregon after cherries were introduced into these areas in the 1850s and was first detected attacking cultivated cherries (sweet cherry and sour cherry, P. cerasus L.) in Oregon in the early 1900s (Wilson and Lovett 1913).
While the fly is found in northern California (Mackie 1940, Frick et al. 1954, Blanc and Kiefer 1955 on its native host, bitter cherry, Prunus emarginata (Douglas) Eaton (Curran 1932), it has not been detected farther south in the commercial cherry growing regions of California. However, there are concerns that it may invade those areas in the future. The major production areas for commercial sweet cherries in California are located in the Central Valley from Lodi to Bakersfield (http://calcherry.com/about/history. cfm?CFID¼18447373&CFTOKEN¼91065340). The fly in California is apparently confined to native bitter cherry in the northern and southern parts of the state at high altitudes (Mackie 1940, Frick et al. 1954, Blanc and Kiefer 1955. If true, one hypothesis is that the lower-altitude areas in California where commercial cherries are grown are too warm for the flies, which require long periods of chilling to break diapause (Frick et al. 1954).
Currently, for cherry growers in the Pacific Northwest of the U.S. to ship sweet cherries to California, they must maintain an active control program for R. indifferens that includes monitoring through trapping, adhering to a strict pesticide application schedule, shipping through California designated 'Approved Shippers', performing both 'porch sampling' (sampling of 1-5 lbs. of cherries from each load of cherries, depending on the size of the load, that is delivered to the warehouse) and sampling of packed fruit from all lots using the approved 'brown sugar' or 'hot water' screening procedure (Brown 2009), and be subjected to rigorous border inspection (CDFA, http://pi.cdfa.ca.gov/ pqm/manual/htm/305.htm). The penalty for any grower whose lot is found to contain one or more larvae is elimination of the grower from shipping cherries to California for the remainder of the shipping season. Also, packinghouse approval for shipping cherries to California can be revoked if the larval detection threshold is exceeded (i.e., two or more larval detections from a production zone results in elimination of the grower from the program for the whole zone) (CDFA 2012). However, if the flies cannot survive or establish in the cherry-growing areas of California, then the stringent protocol for the shipment of cherries to California from areas where R. indifferens is known to occur can be relaxed. Since R. indifferens has never been reported in commercial sweet cherries in California, information documenting the unlikelihood of this pest being present or establishing in the commercial sweet cherry-growing regions may be beneficial in negotiating trade agreements with countries restricting the importation of this commodity due to concerns over this pest.
The objectives in this study were to: (1) evaluate the capabilities of the correlative model MaxEnt and the process-based mechanistic niche model CLIMEX to assess the risk of R. indifferens establishment in the commercial cherry-growing areas of California; (2) identify environmental factors associated with R. indifferens distribution; (3) test whether including a host species' distribution improves the niche models' accuracy; and (4) test whether models of pest risk establishment can be improved by combining a correlative and a process-based model. To capture the entire environmental niche of R. indifferens, we developed models for western North America (its current distributional range) and extracted the maps for California.

Species occurrence data
Presence records for R. indifferens were collected from published papers, reports, and books (Appendix: Fig. A1). Five occurrence records were obtained from the Global Biodiversity Information Facility (GBIF) website (http://data. gbif.org). Three presence records for Utah were collected from the Utah State University's Cooperative Extension website (http://utahpests.usu. edu/ipm/files/uploads/PPTDocs/04sh-insectswcffcontrol.pdf ). Google Map (http://maps. google.com/) was used to record geographic coordinates of locations (using the ''What's here?'' feature) where exact latitude and longitude data were not provided. We also digitized R. indifferens occurrence data published in Foote et al. (1993: Map 68) using ArcGIS (ESRI 2006). A total of 154 records were obtained covering the entire range of the distribution of R. indifferens that included eight western U.S. states and southern British Columbia, Canada (Fig. 1A). Four duplicate occurrence records were removed and the remaining 150 spatially unique (one record per 5 3 5 km cell) records were used in spatial modeling and other analyses. A total of 862 occurrence records for the major native host, bitter cherry, were acquired from GBIF, Intermountain Region Herbarium Network (http:// intermountainbiota.org/portal/index.php), Calflora (http://www.calflora.org/) and E-Flora of British Columbia (http://www.geog.ubc.ca/ biodiversity/eflora/) (see Appendix: Fig. A1). The number of bitter cherry occurrences was reduced from 862 to 635 after we used 'spatial filtering' strategy to filter bitter cherry occurrence points to make sure that data points were at least 12.6 km apart and to achieve spatial independence (Veloz 2009). Moran's I correlograms were generated to examine spatial autocorrelation in model residuals (1 À predicted probability of presence; De Marco et al. 2008)

Environmental data
Twenty nine environmental variables were considered as potential predictors of R. indifferens distribution (Appendix : Table A1). These variables were chosen based on the fly's biology and ecological requirements, and similar niche modeling studies on other fruit flies and insects (e.g., Li et al. 2009, De Meyer et al. 2010, Sambaraju et al. 2012). These variables included climatic, topographic, and species-specific phenology variables as well as human factors. Nineteen Bioclim variables were obtained from the WorldClim dataset (http://www.worldclim.org; Hijmans et al. 2005). Apart from these 19 variables, we also calculated three phenology variables for R. indifferens based on its lower development threshold and chilling requirement: (1) number of degree days with average temperatures at or above 8.38C (van Kirk and AliNiazee 1981), (2) number of degree days with average temperatures at or below 38C, and (3) at or below 58C (chilling requirement; Frick et al. 1954, van Kirk v www.esajournals.org and AliNiazee 1981, using monthly temperature data layers in ''Raster Calculator'' in ArcGIS (ESRI 2006). We also included direct solar radiation, elevation, and potential evapotranspiration (Appendix: Table A1). The probability of presence of the native host of R. indifferens, bitter cherry, was modeled using MaxEnt and included in the R. indifferens model. For bitter cherry modeling we also considered slope, aspect (converted into northness and eastness gradients), and growing degree days with average temperature .08C (Appendix :  Table A1). Bioclimatic variables were obtained at ;5-km spatial resolution to account for potential spatial inaccuracies during digitization of presence records from the published maps (e.g., Foote et al. 1993). Other layers obtained at 1-km resolution were resampled to a ;5-km resolution to match with the 19 bioclimatic variables.

Modeling methods
Correlative niche modeling: MaxEnt.-A number of correlative niche modeling algorithms are available for modeling potential distribution of a species (Franklin 2009). We used the most commonly employed correlative niche model, MaxEnt (version 3.3.3k; Phillips et al. 2006). MaxEnt is a presence-only method and recent studies on distribution modeling of insect pests and other species in different parts of the world have demonstrated its effectiveness (e.g., Kumar et al. 2009, Li et al. 2009, De Meyer et al. 2010. MaxEnt generates an estimate of the probability of presence (or relative environmental suitability) of a species that varies from 0 (lowest) to 1 (highest). We initially ran models with default settings in MaxEnt, which resulted in highly complex models and nonsensical species' response curves (Appendix: Fig. A3). We also ran MaxEnt with different values of the regularization parameter (L ¼ 1.5 and 2.0) and left other settings at default, but that also resulted in quite a complex model (Appendix : Table A2). Therefore, we changed Maxent's default settings and used only linear, quadratic, and product features to keep the models simple and to avoid overfitting. The 'fade-by-clamping' option was used to prevent extrapolations outside the environmental range of the training data (Owens et al. 2013). The 'jackknife' feature in MaxEnt was employed to evaluate the relative influence of different environmental predictors on R. indifferens distribution. The MaxEnt generated species' 'Response Curves' that show the relationships between predicted probabilities of presence for a species and different environmental predictors were also examined.
All environmental variables were examined for cross-correlations to address potential problems due to multicollinearity (Dormann et al. 2013). Only one variable from each set of the highly correlated predictors (Pearson correlation coefficient, r ! 0.75 or À0.75) was included in the model (Appendix: Table A3). The decision to include or exclude a highly correlated variable was made based on its biological relevance to R. indifferens, ease of interpretation, and its relative predictive power (based on the training gain in the preliminary MaxEnt model). For example, degree days with average temperature !8.38C and annual potential evapotranspiration were highly correlated (r ¼ 0.85, P , 0.0001), so we dropped the latter and included the former. The final MaxEnt model included only seven variables: degree days with average temperature !8.38C, degree days with average temperature 58C, mean diurnal range in temperature, mean temperature of wettest quarter, mean temperature of driest quarter, precipitation of driest quarter, and precipitation of coldest quarter (Appendix : Tables A1 and A2).
Background selection and sampling bias.-The background extent in MaxEnt was defined based on the Biotic-Abiotic-Mobility (BAM) diagram, a framework suggested by Soberon and Peterson (2005); and included regions that have been accessible to R. indifferens since 1900s (Barve et al. 2011, Saupe et al. 2012; Appendix: Fig. A1). To account for potential sampling bias, 10,000 random background data points (or pseudoabsences) were drawn using a kernel density estimator (KDE) surface (see Appendix: Fig. A1). Background data points drawn in this way had the same sampling bias as the occurrence data and both biases are cancelled out in the modeling process. The KDE surface was generated using all the occurrence data points in Arc GIS using ''kernel density'' and ''create spatially balanced points'' tools in Arc Tools. Environmental variables' values were extracted for presence and background points and models were trained v www.esajournals.org using ''samples with data'' (SWD) format in MaxEnt. The trained models were projected (using ''projection'' feature) to the entire range of fly occurrences (western North America; Fig.1B-E). Predicted maps for California were clipped from the full extent predictions (Fig. 2).
Model selection.-We used Akaike's Information Criteria (AIC) and the information-theoretic approach (Burnham and Anderson 2002) to evaluate multiple models and select the ''best'' models for R. indifferens and bitter cherry. We calculated AIC c (AIC corrected for small sample sizes) for different Maxent models using ENM-Tools (Warren et al. 2010). The models were ranked by calculating differences in AIC c values as DAIC ci ¼ (AIC ci À minimum AIC c ). The best model has DAIC ci ¼ 0 and only the models with DAIC ci 2 have substantial support (Burnham and Anderson 2002). The values of DAIC c also helped us test whether inclusion of a particular variable improved the model. For example, we included host plant species data and Ecoclimatic Index generated by CLIMEX in the MaxEnt model and tested whether they improved the model. We evaluated seven different MaxEnt models: MaxEnt_Env (model with only environmental variables); MaxEnt_EnvDef (MaxEnt_Env with default settings), MaxEnt_EnvHost (Max-Ent_Env and host); MaxEnt_EnvHostClimex (MaxEnt_Env, host and CLIMEX outputs), Max-Ent_Climex (model with only CLIMEX outputs), MaxEnt_ClimexHost (MaxEnt_Climex and host), and MaxEnt_ClimexHostTopo (MaxEnt_Climex, host and topographic variables).
Model evaluation and validation.-We used 80% of occurrence data for training (n ¼ 120 for R. indifferens, n ¼ 508 for bitter cherry) the MaxEnt models and withheld the remaining 20% (n ¼ 30 for R. indifferens, n ¼ 127 for bitter cherry) for independent validation of model performance.
Rhagloetis indifferens models were also tested using independently collected fly presence data (n ¼ 51) for California from Dowell and Penrose (2012). We chose the commonly used metric AUC or area under the receiver operating characteristic (ROC) curve (Fielding andBell 1997, Phillips et al. 2006) as one of the measures for evaluating model performance. The AUC is a thresholdindependent measure of a model's ability to discriminate presence from absence (or background). It varies from 0 to 1; an AUC value of 0.5 shows that model predictions are not better than random; values ,0.5 are worse than random; 0.5-0.7 indicates poor performance; 0.7-0.9, reasonable/moderate performance; and .0.9, high performance ). The 10-fold cross-validation procedure in Maxent was used on 80% training data and averaged test AUC values across the 10 replicates were reported. We also used Pearson correlation coefficient between observed presence-random background points and predicted probabilities of presence to evaluate MaxEnt models . Validation AUC and sensitivity (fraction of correctly predicted presences) values were calculated using above independent datasets. Test sensitivity was calculated at a 0% training omission rate (or Lowest Predicted Threshold; LPT), 2% training omission rate, and 5% training omission rate. Zero percent omission rate means 100% of the training presence locations fall inside the suitable areas, and 5% training omission rate means 5% of the training localities fall outside the suitable areas. More details about different threshold selection for presence-only models are discussed by Liu et al. (2013).
Process-based niche modeling: CLIMEX.-We also used a process-based niche model CLIMEX 3.0 (Sutherst et al. 2007) to develop a mechanistic simulation model to estimate the climatic suitability for the establishment of R. indifferens in western North America. CLIMEX does not use species occurrence data but estimates suitable areas based on climatic conditions alone; it assumes that there are no limiting factors other than climate (Sutherst et al. 2007). The ''Compare Locations'' function in CLIMEX was used to develop the simulation model for R. indifferens that calculated an annual index of climatic suitability by combining growth index, stress indices, and stress interaction indices (Sutherst et al. 2007, Kriticos andLeriche 2010).
The recently published CliMond CM10_ 1975H_V1 climatic dataset ; available at http://www.climond.org) interpolated at 10 arc minute (;18 km) resolution was used for CLIMEX modeling. This dataset has longterm monthly climate means centered on 1975 for precipitation, maximum temperature, minimum temperature, and relative humidity at 0900 and 1500 hours. CLIMEX generates an index of climatic suitability for the species called the v www.esajournals.org Ecoclimatic Index (EI) that varies from 0 to 100; where 0 represents locations that are unfavorable for long-term survival of the species, and values close to 100 indicate areas that have optimal conditions for the species growth year round. Following Kriticos et al. (2003), EI values were classified into four arbitrary categories: unsuitable (EI ¼ 0), marginal (EI ¼ 1-5), moderately favorable (EI ¼ 6-25), and highly favorable (EI . 25). Localities with EI . 0 were interpreted as correctly predicted presences for sensitivity calculations.
Fitting CLIMEX parameters.-Values for CLI-MEX model parameters were defined based on published laboratory studies and phenological observations on physiological tolerances of R. indifferens and were iteratively adjusted until the simulated distribution matched the known distribution of R. indifferens in western North America (Table 1).
1. Degree-days per generation (PDD).-The degree-day (DD) accumulation requirement for the completion of one generation of R. indifferens was set to 1800 based on a 58C lower optimum temperature threshold for growth based on the phenological studies conducted by Jones et al. (1991) and Song et al. (2003). Jones et al. (1991) detected first average fly emergence (based on trap captures) at 573 6 19.0 DD (mean 6 SE) in Utah and 592 6 42.1 DD in Washington. Jones et al. (1991) and Song et al. (2003) detected last adult spring emergence between 1700 and 1800 degree-days at various locations in Utah and Washington state.
2. Temperature index.-The lower temperature threshold for growth (DV0) was set at 38C based on van Kirk and AliNiazee (1982), who found that diapause development rate of R. indifferens pupae after exposure to different cold temperatures was optimum at 38C. Lower optimum temperature for growth (DV1) was set at 58C based on studies conducted by Frick et al. (1954) and van Kirk and AliNiazee (1981). The upper optimum temperature for growth (DV2) and the upper temperature threshold for growth (DV3) were set at 258C and 288C, respectively, based on postdiapause developmental rate function published in Stark and AliNiazee (1982).
3. Moisture index.-The soil moisture (SM) index in CLIMEX model is used as a proxy for moisture availability. A hydrological model that uses rainfall and evapotranspiration is used to calculate the weekly soil moisture balance for determining population growth. A value of SM ¼ 0 indicates no soil moisture; SM ¼ 0.5 indicates soil moisture content is 50% of field capacity; SM ¼ 1 indicates that the soil moisture content is 100% of capacity; and SM . 1.0 indicates the possibility of excessive amounts of rainfall and soil moisture (Sutherst et al. 2007). The initial soil moisture parameters for R. indifferens (SM0, SM1, SM2, and SM3; Table 1) were set based on Yee (2013) and the values were iteratively adjusted to fit the known distribution of the fly. Yee (2013) investigated the effects of different levels of soil saturation (0-76%) on R. indifferens emergence and mortality. The fly was tolerant of wide range of soil moisture conditions but his results showed emergence of higher percentage of deformed/ unhealthy flies at lower levels of soil saturation.
4. Cold stress.-The temperature threshold for cold stress (TTCS) was set to À68C based on van Kirk and AliNiazee (1982), who found that the diapause development was slow at 0 and À38C. The cold stress affected R. indifferens distribution in southern parts of British Columbia, Canada. The cold stress accumulation rate (THCS) was set to À0.001 week À1 to fit the fly's distribution in British Columbia.
5. Heat stress.-The heat stress affects the distribution of R. indifferens in the southwestern United States. The temperature threshold for heat stress was set at 288C because the upper threshold is close to 308C (Jones et al. 1991). The heat stress accumulation rate (THHS) was set to 0.009 week À1 to match the fly's current known distribution in the south-western U.S.
6. Dry stress.-The soil moisture threshold for dry stress (SMDS) and dry stress accumulation rate (HDS) constrained the fly's distribution in south-eastern parts of California and southern Arizona and New Mexico. These parameters were iteratively adjusted to match the fly's absence from these areas.
7. Wet stress.-High soil moisture appears to limit the distribution of R. indifferens in the Olympic Peninsula in western Washington, Vancouver Island, and the coastal areas of western Canada. Therefore, wet stress parameters were accordingly adjusted to ensure absence of the fly from these areas.

Predictive performance of different models
All seven MaxEnt models evaluated for R. indifferens risk of establishment in western North America performed better than random with training AUC values greater than 0.50 (Table 2). Average AUC values based on 10-fold cross validation varied from 0.70 to 0.77. Predictions of all models were also significantly and positively correlated with R. indifferens presence-background data ( Table 2). The best model had an AIC c value of 2732.5 (lowest) and included nine predictors comprising seven environmental variables, host species, and Ecoclimatic Index from the CLIMEX model and other factors (MaxEn-t_EnvHostClimex; Table 2). The MaxEnt_Env model with default settings and model with CLIMEX variables alone performed worst and had higher AIC c values (MaxEnt_EnvDef, and MaxEnt_Climex; Table 2).
When tested using independent datasets from western North America and California, the model with only seven environmental variables (MaxEnt_Env) performed the best with highest validation AUC values for both western North America (AUC ¼ 0.800) and California (AUC ¼ 0.931) ( Table 3). This model also had higher sensitivity values across all three omission rates (Table 3) for both test datasets. CLIMEX model predicted 81% of the R. indifferens localities correctly for western North America and 84% for California (Table 3).

Predicted potential risk of R. indifferens establishment
Predictions from the three best MaxEnt models (MaxEnt_Env, MaxEnt_EnvHost, and MaxEn-t_EnvHostClimex) and the CLIMEX model matched closely with the R. indifferens current distribution in western North America (Fig. 1). For example, the MaxEnt and CLIMEX models correctly predicted R. indifferens occurrences in southern parts of British Columbia, Canada, central Washington, western Montana (near Flathead Lake), Colorado, and northern California (Fig. 1). Both CLIMEX and MaxEnt models also predicted potential distribution of R. indifferens in Nevada, central Montana, Nevada and Arizona. However, the fly has never been reported from these areas. The CLIMEX model missed R. indifferens localities in the Yakima  Frick et al. (1954), Stark and AliNiazee (1982), van Kirk and AliNiazee (1982), Jones et al. (1991), Song et al. (2003), and Yee (2013) Threshold expressed as a proportion of soil moisture holding capacity (0 ¼ oven dry, and 1 ¼ field capacity (saturation)). Values .1.0 indicate the possibility of excessive amounts of rainfall and soil moisture.
All models predicted no risk for R. indifferens establishment in the Central Valley around the areas where sweet cherries are produced and in the Mojave Desert and other parts of southern California (Fig. 2). Most of the high to very high risk areas for R. indifferens were predicted in the northern parts of California (Klamath Mountains in the Pacific Coast Range and the Cascade Range) and Sierra Nevada Mountains (Figs. 1 and 2), where native populations of the fly exist in bitter cherry (Mackie 1940, Dowell andPenrose 2012). Some patches of medium to high risk were predicted in Mount San Jacinto and the San Bernardino Mountains (Figs. 1 and 2), which agrees with Frick et al. (1954). The CLIMEX model missed several R. indifferens occurrences in northern California (Fig. 1D, E). The CLIMEX model indicated that high Heat Stress in the Central Valley of California potentially makes it  Notes: Sensitivity is the percentage of correctly predicted presences and varies from 0 to 1.0; a value of 1.0 indicates 100% correctly predicted presences. LPT is lowest predicted threshold; 2% training omission means that 2% of training locations (i.e., R. indifferens presences) fell outside the predicted suitable area. For the CLIMEX only model, sensitivity was 0.87 for 20% test data (0.81 for all data) and 0.84 using the Dowell and Penrose (2012) data. Training omission rates do not apply to CLIMEX model and locations with Ecoclimatic Index (EI) . 0 were interpreted as correctly predicted presences.

Factors influencing R. indifferens distribution
Precipitation of the driest quarter and degree days at average temperature !8.38C were the strongest predictors of R. indifferens presence in the MaxEnt_Env model, with average percent contributions of 38 and 28 percent, respectively (Table 4). Other top predictors of the R. indifferens distribution in this model were degree days at average temperature 58C and mean diurnal range in temperature (Table 4). The average percent contributions of environmental variables changed after considering native host bitter cherry and CLIMEX outputs in MaxEnt models (Table 4). For example, percent contribution of precipitation of the driest quarter and degree days at average temperature !8.38C decreased in MaxEnt_EnvHost and MaxEnt_EnvHostClimex models (Table 4). Degree days at average temperature !8.38C also had the highest 'training gain' (Fig. 3A) and 'AUC values (Fig. 3B) when used in isolation, which means it had the most useful information for predicting the distribution of R. indifferens. Mean temperature of the driest quarter had the lowest training gain and AUC values (Fig. 3) in MaxEnt_Env model but we kept it in the model because it lowered the AIC c values and improved the model (Appendix :  Table A2).
Rhagoletis indifferens response to precipitation of the driest quarter and degree days at average temperature !8.38C was unimodal (Fig. 4A, B). The probability of R. indifferens presence was highest when precipitation of the driest quarter was 75 mm (Fig. 4A) and degree days at average temperature of !8.38C was 1300 (Fig. 4B). The probability of R. indifferens presence was high when the degree days at average temperature of 58C was close to zero and was highest around 600 (Fig. 4C). The probability of R. indifferens presence increased with the increased probability of bitter cherry presence (Fig. 4D).

DISCUSSION
In this study we evaluated the capabilities of the correlative niche model MaxEnt and the process-based mechanistic niche model CLIMEX to assess the risk of R. indifferens establishment in the commercial cherry-growing areas of California. We also tested (1) whether including a host species distribution improved the niche models' accuracy, and (2) whether models of pest risk establishment can be improved by combining a correlative and a process-based model. Our results showed that both correlative and process-based mechanistic niche models did an excellent job in estimating the current known occurrences of R. indifferens in western North America (Fig. 1). The MaxEnt model was more accurate (i.e., had very low omission errors; Table  3) than the CLIMEX model which could be because of the differences in spatial resolutions (;5 km versus ;18 km), and the type and time period of climate datasets (WorldClim versus CliMond) used in these models. We also found that including the distribution of a native host plant species improved the models of pest risk establishment. Models were further improved when the CLIMEX generated Ecoclimatic Index v www.esajournals.org was used as a predictor in the correlative MaxEnt model. Finally, our study showed that R. indifferens is unlikely to establish in the commercial cherry-growing areas in the Central Valley of California, largely because heat stress is too high and chilling requirement in those areas are not met. The major production areas for commercial sweet cherries in California are located in the Central Valley. The sweet cherry season in this region spans from mid-May to around 20 June. Sweet cherry harvest in Oregon and Washington generally begins in mid to late June, with the latest varieties being harvested into early August (Long et al. 2007). Rhagoletis indifferens is essen-tially univoltine (with a very small second generation) and has a near obligatory diapause (Frick et al. 1954, van Kirk and AliNiazee 1982). The significance of this very small second generation (0.3-1.1%) and its potential impact on population composition has not been determined. However, since it takes approximately 3 to 5 weeks for this small segment of the population to emerge (Frick et al. 1954;L. Neven, unpublished data), there is a very low probability that host fruit would be available to support any potential offspring from this second generation (Frick et al. 1954) in California cherry-growing regions. Previous reports indicate that at least 20 weeks of chilling below 58C is necessary to v www.esajournals.org complete the chilling requirement for this species (Frick et al. 1954, van Kirk and AliNiazee 1981) (Table 1), which is not met in the Central Valley of California (Fig. 2F). This can result in either asynchronous spring/summer emergence or no emergence at all from diapausing individuals. More research, similar to that of Johnson et al. (2011) on olive fruit fly (Bactrocera oleae (Rossi )), is needed to determine the effects of heat stress (or high temperatures) on R. indifferens.
Our study identified precipitation of the driest quarter, degree days at average temperature !8.38C, degree days at average temperature 58C, and mean diurnal range in temperatures as the strongest predictors of R. indifferens presence in western North America. In the Pacific Northwest, R. indifferens is native to the coastal forest and ponderosa pine ecosystems, where its ancestral host bitter cherry is commonly found Goughnour 2008, Yee et al. 2011). Establishment of the fly in the desert sagebrush ecosystem of central Washington and Oregon was possible only because of irrigation and planting of monocultures of sweet cherries in this habitat. Bitter cherry is not found in sagebrush habitat but along its margins (Yee 2008). In contrast to the situation in the Pacific Northwest, in California, irrigation and planting of monocultures apparently are insufficient for establishment of R. indifferens due to lack of sufficient low temperatures.

Caveats and uncertainties
Results from this study should be interpreted with caution, keeping in mind the uncertainties associated with different niche models. Different v www.esajournals.org niche modeling algorithms have different assumptions and limitations and may produce slightly different predictions , Kumar et al. 2009, Taylor and Kumar 2012. For example, they may be affected by sampling bias (Syfert et al. 2013), sample size (Wisz et al. 2008), multicollinearity (Dormann et al. 2013), and spatial autocorrelation (Veloz 2009) and do not include biotic interactions automatically. Performance of these models may also depend on species characteristics, spatial resolution and extent of the study area, and choice of predictor variables (Guisan et al. 2007a, b). The models may also be affected by the way background data points are selected (Phillips 2008, Phillips et al. 2009, VanDerWal et al. 2009). We addressed some of these by (1) using more than one niche model, (2) reducing the number of variables by assessing cross-correlations, (3) examining spatial autocorrelation and sampling bias before modeling, (4) including the fly's native host plant distribution, (5) including species-specific phenology variables, and (6) drawing background points using a kernel density estimator (KDE) surface.
CLIMEX is a semi-quantitative model (Sutherst et al. 2007) which calculates a Growth Index (GI) based on user defined parameters. The model parameters, if not known, are inferred by fitting the simulated distribution to known geographical distribution of the species. A user starts with a template (e.g., temperate or tropical) depending on the species of interest and then estimates the parameters. The majority of the parameters for R. indifferens (Table 1) were based on published laboratory studies and phenological observations but some were iteratively estimated which may have uncertainties associated with them (Taylor and Kumar 2012). This uncertainty is not specific to CLIMEX model as correlative niche models are also subject to parameter uncertainty (Barry and Elith 2006). The CLIMEX model is considered a process-based mechanistic niche model (Kriticos and Leriche 2010); however, it can also be a correlative model if species-specific phenological thresholds are not used in defining model parameters, but instead parameters are estimated using species' known occurrences (Elith 2014). In this study, CLIMEX was more process-based than MaxEnt, but not entirely process-based and had some correlative component. None

Conclusions, recommendations and management implications
This study shows the usefulness of correlative and process-based mechanistic niche models in assessing the risk of pest establishment. The approach presented here can be used to assess pest risk establishment at regional or global levels; for example, R. indifferens models developed in this study were projected to eight tropical countries that are current or potential fresh sweet cherry markets (Kumar et al. 2014). We showed how the predictive power of correlative niche models can be improved by including outputs from the process-based mechanistic niche models. We also showed that correlative niche models such as MaxEnt, if used carefully, can provide excellent estimates of likelihood of pest risk establishment. While estimating the risk of pest establishment using MaxEnt (or other correlative niche models), we suggest that (1) default settings in MaxEnt not be used (especially with small sample sizes) because they often result in overfitting and generate poorer estimates (Merow et al. 2013; Table 2; Appendix: Table A2); (2) sampling bias in occurrence data always be accounted for (Syfert et al. 2013); (3) background points be drawn from the areas that have been accessible to the species over a given period of time (Barve et al. 2011, Saupe et al. 2012; Appendix: Fig. A1); (4) species' response curves be critically examined (Appendix: Fig. A3); (5) wherever possible, species' interactions be considered by including host species; and (6) more than one niche model be tested.
Although the quarantine procedures for sweet cherry shipments from the Pacific Northwest to v www.esajournals.org California were originally established 30 years ago to prevent the accidental introduction of this pest into commercial California cherry-producing areas, our model indicates that these precautions are unnecessary due to the unsuitability of this region for R. indifferens. Given the presence of R. indifferens in northern California, one would expect that if the climatic conditions are favorable for R. indifferens to establish in commercial sweet cherries, as is the case for Oregon and Washington, then establishment would have already occurred. Our model indicates that this is not very likely and that the rigid protocol concerning the shipment to California of cherries from areas where R. indifferens is known to occur might be reconsidered. Dowell and Penrose (2012) monitored for the presence of R. indifferens in commercial sweet cherry orchards in the Central Valley. They found no R. indifferens in the commercial sweet cherry growing region, and attributed the lack of presence to adhering to a strict, 30 year, quarantine procedure along with the asynchrony of fruit availability when comparing host use on sweet cherry and bitter cherry without consideration of the effects of climate on the potential distribution of this pest. Since R. indifferens was first reported in commercial cherries in the early 1900s, it is more than probable that infested cherries were shipped to the Central Valley of California from multiple sources prior to the establishment of the quarantine barrier but the fly was unable to establish due to adverse climatic conditions.
The model generated in this study can also benefit sweet cherry growers exporting to areas climatically similar or warmer. It demonstrates that R. indifferens cannot or is highly unlikely to survive in those areas. Thus, where trade restrictions imposed on U.S.-produced sweet cherries exist due to concerns over R. indifferens, an argument can be made that restrictions should be eased. California exports of cherries to countries with concerns about R. indifferens also should not be affected, even if access to those countries is based on maintenance of a fly free area through the currently strict exterior quarantines. This research provides supporting documentation that relaxation of the quarantine will not represent a measurable increase in risk. Notes: Abbreviations are: bio, the 'BIOCLIM' variables from the WorldClim dataset (see Hijmans et al. [2005] for more details); temp, temperature; SD, standard deviation; max, maximum; min, minimum; WCFF, western cherry fruit fly (Rhagoletis indifferens). Data sources (appearing as superscripts in column 1) are: 1, Bioclim variables and elevation from WolrdClim dataset (http://www.worldclim.org/); 2, generated using Arc GIS; 3, potential evapotranspiration: Trabucco and Zomer (2009); 4, generated using CLIMEX model; 5, generated using MaxEnt model. v www.esajournals.org Notes: The correlations were significant at alpha ¼ 0.05 unless otherwise stated. Significance levels were adjusted using Bonferroni correction for multiple comparisons. The abbreviation ''temp'' is temperature.  A1. Distribution of (A) Rhagoletis indifferens and (B) bitter cherry occurrence data in the study area. Background points for (C) R. indifferens and (D) bitter cherry were selected using a kernel density estimator (KDE) surface. Presence records for R. indifferens were collected from Banham (1971Banham ( , 1973, Zwick et al. (1977), AliNiazee (1978), Kroening et al. (1989), Messina (1990), Jones et al. (1991), Yee (2005Yee ( , 2006Yee ( , 2008, Yee and Alston (2006), Yee and Goughnour (2008), Maxwell et al. (2009), Senger et al. (2009), Yee et al. (2010, and Dowell and Penrose (2012). Open diamonds represent nonsignificance, and closed diamonds indicate significance (two-tailed test; P ! 0.05) for positive spatial autocorrelation adjusted using progressive Bonferroni correction for multiple comparisons.  Tables 2 and 3).