The Catalan butterfly monitoring scheme has the capacity to detect effects of modifying agricultural practices

Impacts of agricultural management practices on the receiving environment are seldom suitably assessed because environmental monitoring is costly. In this regard, data generated by already existing environmental survey networks (ESNs) may have sufficient capacity to detect effects. Here, we study the capacity of the Catalan butterfly monitoring scheme (CBMS) to detect differences in butterfly abundance due to changes in agricultural practices. As a model, we compared butterfly abundance across two landscape types according to agricultural intensification. A 2 km diameter buffer area was centered on the CBMS transect, the control group were transects located in areas where intensive agriculture represented <20% of the area; a treated group was simulated by selecting transects located in areas where intensive agriculture occupied an area over 40%. The Welch t-test (a = 0.05 and 80% power) was used to compare butterfly abundance per section across landscape types. The capacity of the t-test to detect changes in mean butterfly abundance, of 12 butterfly indicators relevant to farmland, was calculated annually and for 5-, 10-, and 15-yr periods. Detection capacity of the t-test depended mainly on butterfly data sample size and variability; difference in butterfly abundance was less important. The t-test would be capable of detecting acceptably small population changes across years and sites. For instance, considering a 15-yr period, it would be possible to detect a change in abundance below 10% of the multispecies indicators (all butterfly species, open habitat species, mobile species, and grassland indicators) and two single species (Lasiommata megera and Lycaena phlaeas). When comparisons were carried out within each year, the t-test would only be capable of detecting a change below 30% for all butterfly species, mobile species, and L. megera. However, detection capacity rapidly improved with the addition of further years, and with 5 yr of monitoring, all indicators but Thymelicus acteon had a detection capacity below 30%. We therefore conclude that, from a statistical point of view, the CBMS data “as is” are sensitive enough for monitoring effects of changes in agricultural practices. It could be used, for instance, for the general surveillance of genetically modified


INTRODUCTION
Changes in agricultural management practices can affect the capacity of the receiving environment to deliver ecosystem services that are essential to maintain the productivity of agricultural land (i.e., Tscharntke et al. 2005). Therefore, in order to manage our resources appropriately (Ripple et al. 2017), it is essential to monitor indicators capable of providing reliable information on the state of the environment before and after changes in agricultural practices (Elzinga et al. 2001). However, environmental monitoring is often too costly to implement because a high number of replications in space and time are needed for reliably detecting changes of the indicators (Field et al. 2007).
For this reason, there is considerable interest in using data collected by already existing environmental survey networks (ESNs) for environmental impact assessment. The use of ESNs has received much attention (Morecroft et al. 2009, Geijzendorffer andRoche 2013), for instance, recently for monitoring the impact of GMOs on natural communities (Lang andB€ uhler 2012, EFSA 2014). However, their applicability for monitoring is still uncertain (i.e., Smets et al. 2014), particularly regarding the suitability of ESN data for statistical analysis of effect detection capacity.
In order to use ESNs' data for detecting impacts, measurable effects on the receiving environment must be detected (Field et al. 2007). After setting the degree of change (effect size) considered sufficient to trigger a management response, the most fundamental requirement is that the data should be capable of detecting the change if it actually occurs, that is, that it will yield adequate statistical power. For certain impacts of agricultural practices like GMOs on nontarget organisms, capacity to detect population changes of 25-50% has been considered acceptable in field trials (Duan et al. 2006, Lopez et al. 2005, Perry et al. 2003. Unfortunately, most ESNs lack sufficient statistical power to prevent false-negative conclusions when temporal (e.g., before and after implementation of novel crop management practices) and spatial (areas where the measure has been introduced compared to areas where it has not) average differences are compared (Hails et al. 2012).
Here, we carry out a case study using data from the Catalan butterfly monitoring scheme (henceforth CBMS), a large-scale network based on transect counts, to determine its capacity to detect changes in mean abundance of butterfly populations due to changes in agricultural practices. This network is particularly relevant because it uses a standardized monitoring protocol for collection of butterfly data (Van Swaay et al. 2008); butterflies are widely recognized ecological indicators, capable of reflecting environmental impacts of human activity in terrestrial ecosystems (Thomas 2005); and the CBMS is located in a region where high biodiversity and agricultural intensification converge.
The main aim of this work was to determine to what extent the CBMS sampling protocol can be used as is for detecting a potential change, from a statistical point of view. In particular, we calculated the capacity of Welch's t-test (Welch 1947) to detect eventual changes in butterfly abundance related to agricultural management.

MATERIALS AND METHODS
Butterfly data were provided by the CBMS (www.cbms.org), a network monitoring butterfly populations in Catalonia (NE Iberian Peninsula) since 1994. Butterfly data were standardized, and a two means unequal variance t-test was used to compare abundance of four multispecies indicators and eight species, across two broad landscape types. The sensitivity of the t-test to detect changes in abundance between samples was calculated for two types of analysis, annual and multiannual (5, 10, and 15 yr of monitoring data).

Butterfly dataset
Butterfly data were provided by the CBMS (CBMS 2016), which currently has over 150 recording transects located throughout Catalonia, Andorra, and the Balearic Islands. The CBMS uses a standardized methodology for data collection (Pollard and Yates 1993), common to most European butterfly monitoring schemes (Schmucki et al. 2016). In short, trained observers count the number of butterflies observed within a 5 9 5 m virtual area along a line transect of approximately 1.5 km, which is divided into sections of variable length according to the surrounding habitat type. Counts take place weekly from March to September, only in good weather conditions (about 30 counts per year). Transect counts yield species-specific relative abundance indices that are assumed to reflect year-to-year population changes over the entire study area (Pollard and Yates 1993).
For this study, we selected a subset of the CBMS transects (Fig. 1), which were separated into two broad classes, agricultural (Ag) and non-agricultural (Appendix S1) (non-Ag), according to land cover of surrounding landscape, to test for an expected effect of land cover on butterfly communities. We would expect changes in agricultural practices to have a reduced impact on butterfly populations in non-Ag landscapes. Transects located in urban (more of 20% of the area covered by buildings and associated infrastructure) and montane (above 800 m) areas were excluded. Transects were also required to have been operative for at least 10 yr (data were analyzed from 1999 to 2013). This resulted in 11 Ag transects located in agricultural landscapes (i.e., where the area of arable crops and orchards was above 40% of a total area determined by a 2-km circle centered on the transect) and 18 non-Ag transects that were located in areas where arable crops and orchards accounted for <20%; non-Ag landscapes were dominated by grassland, scrub, or forest and were often within protected areas. Land-use Fig. 1. Location of the 29 Catalan butterfly monitoring scheme (CBMS) transects analyzed, grouped according to the intensity of agriculture in the surrounding landscape into agricultural (Ag) transects and non-agricultural (non-Ag) transects. cover was mapped and calculated using ARCGIS version 9.3 (ESRI 2008) on the basis of georeferenced aerial photographs (ICGC 2013). For each butterfly indicator, only sections with nonzero values were used for the analysis (i.e., Lang 2004) as the focus of this study was on abundance data, rather than presence-absence of species (Elzinga et al. 2001).
The dataset analyzed comprised 135 butterfly species, many of which were present in only a few transects or years. This resulted in frequent gaps in the data making it impossible to compare populations of specific species across diverse geographical areas. To address this problem, four multispecies indicators were generated by aggregating species according to ecological traits relevant for monitoring agricultural impacts (see Supporting Information in Appendix S2 for multispecies indicator composition). Multispecies indicators "all species" included all butterfly counts (135 species); "open habitat species" comprised 41 butterfly species associated with open habitats (Herrando et al. 2016) after excluding 12 species with a documented strong migratory behavior, whose abundance depends heavily on the conditions at their place of origin (Stefanescu et al. 2011b); "mobile species" included 23 species with a high dispersal ability (Stefanescu et al. 2011a); and finally, "grassland indicators" aggregated 16 species from the European Grassland Indicator, developed by the European Environment Agency (EEA 2013). However, although monitoring multispecies indicators can produce very good results from a statistical point of view, the interpretation of results is more straightforward when monitoring single species. For this reason, single species potentially suitable for monitoring impacts in agricultural land were selected from the 15-yr dataset. Candidate species were required to have a relative detection capacity of Welch's t-test below 30% (see following sections), migrants were excluded, and we selected eight single species that were widespread and common in farmland in the study area (Lee and Albajes 2013); butterfly nomenclature followed Van Swaay et al. (2010).

Data analysis
Before the analysis, butterfly abundance was standardized to density (individuals/km) by dividing the sections' butterfly abundance by section length and multiplying by 1000. Data were not transformed because for large datasets, the calculated means and their deviations approximate to the normal distribution (Hazewinkel 2002). After standardization, mean butterfly abundance per transect section (counts from March to September) was calculated for each year (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013). Only nonzero transect counts were used for calculations because this study focuses on the detection of change by comparing abundance only across sections where the species is present. The drawback of this method is that local extinction of species could be overlooked; nevertheless, the detection of a population decline can be used as an early warning. Standardized mean annual abundance of each butterfly indicator per section was the basic unit used in both annual and multiannual analyses.
Contrast statistic.-To simulate a situation in which differences would exist between Ag (potentially disturbed by modified agricultural practices) and non-Ag landscapes (less likely exposed to disturbances by modified agricultural practices), we tested the hypothesis that there were no differences between means of sections in the two landscape types. For each butterfly indicator, the mean abundance per section in each landscape type was compared within each year (annual analysis) and also aggregating means from the 5-, 10-, and the entire 15-yr period (multiannual analysis). We used a two-sided t-test without assuming equal variances (Welch 1947), a robust technique for large datasets (Fagerland 2012), which is commonly used for field testing effects of agricultural practices (i.e., Feber et al. 2007, Aviron et al. 2009, Lang and B€ uhler 2012. The contrast statistic (t) was calculated by where m Ag , s Ag, and n Ag ; and m non-Ag , s non-Ag , and n non-Ag are the mean, standard deviation, and sample size for samples in Ag and non-Ag landscapes, respectively. The denominator is the standard error (SE) of the statistic m Agm non-Ag . The significance of the test is expressed by the P-❖ www.esajournals.org value (a = 0.05). Correlations between longitudinal series (across years) and horizontal sampling data (between sections within a transect) were checked for lack of temporal and spatial autocorrelation.
Detection capacity of the t-test.-Following Albajes et al. (2013), the detection capacity of the t-test was computed by establishing ex ante the probability of false positives (i.e., the probability of the test producing a significant result when there are no differences between the means of the two populations, symbolized by a) and false negatives (i.e., the probability of the test not producing a significant result, when population means are not equal, symbolized by b).
The detection capacity (D) of Welch's t-test expressed in absolute terms was computed as follows: where variables were defined as in Eq. 1; a was set at 0.05 and b at 0.2, values considered acceptable in field tests (Perry et al. 2003), but that can be modified if required (Di Stefano 2003, Field et al. 2007). According to this procedure, the detection capacity is the size of the population change (effect size) of a given species or group that could be detected given its abundance, variability, and sample size. This expression may also be used to calculate the relative detection capacity of the test in relation to the mean abundance in the control (non-Ag landscapes), D N . Further details regarding this procedure can be found in Comas et al. (2013). All calculations and statistical analysis were done using the R software (R Core Team 2016); ttests were carried out with R Stats Package version 3.3.1 and detection capacity calculations based on Package pwr version 1.2-1.

RESULTS
The mean length of the 29 transects selected was 1692 AE 132 m (mean AE SD); each transect was divided into 5-16 sections (mean length 198 AE 63 m). The raw dataset consisted of 262,044 butterfly section counts, 102,210 from the 11 Ag transects, and 159,834 from the 18 non-Ag transects. This is the result of considering, for each of the 135 species recorded, the number of transect sections with nonzero values in each sampling date (~30 dates per year), and the number of years (15). In order to compare butterfly abundance across the two landscape types, the annual mean was calculated for each section.
In the 15-yr dataset, 52 species of the 135 had a relative detection capacity (D N ) below 30%; their frequency in section counts was above 34.6% (mean sample size of 504 sections), and SE values were generally below 1.01. From these candidates, eight indicator species widespread in farmland were selected for the annual and multiannual analyses.
Mean annual sample sizes of the multispecies indicators, all species (135 species), open habitat species (41 species), grassland indicators (16 species), or mobile species (23 species), were roughly similar in both landscape types (Table 1). When the eight single species were analyzed, sample sizes were considerably lower (see Table 1 for mean annual values, the detailed year-by-year analysis is shown in Appendix S3).
When section counts were compared across the 5-, 10-, or the entire 15-yr period (multiannual analysis), sample sizes of the butterfly indicators increased greatly (Table 2) in comparison with the annual analysis (Table 1).

Factors that affected the outcome of the t-test
Once the significance and the power of the test have been set, the t-test is influenced by the difference between mean abundance of the two samples and the standard error of the test. The SE of Welch's t-statistic is the square root of the sum of the ratios between the variance and the size of each sample (Eq. 1). Consequently, the larger the sample size, the smaller the SE of the test. As expected, as the average size of two samples increased, the SE of the contrast statistic decreased exponentially (R 2 = 0.60): However, for similar sample sizes, the range of the test's SE values was quite wide, especially when considering the annual analysis (see Table 1; Appendix S3). In multispecies data, SE ranged between 1.01 and 1.83, whereas average ❖ www.esajournals.org sample sizes varied in a much narrower range, 100-103, and the difference between mean abundance from 1.70 to 4.36 butterflies per km. Likewise, regarding single species, the highest SE values did not strictly correspond to the lowest sample sizes (Table 1).
In the multiannual analysis (

Detection capacity of the t-test
We calculated the detection capacities of Welch's t-test (a = 0.05 and b = 0.2), expressed relative to the average abundance in non-Ag landscapes (D N ), of the four multispecies indicators and the eight single species considered. The t-test was carried out firstly comparing annual abundance in non-Ag and Ag landscapes year by year (Table 1 and Fig. 2A-C, detailed results in Appendix S3), and then, means were compared across an increasing timespan of 5, 10, and 15 yr ( Table 2 and Fig. 2D-F).
When comparisons were conducted within each year, differences between the two landscape types were detected only in some cases (Table 1). For the four multispecies indicators, mean annual D N values were 24.1%, 30.2%, 26.0%, and 30.2% for all species, open habitat species, mobile species, and grassland indicators, respectively; D N fluctuated always below 31% ( Fig. 2A) being lowest for most years for all species and then for mobile species; and the worst D N values were those of open habitat species and grassland indicators. Regarding the eight single species, average D N values (Fig. 2B, C) ranged from 26.5% (Lasiommata megera) to 71.6% (Thymelicus acteon); only four species showed D N values under 50% (Fig. 2B) in most years, whereas the remaining four species showed very poor D N values, consistently above 50% (Fig. 2C).
When the mean comparison was carried out across an increasing time period of 5, 10, and 15 yr (Table 2), the t-test would allow to detect increasingly small differences in population abundances between the two landscape types (see Fig. 2D-F). The effect of increasing the sample size through adding further years was very pronounced in the first 5 yr and then progressively leveled off toward the end of the 15-yr period.
There were differences in abundance of all the multispecies indicators and single species between landscape types, with few exceptions (Table 2). With 5 yr of monitoring data, D N values were halved for most butterfly indicators regarding the mean values of D N obtained in the annual analysis (Fig. 2D-F) indicating a better detection capacity. When means were compared using the entire 15-yr dataset, D N dropped from a mean annual detection capacity of around 30% to a D N below a 10% population change, 6.3% for all species, 8.1% for open habitat species, 6.8% for mobile species, and 7.8% for grassland indicators ( Fig 2D). Regarding single species, the capacity of the t-test to detect changes in abundance between the two landscape types also Notes: The analysis was repeated for an increasing timescale of 5 yr (1999)(2000)(2001)(2002)(2003), 10 yr (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008), and 15 yr (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013). The mean sample size (n) was the mean number of sections in each landscape type, the difference in abundance (Dif.) is butterflies/km section of Ag compared to non-Ag sections, the SE of the t-test was calculated according to Eq. 1, and significant differences (Sig.) in Ag compared to the control (non-Ag) are shown by asterisks: *P < 0.05, **P < 0.01, ***P < 0.001. 2) to detect differences in butterfly abundance across two landscape use types (transects located in intensive agricultural areas compared to transects located in non-agricultural areas). When compared within any single year (A, B, C), the t-test would be capable of detecting population changes below 30% of the test population (non-agricultural areas) only when butterflies are aggregated in multispecies indicators (A); when abundance of single species was compared in the two transect types within single years, D N was only below 30% for Lasiommata megera (B); for improved rapidly with the addition of further years; after five years of monitoring, it would be possible to detect a 30% change in abundance of all species except for T. acteon; after 15 yr, D N values of single species (see Fig. 2E, F) were, on average, 13.7%, but there were considerable differences between species, ranging from 6.9% (L. megera) to 20.1% (T. acteon).

Factors influencing detection capacity
The magnitude of the test's SE greatly influenced both the eventual significance of the t-test and its ability to detect differences between population means (Eq. 2). The smaller the value of SE, the higher is the value of the contrast statistic, and therefore, the t-test is more likely to be significant. Consequently, when the value of SE is small, the test is capable of detecting very small differences between populations (low values of D), and this translates into a high detection capacity.
There was a fairly strong direct linear relationship (R 2 = 0.75) between the relative detection capacity (D N ) of the t-test and the SE of the contrast statistic, described by In addition, the strong dependence (R 2 = 0.80) of detection capacity (D N ) on sample size (n) can be represented by an inverse potential relationship between the two variables: For instance, the species with the worst D N values, Aricia cramera, Carcharodus alceae, and T. acteon (Figs. 2C, F) also had the highest SE values (see Tables 1, 2; Appendix S3). However, there were some exceptions; for instance, considering the annual analysis, Polyommatus icarus had a relatively high variability but its high sample size (mean of 145 sections) and large difference between mean abundance of samples (2.76 individuals per km) resulted in an acceptable D N ; Pieris napi, conversely, despite a lower SE (average of 1.95 butterflies per km), its low sample size (mean of 68 sections), and smaller difference in mean abundance (3.00 individuals per km) resulted in a poor D N (average of 58.3%).

DISCUSSION
The results of this study indicate that data generated by the CBMS, used "as is," have a very high sensitivity to detect impacts of modified agricultural practices on butterfly populations, provided that a suitable indicator is chosen.

Detection capacity of the CBMS data
This study shows that a t-test, carried out on data from a well-established butterfly monitoring scheme, would be capable of detecting an acceptably small change in butterfly abundance between two transect types, here exemplified by transects in intensive agricultural landscapes (Ag) compared to areas with a lower agricultural activity (non-Ag). When the butterfly abundance in Ag transects was compared to the non-Ag transects, using data from the entire 15-yr period, the relative detection capacity (D N ) was below 30% for 52 of the 135 species in the dataset. The species that had a good detection capacity were generally those most frequent across the landscape (translated into a large sample size) and with a relatively low variability. Considering the 12 selected butterfly indicators, D N was below 25% of the population abundance. This is a very good relative detection capacity, considering that population changes of 25-50% are considered most widespread species, D N was below 50% (represented by the red arrow), but as sample size and abundance decreased and variability increased, the t-test would be unable to detect population changes below 50% of the test population (C). As data from an increasing number of years were used (5, 10, or 15 yr) compared to a single year, D N improved (a smaller population change could be detected). For the multispecies groups, in most cases two years would be sufficient to detect a 30% population change (D N ). Regarding single species, the number of years necessary for the t-test to detect a 30% population difference in abundance varied; for the seven widespread species, 5 yr would already allow to detect a 30% population change (E and F); however, in the case of the less abundant and less widespread Thymelicus acteon, at least 6 or 7 yr would be necessary for the t-test to have the capacity to detect a change below 30% (F). Note the different scales of the y-axis in the figures.
( Fig. 2. Continued) ❖ www.esajournals.org acceptable in field trials, for instance, to assess risks of GM crops on nontarget organisms (Duan et al. 2006, Lopez et al. 2005, Perry et al. 2003. Detection capacity was very good because CBMS samples were very large (>300 sections) and the standard errors relatively moderate (<0.95 individuals per km).
In contrast, when comparing mean butterfly abundance across the two transect types within the same year, instead of aggregating years, the values of D N were generally much poorer (rarely below 30%) because samples were relatively small (<30 sections) and the standard errors were relatively large (>1.9 individuals per km). For instance, four of the eight single species selected for the study (A. cramera, C. alceae, P. napi, and T. acteon) may not be suitable for monitoring because the t-test would only be capable of detecting changes if the annual population decrease was over 50%. Conversely, it would be possible to detect a population change below 30% of the control population of the multispecies indicators all species and mobile species; and the single species L. megera, even using data from a single year. Detection capacity increased rapidly as further years were added due to a greater sample size. After 5 yr, D N was below 30% for all indicators tested except T. acteon. Similarly, in a recent study in which butterflies were sampled to determine the effort needed to detect a 30% reduction in abundance due to GM maize cultivation, it was found that recording 9-25 transects during 3 yr would have sufficient statistical power (Lang et al. 2019).
In this study, detection capacity depended heavily on sample size. This was also reported by Lang and B€ uhler (2012) in two Swiss butterfly monitoring schemes when mean annual abundance of multispecies indicators or single species was analyzed; these authors calculated the sample size necessary for detecting changes in abundance of butterfly populations when pooled or single species' data were used for calculations. The values describing the relationship between sample size and the detection capacity were not very different to those found in the present work except for single species in which the detection capacity was more variable than in our analysis. As Lang and B€ uhler (2012) only disposed of two datasets for some sites, they were unable to test the detection capacity using longer time-series. Worse results were obtained by Aviron et al. (2009) when using a dataset from a monitoring project on ecological compensation areas and biodiversity in Switzerland; this was mainly due to the low number of years (and therefore sample size) used for records. Aviron et al. (2009) consequently concluded that case-specific monitoring would not be appropriate for detecting possible effects of cultivation of GM Bt crops on butterflies because in order to detect an effect around 30%, over 100 pairs of fields would need to be sampled.
Regarding the butterfly indicators tested, the multispecies indicators, as expected (e.g., Brereton et al. 2011, Lang andB€ uhler 2012), performed much better than the single species but the drawback of using multispecies indicators is that results are difficult to interpret and effects may go unnoticed, that is, decreases of single species can often be masked by increases of others species. This is exemplified here by the fact that there were barely any differences in abundance of mobile species when tested annually despite significant differences for most of the single species. The most interesting species for monitoring agricultural impacts are those that are frequent across the landscape and have the lowest standard deviations, that is, those with less clumped distributions. Among these, there was L. megera, which is a common widespread butterfly in grassland (EEA 2013) whose populations are declining across northwestern Europe, possibly due to climate change (Van Dyck et al. 2015). Another interesting indicator species in the study area was Pieris napi, which, despite a poor detection capacity, was the only species more abundant in farmland. This species does not usually feed on crop plants (Garc ıa- Barros et al. 2013). In the arid Mediterranean climate, its presence in agricultural areas has been explained by the availability of humid environments that help to buffer the effects of extreme temperatures and droughts (Carnicer et al. 2019), which are increasingly aggravated by global climate change. Finally, T. acteon, with its status as a Red List Species (Van Swaay et al. 2010), had a rather poor relative detection capacity. This is often the case for endangered organisms, although the need to protect them from harm is very high, they are not frequent enough to allow for suitable statistical analysis.
❖ www.esajournals.org A case study: the CBMS for general surveillance of GM crops There are many practical cases in which it is necessary to obtain reliable information on the effects of changes in agricultural practice on the receiving environment. For instance, in the case of genetically modified (GM) crops in the EU, post-market environmental monitoring (PMEM) is compulsory (Directive 2001/18/EC), and EU regulations require the implementation of general surveillance (GS) plans for long-term monitoring (EFSA 2010). Much effort has been devoted in Europe to generate data for PMEM of GMOs, for butterflies (i.e., Lang et al. 2019), and other taxa, particularly in the UK (Clark et al. 2006) and Spain (De la Poza et al. 2005, Comas et al. 2014. In Spain, GM corn has been grown in thousands of hectares in the last 15 yr (ISAAA 2016), but GS has yet to be fully implemented. In order to reduce costs and increase monitoring practicability, regulation authorities recommend companies to use ESNs (EFSA 2011) rather than implementing field studies. There are contrasting opinions on the utility of these networks for GS of GM crops; whereas Smets et al. (2014) consider that these networks would only provide information on the baseline variation of indicators, other studies such as Lang and B€ uhler (2012) and the present study indicate that the data obtained by butterfly monitoring schemes are sensitive enough to detect changes in populations of the indicator organisms.
Whereas field studies for environmental risk assessment (ERA) of GM crops (or plant protection products) are designed specifically to detect effects, ESNs aim to obtain more general information on population dynamics. This results in differences in the factors influencing the capacity of the t-test to detect population change. For instance, when authors analyzed data from over 12 yr of field trials for ERA of GM corn, using several arthropod taxa to detect eventual changes in nontarget arthropod abundance due to cultivation of Bt corn; taxon abundance was the most influential factor determining the relative detection capacity (D N ) of Welch's t-test . The D N of a taxon improved as its abundance increased because its relative variability decreased. Since the number of replicates in the experimental trials was fairly constant (3-4 blocks), the SE of the contrast statistic was determined mainly by the variability of the sample, not by sample size as in this study.
This study shows that data from the CBMS could be used to monitor post-market effects of GM crops on butterflies, given its high capacity to detect possible effects. This is a relevant point in GS of GM crops because environmental impacts of GM crops in the field, if any, would appear to be of low magnitude (Naranjo 2009, Pleasants and Oberhauser 2012, Comas et al. 2014, and therefore, a high detection capacity of statistical tests is needed. Nevertheless, a drawback of the CBMS, compared to field trials, is that it mainly samples seminatural habitats so there may be years when few transects are located in the vicinity of GM, at the beginning of Bt corn deployment, and non-GM cornfields, when Bt corn has been successfully established in the area. This would result in a reduction of detection capacity. To address this issue, EFSA (2014) recommended the possible increase of transect number, which would also increase costs (Schmeller and Henle 2008). Nevertheless, this problem should be mitigated in future because butterfly monitoring schemes are attempting to increase number and spatial uniformity of monitoring sites in agricultural landscapes, where butterfly populations may suffer more pressures from agricultural practices (Brereton et al. 2010). Additionally, the increase in the number of sites could also be partially compensated by a lower sampling frequency (Brereton et al. 2010). Notwithstanding the pitfalls of ESN data, it is often not possible to use field trials due to the costs involved, so data from these monitoring schemes represent the only practicable option for environmental monitoring.

Conclusion: Is the CBMS suitable for monitoring?
With probability levels fixed at a = 0.05 and b = 0.2, a Welch's t-test on CBMS data had the capacity to detect changes in abundance between 6% and 20% of the selected butterfly indicators when samples were compared across a 15-yr period. Detection capacity was found to depend mainly on sample size; thus, species that were more frequent across the landscape tended to have higher detection capacities. When mean butterfly abundances between transects were compared within the same year, the sensitivity of the t-test was much lower but it would still be possible to detect population changes between 24% and 50% in eight out of the 12 butterfly indicators tested. Detection capacity rapidly improved with the addition of further years as this greatly increased sample size, and after 5 yr of monitoring, it would be possible to detect differences in abundance below a 30% threshold. In conclusion, this study shows that, despite some pitfalls, data from existing environmental survey networks do have the potential to be used for environmental monitoring of agrienvironmental measures to inform shareholders and policymakers. In the specific case of GM crops, the CBMS could be used to monitor post-market effects on butterflies, given its high capacity to detect possible effects. However, the specific testing approach would have to be adapted to the nature of the expected agricultural impacts to be monitored and to the particular characteristics and limitations of the monitoring scheme data.