col-Quantifying the contribution of citizen science to broad- scale ecological databases

containing information collected by of sources, and citizen- science programs. citizen- science to these is not well known. analyzed one such database to quantify the contribution of citizen science to lake water- quality data from seven US states. Citizen- science programs only provided over half of the for commonly sampled water- quality measures (water clarity, nutrients, and algal biomass) from the past 31 years, but also contributed to the majority of long- term monitoring (>15 years) for selected measures in lakes. While previous studies have demonstrated the usefulness of citizen science for research, management, policy, and public engagement, our study demonstrates that citizen science can also make valuable contributions to populating broad- scale ecological databases. Strengthening partnerships between citizen-science programs and monitoring agencies can help maintain and expand spatial and temporal data coverage during the “big data” era of ecology.

E cologists and decision makers alike recognize the value of studying ecological systems at scales ranging from regions and continents to the entire globe (Kelling et al. 2009;Heffernan et al. 2014;Read et al. 2017). These broad-scale studies require databases that include samples collected through time from a large number of ecosystems. However, such databases are not widely available, in part because researchers and government agencies often lack the resources to sample extensively across large geographic regions (Tulloch et al. 2013). Over the past several decades researchers have expressed an interest in relying more heavily on citizen scientists to collect ecological data that can help fill existing gaps in data across space and time (Hampton et al. 2013;Pearce-Higgins et al. 2018). Typically, citizen-science programs consist of volunteers or non-professionals who undergo training in sampling methods and assist in collecting data about phenomena such as species presence, weather, water quality, and environmental pollution (Figure 1; Dickinson et al. 2012). Water quality has been well sampled by citizen volunteers in the US, where a recent survey identified at least 1675 active water-quality-related citizen-science programs (Stepenuck 2013). In addition, citizen-collected water-quality data are often of high quality and comparable to professionally collected data, particularly given the common practice of training volunteers (eg Loperfido et al. 2010;Elliott and Rosenberg 2019). In recognition of the potential value of citizen-science data for determining water-quality status and trends, many state and federal agencies have embraced citizen science as part of their water protection strategies (eg Latimore and Steen 2014;EPA 2016).
There are numerous examples of citizen-science data being used in ecological research as well as in natural resource management and policy (Crall et al. 2010;Cunha et al. 2017;Zhang et al. 2017). For instance, Lottig et al. (2014) used water clarity data collected by citizen-science volunteers to document temporal and spatial trends in lake water clarity in the upper Midwestern US. In addition, Hoyer et al. (2014) relied on data from Florida LAKEWATCH (http://lakew atch.ifas.ufl.edu -a Florida-based citizen-science water-quality program) to provide the US Environmental Protection Agency (EPA) and the Florida Department of Environmental Protection with a basis for developing numeric lake nutrient criteria, which were approved by the Florida state legislature and resulted in the removal of numerous water bodies from the state's impaired list (Conrad and Hilchey 2011). Finally, Latimore and Steen (2014) described how the Michigan Department of Environmental Quality used water-quality data collected by citizen scientists from the Michigan Clean Water Corps (https ://micor ps.net) to report on the quality of state waters, to screen sites for further agency assessment, and to develop fisheries management plans and harvest regulations.
With increased interest in compiling data across regions, states, and countries, it is crucial to gain a better understanding of the general availability of citizen-science data, and how these data contribute to broad-scale efforts that support research, management, and policy. However, some key knowledge gaps need to be filled. For example, how does the availability of data collected by citizen-science programs compare with the availability of data collected by non-citizen-science programs across broad spatial and temporal extents? Specifically, the extent to which citizen-science programs col-lect long-term data (>15 years) relative to that collected by other monitoring programs is currently unknown, as is whether citizen-science programs gather data for locations or from particular types of sites not already sampled by other programs. Understanding the contributions of different types of programs that provide data for research and management should help to allocate future resources for monitoring ecosystems.
We addressed these knowledge gaps for water-quality data collected from lakes in a subset of states in the US. We asked the following questions: (1) what proportion of lake waterquality data is collected by citizen-science programs compared to non-citizen-science programs, (2) what proportion of longterm monitoring data comes from citizen-science programs versus non-citizen-science programs, and (3) what are the characteristics of lakes sampled by citizen-science programs and non-citizen-science programs as compared to all lakes in the study area? We expected citizen-science programs to represent a large proportion of the total water-quality data collected, especially for variables that require basic training and are easy to sample relative to variables that are more complicated to sample. We also anticipated that because state, federal, and tribal agencies do not have sustained funding to study many lakes every year, citizen-science programs that work with volunteers for sampling would represent a large proportion of the available long-term data. Finally, we expected that citizenscience programs would tend to sample large, clear lakes near residential areas because volunteers tend to select sites that are easier to access (Dickinson et al. 2010) and more visually appealing (McGoff et al. 2017).
To answer these questions, we used a large lake water-quality and landscape database: LAGOS-NE (LAke Multi-scaled GeOSpatial and Temporal Database; Soranno et al. 2017). This database was created by collating as many publicly available water-quality databases -from 17 lake-rich states of the northeastern and Midwestern US -as could be reasonably discovered and obtained in a ~2-year period from 2010 to 2011. Data were derived from a range of sources including state, federal, and tribal agencies; non-profit organizations; citizen-science programs; and university researchers (Soranno et al. 2017). LAGOS-NE is ideally suited for studying the contribution of citizen-science data to broad-scale ecological databases because it is -to the best of our knowledge -the only water-quality database in the US that includes and tracks water-quality data from both government agencies and citizen-science programs, and that also contains additional lake characteristic data on all lakes in the study area (see Soranno et al. 2017).

Study extent and data
LAGOS-NE includes 87 water-quality datasets from 17 states in the northeastern and Midwestern US (Soranno et al. 2017). We used LAGOS-NE LIMNO v1.087.1, which contains water-quality data collected from the mid-1960s to 2011 for more than 13,000 lakes (Soranno and Cheruvelil 2017a). We accessed the data using the LAGOSNE R package (Stachelek and Oliver 2017). All code for accessing, analyzing, and plotting the LAGOS-NE data is available in Poisson and McCullough (2019).

Citizen-science programs
We defined citizen-science programs as those in which volunteers participated in the sampling, all of which involved training (Miller-Rushing et al. 2012). We classified each program as either a citizen-science program or a non-citizenscience program (eg we placed state, federal, and tribal agencies; non-profit organizations; and universities into one category).

State selection criteria
We analyzed our data at the scale of US states because state agencies are mandated by the EPA to monitor and manage lake water quality. We selected states from LAGOS-NE that allowed us to differentiate the programs responsible for individual water-quality observations in each dataset. For each of the 17 states in LAGOS-NE, we determined whether programs responsible for collecting individual observations could be identified as either a citizen-science program or a non-citizen-science program. There were seven E Herron, URIWW states with data collected by citizen-science programs that could be differentiated from other sources, eight states for which there were no citizen-science data in LAGOS-NE, and two states for which there were citizen-science data that were indistinguishable from the state agency dataset (WebTable 1). Specifically, Maine and Wisconsin had citizenscience programs that we were unable to use in our analysis. The Wisconsin Department of Natural Resources works with the Citizen Lake Monitoring Network, whose volunteers collect data on lake water clarity, nutrients, algal biomass, and temperature profiles that are then integrated into the state agency database, which is what was incorporated into LAGOS-NE. The Maine Department of Environmental Protection receives much of its lake water-quality data from the Lake Stewards of Maine (formerly known as Maine Volunteer Lake Monitoring Program) that samples water clarity, dissolved oxygen, and nutrients, and does not differentiate the data sources.
To examine whether the LAGOS-NE dataset contained a complete representation of statewide citizen-science programs, we searched for additional citizen-science lake water-quality programs in the remaining eight states. As of 2019, there appear to be statewide citizen-science lake water-quality monitoring programs in four of the eight states for which LAGOS-NE does not have citizen-science program data (Illinois, Massachusetts, Ohio, and Vermont; WebTable 1). It is unclear whether data from these programs were available at the time of the LAGOS-NE dataset compilation in 2011, although there is some evidence that more lake water-quality data are available online as of 2018 (P Soranno pers obs). We chose not to attempt to acquire these datasets because it took the creators of LAGOS-NE over 5 years to integrate all of the different waterquality programs in these 17 states, and it was beyond the scope of our current paper to add these additional data sources. However, we have no reason to suspect that the patterns that we observe in the seven states with the necessary data for our analysis would differ from the patterns in these additional states. In all remaining analyses for this paper, we used the seven states that clearly delineated citizen-science-collected lake observations from other sources (Indiana, Michigan, Minnesota, Missouri, New Hampshire, New York, and Rhode Island; WebTable 1). We analyzed all programs within these seven states together as a representative sample of states that have citizen science and other programs that sample water quality.

Water-quality and lake characteristic data
For all analyses, we considered four commonly collected lake water-quality variables: nutrients measured as total phosphorus (P) and total nitrogen (N); algal biomass measured as chlorophyll a; and water clarity measured as Secchi depth (the depth at which a Secchi disk [characteristic circular black and white disk attached to a cord], once submerged, is no longer visible to the naked eye from the water's surface) (Figure 1). These variables were selected for our analysis because they are good indicators of lake water quality and productivity, and are often collected and used by researchers and natural resource managers to study eutrophication. We analyzed water-quality data collected from 1980 to 2010 during the summer lake stratification period of June 15 to September 15 to coincide with the peak data collection period for both citizen science and other programs. We also analyzed five characteristics of lakes and watersheds -lake size, percent residential development in the lake's 100-m buffer, percent watershed agriculture (row crop and pasture), percent watershed forest (evergreen, deciduous, and mixed), and percent watershed wetlands -that were available from existing digital maps for all lakes within the study area. Watershed land-use and land-cover data came from LAGOS-NE-GEO v1.05, calculated using the 2006 National Land Cover Database and the National Wetlands Inventory (Soranno and Cheruvelil 2017b). Geospatial data layers for study lakes and state outlines were obtained from LAGOS-NE-GIS v1.0 (Soranno and Cheruvelil 2017c).

Analyses of long-term data and lake selection
We counted the number of unique years that citizen-science programs and non-citizen-science programs sampled lakes for nutrients, algal biomass, and water clarity. We considered long-term data as 15 or more years of data that were not necessarily continuous (ie at least 15 years with at least one data point per year for a given water-quality variable from 1980 to 2010). To examine potential lake selection biases in citizen-science programs, we compared lake characteristics for lakes that were sampled by citizen-science programs to those sampled by non-citizen-science programs and to all lakes in the seven states. We used pairwise t tests to determine whether there were differences between the characteristics of lakes sampled by citizen-science programs and non-citizen-science programs, as well as to all lakes in the study area. All analyses were performed using R v3.3.3.

Proportion of data from citizen-science programs
Of the 434,629 samples collected from 1980 to 2010 for water clarity (Secchi), nutrients (total P and total N), and algal biomass (chlorophyll a), 299,745 were collected by citizenscience programs (69%) (Figure 2). For water clarity, citizenscience programs were responsible for collecting 82% of the data during the 1980-2010 period and ~90% of the data from the mid-1980s to the mid-1990s. Although providing a lower proportion of nutrient data across the full study period (P: 58%, N: 41%), citizen-science programs have collected increasingly greater proportions of nutrient data since the mid-1980s. Since 2000, citizen science contributed >50% of P and N data in all but one year. Citizen-science programs provided the lowest proportion of algal biomass data (32%) as compared with the other two variables, but contributed the majority of data during the mid-1990s; since 2000, noncitizen-science programs have contributed the majority of algal biomass data. In summary, although their contributions have varied through time and across water-quality variables, citizen-science programs have consistently provided most of the water clarity data and are contributing a steadily increasing amount of nutrient data. In contrast, non-citizen-science programs contributed the overall majority of algal biomass data, and this contribution has increased since 2000.

Contribution of citizen-science programs to long-term data collection
Citizen-science programs contributed the majority of longterm (>15 years) monitoring of lake water-quality data for water clarity (94%), P (72%), and algal biomass (56%) (Figure 3). Non-citizen-science programs provided the majority of long-term N data (59%). Although these basic results are important, considering data collection in individual states provides valuable context. For example, if not for their respective citizen-science programs, Rhode Island and New Hampshire would be completely devoid of long-term data for each of the four water-quality variables included here (WebFigures 1-4). Citizen-science programs collected the majority of long-term records for water clarity in many lake-rich states, such as Minnesota, Michigan, and New York (WebFigure 1). In contrast, non-citizen-science programs in Missouri collected most of the long-term data for all four water-quality variables. Generally, non-citizen-science programs collected long-term water-quality data for fewer lakes in relatively small regions (eg northern Lower Peninsula of Michigan, southern New York) (WebFigures 1-4).

Characteristics of lakes and their watersheds selected by water-quality program type
The characteristics of the lakes and watersheds sampled by the two types of programs do not perfectly match the characteristics of the entire population of lakes in the study area. The citizen-science programs and non-citizen-science programs selected lakes that significantly differed (P < 0.05) from all lakes for almost all of the five characteristics (Figure 4, a-e), except for forest cover between citizenscience program lakes and all lakes, which did not differ significantly. As such, neither program type appears to select lakes that are a representative sample of lakes for the characteristics that we analyzed. However, lakes associated with programs (both citizen-science and non-citizenscience) had certain characteristics in common with each other but not with all lakes. For instance, both citizenscience and non-citizen-science programs selected lakes that were generally larger than all lakes and had greater residential development within their 100-m buffers as compared with the 100-m buffers of all lakes, although the amount of residential development was higher for lakes associated with citizen-science programs than for those with noncitizen-science programs (Figure 4, a and c). As compared with all lakes, lakes selected by non-citizen-science programs had greater forest cover in their watersheds and lakes selected by citizen-science programs had greater wetland cover in their watersheds.

Discussion
We found that in seven lake-rich states of the US, citizenscience programs collected large proportions of waterquality data relative to other monitoring programs, particularly for long-term data. The lakes sampled by citizen-science programs in these seven states increased both the spatial and temporal coverage beyond what non-citizen-science programs sample. Neither citizenscience programs nor non-citizen-science programs sampled lakes that were representative of all lakes, and one program type did not appear to be more strongly unrepresentative than the other. On the basis of our results, Figure 2. Proportion of samples collected by citizen-science programs ("Citizen") and non-citizen-science programs ("Non") from 1980 to 2010 in the seven LAGOS-NE states in this study for (a) water clarity, (b) total phosphorus (P), (c) total nitrogen (N), and (d) algal biomass.

(a) (b) (c) (d)
we argue that including citizen-science datasets in broadscale integrated water-quality databases increases the spatial and temporal coverage of observations. Below, we discuss the implications of these results for designing lakemonitoring programs, and the importance of continued and expanded partnerships between citizen-science and non-citizen-science programs. Given the prevalence of citizen-science data for lake waterquality data through space and time, professional agencies should consider ways to leverage existing programs. Given that financial support for conducting broad-scale ecological monitoring is traditionally limited, creative partnerships can make these limited funds go farther. For example, we recommend that for variables that are relatively easy to collect and might benefit from repeated sampling within and across years, one should develop or invest in citizen-science programs. Such programs still require support in the form of professional staff oversight, data management, training, communication, and supplies. However, with respect to conducting site visits, cost savings are associated with citizen scientists, who often reside near sampling locations, volunteer their time, and pay their own travel costs (Canfield et al. 2002); for professional staff, the time and expense associated with sampling, especially when travel across large states or between countries is required, can be substantial. We found that citizen-science programs collected the vast majority (82%) of lake water clarity data over a 31-year period across the seven states; water clarity is perhaps one of the easiest water-quality measurements to make and requires inexpensive equipment. Many citizen-science programs have therefore been developed around this basic approach to monitoring.
To increase the types of variables sampled (including ones that require more training or additional equipment), it may be cost-effective for government agencies to collaborate with existing citizen-science sampling efforts. This appears to be a strategy used in certain states. For example, citizen-science programs in the seven study states provided lower proportions of algal biomass and nutrient data as compared to water clarity data. Interestingly, however, the amount of nutrient data contributed by citizen-science programs has increased through time, which suggests that these programs can expand to sample more diverse and complex variables. We therefore recommend that agencies should support, encourage, and fund citizen-science programs to collect these and other variables that require similar equipment, logistical support, and training (eg other water-quality variables, invasive species, or emerging contaminants). Investing in citizen-science programs to expand data collection efforts for these types of variables can be more cost-effective than relying on limited government agency resources (Thornhill et al. 2016). State agencies tasked with managing water bodies within state boundaries have an added incentive to support citizen-science water-quality programs because doing so provides more data, which can be used for mandatory reporting to the EPA, developing state-level nutrient criteria, and managing specific water bodies. Yet there will always be variables that require specialized training, equipment, or sampling protocols and are therefore more practically or efficiently collected by agency professionals; these include fish community surveys requiring multiple gear types or chemical analyses that depend on ultraclean techniques and specialized equipment (eg methyl mercury, endocrine disruptors, eDNA, microplastics). Finally, although many of the partnerships observed in our study were between citizenscience programs and state agency personnel, there are also programs in which university scientists provide logistical, training, and financial support for citizen-science programs (WebTable 1), highlighting another collaborative avenue for expanding data collection through citizen-science programs.
Another important component of ecological monitoring is the temporal range of observations. In our study area, citizen-science programs substantially increased the amount  of available long-term water-quality data, particularly for lake water clarity. This result is important because the availability of such data in general in the LAGOS-NE study area (which includes 17 states) is unexpectedly low, regardless of the type of monitoring program (Stanley et al. 2019). However, because citizen-science programs are often more cost-effective models for long-term sampling, they should be a central component of future long-term data collection (Bonney et al. 2009). Given the large number of lakes with current long-term records, we recommend that both citizenscience programs and state agencies target lakes for which long-term records already exist, to maintain the value and continuity of these long-term data records. In addition, we recommend that programs target lakes with historical but limited contemporary data, as well as recently sampled lakes that have the potential to become long-term monitoring sites in the near future. We also found that the geographic extent of long-term data is larger than would otherwise be available because of the contributions of citizen-science programs. It is therefore clear that citizen science can help to maintain and expand ecological databases by boosting spatial coverage of long-term data. Finally, the above discussion has emphasized the value of citizen-science data for broad-scale ecological databases, decision makers, and researchers. However, citizen-science volunteers participate in citizen-science programs for a wide range of reasons that are not fully captured by considering data availability alone (eg Bonney et al. 2016). Additional benefits include improving volunteers' understanding of science, increasing their skills and knowledge, providing opportunities to participate actively in research and management of the environment, contributing to social well-being by giving them a voice in local decision making, and aligning with their values of conservation or environmentalism (Aceves-Bueno et al. 2015;McKinley et al. 2015;Haywood et al. 2016). To achieve these benefits, we must engage a wider variety of people in citizen science, which may require us to reconsider the phrase "citizen science" to ensure that we are not inadvertently marginalizing those who do not have legal citizenship and to better include all forms of inquiry, stages of research, and forms of knowing (Eitzel et al. 2017;Elliott 2019). Encouraging state (and other) agencies to invest further in citizen-science programs to increase opportunities for participation of all people will foster these important additional benefits to the volunteers and the broader public.

Conclusion
Future ecological studies and policies will need to rely on ecosystem monitoring and data compilation efforts among ecologists, government agencies, and citizen-science programs. Our study of water quality in seven US states demonstrates the major contribution of citizen-science programs to broadscale and long-term ecological databases. Recognizing the value of citizen-science programs, increasing collaboration between those programs and monitoring agencies, and acknowledging past lake selection preferences can help to improve future ecosystem sampling efforts by building increasingly comprehensive databases with improved spatial and temporal coverage. Our results show the importance of maintaining and investing in citizen-science programs, and the potential gains from strengthening partnerships with other types of programs to collect data for a wider range of ecosystems and variables.
An important next step is to determine the level of use of these citizen-science data in states where they are being collected. In some cases, citizen-science data are not readily available and not integrated into state water-quality databases used by natural resource professionals and decision makers. Data compilation efforts like LAGOS-NE can help increase access to and usage of the high-quality data that are found in citizenscience datasets. By doing this and by combining those data with additional information that provide valuable ecological context such as watershed land use and other important lake characteristics, large public databases like LAGOS-NE enable Figure 4. Characteristics of lakes sampled by citizen-science programs ("Citizen"), of lakes sampled by non-citizen-science programs ("Non"), and of all lakes in our seven-state study area, including (a) lake size, (b) percent forest in the watershed, (c) percent residential development in the 100-m lake buffer, (d) percent wetland in the watershed, and (e) percent agriculture in the watershed. Horizontal lines within boxes depict median values, boxes represent the interquartile range (25th-75th percentiles), whiskers (vertical lines) represent 1.5×interquartile range, and solid circles depict outliers. Because all figure scales have been truncated to emphasize differences among the interquartile values, most outlier data points are not shown. Differences between all possible pairs were significant (P < 0.05) except for the comparison of forest cover between all lakes and lakes sampled by citizen-science programs. wider use of citizen-science data in research, management, and policy development.