The Accuracy of Citizen Science Data: A 1 Quantitative Review

Citizen science is increasingly being used to collect data for research. However, there is often 25 concern about the accuracy of the data. Here we use 63 peer-reviewed case studies in ecology 26 and environmental science that compare citizen science data against reference data to statistically 27 evaluate the accuracy of citizen-collected data. Citizen science data is not significantly different 28 from professional data in 62% of the comparisons using p-values, shows moderate to strong 29 correlation (r ≥ 0.5) with professional data in 51% of the comparisons using correlations, and has 30 at least 80% agreement with professional data in 55% of the comparisons using percent 31 agreement. Data collected by participants who were involved for longer time periods, by 32 participants who had training, by larger groups, and in research related to volunteers’ economic 33 and health situations are more accurate. Citizen science can provide useful data, but accuracy for 34 a given task may be low and researchers should design tasks that increase the accuracy of data 35 collected by citizen scientists. 36 37


24
Citizen science is increasingly being used to collect data for research. However, there is often 25 concern about the accuracy of the data. Here we use 63 peer-reviewed case studies in ecology 26 and environmental science that compare citizen science data against reference data to statistically 27 evaluate the accuracy of citizen-collected data. Citizen science data is not significantly different 28 from professional data in 62% of the comparisons using p-values, shows moderate to strong 29 correlation (r ≥ 0.5) with professional data in 51% of the comparisons using correlations, and has 30 at least 80% agreement with professional data in 55% of the comparisons using percent 31 agreement. Data collected by participants who were involved for longer time periods, by 32 participants who had training, by larger groups, and in research related to volunteers' economic 33 and health situations are more accurate. Citizen science can provide useful data, but accuracy for 34 a given task may be low and researchers should design tasks that increase the accuracy of data 35 collected by citizen scientists.  qualitative reviews (e.g., Lewandowski and Specht 2015), there are to date no reviews that 78 combine the case studies to quantitatively evaluate the data quality of citizen science. In this 79 paper, we conduct a quantitative review of citizen science data in the areas of ecology and 80 environmental science. We focus on the universe of peer-reviewed studies in which researchers 81 compare citizen science data to reference data either as part of validation mechanisms in a citizen 82 science project or by designing experiments to test if volunteers can collect sufficiently accurate 83 data. We code the authors' qualitative assessments of data accuracy and we code the quantitative 84 assessments of data accuracy. This enables us to evaluate both whether the authors believe the 85 data to be accurate enough to achieve the goals of the program and the degree of accuracy 86 reflected in the quantitative comparisons. We then use a linear regression model to assess 87 correlates of accuracy. With citizen science playing an increasingly important role in expanding 88 our scientific knowledge and enhancing the management of the environment, we conclude with 89 recommendations for assessing data quality and for designing citizen science tasks that are more 90 likely to produce accurate data.  (http://scholar.google.com/) for papers that cited these 16 studies. Next, we identified every 104 paper cited in this group of papers that compared citizen science data to reference data and again 105 performed a cited reference search on this new group of papers. We repeated this process 106 iteratively until we encountered no new case studies, giving confidence that we had identified the 107 universe of papers in ecology and environmental science that compare citizen science data to 108 reference data. This process yielded a preliminary list of 72 articles. We eliminated 9 studies 109 either because they presented their statistical results in figures (e.g., (Rock and Lauten 1996,   estimates using a Student t-test, so we recorded the t-statistic, p-value, and degrees of freedom 123 when provided. In that same paper, citizen scientists' correct identification of species was 124 compared to professionals' using percent agreement and a chi-squared test, so each of those 125 values (% agreement, chi-squared value, and p-value) was recorded. That paper also included 126 breakdowns of easy and difficult species identification, as well as the presence or absence of 127 species, resulting in five observations that compare the data from volunteers to that of 128 professionals. To assure data quality, the accuracy of the data extracted from each paper was 129 checked by a second coder after inclusion into the database.

147
In addition to collecting the statistical comparisons between citizen science and reference data,  In addition to coding the statistical comparisons between citizen science data and reference data, 158 we coded the attributes of the task and citizen scientists that might affect accuracy. To 159 characterize the task, we coded the discipline as geology, atmospheric science, biology of 160 animals, or botany and the location of the research as marine, freshwater, terrestrial, or the 161 atmosphere. We also coded whether the author noted any particular difficulty with the task, as 162 difficulty affects accuracy (Kosmala et al. 2016). To understand the attributes of the citizen 163 scientists, we coded the length of participation of the citizen scientists into 6 categories ranging 164 from 0-1 month to more than 10 years, whether they participated only once or repeatedly, and the 165 number of citizen scientists participating. We coded whether the paper mentioned that the citizen   The most common means of comparing citizen science data to data collected by professionals 203 was percent agreement (525 out of 1363; Table 1); yet this method does not allow for hypothesis 204 testing. As shown in Figure 3, 55.2% of comparisons had a percent agreement equal to or greater 205 than 80%. There was at least 50% agreement in about 86.1% of the comparisons. Percent 206 agreement of 10% or less was reported less than 2% of the time. We note that percent agreement 207 fails to account for agreement by chance (Lombard et al. 2002), so these figures likely overstate 208 the degree of accuracy of citizen scientists.   222 and professionals? 223 The correlation between citizen scientist and professional data was reported in 81 pairings.

224
Overall, 72% of correlations were significantly greater than zero, but a quarter of the positive 225 correlations were quite weak. We considered values of r ≥ 0.5 to show moderate to strong 226 correlation between citizen scientist and scientist data. There were 41 observations (50.6%) with 227 r ≥ 0.5, of which 36 (87.8%) were significant (p ≤ 0.05), 2 (4.9%) were not significant, and 3 228 (7.3%) were not reported. A total of 35 observations (43.2%) showed weak positive correlation 229 between citizen scientist and scientist data (0 ≤ r < 0.5). Of these observations, 12 (34.3%) were 230 significant, 17 (48.6%) were not significant, and 6 (17.1%) had no reported p-values. A total of 5 231 observations (6.2%) indicated a negative correlation between citizen scientist and scientist data, 232 and in all of these cases the correlations were not significant ( Figure 5). data, which may allow the authors to conclude that citizen science data is sufficiently accurate 243 for certain tasks. In other words, the authors of the studies frequently saw the usable data within 244 the noise. Second, there is no agreed-upon definition of terms like "reliable". For some scholars, 245 70% agreement is reliable, yet for others 70% agreement would not be sufficient for the 246 scientific questions they seek to answer. This highlights the crucial role that research design and 247 researcher judgement plays in deciding whether data are accurate enough for a given use.  This lack of explicit criteria for accuracy is particularly acute when correlations are used.

274
For example, one paper reported a Spearman's rank correlation of 0.55 with p<0.001.

275
While this allows for a significance test (an advantage over percent agreement), it is 276 unclear whether 0.55 should be considered a high enough correlation. These definitions 277 of accuracy are specific to the research question for which the data will be used and 278 should be specified before data collection commences or analysis proceeds.

287
The case survey method of analysis has well-known shortcomings. First, the case survey method 288 relies on published case studies, which may not adequately cover all areas. In this case, many 289 well-known citizen science projects are long-term and use many citizen scientists. Studies should be taken to apply mainly to shorter projects. It is clear that studies comparing citizen 295 science data to reference data should continue, as there is more to learn about the correlates of 296 data quality and how to design citizen science projects that produce quality data. Second, the 297 analysis hinges on the quality of the data in the studies. There are reasons to believe that the 298 studies used here likely represent relatively good quality data. They were primarily designed 299 explicitly to test the quality of citizen science data, which likely indicates that the researchers put 300 more thought into how to obtain quality data. Most of the studies here (75.3%) provided training, 301 which improves data quality. Nonetheless, this study must rely on published comparisons and 302 data quality issues are not unique to citizen science. The papers examined here most often 303 compare citizen science data to professional data, a common means of assessing data quality that 304 often makes the assumption that the professional data is fully accurate (Kosmala et al. 2016). Yet  researcher notices citizen scientists struggling to identify uncommon species. But they may be 317 overly optimistic. Although the abstracts of papers comparing citizen science data to professional 318 data indicated that the citizen science data quality was good in 73% of the abstracts, the results of 319 our quantitative assessment cast more doubt on the accuracy of the data. For those studies 320 reporting p-values we found that citizen science was not significantly different from professional 321 data in 62% of the cases. We also found a moderate to strong correlation in 51% of the   Isaac Perlman and Trevor Zink participated in early stages of the project. We thank them for 362 their assistance. We would like to thank Michael Bostock for the d3.js script that we used to 363 produce Sankey diagrams in this manuscript. This research did not receive any specific grant 364 from funding agencies in the public, commercial, or not-for-profit sectors.