Improving the efficacy of web-based educational outreach in ecology

Scientists are increasingly engaging the web to provide formal and informal science education opportunities. Despite the prolific growth of web-based resources, systematic evaluation and assessment of their efficacy remains limited. We used clickstream analytics, a widely available method for tracking website visitors and their behavior, to evaluate .60,000 visits over three years to an educational website focused on ecology. Visits originating from search engine queries were a small proportion of the traffic, suggesting the need to actively promote websites to drive visitation. However, the number of visits referred to the website per social media post varied depending on the social media platform and the quality of those visits (e.g., time on site and number of pages viewed) was significantly lower than visits originating from other referring websites. In particular, visitors referred to the website through targeted promotion (e.g., inclusion in a website listing classroom teaching resources) had higher quality visits. Once engaged in the site’s core content, visitor retention was high; however, visitors rarely used the tutorial resources that serve to explain the site’s use. Our results demonstrate that simple changes in website design, content and promotion are likely to increase the number of visitors and their engagement. While there is a growing emphasis on using the web to broaden the impacts of biological research, time and resources remain limited. Clickstream analytics provides an easily accessible, relatively fast and quantitative means by which those engaging in educational outreach can improve upon their efforts.


INTRODUCTION
The web has now become the primary means by which people seek scientific information (National Science Board 2014) and those doing so are more likely to value the roles of science in society (Horrigan 2006, Dudo et al. 2011. In parallel, the scientific community is increasingly turning to the web to create science education opportunities. The web benefits from ease of content creation, rapid revision of that content, and low cost distribution to a global audience. Not surprisingly, it is an increasingly common means of addressing the growing emphasis on broadening the societal impacts of publicly funded research (Nadkarni and Stasch 2013a). Individuals and organizations are creating and hosting scientific content on formal and informal v www.esajournals.org science education websites, blogs, and social media. The rate of adoption has been rapid: howtosmile.org, an NSF-supported catalog of math and science resources, now lists . 3500 websites (Porcello and Hsi 2013).
In spite of their rapid proliferation, evaluation and assessment of science education websites remains limited (Bell et al. 2009, Brossard 2013. Effective science communication is no different than science, in that ''data should trump intuition'' (Nisbet and Scheufele 2009). There is a critical need for information that improves science education by (1) identifying the best ways to drive visitors to a website, (2) determining how visitors interact with a website once there, and (3) assessing how visitors' scientific knowledge and attitudes have changed based on their visit. The anonymity of the web can present a barrier to collecting such information, particularly with respect to determining changes in knowledge and attitudes. However, there are tools that can help maximize the scientific community's return on its investment of resources in educational outreach.
Clickstream analytics provides one method for evaluating the use of science education websites (Clifton 2012). Clickstream analytics collects spatially and temporally explicit data on each visit to a website by depositing a small code file on the visitor's computer (commonly referred to as a 'cookie') that returns quantitative and qualitative information to a host server, which then collates the information and makes it available to the website operator. Common information includes the means by which the visitor found the site, the geographic origin of the visit, the duration of the visit, and the specific pages viewed. More sophisticated clickstream analytics systems can provide this information in real time, with a very high level of detail. Importantly, the visitor's identity and location (i.e., IP address) are removed to maintain anonymity. Clickstream analytics software is freely available and easy to use; simple versions are already built into many website hosting services. Clickstream analytics is a ubiquitous tool for businesses tracking engagement with their clients; however, its use in the academic community has primarily focused on resource clearinghouses such as libraries (Fang 2007, Wang et al. 2011, with limited application to science education websites (Scotchmoor and Thanukos 2007, Schernewski and Bock 2014, Zhang 2014) and science networking websites (Guerrero-Medina et al. 2013).
We used clickstream analytics to assess Canopy in the Clouds, a free website we built to provide informal science education in ecology. In particular, we studied patterns of visitation and visitor behavior with a focus on comparing and contrasting (1) the means by which visitors found the site and (2) how they engaged with the site following their arrival. Our ultimate objective was to provide a quantitative evaluation capable of informing how we build and promote better science education websites in the future.

Website description
Canopy in the Clouds is a free, web-based learning environment for informal science education initiated as a graduate student outreach project (www.canopyintheclouds.com). The website uses media from a tropical montane cloud forest as a tool for teaching ecology, with an emphasis on providing peer-reviewed resources for inquiry-based K-12 education. The core of the website's content is a series of navigable panoramas that allow the user to explore the location with a 3608 field of view ( Fig. 1). Embedded within these panoramas are a total of 75 clickable links leading to videos, photos and text that provide natural history information about the forest. Additionally, there is a dedicated learning page with resources for students and a teaching page with .25 lesson plans. A separate Spanish language version of the website (www. doselenlasnubes.com) is not considered herein. The website was launched on 21 January 2011.

Analytics analysis
Visitation and visitor behavior to Canopy in the Clouds was analyzed using Google Analytics, a free clickstream analytics program (www.google. com/analytics). Data can be viewed directly through a web browser, downloaded as an Excel CSV file, or as in this instance, downloaded through an application programming interface (API) facilitated by a package (''rga'') in the statistical computing program R (R Core Team 2013). We analyzed weekly data (n ¼ 160 weeks v www.esajournals.org as a function of Google Analytics' aggregation) on all visitations originating from the USA (75% of all traffic) between 21 January 2011 and 21 January 2014. Data are available from the Dryad Digital Repository (Goldsmith et al. 2014). We do not distinguish among visits where the program identifies languages other than English as being utilized by the visitor's computer. For consistency with future research, we adopt Google Analytics terminology wherever possible. Analyses described below were performed in R 3.0.2 (R Core Team 2013).

Quantity of visits
To study the quantity of visitors arriving at the website as a function of time, we accounted for temporal auto-correlation by applying a seasonal trend decomposition procedure to weekly means of visitation in order to estimate the trend, the seasonal effect and any remaining error contributing to the observed data. We then summed the trend and random components determined by the decomposition procedure and carried out a linear regression (Cleveland et al. 1990). To study differences in the sources of those visitors, we then determined whether visits were ''direct'' (i.e., the visitor directly enters the website address in the browser), ''referral'' (i.e., the visitor follows a link to the website from another website), or from ''search'' (i.e., the visitor arrives at the website by searching for specific content) and how this compared to reported means of all websites worldwide (Google Analytics Team 2011) using one sample t-tests.
We promoted visitation to the website using outlets including print media, social media, v www.esajournals.org radio, and email list-servers. For social media, we used linked accounts on Facebook and Twitter, occasionally supplying additional content on Twitter. Common hashtags, which are a form of keyword for micro-blogs, included ''#science'', ''#forest'', ''#education'', ''#rainforest'', ''#tropical'' and ''#classroom.'' To determine the quantity of visitors among the resulting referrals, we then classified all referral sources as either educational (e.g., a website of classroom teaching resources), non-educational (e.g., a website of a company that donated equipment to the project), news media (e.g., a website of a newspaper) or social media (e.g., a website that is a participatory blog or micro-blog). The number of referral sources was analyzed as a function of time using a linear regression applied to seasonally de-trended data as above.
Search engine optimization (SEO), whereby keywords were added to webpage metadata to improve the site's search engine visibility, was performed on 1 August 2011. To determine the effects of search engine optimization on the quantity of visitors, we used a one-sided binomial proportions test with data thinned by a factor of 3 to reduce temporal auto-correlation, in order to compare the proportion of visits originating from search engines in the 27 weeks prior to optimization to the 27 weeks following optimization.

Quality of visits
To study differences in visitor behavior, we determined differences in the proportion of visitors retained (i.e., where time on site . 0 seconds) among the different referral sources using a generalized linear model with a logit link. We also determined differences in the average number of pages per visit, average time on site, and the average time per page among different referral sources using Kruskal-Wallis rank sum tests.
To study the content viewed by visitors, we used the quantitative behavioral flow charts provided by Google Analytics, as applied to all visitors regardless of source. We then determined differences in the number of observed versus expected visits to particular content pages (e.g., teaching resource and video tutorial) among different referral sources using chi-square tests.

Quantity of visits
Cumulatively, the website received 61,774 visits, of which 51% were return visits. Visits to the website were highly seasonal, generally corresponding to the school calendar in the United States (Fig. 2). The number of visits demonstrated no significant trend as a function of time (t ¼ À1.01, p ¼ 0.3). Direct visits (55.1 6 14.7% visits week À1 6 1 SD) accounted for significantly more traffic than the global mean of visitation patterns (l ¼ 36%; t ¼ 16.41, p , 0.001), while referral visits (19.1 6 11.4% visits week À1 ) were significantly lower (l ¼ 21%; t ¼ À2.08, p ¼ 0.038) and search visits (25.8 6 15.6% visits week À1 ) did not significantly differ (l ¼ v www.esajournals.org The majority of visits from referrals originated from educational sources (50.1 6 23.3% visits week À1 6 1 SD), followed by visits from social media (21.9 6 15.9% visits week À1 ), non-educational (19.1 6 13.8% visits week À1 ), and newsmedia (16.6 6 13.4% visits week À1 ) sources. The number of referring sources decreased significantly with time (r 2 ¼ 0.43, t ¼À11.03, p , 0.001). The social media accounts grew from ;100 to ;350 followers over the course of the study; the social media website Twitter resulted in 0.14 visits tweet À1 , while Facebook resulted in 5.1 visits post À1 .
Visits to the website from searches originated from .2500 unique search term combinations; however, five terms and their common variants accounted for 45.5% of the visits (Table 1). Searches for Canopy in the Clouds alone accounted for .35% of the visits. The proportion of visits originating from searches in the 27 weeks after search engine optimization (0.77) was significantly higher than the 27 weeks before optimization (0.11) occurred ( Fig. 3; v 2 ¼ 17, p , 0.001).

Quality of visits
Among visits from different referral sources, the proportion of visitors retained from social media was significantly lower than from other sources ( Fig. 4A; df ¼ 573, residual deviance ¼ 440, p , 0.02). Among those visitors retained, there were significant differences in the average number of pages visit À1 (Fig. 4B; v 2 ¼ 67, p , 0.001), average time on site ( Fig. 4C; v 2 ¼ 37, p , 0.001), and average time page À1 (Fig. 4D; v 2 ¼ 27, p , 0.001). Again, these differences were primarily driven by visitors referred from social media, as well as visitors from non-educational sources; those visitors that did remain on the website viewed fewer pages, remained on the site for less  Visitors to the website largely arrived to the homepage, with notable interest in the symbiosis lesson plan, learning, and teaching pages (Fig. 5). Retention of visitors, excluding those who only visited one page, decreased at a loss of ;10% for every additional page viewed. Once engaged, they moved primarily among the panorama pages by means of the homepage, with the site tutorial page (which serves to explain how to use the site) often serving as the fourth or fifth page visited. The observed number of page views by educational source referrals to the teaching resource (v 2 ¼ 30, p , 0.001) and tutorial (v 2 ¼ 79, p , 0.001) pages were significantly higher than the expected number, whereas observed visits from non-educational, news media, and social media to these pages were less than the expected number.

DISCUSSION
Considerable resources are being devoted to enhancing ecoliteracy through science education websites; however, our understanding of the efficacy of these efforts remains unresolved.
Our results show that by engaging clickstream analytics, we can improve our understanding of how to allocate resources to drive visitation, how to structure or restructure a site to improve visitor retention, and how to build better content for visitors.

Improving the quantity and quality of website visits
Visitation to Canopy in the Clouds was primarily based on prior knowledge of the website, as evident from the large proportion of direct traffic and traffic originating from searches for the name of the website. We actively sought to drive traffic through press releases to a variety of sources coinciding with the launch of the website, with a particular focus on reaching out to state and national science teachers associations, resulting in the website's inclusion in a number of teaching resource clearinghouses. Among our target audience of middle and high school science teachers, 39% use the web on a daily basis to look for material to create lesson plans, while 44% use the web on a daily basis to identify content or material that will engage students (Purcell et al. 2013). We augmented these efforts v www.esajournals.org with regular posts to dedicated, linked accounts on Facebook and Twitter. We found a clear difference in return on investment; the low number of website views originating from Twitter may be attributable to the structure of the site, which results in a high volume of posts that do not come to the attention of potential visitors as readily as Facebook. While social media may reach new and more diverse audiences, such efforts are inherently driven by creating new content and thus require time and energy. As a community, there is a need to explicitly consider the balance of reaching wider audiences through social media versus the resources invested.
Visitation to the website was also increased through search engine optimization, although the degree of investment in the process, which is time-intensive and based on text that may be absent from media-rich websites, is an open question. Ultimately, search traffic remains a relatively small proportion of the visitation. Our results suggest that simply creating content is not sufficient to ensure its use, resources must be allocated to driving traffic to the site, or else the full potential of the website will not be realized.
Once immersed in the panoramas, the website's core content, visitor attrition was very low, with a high average number of pages visit À1 and time on site. However, nearly half of the visitors left the site before engaging in the panoramas. This may not be surprising for visits originating from search, where specific content is being sought. Nevertheless, the results suggest that directly engaging visitors in content, rather than the more traditional homepage that serves as an index of the site, may improve retention. This is further supported by results demonstrating that irrespective of the starting page, visitors view several other pages prior to the tutorial page. Clickstream analytics programs often support hypothesis testing (referred to as A/B testing), where visitors are randomly and blindly presented with one or more variants of a single feature of a website to provide a quantitative comparison of the resultant behavior (Clifton 2012). For instance, given the observations of visitor behavior noted above, A/B testing could be used to evaluate which homepage design (i.e., a homepage focused on introducing the website vs. a homepage rich in core content) results in better visitor retention. A/B testing is among the most promising avenues for future research evaluating and assessing science education websites; however, it must be treated with caution, as specific ethical concerns arise where studies use experimental (i.e., experimental changes to content), rather than observational approaches (Slade and Prinsloo 2013, Harriman and Patel 2014, Kramer et al. 2014. In addition to engaging in the panoramas, visitors were particularly likely to use the lesson plans (8,708 total downloads). This was supported by search term results. Searches for ''(tropical montane) cloud forests,'' ostensibly a primary theme of the site, accounted for , 1% of the search terms, whereas searches for lesson plans on ''symbiosis, '' ''hypotheses,'' and ''niches'' were Fig. 5. A flow path diagram demonstrating the content visited on Canopy in the Clouds, beginning with the identity of the webpage that visitors land on and following subsequent pageviews (interactions) on the site.
v www.esajournals.org much more prevalent. Finally, visitors from educational referral sources were more likely to visit the teaching resource and tutorial pages, suggesting that we are reaching classrooms and should continue to focus our efforts there. Notably, certain content that was particularly resource-intensive to construct (e.g., a glossary), was so seldom utilized that we would not build a similar resource in the future. Applied in this context, clickstream analytics can provide valuable insight into how target audiences find and use resources (Duin et al. 2012).

Limitations of clickstream analytics
The metrics provided by clickstream analytics, as well as their underlying calculation, remain subject to change as providers react to market demand for clients that pay for clickstream analytics. For example, open webpages where the visitor returns minutes later (after drinking a cup of coffee) and then clicks a second page are recorded as having a long time on site. Absolute values must thus be treated with caution and efforts made to understand how each metric is calculated (Clifton 2012); as such, we have generally refrained from quantitative comparisons with other informal science education websites (Scotchmoor andThanukos 2007, Schernewski andBock 2014).
Perhaps more critically, clickstream analytics are unable to evaluate whether visitors knowledge of, or attitude towards, a certain subject have changed as a result of their visit (Borun et al. 2010). Here, the anonymity of the web continues to present a considerable challenge; however, intensive educational research methods exist for pursuing such questions and should be pursued in concert with clickstream analytics (Bell et al. 2009). There is great potential in the emerging disciplines of learning analytics and educational data mining (Siemens 2013).
Finally, we note that while clickstream analytics provides a scientific approach to evaluation and assessment following website implementation, scientific approaches to website design prior to implementation are equally important for ensuring the best possible educational outcomes (Wong-Parodi and Strauss 2014).

Conclusions
The scientific community must continue to create science education content that addresses the diverse audiences engaging the web for scientific information. It is regrettable that there is currently little incentive for those engaging in such activities to study the outcomes (Nadkarni and Stasch 2013b). However, pursuing evaluation and assessment, as well as iterative improvement based on that information, should occur in tandem with creating content (Varner 2014). Moreover, the peer-reviewed publication of that information represents an additional and tangible professional incentive for outreach that is not often mentioned. By engaging and making available the data from tools such as clickstream analytics, we ensure that the ecological community uses it limited time and resources to build the best websites possible.

ACKNOWLEDGMENTS
This research was completed following the guidelines of the Oxford University Research Ethics Committee. We thank P. Lopes, W. Anderegg, T. Marthews and G. Middendorf, as well as two anonymous reviewers, for constructive comments on the manuscript. Financial support for the project was provided by grants from the National Geographic Society Young Explorers Program to G. Goldsmith