Environmental proteomics reveals taxonomic and functional changes in an enriched aquatic ecosystem.

Aquatic ecosystem enrichment can lead to distinct and irreversible changes to undesirable states. Understanding changes in active microbial community function and composition following organic-matter loading in enriched ecosystems can help identify biomarkers of such state changes. In a field experiment, we enriched replicate aquatic ecosystems in the pitchers of the northern pitcher plant, Sarracenia purpurea. Shotgun metaproteomics using a custom metagenomic database identified proteins, molecular pathways, and contributing microbial taxa that differentiated control ecosystems from those that were enriched. The number of microbial taxa contributing to protein expression was comparable between treatments; however, taxonomic evenness was higher in controls. Functionally active bacterial composition differed significantly among treatments and was more divergent in control pitchers than enriched pitchers. Aerobic and facultative anaerobic bacteria contributed most to identified proteins in control and enriched ecosystems, respectively. The molecular pathways and contributing taxa in enriched pitcher ecosystems were similar to those found in larger enriched aquatic ecosystems and are consistent with microbial processes occurring at the base of detrital food webs. Detectable differences between protein profiles of enriched and control ecosystems suggest that a time series of environmental proteomics data may identify protein biomarkers of impending state changes to enriched states.

enriched. The number of microbial taxa contributing to protein expression was 23 comparable between treatments; however, taxonomic evenness was higher in controls. 24 Functionally active bacterial composition differed significantly among treatments and 25 was more divergent in control pitchers than enriched pitchers. Aerobic and facultative 26 anaerobic bacteria contributed most to identified proteins in control and enriched 27 ecosystems, respectively. The molecular pathways and contributing taxa in enriched 28 pitcher ecosystems were similar to those found in larger enriched aquatic ecosystems and 29 are consistent with microbial processes occurring at the base of detrital food webs. 30 Detectable differences between protein profiles of enriched and control ecosystems 31 suggest that a time series of environmental proteomics data may identify protein 32 biomarkers of impending state changes to enriched states. difficult to forecast shifts with sufficient lead-time may be that changes in monitored 55 variables lag behind the microbial processes that underlie state changes. We hypothesize 56 that biomarkers linked closely to microbial function, such as proteins, may serve as better 57 early warning signals of impending state changes than traditional aquatic ecosystem 58 biomarkers. 59 One of the challenges to studying aquatic ecosystem state changes is the lack of 60 replicable natural ecosystems that can be ethically manipulated. Recently, we have 61 identified the aquatic ecosystem that assembles in the cup-shaped leaves of the northern 62 pitcher plant Sarracenia purpurea as a model system for identifying whole-ecosystem 63 microbial processes associated with detrital enrichment. Each leaf functions as an 64 independent ecosystem that can be experimentally enriched and monitored through time 65 in the field or lab (Srivastava et al. 2004). Arthropod prey, mostly ants and flies, form the 66 base of a "brown" food web that includes dipteran larvae, protozoa, mites, rotifers, and a loading, microbial activity increases, pitcher fluid becomes turbid, and oxygen levels 72 collapse to hypoxic conditions even during daytime photosynthesis (Sirota et al. 2013). 73 Such consequences are similar to those seen in larger aquatic ecosystems that have 74 switched from a green to a brown food web dominated by detritivores, as an initial 75 increase in primary production leads to internal organic-matter loading and increasing 76 biological oxygen demand as primary producers decompose (Correll 1998). 77 In the last decade, environmental proteomics has emerged as a powerful tool to Massachusetts. Newly opened pitchers were identified and randomly assigned to an 114 ambient control or detritus-enriched treatment (Appendix S1 Six of ten replicate microbial pellets from each treatment yielded enough protein 131 for analysis via tandem mass spectrometry. All replicates were analyzed separately using 132 SDS-PAGE and Coomassie staining (Fig. 1, Appendix 1: Fig. S1a, and Appendix 1: Fig.   133 S1b). All six of the enriched pitchers and five of the six control pitchers had visible 134 protein staining levels and were chosen for mass spectrometry. Proteins were subjected to 135 a tryptic digest (Appendix S1) and to LC-MS/MS as previously described ( was then ranked by unique number of peptides and the top 220 proteins from each 185 treatment were selected so that the false discovery rate for control and enriched 186 treatments were 6.6% and 0%, respectively. These top 220 proteins and their associated 187 peptides are found in Data Supplement S1. 188 In the list of control peptides, a protein hit from the decoy database was 189 represented by 25 total peptides; therefore, we suspected that this hit was a true positive 190 not represented in our target database. However, a BLAST search of the full amino acid 191 sequence did not yield an identical match, so we cannot definitively claim it is a true 192 positive; therefore, we removed this peptide from our top 220 list of control peptides. 193 With this peptide removed, the false discovery rate for the control treatment was 4.3%. 194 All peptide hits were pooled within treatments and mapped back to their source 195 sequences in the custom protein database. Those  there was a single common protein pool for both the control and enriched treatments and 203 that the number of observed shared proteins between treatments reflects chance effects 204 resulting from random draws from this single protein pool (Appendix S1). We conducted 205 an additional simulation in R to determine the likelihood of a Type I error in our 206 randomization test (Appendix S1).  To determine the taxonomic composition of the microbes contributing to identified 220 proteins in our treatments, we conducted a BLAST homology search of the metagenomic 221 sequence data for protein hits. All peptides from the top 220 identified proteins in each 222 treatment were mapped back to their contigs of origin to obtain nucleotide sequences. 223 Because contigs were at least 500 base pairs in length, we felt confident that a BLAST   Table S1). 252 To determine whether bacteria contributing to expressed proteins in control and (4.04%) and Lutiella (3.91%). Within the metagenome, 23% of aligned contigs were 291 mapped to the order Burkholderiales while only 7% mapped to Neisserialies (Appendix 292 1: Fig. S3). Taxonomic evenness of the metagenome, calculated using Hurlbert's PIE, 293 was equal to 0.79. 294 Representation of the contigs mapping to functional pathways was dominated by 295 amino acid metabolism (20.6%), followed by membrane transport (12.9%), carbohydrate 296 metabolism (11.9%), translation (7.2%), and metabolism of cofactors and vitamins 297 (6.4%). Within amino acid metabolism, pathways were represented primarily by glycine, 298 serine, and threonine metabolism (17.1%), alanine, aspartate, and glutamate metabolism 299 (13.8%), and valine, leucine, and isoleucine degradation (12.7%). Membrane transport 300 was represented by ABC transporters (78.2%), bacterial secretion system (19.4%), and 301 phosphotransferase system (PTS) (2.4%). Carbohydrate metabolism was dominated by 302 pyruvate metabolism (13.9%), glycolysis/glucogenesis (12.6%), and pentose phosphate 303 pathway (11.6%). Overall, the top 5 level 3 KEGG categories included ABC transporters 304 (10.1%), two-component system (4.8%), aminoacyl-tRNA biosynthesis (3.8%), glycine, 305 serine, and threonine metabolism (3.5%), and ribosome (3.3%) (Appendix 1: Fig. S4). 306 We identified a total of 986 proteins in the enriched treatment and 616 proteins in 307 the control treatment. Of the 220 most abundant protein identifications for each 308 treatment, 65 were shared between treatments leaving 155 unique to each treatment (Fig   309   2a). The randomization test revealed significantly fewer protein hits shared between the 310 treatments than expected by chance (Fig 2b). In both treatments, the top three of the 20 311 most abundant proteins, as measured by the total number of matched peptides (spectral 312 counts), were the same in the control and enriched treatments. However, the relative 313 abundances of the remaining 17 proteins in this top list differed strongly between 314 treatments, with only seven of the 20 proteins unique to each treatment (Fig. 2c). 315 The majority of identified proteins were associated with bacteria. The most 316 common microbial class contributing to identified proteins in both treatments was 317 Betaproteobacteria, but the contribution was higher in enriched (84.4%) versus control 318 (50.3%) treatments (Table 1  remarkably similar results (Fig 3a, Appendix 1: Fig. S6), suggesting that tryptic peptides 422 could be used to correctly identify microbes contributing to identified proteins, though at 423 coarser taxonomic levels than can be achieved by nucleic acid analysis. 424 We hypothesized that there would be detectable differences in the function of 425 microbial communities in control and enriched pitchers. We measured function in two 426 ways: first, we mapped identified bacterial classes associated with proteins to their 427 oxygen requirements and second, we mapped peptides to functional KEGG pathways. 428 Oxygen requirements differed significantly between taxa contributing to protein

Protein)analysis)
C E

Retinol metabolism
alpha Linolenic acid metabolism

C5 Branched dibasic acid metabolism
Glycine serine and threonine metabolism

Phenylalanine metabolism
Selenocompound metabolism beta Alanine metabolism

Aflatoxin biosynthesis
Valine leucine and isoleucine biosynthesis

Toluene degradation
Tyrosine metabolism

Aminobenzoate degradation
Glutathione metabolism

Pantothenate and CoA biosynthesis
Alanine aspartate and glutamate metabolism

Oxidative phosphorylation
Limonene and pinene degradation

Lysine degradation
Arginine and proline metabolism

Phenylalanine tyrosine and tryptophan biosynthesis
Valine leucine and isoleucine degradation

Pyruvate metabolism
Cysteine and methionine metabolism

Metabolism of other amino
Lipid metabolism

Nucleotide metabolism
Amino acid metabolism

Proportion of total peptide identifications in treatment/replicate
Color Key

Lipid metabolism
Biosynthesis of other secondary metabolites

Xenobiotics biodegradation and metabolism
Energy metabolism

Nucleotide metabolism
Amino acid metabolism and Coomassie staining (Fig. 1, Appendix 1: Fig. S1a, and Appendix 1: Fig. S1b). hits. Whether protein hits were drawn with or without replacement, the number of shared 901 proteins was less than expected by chance supporting the alternative hypothesis that the 902 protein pools from the two treatments are distinct from one another (Fig 2b). 903 We conducted an additional simulation experiment (programmed in R) to test for Next, we followed the procedure that we described in our randomization test. 915 Namely, we reshuffled these proteins between the two groups, and calculated the number 916 of shared proteins between them. We used 100 replicates per simulated set of proteins 917 and repeated this procedure for 100 trials (preliminary runs showed that the results were 918 just as precise using only 100 replicates instead of the full 1000 employed in the analysis 919 of the real data). If our algorithm is behaving properly, less than 5% of such trial 920 simulations should yield a statistically significant result. We conducted two variants of 921 this test. In the first variant, each of the 10,000 proteins was equally abundant. In the 922 second variant, the protein abundances followed an exponential distribution, in which 923 there are a few relatively abundant proteins and a large number of relatively rare proteins. 924 We simulated this distribution by drawing elements from a beta distribution with 925 parameters shape1 = 0.5, shape2 =