Deep Metaproteomic Analysis of Human PDF

992 DOI 10.1002/pmic.
201100503 Proteomics 2012, 12, 9921001
RESEARCH ARTICLE
Deep metaproteomic analysis of human

salivary supernatant
Pratik Jagtap1 , Thomas McGowan2 , Sricharan Bandhakavi3 , Zheng Jin Tu1 , Sean Seymour4 ,
Timothy J. Griffin2 and Joel D. Rudney5
1
Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, USA
2
University of Minnesota Biochemistry Molecular Biology & Biophysics, Minneapolis, MN, USA
3
Bio-Rad Laboratories, Hercules, CA, USA
4
AB SCIEX, Foster City, CA, USA
5
School of Dentistry, University of Minnesota , Minneapolis, MN, USA
The human salivary proteome is extremely complex, including proteins from salivary glands, Received: September 23, 2011
serum, and oral microbes. Much has been learned about the host component, but little is Revised: December 8, 2011
known about the microbial component. Here we report a metaproteomic analysis of salivary Accepted: January 10, 2012
supernatant pooled from six healthy subjects. For deep interrogation of the salivary proteome,
we combined protein dynamic range compression (DRC), multidimensional peptide fraction-
ation, and high-mass accuracy MS/MS with a novel two-step peptide identification method
using a database of human proteins plus those translated from oral microbe genomes. Pep-
tides were identified from 124 microbial species as well as uncultured phylotypes such as TM7.
Streptococcus, Rothia, Actinomyces, Prevotella, Neisseria, Veilonella, Lactobacillus, Selenomonas,
Pseudomonas, Staphylococcus, and Campylobacter were abundant among the 65 genera from 12
phyla represented. Taxonomic diversity in our study was broadly consistent with metagenomic
studies of saliva. Proteins mapped to 20 KEGG pathways, with carbohydrate metabolism, amino
acid metabolism, energy metabolism, translation, membrane transport, and signal transduc-
tion most represented. The communities sampled appear to be actively engaged in glycolysis
and protein synthesis. This first deep metaproteomic catalog from human salivary supernatant
provides a baseline for future studies of shifts in microbial diversity and protein activities
potentially associated with oral disease.
Keywords:
Bacteria / Metaproteomics / Microbiology / Microbiome / Phylogenetic analysis /
Whole saliva
1 Introduction
The proteome of human whole saliva is extremely complex,

encompassing proteins from several types of salivary glands
Correspondence: Dr. Joel Rudney, School of Dentistry, University with distinct secretory profiles, surface, and secreted proteins
of Minnesota, 515 Delaware Street SE, 17-252 Moos Tower, Min-
from oral epithelial cells, as well as serum and neutrophil
neapolis, MN 55455, USA
E-mail: jrudney@umn.edu
proteins that enter the mouth through the gingival crevice
Fax: +1-612-626-2651 [1]. Moreover, the mouth also is home to an extremely diverse
microbial community. Recent metagenomic studies indicate
Abbreviations: BLAST, basic local alignment search tool; DRC, that the microbiota of whole saliva samples may include over
dynamic range compression; FDR, false discovery rate; GAPDH,
glyceraldehyde-3-phosphate dehydrogenase; HOMD, Human
Additional corresponding author: Dr. Tim Griffin,
Oral Microbiome Database; KEGG, Kyoto Encyclopedia of
Genes and Genomes; LTQ, linear trap quadrupole; MEGAN, E-mail: tgriffin@umn.edu
MEtaGenome Analyzer Colour Online: See the article online to view Figs. 2 and 3 in colour.

C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
Proteomics 2012, 12, 9921001 993
10 000 distinct taxa, and the vast majority of them have never libraries, and three-dimensional peptide fractionation prior
been cultured [27]. This suggests that oral microbes may to MS/MS analysis on an LTQ-Orbitrap instrument [8]. We
contribute thousands of additional proteins to the salivary also applied a novel two-step peptide identification method
proteome. for more effective metaproteomic analysis. These collective
Much has been learned about the human components of improvements enabled the first deep cataloging of microbial
the salivary proteome [8,9], but so far less is known regarding proteins in human salivary supernatant.
salivary proteins of microbial origin. In recent years, it
has become clear that the members of the oral microbiota 2 Materials and methods
interact extensively and function together as a polymicrobial
community [10]. Thus approaches are needed that would 2.1 Salivary supernatant dataset
allow one to analyze the salivary microbial proteome in
the context of its host. The community-based approach to Salivary supernatant was collected and pooled from six
microbial proteomics has become known as metaproteomics. healthy subjects who refrained from eating or drinking for
Conceptually, metaproteomics is the proteomic counterpart 90 min. Proteins were analyzed using ProteoMinerTM (Bio-
to metagenomics. Metaproteomic analyses have been carried Rad Laboratories, Hercules, CA, USA) for DRC, multidi-
out on bacterial communities from a wide variety of envi- mensional peptide fractionation, and an LTQ-Orbitrap mass
ronments, including human feces [11, 12], marine microbial spectrometer (Thermo Fisher Scientific, Waltham, MA) as
ecosystems [13], acid mine drainage [1416], activated sludge described in Bandhakavi et al. 2009 [8]. Additional 45 RAW
wastewater treatment systems [16, 17], and rhizosphere soil files generated from ProteoMiner TM Library-2-treated saliva
[18, 19]. were also analyzed.
However, several challenges hold back routine metapro-
teomic analyses in saliva, as well as other samples. First is
the challenge of the wide dynamic range of protein abun- 2.2 Two-step method for peptide sequence
dance. In saliva, human proteins are most likely orders of matching and protein identification
magnitude more abundant than microbial proteins, thereby
suppressing detection and identification of the microbial pro- RAW files generated (200 total) from the LTQ-Orbitrap sali-
teins. Therefore, advanced analytical techniques are needed to vary supernatant dataset were processed using the MaxQuant
dig deep into the salivary proteome and confidently identify (v1.0.13.13) Quant module to generate .MSM files
even the low abundance proteins of microbial origin. The sec- [8, 2224]. Individual Peak and iso MSM files corresponding
ond challenge is peptide sequence matching using sequence to each .RAW file were converted to Mascot generic format
database searching. Potential expressed protein sequences (MGF) and searched using ProteinPilot v 4.0 (ProteinPilot
can be translated from microbial genome sequences. How- Software 4.0; Revision: 148085; Paragon Algorithm: 4.0.0.0.
ever, the collective genomes of sequenced microbes exceed 148083; AB SCIEX, Framingham, MA). Paragon searches [34]
the size of the human genome by several orders of magnitude. were conducted using LTQ-Orbitrap subppm instrument set-
Thus, database searches are extremely computing intensive. tings. Other parameters used for the search were as follows:
Most significantly, peptide sequence matching against such Sample Type: Identification; Cys alkylation: None; Digestion:
very large databases suffers from the increased potential for Trypsin; ID Focus: Biological Modifications; Search effort:
false-positive matches, which lowers the number of high con- Thorough.
fidence true matches [20]. In the first step, all 200 RAW files were searched against
Our group previously published a metaproteomic analysis a database consisting of all the translated human oral mi-
of 357 microbial peptides obtained from the pellet fraction of crobial genomic sequences from the Human Oral Micro-
saliva, pooled from four oral cancer patients. Those peptides biome Database (HOMD) [25], along with the human IPI
were assignable to five bacterial phyla representing 26 genera, v3.52 database and contaminant proteins (1 687 426 total pro-
of which Streptococcus, Neisseria, and Haemophilus were tein sequences) [26]. The ProteinPilot searches generated a
the most abundant. The primary functions represented by the .group file that was used to generate a Protein Report from
parent proteins were translation, carbohydrate metabolism, the peptide sequence matches. All nonhuman protein se-
amino acid metabolism, and energy metabolism [21]. Al- quences that were identified at the threshold of at least 66%
though those findings were generally representative of the Conf score (0.47 ProtScore) in the first step were merged with
species composition and activities of known oral microbes, the Human IPI v3.52 database along with contaminant pro-
there also was concern that a substantial number of peptides teins to generate a refined FASTA database for the second
were assigned to exotic taxa that were very unlikely to be ac- step.
tually present in the mouth. This was particularly surprising In the second step, all 200 RAW files were searched
given that the overall number of microbial peptides obtained against a Target Decoy version of the FASTA database men-
was relatively small. tioned above, by appending the reversed protein sequences
Here, we improve on our previous study using dynamic to the forward sequences, resulting in a database containing
range compression (DRC) via ProteoMinerTM hexapeptide 152 724 total protein sequences. Parameters for ProteinPilot

994 P. Jagtap et al. Proteomics 2012, 12, 9921001
3 Results
3.1 Two-step database searching for microbial

peptide and protein identification
At the outset of our studies, we first addressed the challenges

of matching a large number of MS/MS spectra (988 974 total
spectra for our current study) to peptide sequences within the
very large database containing both human and translated
oral microbial proteins (over 1.6 million total proteins). A
large database increases the possibility for false-positive pep-
tide sequence matches [20], thereby necessitating more strin-
gent filtering thresholds and effectively lowering the number
Figure 1. Two-step method for human salivary metaproteome of high confidence matches.
analysis. See text for details. To address this challenge, we explored a novel two-step
database searching method, detailed in Fig. 1 and in Section
2. In the first step, we match MS/MS to peptides in the for-
ward database using low-stringency scoring. A refined, much
were the same as described for the first step with the ex-
smaller database is created containing all identified proteins
ception of searching to generate a false discovery rate (FDR)
from the first step, and a target-decoy version of this database
report. These ProteinPilot searches generated a .group file
used to re-search all MS/MS data, followed by stringent filter-
and a Proteomics System Performance Evaluation Pipeline
ing of peptide matches. We first evaluated the effectiveness
(PSPEP) FDR report [27]. The .group file was used to gener-
of this two-step method using a representative set of 20 RAW
ate a Protein Summary and a Peptide Summary Report. The
files from our salivary dataset. We compared the numbers of
FDR report was used to estimate the number of proteins,
human and microbial proteins identified using the two-step
distinct peptides, and spectra at 5% local FDR. Spectra iden-
method to the traditional one-step direct searching of the
tified at 5% local FDR were used to filter for non-human (mi-
very large target-decoy version of the combined human and
crobial) peptide identifications. A list of non-human distinct
microbial protein sequence database. As shown in Support-
peptide sequences that were identified was generated from
ing Information Fig. S1, the two-step method increased by
the Peptide Summary Report. Supporting Information Table
63% the number of microbial proteins identified while hav-
S1 contains information on all non-human peptides identi-
ing no discernable effect on identification of human proteins.
fied in this study. Supporting Information Table S3 provides
Given this positive result, we analyzed the entire dataset
a list of human proteins identified in an independent search
of 200 RAW files using the two-step method. Table 1 sum-
from the 3D-fractionated fractions.
marizes the results. The database search in the first step re-
sulted in identification of 2176 non-human proteins using a
relatively low threshold for matching. For the second step, a
refined target-decoy database containing all human proteins
2.3 Metaproteomic analysis and the 2176 microbial protein sequences was constructed.
This second search resulted in matching of 1927 distinct pep-
MEGAN4 (MetaGenomeANalyzer 4) software was used for tide sequences.
metaproteomic analysis [28, 29]. MEGAN4 specifically re-
quires BLAST searches of nucleotide or protein sequences
as input. The software then parses the BLAST results, and 3.2 Phylogenetic analysis
generates phylogenetic assignments for input sequences us-
ing a homology-based algorithm. MEGAN4 also uses Ref- A BLAST output file containing results for 1927 peptide se-
seq IDs in BLAST hits (when available), to assign peptides quences was input to MEGAN4. We initially used MEGAN4s
to KEGG pathways. The distinct peptide sequences retained default minimum BLAST bit score criterion of 35.0. How-
after FDR analysis by ProteinPilot were used for BLASTP ever, we obtained almost the same phylogenetic tree when
searches (v BLASTP 2.2.25+) against a database containing all the minimum bit score was reduced to zero, with the main
non-redundant GenBank sequences (13 812 207 sequences). difference being that more peptides were assigned to differ-
We used the online version of the BLASTP server, with default ent taxa. Accordingly, we present the results obtained when
parameters adjusted for short sequences (except in the case all 1927 peptides were included.
of three peptides that were more than 30 amino acids long). The complete phylogenetic tree is available as Support-
The default maximum of 100 hits per peptide was specified. ing Information Fig. S2. Some peptides were so highly con-
The compiled BLAST results were then input to MEGAN4, served that they could not be assigned at the kingdom level
for phylogenetic and KEGG analysis. (539; 28%). The remaining peptides were all assigned to

Proteomics 2012, 12, 9921001 995
Table 1. Summary of two-step database searching method for salivary metaproteomics
Number of Total number of First stepa) Second stepb)

RAW files MS/MS spectra
Total sequences in Non-human Total sequences in refined Non-human, distinct
initial database proteins identified target-decoy database peptides identified
200 988 974 1 687 426 2176 152 724 1927
a) Database used contains human proteins + translated sequences from HOMD. Proteins identified at a low stringency 66% confidence
value in ProteinPilot.
b) Database contains human proteins + microbial sequences identified from first step. Peptides were identified using a stringent 5% local
FDR value in ProteinPilot.
the kingdom Bacteria with the exception of 10 being as- 37 peptides were assigned at the species level, but 89 were
signed to Eukaryota, and 2 to viruses. Sixteen percent of assignable at the genus level. Thus, the genus level may pro-
the Bacteria peptides could not be assigned below the king- vide a better representation of prevalence patterns within this
dom level. The remaining 1156 peptides were distributed dataset.
among twelve phyla (Fig. 2A). The four most abundant
phyla collectively accounted for 81% of bacterial peptides.
Firmicutes were the most prevalent, followed by Proteobac-
teria, Actinobacteria, and Bacteroidetes. Sixty-six percent of 3.3 Ontology analysis
bacterial peptides were assignable at the genus level, rep-
resenting 65 genera (Fig. 2B). The most abundant gen- The KEGG analysis assigned 1774 peptides to 20 path-
era were Streptococcus, Rothia, Actinomyces, Prevotella, and ways (Fig. 3). It is important to note that a number of
Neisseria. key enzymes were represented in multiple KEGG pathways.
Thirty-five percent of bacterial peptides were assignable Carbohydrate Metabolism was the most prevalent pathway.
at the species level, representing 124 species (Support- Proteins involved in glycolysis were among the most com-
ing Information Fig. S2 and Supporting Information mon on the basis of the number of peptides assigned to
Table S2). The most common species were Rothia mucilagi- them (Table 2). Glyceraldehyde-3-phosphate dehydrogenase
nosa, Prevotella melanogenica, Selenomonas sputigena, Pseu- (GAPDH) had the highest number of sequence assignments
domonas aeruginosa, Cornybacterium matrochotii, Streptococ- in the entire dataset, and enolase, aldolase, phosphoglyc-
cus salivarius, Actinomyces odontolyticus, and Abiotrophia defec- erate mutase, triose-phosphate isomerase, pyruvate kinase,
tiva. The remaining 115 species represented a broad range phosphoenolpyruvate carboxykinase, and phosphoglycerate
of oral bacterial diversity. They included species that are kinase also were relatively abundant. Pyruvate Metabolism
presently non-cultivable, most notably two members of can- was represented by pyruvate formate lyase, and the Citrate
didate division TM7. All of those species were listed in the Cycle by fumarate hydratase and succinyl-CoA synthetase.
current version of the HOMD database, with the exception Levansucrase provided evidence for Sucrose Metabolism.
of six. Downstream from the Carbohydrate Metabolism path-
The six bacterial species not in HOMD included represen- way, Energy Metabolism was represented by ATP
tatives of a variety of environments, including the human synthase.
rectum, seawater, soil, plant pathogens, and legume root Within the DNA Replication and Repair pathway, DNA
nodules (Supporting Information Table S2). There was only polymerase was the second most abundant protein in the
a single peptide assigned to each. Five of them showed dataset. Molecular chaperone DnaK was the major compo-
mismatches with the target sequence for their assigned nent of Folding Sorting, and Degradation, while Transcrip-
species in BLAST. Mismatches also applied to the five pep- tion was primarily represented by RNA polymerase. Trans-
tides in the complete phylogenetic tree that were assigned lation was represented by translation elongation factor G, as
to non-human eukaryotic species, and one of two peptides well as ribosomal protein L7/L12 and ribosomal protein S1.
assigned to viruses (Supporting Information Fig. S2). Other translation elongation factors and ribosomal proteins
It is important to note that peptide counts for partic- were present, but with fewer matching peptides.
ular species were not always representative of the preva- The Cell Motility pathway was dominated by flagellin and
lence of their parent genus. A good example is Neisse- methyl-accepting chemotaxis protein. The Membrane Trans-
ria, where only five peptides were assignable at the species port group consisted mostly of diverse phosphotransferases
level, but 66 were assignable at the genus level (Support- associated with carbohydrate transport, although ABC trans-
ing Information Table S2). Likewise, only four peptides were porters also were detected. The Signal Transduction pathway
assignable to Veilonella at the species level, but 38 were was represented by a variety of bacterial two-component sys-
assignable at the genus level. In the case of Streptococcus, tems. (not shown).

Figure 2. Phyla and genus level assignments for human salivary proteome. (A) Pie chart showing peptides assigned to bacterial phyla. The
numbers in parentheses indicate the number of peptides assigned to each phylum. (B) Pie chart showing peptides assigned to bacterial
genera. The numbers in parentheses indicate the number of peptides assigned to each genus. For reasons of legibility, genera with fewer
than four peptide assignments have been excluded from the chart. The excluded genera are shown in Supporting Information Table S2.

Proteomics 2012, 12, 9921001 997
Figure 3. KEGG analysis of human salivary metaproteome. Pie chart showing bacterial peptides assigned to KEGG pathways. The numbers
in parentheses indicate the number of peptides assigned to each pathway. Because many proteins are assigned to multiple KEGG pathways,
the total number of peptide assignments is greater than the total number of peptides identified in our study.
4 Discussion racy can increase confident peptide identifications [31], espe-

cially when processing the data via the Quant module from
Our previous metaproteomic study identified 357 bacterial MaxQuant [23,32].
peptides in a pooled sample of salivary pellets from four oral Our novel two-step database searching method also signif-
squamous cell carcinoma patients [21]. The pellet fraction of icantly increased bacterial protein identifications. We believe
saliva is a mixture consisting mostly of exfoliated epithelial the increased bacterial protein identifications was due to the
cells, bacteria, and high molecular weight salivary proteins, use of the refined database in the second step, which was an
while the salivary supernatant fraction is largely bacteria free. order of magnitude smaller than the target-decoy database
Nevertheless, in this study, we achieved an almost six-fold in- used for the one-step method. A smaller database has less po-
crease in the number of bacterial peptides detected in a pooled tential for false-positive matches [20], thereby requiring lower
sample of salivary supernatants from six healthy subjects. stringency filtering and providing more protein identifica-
Several factors likely increased the depth of informa- tions. Also beneficial to our study was the use of ProteinPilot
tion obtained in the current study. Protease inhibitors were database searching software designed for use with large se-
used here to better preserve proteins prior to DRC [8]. The quence databases and capable of robust FDR estimation [33],
compression of high-abundance host salivary proteins via making it well suited for our two-step method.
ProteomeMinerTM treatment likely enhanced the detection of One problem in metaproteomics is the choice of an ap-
bacterial proteins, which are at lower abundance compared to propriate database for peptide sequence matching. For our
the dominant human proteins from the host. Interestingly, study in saliva, a database that encompasses a wide range of
there was relatively little overlap between the bacterial pep- microbial environments runs the risk of producing a large
tides identified from the ProteomeMinerTM -treated portion of number of false positive identifications of proteins from
the pool and those identified from the untreated portion (data microbes that may share common sequences with oral taxa,
not shown). Thus a parallel analysis of a non-treated sample even though they are quite unlikely to be actually present in
may be warranted when using this approach for metapro- the oral environment. Our previous metaproteomics study
teomic studies. We also used sensitive LTQ-Orbitrap mass of the salivary cell pellet suffered from this problem, as we
spectrometry, with high peptide precursor mass accuracy, identified a number of microbes likely not found in the oral
recommended for metaproteomics studies [30]. Mass accu- environment [21].

Table 2. KEGG analysis of abundant microbial proteins from the salivary metaproteome
KEGG pathway Protein name (EC # or gene name) Number of peptides a)
Carbohydrate metabolism
Glyceraldehyde-3-phosphate dehydrogenase (1.2.1.12) 58
Enolase (4.2.1.11) 27
Fructose-bisphosphate aldolase (4.1.2.13) 18
Fumarate hydratase (4.2.1.2) 14
Pyruvate formate-lyase (2.3.1.54) 14
Phosphoglycerate mutase (5.4.2.1) 13
Triose-phosphate isomerase (5.3.1.1) 11
Pyruvate kinase (2.7.1.40) 11
Succinyl-CoA synthetase (6.2.1.5) 9
Phosphoenolpyruvate carboxykinase (4.1.1.32) 8
Levansucrase (2.4.1.10) 8
Phosphoglycerate kinase (2.7.2.3) 8
Cell motility
Flagellin (FliC) 25
Methyl-accepting chemotaxis protein (MCP) 16
DNA replication and repair
DNA polymerase (2.7.7.7) 36
Energy metabolism
ATP synthase (3.6.3.14) 12
Folding, sorting, and degradation
Molecular chaperone DnaK (DnaK) 12
Transcription
RNA polymerase (2.7.7.6) 13
Translation
Translation elongation factor G (EF-2) 23
Ribosomal protein L7/L12 (L7/L12) 14
Ribosomal protein S1 (S1) 13
Ribosomal protein S2 (S2) 11
a) Number of distinct peptides assigned to a given protein. Proteins listed in this table were represented by at least eight peptides.
One solution to this problem is to perform a metapro- sity in saliva, that diversity is very unevenly distributed with
teomic analysis on a sample for which corresponding metage- respect to prevalence [27]. Because of the large amounts of
nomic data already exist. That allows the peptides to be protein required for DRC and subsequent peptide fraction-
matched against genomic data that is specific to the same in- ation, it was necessary to pool saliva samples for this study.
dividuals. This was done in a recent metaproteomic analysis That pool included six individuals of varied ethnicity. Our
of the human gut microbiota, with considerable success [12]. data can be considered analogous to population summary
We did not have a metagenomic dataset to work from, so we data from a metagenomic study, although the data represent
chose the translated protein sequences from the HOMD ge- a relatively small number of people.
nomic dataset instead. HOMD is a curated database of species From that perspective, our data shows patterns that are
and uncultured phylotypes that have been identified from oral consistent with existing metagenomic data. At the phylum
samples on the basis of 16S rRNA sequencing. It also incorpo- level, our study is consistent with others showing Firmicutes
rates genomic data for approximately 150 members of the oral to be the most prevalent phylum, with Proteobacteria, Acti-
microbiota that have been completely or partially sequenced nobacteria, and Bacteroidetes also being abundant [27]. Our
[25, 34]. That allowed a search strategy focused on species of data showed a relatively higher prevalence of Actinobacteria
verified oral origin. Our strategy appears to have been suc- than other studies [27], but this is more likely to be due to
cessful, since only six peptides were assigned by MEGAN4 to individual variation than to any bias towards Actinobacteria in
species not present in HOMD, and five of those assignments our proteomic approach. The same broad patterns of similar-
were based on imperfect matches in BLAST. ity existed at the genus level, although we observed relatively
Recent 16S rRNA-based metagenomic studies of microbial lower prevalence of Haemophilus than has been seen by oth-
diversity in human saliva have shown that there is consider- ers [27]. Again, we believe that is most likely due to variation
able variation between individuals, and possibly between dif- between individuals.
ferent geographical populations as well. However, there also It has been suggested that the salivary microbiota corre-
is consistent evidence for a core oral microbiome consisting sponds most closely to that of the tongue [5, 35]. Our find-
of a more limited number of abundant taxa at the phylum, ings are consistent with that hypothesis, since we observed a
genus, and species level. Thus, although there is great diver- high relative abundance for species common on the tongue,

Proteomics 2012, 12, 9921001 999
notably R. mucilaginosa, S. salivarius, and P. melaninogenica include proteins that are relatively abundant in our dataset,
[5, 35, 36]. However, metaproteomic species classifications such as enolase, fructose-bisphosphate aldolase, phospho-
have to be interpreted cautiously, since important bacterial glycerate mutase, triose-phosphate isomerase, pyruvate ki-
proteins may include regions, such as enzyme active sites, nase, and phosphoglycerate kinase. Adhesin functions have
that are highly conserved. Thus, some genera that were poorly been demonstrated for many of those proteins [4146]. More-
represented at the species level were in fact more abundant over, streptococcal surfaces also incorporate proteins from
when viewed at the genus level. other pathways that are relatively abundant in our dataset,
The same caveat applies to metaproteomic data at any tax- including ATP synthase, molecular chaperone DnaK, RNA
onomic level. There is presently no method for selectively polymerase, Translation elongation factor G, Ribosomal Pro-
removing conserved peptides. Thus, it is likely that there will tein L7/L12, and Ribosomal Protein S1. Those proteins also
be a consistent bias towards such sequences. Nevertheless, are capable of functioning as adhesins [4146]. Taken to-
the high degree of consistency between our data and previous gether, those reports suggest that many of the most abun-
16s rRNA metagenomic studies suggest that our approach for dant proteins in our dataset may function as extracellular ad-
deep metaproteomic analysis can be used to provide compara- hesins, in addition to their established roles in intracellular
ble information about microbial diversity in saliva, although pathways.
it may not provide the same depth of coverage of taxa that are To conclude, our metaproteomic analysis has provided the
extremely rare. first in-depth catalog of bacterial proteins in human saliva
The diversity of proteins detected by our approach also fa- supernatant. We believe this could serve as a basis for future
cilitated ontological analysis of the activities carried out by studies relevant to oral disease. Common oral diseases such
the oral microbial community at large. The KEGG analysis as dental caries and periodontal disease are associated with
suggested that the saliva supernatant community appeared major shifts in microbial ecology [47, 48]. Moreover, changes
to be actively engaged in growth and metabolism. Evidence in the salivary microbiota also have been documented for
for active DNA replication, transcription, and translation was rare but very serious conditions such as oral cancer [49]. Most
provided by the relatively large number of peptides derived studies have emphasized taxonomic changes in the compo-
from DNA polymerase, RNA polymerase, ribosomal proteins, sition of the oral microbiota in disease, but it is reasonable
and translation elongation factors, while numerous peptides to suggest that such changes are likely to be accompanied
from ATP synthase suggested that activated energy carriers by changes in the expression of microbial proteins. Our data
were being produced. Glycolysis is the most likely mecha- now can be used as a basis for comparison in future metapro-
nism for ATP production, since peptides for key enzymes in teomic studies of oral diseases. Our findings suggest that
that pathway were among the most abundant. Moreover, the the salivary metaproteome is likely to be closely correlated
Membrane Transport pathway included peptides for various with that of the tongue. Since the tongue is a frequent site of
components of sugar phosphotransferase systems, consistent occurrence for oral cancer [50], comparisons of the salivary
with active glycolysis. The overall pattern is consistent with metaproteome in health, dysplasia, and disease may be par-
metabolism of dietary carbohydrates and salivary glycopro- ticularly useful for testing the hypothesis that oral microbes
teins by oral bacteria. are directly or indirectly involved in the pathogenesis of oral
KEGG pathway maps may incorporate data from both cancer. We are actively engaged in comparative studies to
prokaryotes and eukaryotes [37]. As a consequence, some address that question.
microbial enzymes in our dataset were also cross-referenced
to eukaryote-specific KEGG pathways. Obvious misclassifi- This research was funded in part by NIH grant
cations such as that were easy to eliminate. A subtler issue 1R01DE017734. We also thank the Center for Mass Spectrometry
was presented by microbial proteins that have functions ad- and Proteomics for instrumental support and maintenance and
ditional to their defined roles in KEGG pathways. GAPDH the Minnesota Supercomputing Institute for computational pro-
provides an excellent case in point. GAPDH is known to oc- teomics support. The data associated with this manuscript may
cur on streptococcal surfaces [38], and oral streptococci appear be downloaded from ProteomeCommons.org Tranche using the
to release GAPDH when grown in batch culture under cer- passphrase saliva along with the following hash:
tain conditions [39]. The extracellular form of streptococcal T6YUFF/s0PhO2HFJ0KEH/AXoAK/vlpPaYcgBFzpNOoj
GAPDH is enzymatically active [38,39], but GAPDH has also Q0SZVoX9k2Ts2YOsuWi5IkdySa8bF0av4ZvN2cUvh8NO+
been shown to act as a bacterial adhesin for host cell proteins, SXYAAAAAAAA1tw = = .
such as fibronectin [38], and other oral bacterial species, such
as Porphyromonas gingivalis [40]. There is only a single copy The authors have declared no conflict of interest.
of the GAPDH gene in streptococci [38,39], so the same gene
product appears to be responsible for both intracellular and
extracellular activities. 5 References
More recent proteomic studies of oral streptococci grown
in monoculture have established that other members of the [1] Helmerhorst, E. J., Oppenheim, F. G., Saliva: a dynamic pro-
glycolytic pathway are surface proteins on streptococci. Those teome. J. Dent. Res. 2007, 86, 680693.

[2] Keijser, B. J., Zaura, E., Huse, S. M., van der Vossen, J. M. et [18] Wang, H. B., Zhang, Z. X., Li, H., He, H. B. et al., Characteriza-
al., Pyrosequencing analysis of the oral microflora of healthy tion of metaproteomics in crop rhizospheric soil. J. Proteome
adults. J. Dent. Res. 2008, 87, 10161020. Res. 2011, 10, 932940.
[3] Nasidze, I., Li, J., Quinque, D., Tang, K. et al., Global diversity [19] Wu, L., Wang, H., Zhang, Z., Lin, R., Lin, W., Compara-
in the human salivary microbiome. Genome Res. 2009, 19, tive metaproteomic analysis on consecutively Rehmannia
636643. glutinosa-monocultured rhizosphere soil. PLoS One 2011, 6,
[4] Nasidze, I., Quinque, D., Li, J., Li, M., et al., Comparative anal- e20611.
ysis of human saliva microbiome diversity by barcoded py- [20] Cargile, B. J., Bundy, J. L., Stephenson, J. L., Jr., Potential for
rosequencing and cloning approaches. Anal. Biochem. 2009, false positive identifications from large databases through
391, 6468. tandem mass spectrometry. J. Proteome Res. 2004, 3, 1082
[5] Zaura, E., Keijser, B., Huse, S., Crielaard, W., Defining the 1085.
healthy core microbiome of oral microbial communities. [21] Rudney, J. D., Xie, H., Rhodus, N. L., Ondrey, F. G., Grif-
BMC Microbiol. 2009, 9, 250, doi:10.1186/1471-2180-9-259. fin, T. J., A metaproteomic analysis of the human salivary
[6] Bik, E. M., Long, C. D., Armitage, G. C., Loomer, P. et al., microbiota by three-dimensional peptide fractionation and
Bacterial diversity in the oral cavity of 10 healthy individuals. tandem mass spectrometry. Mol. Oral Microbiol. 2010, 25,
ISME J. 2010, 4, 962974. 3849.
[7] Ling, Z., Kong, J., Jia, P., Wei, C. et al., Analysis of oral [22] Cox, J., Mann, M., Computational principles of determining
microbiota in children with dental caries by PCR-DGGE and improving mass precision and accuracy for proteome
and barcoded pyrosequencing. Microbial Ecol. 2010, 60, measurements in an Orbitrap. J. Am. Soc. Mass Spectrom.
677690. 2009, 20, 14771485.
[8] Bandhakavi, S., Stone, M. D., Onsongo, G., Van Riper, [23] Cox, J., Mann, M., MaxQuant enables high peptide identi-
S. K., Griffin, T. J., A dynamic range compression and fication rates, individualized p.p.b.-range mass accuracies
three-dimensional peptide fractionation analysis platform and proteome-wide protein quantification. Nat. Biotechnol.
expands proteome coverage and the diagnostic potential 2008, 26, 13671372.
of whole saliva. J. Proteome Res. 2009, 8, 55905600. [24] Olsen, J. V., de Godoy, L. M., Li, G., Macek, B. et al., Parts per
[9] Denny, P., Hagen, F. K., Hardt, M., Liao, L. et al., The pro- million mass accuracy on an Orbitrap mass spectrometer via
teomes of human parotid and submandibular/sublingual lock mass injection into a C-trap. Mol. Cell. Proteomics 2005,
gland salivas collected as the ductal secretions. J. Proteome 4, 20102021.
Res. 2008, 7, 19942006. [25] Chen, T., Yu, W.-H., Izard, J., Baranova, O. V. et al., The Hu-
[10] Marsh, P. D., Moter, A., Devine, D. A., Dental plaque biofilms: man Oral Microbiome Database: a web accessible resource
communities, conflict and control. Periodontol. 2000 2011, for investigating oral microbe taxonomic and genomic infor-
55, 1635. mation. Database 2010, doi:10.1093/database/baq013.
[11] Klaassens, E. S., de Vos, W. M., Vaughan, E. E., Metapro- [26] Kersey, P. J., Duarte, J., Williams, A., Karavidopoulou, Y. et
teomics approach to study the functionality of the microbiota al., The International Protein Index: an integrated database
in the human infant gastrointestinal tract. Appl. Environ. Mi- for proteomics experiments. Proteomics 2004, 4, 19851988.
crobiol. 2007, 73, 13881392. [27] Tang, W. H., Shilov, I. V., Seymour, S. L., Nonlinear fitting
[12] Verberkmoes, N. C., Russell, A. L., Shah, M., Godzik, method for determining local false discovery rates from de-
A. et al. Shotgun metaproteomics of the human distal gut coy database searches. J. Proteome Res. 2008, 7, 36613667.
microbiota. ISME J. 2009, 3, 179189. [28] Huson, D. H., Auch, A. F., Qi, J., Schuster, S. C., MEGAN
[13] Sowell, S. M., Wilhelm, L. J., Norbeck, A. D., Lipton, M. S. analysis of metagenomic data. Genome Res. 2007, 17, 377
et al., Transport functions dominate the SAR11 metapro- 386.
teome at low-nutrient extremes in the Sargasso Sea. ISME [29] Huson, D. H., Mitra, S., Ruscheweyh, H.-J., Weber, N., Schus-
J. 2009, 3, 93105. ter, S. C., Integrative analysis of environmental sequences
[14] Mueller, R. S., Dill, B. D., Pan, C., Belnap, C. P. et al., Proteome using MEGAN4. Genome Res. 2011, 21, 15521560.
changes in the initial bacterial colonist during ecological suc- [30] Renuse, S., Chaerkady, R., Pandey, A., Proteogenomics. Pro-
cession in an acid mine drainage biofilm community. Envi- teomics 2011, 11, 620630.
ron. Microbiol. 2011, 8, 22792292. [31] Boyne, M. T., Garcia, B. A., Li, M., Zamdborg, L. et al., Tandem
[15] Jiao, Y., DHaeseleer, P., Dill, B. D., Shah, M. et al., Identi- mass spectrometry with ultrahigh mass accuracy clarifies
fication of biofilm matrix-associated proteins from an acid peptide identification by database retrieval. J. Proteome Res.
mine drainage microbial community. Appl. Environ. Micro- 2009, 8, 374379.
biol. 2011, 77, 52305237. [32] Renard, B. Y., Kirchner, M., Monigatti, F., Ivanov, A. R. et al.,
[16] Graham, C., McMullan, G., Graham, R. L., Proteomics in the When less can yield more - computational preprocessing of
microbial sciences. Bioeng. Bugs 2011, 2, 1730. MS/MS spectra for peptide identification. Proteomics 2009,
[17] Abram, F., Enright, A. M., OReilly, J., Botting, C. H. et al., 9, 49784984.
A metaproteomic approach gives functional insights into [33] Shilov, I. V., Seymour, S. L., Patel, A. A., Loboda, A.et al., The
anaerobic digestion. J. Appl. Microbiol. 2011, 110, 1550 Paragon Algorithm, a next generation search engine that
1560. uses sequence temperature values and feature probabilities

Proteomics 2012, 12, 9921001 1001
to identify peptides from tandem mass spectra. Mol. Cell. wall-associated proteins of group A Streptococcus. Infect.
Proteomics 2007, 6, 16381655. Immun. 2005, 73, 31373146.
[34] Dewhirst, F. E., Chen, T., Izard, J., Paster, B. J. et al., The [43] Blau, K., Portnoi, M., Shagan, M., Kaganovich, A.et al.,
human oral microbiome. J. Bacteriol. 2010, 192, 5002 Flamingo cadherin: a putative host receptor for Streptococ-
5017. cus pneumoniae. J. Infect. Dis. 2007, 195, 18281837.
[35] Mager, D. L., Ximenez-Fyvie, L. A., Haffajee, A. D., Socransky, [44] Severin, A., Nickbarg, E., Wooters, J., Quazi, S. A. et al.,
S. S., Distribution of selected bacterial species on intraoral Proteomic analysis and identification of Streptococcus pyo-
surfaces. J. Clin. Periodontol. 2003, 30, 644654. genes surface-associated proteins. J. Bacteriol. 2007, 189,
[36] Kazor, C. E., Mitchell, P. M., Lee, A. M., Stokes, L. N., et al., 15141522.
Diversity of bacterial populations on the tongue dorsa of pa- [45] Kesimer, M., Kilic, N., Mehrotra, R., Thornton, D. J., Sheehan,
tients with halitosis and healthy patients. J. Clin. Microbiol. J. K., Identification of salivary mucin MUC7 binding proteins
2003, 41, 558563. from Streptococcus gordonii. BMC Microbiol. 2009, 9, 163,
[37] Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K. F., et al., doi:10.1186/1471-2180-9-163.
From genomics to chemical genomics: new developments [46] Wilkins, J. C., Homer, K. A., Beighton, D., Altered protein ex-
in KEGG. Nucleic Acids Res., 34, D354D357. pression of Streptococcus oralis cultured at low pH revealed
[38] Pancholi, V., Fischetti, V. A., A major surface protein on by two-dimensional gel electrophoresis. Appl. Environ. Mi-
group A streptococci is a glyceraldehyde-3-phosphate- crobiol. 2001, 67, 33963405.
dehydrogenase with multiple binding activity. J. Exp. Med. [47] Aas, J. A., Griffen, A. L., Dardis, S. R., Lee, A. M. et al., Bacteria
1992, 176, 415426. of dental caries in primary and permanent teeth in children
[39] Nelson, D., Goldstein, J. M., Boatright, K., Harty, D. W. S. and young adults. J. Clin. Microbiol. 2008, 46, 14071417.
et al., pH-regulated secretion of a glyceraldehyde-3- [48] Colombo, A. P. V., Boches, S. K., Cotton, S. L., Goodson, J.
phosphate dehydrogenase from Streptococcus gordonii M.et al., Comparisons of subgingival microbial profiles of re-
FSS2: purification, characterization, and cloning of the gene fractory periodontitis, severe periodontitis, and periodontal
encoding this enzyme. J. Dent. Res. 2001, 80, 371377. health using the human oral microbe identification microar-
[40] Maeda, K., Nagata, H., Yamamoto, Y., Tanaka, M. et al., ray. J. Periodontol. 2009, 80, 14211432.
Glyceraldehyde-3-phosphate dehydrogenase of Streptococ- [49] Mager, D. L., Haffajee, A. D., Devlin, P. M., Norris, C. M.
cus oralis functions as a coadhesin for Porphyromonas gin- et al., The salivary microbiota as a diagnostic indicator of
givalis major fimbriae. Infect. Immun. 2004, 72, 13411348. oral cancer: a descriptive, non-randomized study of cancer-
[41] Wilkins, J. C., Beighton, D., Homer, K. A., Effect of acidic pH free and oral squamous cell carcinoma subjects. J. Transl.
on expression of surface-associated proteins of Streptococ- Med. 2005, 3, 27, doi:10.1186/1479-5876-3-27.
cus oralis. Appl. Environ. Microbiol. 2003, 69, 52905296. [50] Shiboski, C. H., Shiboski, S. C., Silverman, S., Trends in oral
[42] Cole, J. N., Ramirez, R. D., Currie, B. J., Cordwell, S. J.et cancer rates in the United States, 19731996. Community
al., Surface analyses and immune reactivities of major cell Dent. Oral Epidemiol. 2000, 28, 249256.


Deep Metaproteomic Analysis of Human PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Metaproteomic Analysis of Human PDF

Uploaded by

Copyright:

Available Formats

992 DOI 10.1002/pmic.

201100503 Proteomics 2012, 12, 9921001

Deep metaproteomic analysis of human

The proteome of human whole saliva is extremely complex,

3.1 Two-step database searching for microbial

At the outset of our studies, we first addressed the challenges

Table 1. Summary of two-step database searching method for salivary metaproteomics

Number of Total number of First stepa) Second stepb)

200 988 974 1 687 426 2176 152 724 1927

4 Discussion racy can increase confident peptide identifications [31], espe-

KEGG pathway Protein name (EC # or gene name) Number of peptides a)

You might also like