Professional Documents
Culture Documents
doi: 10.1111/1348-0421.12374
ORIGINAL ARTICLE
ABSTRACT
Multilocus sequence analysis based on hypervariable housekeeping proteins was utilized to
differentiate closely related species in the family Enterobacteriaceae. Of 150 housekeeping proteins,
the top 10 hypervariable proteins were selected and concatenated to obtain distance data. Distances
between concatenated proteins within the family were 0.9–41.2%, whereas the 16S rRNA and
atpD-gyrB-infB-rpoB concatenated sequence (4MLSA) distances were 0.8–6.0% and 0.9–22.1%,
respectively. These data indicate that phylogenetic analysis by concatenation of hypervariable
proteins is a powerful tool for discriminating species in the family Enterobacteriaceae. To confirm
the discriminatory power of the 10 chosen concatenated hypervariable proteins (C10HKP),
phylogenetic trees based on C10HKP, 4MLSA, and the 16S rRNA gene were constructed.
Comparison of average bootstrap values among C10HKP, 4MLSA and 16S rRNA genes indicated
that the C10HKP tree was the most reliable. Location via the C10HKP tree was consistent with
existing assignments for almost all species in the family Enterobacteriaceae. However, the C10HKP
tree suggested that several species (including Enterobacter massiliensis, Escherichia vulneris,
Escherichia hermannii, and Salmonella subterranea) should be reassigned to different clusters than
those defined in previous analyses. Furthermore, E. hermannii and S. subterranea appeared to fall
onto a branch independent from those occupied by the other Enterobacteriaceae. Therefore, we
propose Atlantibacter gen. nov., such that E. hermannii and S. subterranea would be transferred to
genus Atlantibacter as Atlantibacter hermannii, comb. nov. and Atlantibacter subterranea. comb.
nov., respectively.
Correspondence
Hiroyuki Hata, Department of Microbiology, Gifu University Graduate School of Medicine, Gifu University, Yanagido 1-1, Gifu, 501-1194, Japan.
Tel: þ81 58 230 6488; Fax: þ81 58 267 0156; email: h-hatabou@feb.email.ne.jp
Received 21 December 2015; revised 25 February 2016; accepted 7 March 2016.
List of Abbreviations: 4MLSA, atpD-gyrB-infB-rpoB concatenated sequences; C10HKP, 10 concatenated hypervariable proteins; CDC, Centers for
Disease Control and Prevention; DSM, Deutsche Sammlung von Mikroorganismen (German Collection of Microorganisms); JNBP, Japan National
Bioresource of Bacterial Pathogen; KCN, potassium cyanide; LIM, lysine indole motility; MLSA, multilocus sequence analysis; NBRC, National Biological
Resource Center; TSI, triple sugar iron.
© 2016 The Societies and John Wiley & Sons Australia, Ltd 303
H. Hata et al.
Members of the family Enterobacteriaceae are classified Raw sequence data were assembled using Newbler
into 52 genera and approximately 290 species. These version 2.6 (Roche Diagnostics) and ABySS version 1.3.2
bacteria have been classified into several systems (9) software. After assembly, more than 90% of the
according to different standards during their taxonomic sequences were auto-annotated using the Microbial
history. Some pathogenic species of the family were Genome Annotation Pipeline (MiGAP) program suite
misclassified historically based solely on their pathogenic version 1.0.50 (10).
characteristics or the results of 16S rRNA–based
phylogenetic analyses. Because of the low discriminatory Selection of housekeeping proteins
power of 16S rRNA gene analysis, several housekeeping One hundred and fifty housekeeping proteins, including
genes have instead been used for phylogenetic analyses; ribosomal proteins; proteins involved in tRNA synthesis,
these housekeeping loci have included rpoB (1, 2), gyrB DNA replication and modification, and cell wall
(3), dnaJ (4) and recA (5). Moreover, MLSA has become synthesis; and molecular chaperones were selected
a powerful alternative to 16S rRNA gene or single from the clusters of orthologous groups database (11)
housekeeping gene phylogenetic analysis (5–8). for Escherichia coli strain K-12 (substrain MG1655),
Approximately 10 years have passed since the Salmonella enterica subsp. enterica, serovar Typhimu-
emergence of next-generation sequencing techniques; rium strain LT2 and Yersinia pestis strain CO92
these methods have made it relatively easy and inexpen- (Supplemental Table 2). The numbers of amino acid
sive to obtain draft genome sequences for bacterial strains. sequence differences between strains for each protein
In addition, many housekeeping protein sequences now were determined using MEGA version 6 software (12).
can be readily identified using high-performance auto-
mated protein sequence annotation programs. These Distance and phylogenetic analysis
bioinformatics improvements have enabled the selection
After concatenation of the 10 selected protein sequences,
of a greater variety of housekeeping proteins and therefore
the concatenated sequences together with 16S rRNA
might provide more reliable and precise MLSA data.
gene sequences were aligned using MEGA6 software.
The family Enterobacteriaceae contains 52 historically
MLSA using the atpD, gyrB, infB and rpoB genes was
well-characterized genera, including Escherichia, Citro-
performed as previously described (7). Distances among
bacter and Salmonella. The results of 16S rRNA gene
the strains were calculated by MGEA6 software and
analyses have indicated that these three genera are
correlation analysis was performed by Microsoft Excel
closely related. However, Escherichia vulneris, Eschrichia
2007 software. Phylogenetic trees for the 16S rRNA
hermannii, and Salmonella subterranea do not form a
genes and MLSA using the atpD, gyrB, infB and rpoB
clade with the respective genus type species. To
genes were prepared using the maximum likelihood
determine the proper phylogenetic positions of these
method based on the Tamura–Nei model (13). Phylo-
species, we determined the draft genome sequences of
genetic trees of concatenated protein sequences were
several strains of species in the family Entero-
prepared using the maximum likelihood method based
bacteriaceae, and then compared these genomes by
on the JTT matrix-based model (14).
MLSA using hypervariable housekeeping proteins.
Biochemical tests
MATERIALS AND METHODS
Strains subject to biochemical tests are listed in
Bacterial strains and sequence information Supplemental Table 3. TSI agar, LIM medium, Simmons
citrate agar and Voges–Proskauer medium were pur-
All bacterial strains subjected to draft genome analysis
chased from Kyokuto Pharmaceutical Industrial (Tokyo,
were obtained from the stocks of the JNBP and NBRC
Japan). The KCN test was performed according to the
collections. Nucleotide and protein sequence data were
Sakazaki–Yamada modified test (15). Other biochemical
obtained from the NCBI genome database. The strains
tests were conducted by ID 32 E, API 20 E and oxidase
and information used in this study are listed in
tests (bioMerieux, Lyon, France).
Supplemental Table 1.
304 © 2016 The Societies and John Wiley & Sons Australia, Ltd
Atlantibacter hermannii gen. nov., comb. nov.
†, percentages indicate the average amino acid distance for each protein category.
average amino acid distances between E. coli and Enterobacteriaceae using the 10 concatenated hypervar-
S. enterica subsp. enterica, E. coli and Y. pestis, and iable proteins (designated as C10HKP), the 16rRNA
S. enterica subsp. enterica and Y. pestis were 4.0, 13.3 sequences, and the atpD-gyrB-infB-rpoB concatenated
and 13.4%, respectively. Relatively high average amino sequences (designated as 4MLSA) (Fig. 1). Distances
acid distances were observed for three of six protein within the family using C10HKP were 0.9–41.2%,
categories: chaperonins, DNA replication proteins and whereas the distances using 16S rRNA and 4MLSA
cell wall–synthesis proteins (Table 1). We then selected were 0.8–6.0% and 0.9–22.1%, respectively. In addition,
10 hypervariable proteins from these categories (Table 2). we implemented correlation analysis to determine
We did not include some proteins exhibiting higher whether C10HKP has the same divergence as 4MLSA.
distances among those ultimately selected because there The coefficient of correlation between these two models
were too many deletion/insertion sites (e.g., FtsY) or was 0.902 (Fig. 2).
protein was absent in some species (e.g., HslJ and HscB).
Phylogenetic analysis of the family
Discriminatory power of concatenated Enterobacteriaceae using C10HKP
hypervariable proteins
We separately used the C10HKP, 4MLSA and 16S rRNA
To confirm the power of the selected proteins for sequences to construct phylogenetic trees for the family
discriminating among species within the family Entero- Enterobacteriaceae (Fig. 3; Supplemental Figs 1–3).
bacteriaceae, we calculated distances among E. coli Average bootstrap values were 89.2% for the C10HKP-
K12 MG1655 and several other members of the based tree, 77.0% for the 4MLSA-based tree, and 49.8%
Replication holB DNA polymerase III, delta prime subunit 334 21.0% 48.2% 47.6%
Cell wall synthesis murB UDP-N-acetylenolpyruvoylglucosamine reductase 342 17.8% 38.6% 38.0%
Chaperonin grpE Molecular chaperone GrpE (heat shock protein) 191 7.3% 38.2% 39.3%
Cell wall synthesis ftsQ Cell division septal protein FtsQ 266 6.4% 30.8% 29.3%
Chaperonin lolA Outer membrane lipoprotein-sorting protein 202 5.9% 29.7% 30.7%
Cell wall synthesis murF UDP-N-acetylmuramyl pentapeptide synthase 452 12.4% 29.4% 31.4%
Cell wall synthesis ftsX Cell division protein FtsX 316 5.4% 28.5% 26.9%
Replication holA DNA polymerase III, delta subunit 343 10.2% 28.3% 28.6%
Cell wall synthesis pssA Phosphatidylserine synthase 451 6.2% 25.9% 26.2%
Cell wall synthesis mreD Cell shape-determining protein MreD 162 5.6% 24.7% 24.1%
Chaperonin hslJ§ Heat shock protein HslJ 136 29.4% 51.5% 58.8%
Cell wall synthesis ftsY§ Signal recognition particle GTPase 487 15.0% 32.0% 32.6%
Chaperonin hscB§ DnaJ-domain-containing proteins 1 171 8.8% 32.7% 33.9%
†, symbols and gene names used for orthologous sequences are consistently those used in the E. coli genome annotation; ‡, amino acid length
indicates the average amino acid length for E. coli, S. enterica and Y. pestis; §, hslJ, ftsY and hscB had large distances but were not included among the
concatenated proteins.
© 2016 The Societies and John Wiley & Sons Australia, Ltd 305
H. Hata et al.
Fig. 1. 16S rRNA gene, 4MLSA, and C10HKP sequence distances for several species in the family Enterobacteriaceae. The percentage of
nucleotide base differences (16S rRNA gene and 4MLSA) and amino acid sequence differences (C10HKP) relative to Escherichia coli K-12
MG1655 are shown. All positions containing gaps and missing data were eliminated. There were a total of 1154 (16S rRNA gene), 2633
(4MLSA), and 2907 (C10HKP) positions in the final datasets.
306 © 2016 The Societies and John Wiley & Sons Australia, Ltd
Atlantibacter hermannii gen. nov., comb. nov.
Fig. 3. (a) 4MLSA- and (b) C10HKP-based phylogenetic trees of members of the family Enterobacteriaceae. The evolutionary history was
inferred by using the maximum likelihood method based on the Tamura–Nei model (13) for 4MLSA and JTT matrix-based model (14) for
C10HKP. The tree with the highest log likelihood (56653.1344 for 4MLSA and 104327.7197 for C10HKP) is shown. The percentages of trees
in which the associated taxa clustered together are shown next to the branches. Initial tree(s) for the heuristic search were obtained by applying
the neighbor-joining method to a matrix of pairwise distances estimated using the maximum composite likelihood approach. The tree is drawn to
scale, with branch lengths measured in the number of substitutions per site. The analysis involved 92 nucleotide sequences for 4MSLA and
92 amino acid sequences for C10HKP. All positions containing gaps and missing data were eliminated. There were a total of 2633 positions for
4MLSA and 2907 positions for C10HKP in the final dataset.
© 2016 The Societies and John Wiley & Sons Australia, Ltd 307
H. Hata et al.
pigmented colonies and growth in KCN. Although we E. fergusonii and E. albertii are independent and
found that the characteristics of E. vulneris are similar to established species (21, 22). However, these species were
those of E. albertii by TSI agar, LIM medium, Simmons difficult to distinguish from E. coli by 16S rRNA gene–
citrate agar and Voges–Proskauer medium, we easily based phylogenetic analysis (Supplemental Fig. 1). In
differentiated these two strains by motility. contrast, these two species were clearly differentiated
from E. coli in the C10HKP-based tree, which provided
100% bootstrap values (Supplemental Fig. 3). However,
DISCUSSION
species of the genus Shigella could not be differentiated
We selected 10 hypervariable proteins from 150 from E. coli, even via the C10HKP-based tree.
housekeeping proteins commonly possessed by mem- The 16S rRNA gene–based phylogenetic tree suggests
bers of the family Enterobacteriaceae. In distance that E. vulneris has been misclassified in the genus
calculations, C10HKP yielded distance values more Escherichia (Supplemental Fig. 1). 4MLSA analysis also
than twofold those obtained using 4MLSA (Fig. 1). indicated that E. vulneris does not belong to the genus
Moreover, distances derived using C10HKP strongly Escherichia. Consistent with those analyses, data from
correlated with those derived using 4MLSA (Fig. 2). In the C10HKP-based tree strongly suggested that
addition, the phylogenetic tree obtained using C10HKP E. vulneris is independent from the genus Escherichia,
had higher bootstrap values than the trees obtained using and is instead related to Kosakonia sacchari and
either 4MLSA or 16S rRNA sequences (Supplemental E. massiliensis. Phenotypic characteristics of E. vulneris
Figs 1–3). These data indicate that the C10HKP-based are clearly differentiated from those of E. coli, E. albertii
tool is more reliable than 16S rRNA and 4MLSA tools for and E. fergusonii by ornithine decarboxylase, arginine
phylogenetic analysis of the family Enterobacteriaceae. dihydrolase and lysine decarboxylase activities (Supple-
Almost all genera were grouped into a single cluster in mental Table 4). These data indicate that E. vulneris
the C10HKP-based tree. However, some species were belongs to a genus distinct from Escherichia. Further
grouped as parts of clusters distinct from those suggested investigation is needed to clarify the correct taxonomic
by previous classifications. For example, the type strains position of E. vulneris.
of Enterobacter aerogenes and Enterobacter massiliensis E. hermannii and S. subterranea also sorted separately
were not classified with the Enterobacter cloacae cluster from their respective genera according to data from the
when analyzed using C10HKP. E. aerogenes (16) and 16S rRNA gene–based tree. Both the 4MLSA- and
Klebsiella mobilis (17) have the same type strain and are C10HKP-based trees indicated that E. hermannii and
therefore homotypic synonyms (Rules 24a and 24b) (18). S. subterranea should be assigned to a shared genus.
Our results led us to conclude that E. aerogenes belongs Analyses using either of these concatenation tools
in the genus Klebsiella. In addition, given that strain CHS suggested that these two species are closely related to
78 was assigned to the E. cloacae cluster by C10HKP members of the genus Cronobacter. Indeed, the
analysis, it appears that Lelliottia amnigena should be phenotypic characteristics of these two species are quite
classified in the genus Enterobacter. However, because similar, while clearly different from those of other
there is thus far no type strain genome information for Escherichia and Salmonella species (Table 3). These data
L. amnigena, further information is needed before this strongly suggest that E. hermannii and S. subterranea
species can be reclassified with confidence. represent a new genus. Hence, we propose Atlantibacter
The 16S rRNA gene–based phylogenetic tree indi- gen. nov. for both of these species.
cated that species of the genus Citrobacter are closely
related to species of both the genus Escherichia and
Description of Atlantibacter gen. nov.
the genus Salmonella. However, the C10HKP-based tree
divided the Citrobacter species into two groups. The Atlantibacter (Atlan.ti.bac'ter. N.L. gen. fem. n. Atlanta,
animal-derived species Citrobacter rodenticum and the city of Atlanta, Georgia, USA, where the U.S. Centers
Citrobacter farmeri (19) segregated with members of for Disease Control and many researchers investigating
the genus Escherichia, whereas Citrobacter freundii, the family Enterobacteriaceae are located; N.L. masc.
isolated from humans (20), segregated with members of n. bacter a rod; N.L. masc. n. Atlantibacter a rod named
the genus Salmonella. Although these results support in memory of Atlanta).
division of the genus Citrobacter into two distinct This description is based on those of Brenner et al.
genera, we do not propose any taxonomic changes for (23) and Shelobolina et al. (24).
Citrobacter both because no clear genus definition is Gram-negative non-spore-forming rods that are facul-
currently available and we did not (in the present study) tatively anaerobic and motile. After 24 hr aerobic
analyze all of the Citrobacter species' genomes. incubation at 37°C on trypticase soy agar, colonies are
308 © 2016 The Societies and John Wiley & Sons Australia, Ltd
Atlantibacter hermannii gen. nov., comb. nov.
diarizonae JNBP 4573T, S. enterica subsp. houtenae JNBP 4858T and S. enterica subsp. salamae JNBP 4556T; ‡, S. enterica subsp houtenae JNBP 4858T is negative for D-sorbitol fermentation and positive for
S. bongori
†, this column shows common characteristics of the following strains: S. enterica subsp. enterica serovar Typhimurium JNBP 7710 (¼ATCC 13311), S. enterica subsp. arizonae JNBP 4588T, S. enterica subsp.
NBP 4536
yellow-pigmented and convex. Catalase-positive and
þ
þ
þ
þ
þ
þ
þ
þ
Genus Salmonella negative for oxidase. Positive for indole reaction and
nitrate reduction, growth in KCN, ornithine decarboxylase
and beta- galactosidase activities. Negative results in tests
S. enterica†
for urease, ornithine decarboxylase, arginine dihydrolase,
lysine decarboxylase and L-asparagine arylamidase, lipase,
þ‡
‡
þ
þ
þ
þ
þ
þ
alpha-glucosidase, alpha-galactosidase, alpha-maltosidase
and beta- glucuronidase. N-acetyl-beta-D-glucosamini-
dase activities, H2S production and the Voges–Proskauer
NBRC 102420T
þ
þ
þ
þ
þ
þ
þ
þ
þ
comb. nov.
þ
þ
þ
þ
2005.
D-sorbitol fermentation (API 20E and ID 32E)
Arginine dihydrolase (API 20E and ID 32E)
ACKNOWLEDGEMENT
Yellow pigment
growth in KCN.
Growth in KCN
Motility (LIM)
© 2016 The Societies and John Wiley & Sons Australia, Ltd 309
H. Hata et al.
310 © 2016 The Societies and John Wiley & Sons Australia, Ltd
Atlantibacter hermannii gen. nov., comb. nov.
likelihood method based on the Tamura–Nei model Fig S3. C10HKP-based phylogenetic tree of members of
(13). The tree with the highest log likelihood the family Enterobacteriaceae. The evolutionary history
(9295.6582) is shown. The percentages of trees in was inferred by using the maximum likelihood method
which the associated taxa clustered together are shown based on the JTT matrix-based model (14). The tree with
next to the branches. Initial tree(s) for the heuristic the highest log likelihood (104327.7197) is shown. The
search were obtained by applying the neighbor-joining percentages of trees in which the associated taxa clustered
method to a matrix of pairwise distances estimated using together are shown next to the branches. Initial tree(s) for
the maximum composite likelihood approach. The tree the heuristic search were obtained by applying the
is drawn to scale, with branch lengths measured in the neighbor-joining method to a matrix of pairwise distances
number of substitutions per site. The analysis involved estimated using a JTT model. The tree is drawn to scale,
92 nucleotide sequences. All positions containing gaps with branch lengths measured in the number of
and missing data were eliminated. There were a total of substitutions per site. The analysis involved 92 amino
1154 positions in the final dataset. acid sequences. All positions containing gaps and missing
Fig S2. 4MLSA-based phylogenetic tree of members of data were eliminated. There were a total of 2907 positions
the family Enterobacteriaceae. The multilocus sequence in the final dataset.
analysis method has been reported previously (7). The Table S1. Genome sequence data used in this study.
phylogenetic tree was constructed in the same manner Table S2. Amino acid distances of housekeeping proteins.
as the tree shown in Supplemental Figure 1. The tree Table S3. Phenotypic characteristics of Genus Escherichia
with the highest log likelihood (56653.1344) is and Salmonella (Part 1).
shown. There were a total of 2633 positions in the final Table S4. Phenotypic characteristics of Genus Escherichia
dataset. and Salmonella (Part 2).
© 2016 The Societies and John Wiley & Sons Australia, Ltd 311