Professional Documents
Culture Documents
The amount of novel microbial genomic information bacteria and archaea were found to be highly correlated
Genome
The entire hereditary that is being generated on a daily basis is so vast that with DNA hybridization — roughly, the 70% cut-off level
information of an organism multidisciplinary approaches that integrate bioinformat- in DNA–DNA reassociation corresponds to 98% 16S
that is encoded by its DNA (or ics, statistics and mathematical methods are required rRNA sequence identity1,2. High-throughput sequenc-
RNA for some viruses).
to assess it effectively. This information is inspiring a ing of rRNA molecules from environmental samples
Bacterial typing
new understanding of microorganisms that appreciates showed that most of these sequences fall into clusters
A procedure for identifying the scale of microbial diversity and acknowledges that the of 99% identity3,4, which suggests a basis for a coherent
types and strains of bacteria. microbial gene pool is considerably larger than expected. working definition of a bacterial species and indicates
Indeed, the availability of only one complete genome that 16S rRNA sequence conservation of approximately
Metagenome
sequence for a given taxon, which was a dream only two 99% marks the borders of naturally occurring pheno-
The global genetic repertoire of
an environmental niche that is decades ago, is now considered inadequate for describ- typic clusters or species. This definition holds for most
constituted by diverse ing the complexity of species and genera and their inter- cases even today, with only a few exceptions that include
organisms such as free-living relationships. Advances in genomics are also beginning the Bacillus genus, for which the 16S rRNA sequences
microorganisms in the wild or
to drive the discovery of novel diagnostics, drug targets from phenotypically distinct species, such as Bacillus
the commensals of a particular
niche in a mammalian host
and vaccines. In this article, we review various aspects cereus, Bacillus thuringiensis and Bacillus anthracis, dif-
(from the Greek ‘meta’, of the impact of genomics on microbiology, including fer at only a few bases5. Another limitation of 16S rRNA
meaning beyond, and in the evolving field of bacterial typing, and the genomic typing is that organisms with polymorphisms in the
genome). technologies that enable the comparative analysis of mul- regions that are used for primer design can be poorly
tiple genomes or the metagenomes of complex microbial detected with ‘universal’ primers or completely lost, as
environments. We also address the implications of the in the Nanoarchaeota phylum6.
*Novartis Vaccines and
genomic era for the future of microbiology. Additionally, 16S rRNA analysis is a poor method
Diagnostics, 53100 Siena,
Italy. for resolving sub-populations within species. For this
‡
The Wellcome Trust Sanger Classification and the challenges of genomics purpose, multilocus enzyme electrophoresis (MLEE)7,
Institute, Wellcome Trust Historically, bacteria have been classified using conven- a method that classifies bacteria on the basis of the
Genome Campus, Hinxton, ient traits such as cell structure, cellular metabolism or isoforms of a combination of ~15 metabolic enzymes8,
Cambridge CB10 1SA, UK.
§
Department of Microbiology
differences in cellular components. Diversity within a became the method of choice for many epidemiological
and Immunology, Stanford particular species could also be addressed using oppor- studies. MLEE was not widely used, however, because
University, Stanford, tune markers, such as capsular and protein serotypes or it is low throughput and intensive laboratory work is
California 94305, USA. the ability to agglutinate or lyse red blood cells. In the required. A natural evolution of MLEE that came with
||
University of Oxford, John
1970s, DNA–DNA hybridization was introduced to dif- the genomic era was multilocus sequence typing (MLST),
Radcliffe Hospital, Oxford
OX3 9DU, UK. ferentiate bacterial species. Isolates that showed >70% a typing method that is based on the partial sequences
¶
These authors contributed DNA–DNA homology under standard hybridization of 7 housekeeping genes of ~450 bp each9. MLST is high
equally to this work. conditions were considered to belong to the same spe- throughput, allows direct comparison of organisms that
Correspondence to R.R. cies. Later, advances in sequencing techniques allowed are being studied in different laboratories in different
e‑mail: rino.rappuoli@
novartis.com
the introduction of other markers, such as 16S ribosomal parts of the world and has led to a rapidly enlarging data-
doi:10.1038/nrmicro1901 RNA (rRNA), a molecule that is ubiquitous in bacterial base (MLST Public Repository; see Further information),
Published online 13 May 2008 and archaeal genomes. 16S rRNA sequence similarities in in which almost 30 species are represented.
16S rRNA MLEE and MLST eMLST and SNPs life history of the bacteria and their interactions with the
environment. Unfortunately, the obvious solution of using
Antigens high-throughput genomic sequencing to sample all the
similarities and differences between bacterial populations
might not be as close as is often stated.
Enzymes Polymerase
a 454 technology
on beads
T PPi
and primer
Beads
Adaptors
T/A/G/C ATP
gDNA
PCR Light
b SOLiD technology
Ligase
Beads
Adaptors
gDNA
PCR
T
G
Adaptors A C
gDNA
T T C
A C A
PCR
a b
ON
A A C T T GGGGGGGGGGGGGT A A T A
P IR
(G) 13
IR Polysaccharide biosynthesis operon
Recombinase
OFF
A A C T TGGGGGGGGGGGGT AA T AG IR
(G) 12 IR P
A A C T TG G G G G G G G G G G T A A T A G G c
(G) 11 S S′ Type-I restriction–
modification system
Invertase S′ S ATG M R
Multiple
A A C T TG G G G G G G G G G T A A T A C GC possible
inversions
(G) 10
Alternate selectivity
enzymes
DNA-recognition
AAC T TGGGGGGGGGT AA T AGGC sites
(G) 9 Alternate selectivity
subunits
DNA-recognition
sites
Figure 4 | Rapid mechanisms that generate diversity within a clonal culture. Specific mechanisms in many
microorganisms introduce diversity into a clonal culture that results in heterogeneous DNA reads Nature
in Reviews Microbiology
shotgun| sequencing.
a | Schematic depiction of the hypervariable repeats in Campylobacter jejuni35, showing regions of the chromosome where
different reads showed a different number of bases in a particular run of the same base (guanosine; G). b | Invertible
promoters of Bacteroides fragilis38,73: rapid inversion of a small section of DNA, which is mediated by a site-specific
recombinase, is used to control the expression of entire operons that encode surface-polysaccharide production, which
can be switched on and off as single entities with a single promoter. c | Random expression of different alternative proteins
from a single locus in Bacteroides fragilis: rapid DNA inversion is also used to exchange parts of the coding sequences of
expressed genes with sequences from silent cassettes, thereby altering the encoded protein sequence. IR, inverted
repeat; M, S and R, methylation, specificity and restriction subunits of the restriction endonuclease system; P, promoter.
next-generation methods has been shown to be, at present, artefact that can be addressed by increasing the signal
the best solution for de novo bacterial genome sequenc- to noise ratio with higher coverage or various technical
ing in terms of sequence quality and cost-effectiveness34. improvements, but is in fact bona fide variation that is
However, increases in read length and throughput will generated by specific bacterial mechanisms (discussed
probably substantially broaden the work that can be below)35–39 (FIG. 4).
accomplished with these technologies; indeed, in 2004, the One solution to this problem could be provided by the
National Institutes of Health set the scientific community recently devised single-molecule sequencing techniques,
the ‘$1,000 human genome’ challenge to be achieved by either by mimicking the natural enzymatic process of DNA
2015, which looks more and more achievable. The great synthesis by DNA polymerase40 or by reading single bases
enthusiasm that was generated by these advances has led as DNA molecules pass through nanopores41–43. Although
at least some in the microbial genomics community to these techniques are at an early stage of development, they
consider the alternative goal of a ‘$1,000 genome in a day’ represent a radical change in sequencing methodology
to be closer for bacteria than for the larger and more com- and avoid the need for analysis of a mixture of multiple
plex genomes of eukaryotic species. However, bacterial templates for the same DNA molecule. Although this will
genomes are extremely variable in terms of the presence provide new opportunities for sequencing well-defined
or absence of genes and often have large regions that need individual organisms, which have genetic information that
to be sequenced de novo, which potentially limits the use- is clearly distinguished from the genetic information of
fulness of these new short-read technologies. Moreover, other individuals, this will not change the fact that, in the
even the highly clonal populations of bacteria that are used bacterial world, interaction with environmental niches or
for sequencing contain considerable sequence variability, specific host niches often occurs at the population level.
such that the genome sequence that is obtained represents Consequently, both the averaging process that is required
a population average rather than the sequence of any of to generate a consensus sequence from a population and
the individual bacteria. This variability is not a sequencing direct sequencing from a single bacterial cell will hide
a Gene b
duplication
Phage infection
Recombination
Mutation
Gene acquisition
Gene loss
Plasmid acquisition
Metagenome
c d
Pan-genome
Genome
Population
bottleneck;
selective sweep
Figure 5 | Molecular evolutionary mechanisms that shape bacterial species diversity: one genome, pan-genome
Nature Reviews | Microbiology
and metagenome. Intra-species (a), inter-species (b) and population dynamic (c) mechanisms manipulate the genomic
diversity of bacterial species. For this reason, one genome sequence is inadequate for describing the complexity of species,
genera and their inter-relationships. Multiple genome sequences are needed to describe the pan-genome, which
represents, with the best approximation, the genetic information of a bacterial species. Metagenomics embraces the
community as the unit of study and, in a specific environmental niche, defines the metagenome of the whole microbial
population (d).
crucial information on the adaptive strategies that are (the pan-genome) and goes beyond sensible estimation
encoded in intrinsically variable bacterial genomes. Direct when we consider the gene pool of microorganisms in the
PCR amplification and detailed study of particular loci environment (the metagenome; FIG. 5).
could therefore remain crucial steps in bacterial genome The size of the genomes of known, free-living bacteria
sequencing, even if complete closure of the genome is ranges from 0.16 Mb for Candidatus Carsonella ruddii to
not strictly required. The new sequencing technologies 10 Mb for Solibacter usitatus. Evolutionary forces seem
have greatly reduced the amount of time and money that to drive the size of the genome. On the one hand, many
is required to determine a complete bacterial genome parasitic or symbiotic microorganisms do not need all of
sequence, but, as a bacterial genome represents the aver- the genetic material that would be necessary to support
age of a population, the process still requires human life if they were independent of their hosts and genes that
interpretation of raw data and the integration of diverse are redundant therefore tend to be eliminated, leading to
sequencing methods. Paradoxically, therefore, although a small genome. On the other hand, bacteria that face con-
the target of a $1,000 genome in a day is much closer than tinuously variable environments need a large gene pool
could have been predicted a decade ago, it is clear that to address different needs; these bacteria have the largest
much more remains to be learned from ‘simpler’ targets, genomes, which, in some cases, can reach twice the size
such as single bacterial genomes. of the smallest eukaryotic genomes44.
In many species, there is extensive genomic plasticity.
From genome to pan-genome and metagenome For example, completion of the genome sequence of
When the first complete bacterial genome sequence was E. coli O157:H7 revealed that this strain possesses >1,300
published, it was commonly thought that a few dozen strain-specific genes compared with E. coli K12; these
more genomes, chosen representatively from the bacte- genes encode proteins that are involved in virulence and
rial and archaeal domains, would be sufficient to describe metabolic capabilities45. Moreover, when the genomes of
the gene pool of the entire microbial world. Today, as the three E. coli strains (K12, O157:H7 and the uropathogenic
number of fully sequenced microbial genomes approaches strain CFT073) were compared, only 39.2% of genes could
700, it is clear that microbial diversity has been vastly be found in all three strains46. Other reports have also
underestimated and we are just ‘scratching the surface’. revealed an extensive amount of genomic diversity among
Variability, which as discussed above is already present strains of a single species15,47,48.
within a single clonal culture that is established from a sin- From these studies, it is evident that it is not possible to
gle cell, increases greatly within a single bacterial species characterize a species from a single genome sequence. But
how many genome sequences are necessary? The answer information from one species to another52. These proc-
might vary from species to species. The study by Tettelin esses, which are regulated and evolutionarily conserved,
and colleagues15 discussed above that examined the diver- are turned on when they are most likely to result in gene
sity of 8 isolates of S. agalactiae revealed that each new transfer, and genes that must function together are often
genome had an average of 30 genes that were not present transferred together as genomic islands (for example,
in any of the previously sequenced genomes, which sug- pathogenicity islands)11.
gested that the number of genes associated with this spe- The concept of the pan-genome is not just a theoretical
cies could, theoretically, be unlimited. Therefore, the best exercise; it also has fundamental practical applications in
approximation to describe a species could be made by vaccine research. Recently, it was shown that the design
using the concept of the pan-genome. The pan-genome of a universal protein-based vaccine against GBS was
can be divided into three elements: a core genome that only possible using dispensable genes53. In addition,
is shared by all strains; a set of dispensable genes the sequencing of multiple genomes was instrumen-
that are shared by some but not all isolates; and a set of tal in discovering the presence of pili in GBS, group A
strain-specific genes that are unique to each isolate. Streptococcus and S. pneumoniae, an essential virulence
For S. agalactiae, the core genome encodes the basic factor that had been missed by conventional technologies
aspects of S. agalactiae biology and the dispensable for a century54.
and strain-specific genes, which are largely composed Although the size of the gene pool for a species can be
of hypothetical, phage-related and transposon-related estimated by mathematical modelling, the size of the gene
genes49, contribute to its genetic diversity. This contrasts pool for the microbial biosphere is beyond any credible
with B. anthracis, as the pan-genome for this species can model. The projects that have been completed to date have
be adequately described by just four genome sequences. focused on species that can be grown in culture, but it has
This difference in the nature of the pan-genome reflects been estimated that >99% of the bacteria in the environment
several factors, including: the different lifestyles of the two cannot be cultured in the laboratory55. In nature, bacteria
organisms (exposure of S. agalactiae to diverse environ- occupy diverse environmental niches as complex com-
ments versus the occupation of a more isolated biological munities in which they interact with each other and with
niche by B. anthracis); the ability of each species to acquire the surrounding environment, and acquire and dis-
and stably incorporate foreign DNA, an advantage in card genes by lateral gene transfer events (FIG. 5). This
niche adaptation from the acquisition of laterally trans- emphasizes the importance of the local environment in
ferred DNA; and the recent evolutionary history of each shaping the genomic evolution of individual community
species. Recently, a refined model for the H. influenzae members. In the past few years, a new approach known
pan-genome was proposed, which predicted that even if as metagenomics (also called environmental genomics or
the overall number of genes that pertain to this species is community genomics) has emerged that involves the use
finite, the number of genes in the pan-genome will still of genome sequencing and other genomic technologies to
be four to five times higher than the number of genes in study microorganisms directly in their natural habitats56.
the H. influenzae core genome50. A similar analysis of Metagenomics embraces the community as the unit of
17 Streptococcus pneumoniae genomes revealed a core study, and includes sequence-based and product- or func-
genome of 1,454 genes and a pan-genome (named the tion-based methods for analysing environmental samples
supra-genome) of approximately 5,000 genes51. Based on directly, without the need for isolation of discrete organ-
the authors’ assumptions, 142 genomes would need to be isms. This process has been pursued in recent years for
sequenced to obtain the complete S. pneumoniae genome. a few microbial communities and habitats: an acid-mine
In conclusion, there is agreement that the size of the drainage site, the Sargasso Sea, agricultural soil, a deep-sea
Core genome pan-genome is much larger than the size of the genome whale skeleton and the human distal gut57–61. These studies
The pool of genes that is of a single isolate of a particular species. However, the have identified hundreds of unknown bacteria and viruses
shared by all the strains of the theoretical interpretation of these data with regard to the and millions of new genes, revealing an unexpected degree
same bacterial species.
ultimate size of the pan-genome differs slightly depending of diversity62 (FIG. 5). The use of metagenomic approaches has
Lateral gene transfer on the assumptions that are used for the mathematical also provided a new perspective on host–microorganism
The mechanism by which an models. Tettelin et al.15 assume that the pan-genome can mutualistic interactions (BOXES 1,2).
individual of one species be large and theoretically unlimited, whereas Hogg and Complex microbial communities are found in a range
transfers genetic material (that Hiller50 make more conservative estimates and predict a of human habitats, including the female reproductive
is, DNA) to an individual of a
different species.
large, but finite, pan-genome. Determining which of these tract, the skin, the oral cavity and the gut. These commu-
hypotheses is correct will require the accumulation of nities have co-evolved with their human host and play an
Metagenomics more data to facilitate the construction of more accurate important part in human health and disease63. For example,
The study of the genomic mathematical models. the human intestinal microbiota is composed of >1,000
repertoire of all the organisms
Whatever its ultimate size, the pan-genome reflects species and the concept of humans as ‘super-organisms’
that live in a particular
environment and their
the selective pressure on several species to generate new is highlighted by estimates that the human microbiome
activities as a collective. The adaptive combinations by recombining and constantly contains roughly 100 times more genes than the human
genomic analysis is applied to restructuring gene variants (alleles) in the population genome64–66. The microbiome can be viewed as a human
entire communities of and by lateral gene transfer between species. Several natural ‘accessory’ genome that complements the functions
microorganisms, which
bypasses the need to isolate
processes — transport by viruses (transduction), bacte- which are provided by the human germline and provides
and culture individual microbial rial ‘mating’ (conjugation) and the direct uptake of DNA the host with flexibility, diversification and adaptability
species. from the environment (transformation) — carry genetic in the face of a rapidly changing environment.
To date, more than 100 metagenomic projects are genera. However, it is not often appreciated that even a
ongoing, the goals of which vary from characterizing a standard shotgun sequence of a single organism can be
particular species to understanding the dynamics of an considered to be a metagenomic sample of a population.
entire microbial community. The most obvious advantage In most cases, shotgun-sequence libraries are made from
of sequencing DNA from natural samples is the capacity large amounts of DNA that has been isolated from clonal
to access a broad range of genome sequences. From a cultures. However, to generate a contiguous sequence,
broader perspective, metagenomics has the ability to cap- and ensure accuracy, these libraries are over-sampled,
ture the genomic diversity within a natural population, usually by eightfold to tenfold, with multiple individual
thereby offering the promise of assessing biodiversity in sequence reads contributing to each base in the final
a new way that is independent of the bacterial species sequence. The mechanics of library construction mean
concept67. One possible outcome of these projects is the that, effectively, each individual read has probably come
characterization of bacterial populations as a continuum from a different cell, and therefore the shotgun sequence
of genomic possibilities68. is an eightfold-to-tenfold-deep sample of the population
at each base position. As the sample is usually clonal,
Metagenomics of single organisms this often has no real consequence. However, if there
The genomic analysis of mixed populations is increas- is variation within the population, then the redundant
ingly being applied to studies of environmental samples sampling will contain that variation, which can be
that contain large numbers of different species and detected in the shotgun sequences.
This becomes of more than esoteric interest when could, or could not, be translated. Thus, the expected
dealing with bacteria that are pathogens or commensals clonal population was in fact a mixture of genomes with
of larger organisms. These bacteria have a problem as, variant sequences that expressed different subsets of sur-
barring rare mutations, growth by binary fission gener- face proteins. The metagenomic, population-sampling
ates a clonal population of identical cells, but evading or effect of the genome-sequence libraries allowed the
avoiding an immune system or colonizing a highly vari- immediate identification of these variant sequences and
able environment requires diversity. Bacteria can escape the proteins they affected, which provided an overview
this constraint in two ways: by promoting the exchange of of the variable gene set of the organism directly from
DNA with other related organisms or rapidly generating the assembled sequence. A similar effect was observed
diversity within an otherwise clonal growth. The popula- in the related organism Helicobacter pylori69, but not in
tion sampling effect of shotgun sequencing allows us to Neisseria meningitidis70,71, which is another organism
identify and understand these mechanisms directly from that is known to use this mechanism of phase variation.
raw genome-sequence data. This suggests that variation occurs at different rates in
Diversity-generating mechanisms usually involve the the two groups and indicates that the single-organism
random, but heritable, on or off switching of surface- metagenomics approach only works for rapid-variation
exposed structures that leads to mixed populations of mechanisms.
cells. This phenomenon is known as phase variation A second common mechanism for randomly varying
and has been extensively studied for many years10. The the expression of surface structures is DNA inversion.
simplest mechanism of this switching is the random Here, a small section of DNA is inverted by a cleavage-
change in length (during DNA replication) of short and-ligation reaction that is mediated by a site-specific
repetitive tracts of bases. The genome sequencing of recombinase. At its simplest, the effect of this inversion
Campylobacter jejuni, a bird commensal and human can be to alter the direction of transcription from a pro-
pathogen, identified regions of the chromosome where moter within the inverted segment, thereby switching
different reads showed a different number of bases in a on or off the expression of genes that are downstream of
particular run of the same base35 (FIG. 4a). Investigations the promoter. Again, this mechanism has been known
ruled out experimental errors as the source of this vari- and studied for some time, most notably in the control
ation and showed that the effect was due to the shotgun of phase variation of the flagellar antigen of Salmonella
library sampling of individual cells, which, in fact, had enterica serovar Typhimurium72, but population sam-
different sequences at these specific loci. The context of pling through single-organism metagenomic sequenc-
these variants was investigated, and most were found to ing allows its presence and extent to be immediately
be within protein-coding sequences and had the effect identified within whole genomes directly from the
of switching the reading-frame of the gene such that it sequence data.
One good example of this is Bacteroides fragilis, a biological assumptions. We look forward to developments
human commensal and opportunistic pathogen38,73. in the new discipline of systems microbiology, a research
Genome sequencing revealed that this mechanism was area that aims to accurately describe the dynamics of
active, rapid and used to control the expression not just of the microbial world with the assistance of mathematical
surface proteins but of entire operons that encode surface- models.
polysaccharide production, which can be switched on or In the meantime, as our knowledge increases, we real-
off as single entities with a single promoter (FIG. 4b). The ize that we are just scratching the surface of the microbial
genome sequencing of B. fragilis also directly showed that world. We still have much to learn about non-cultivable
this mechanism is used not only to alter the expression of microorganisms, human and animal metagenomes and
genes, but also to exchange parts of the coding sequences the metagenomes of soil, oceans and extreme environ-
of expressed genes with sequences from silent cassettes, ments. The deposition into GenBank (see Further infor-
thereby altering the encoded protein sequence (FIG. 4c). mation) of 1.8 million genes from a single environmental
This allows the random expression of different alterna- investigation77 — twice the size of the entire pre-existing
tive proteins from a single locus, a process that has also gene database — is an example of our underestimation
been described (and elucidated from genomic shotgun of the diversity of the microbial gene repertoire. In this
sequences) in Mycoplasma pulmonis74. study, the analysis of the extraordinary amount of frag-
A fundamental level of diversity between bacterial mented sequence data required the development of ad
genomes is at the level of the single base. It is clear that hoc strategies for sequence reconstruction and clustering
bacteria can generate diversity at this level by point muta- into protein families to reduce the dataset into a trac-
tion and DNA recombination. Mechanisms have also table form78. Unsupervised methods of protein-family
been uncovered that use, for example, reverse transcrip- generation from complete genome data have also been
tion to generate point mutations in targeted regions; this proposed, to enable high-throughput protein classifica-
was initially described in a bacteriophage of Bordetella tion to be achieved without human intervention79. The
bronchiseptica75. Metagenomic analysis of shotgun data question for contemporary microbiologists is how can
from Tropheryma whipplei, a human pathogen, has also we productively apply the avalanche of new information
provided evidence of single-base diversity generation without getting lost in the details?
on a genomic level39. In this case, the rate of diversity Some applications of the genomic era can already be
generation by the organism in its natural habitat was less seen. One of the first of these, reverse vaccinology80,81, led to
clear-cut, as this fastidious organism had to be grown in the identification of hundreds of genes that encode anti-
human cell culture for 17 months to provide enough DNA gens which mediate protection by validated mechanisms,
for sequencing. However, it was evident that consistent and made it possible to develop vaccines against bacteria
variation between shotgun sequences could only be seen for which only a limited number of antigens was previ-
in specific locations in the genome that corresponded to ously known. However, in general, we have little informa-
the coding sequences of surface-expressed proteins. It was tion on the biological role of the novel antigens that are
proposed that the mechanism for generating variation in identified by reverse vaccinology, and further laboratory
T. whipplei is the transfer of base-pair variants from non- work is therefore needed to understand the biology of
coding repeats elsewhere in the genome, possibly by a these new antigens and make more informed decisions
gene-conversion-like mechanism. on the use of vaccines that are based on these targets.
It is clear therefore that it is possible to perform Similarly, new antigens usually segregate in the bacte-
metagenomics on single organisms, through careful rial population in a manner that is independent of con-
analysis of the variation within shotgun sequences and ventional markers, such as those used to define serotypes,
an understanding that this variation represents diversity and genetic markers, such as those used in MLST, and so
within an otherwise clonal population. In turn, this allows we must identify new ways to type bacteria. New typing
us to identify and examine the biological consequences systems need to incorporate whole genome sequences,
of rapid diversity-generating mechanisms in numerous including non-core genes, instead of just a few core
bacterial pathogens and commensals. genetic loci, as has been the case so far. Whether new
typing systems will be based on SNPs or a combination of
Implications for microbiology and conclusions revised MLST and SNPs, or will use novel genetic mark-
Over the past decade, genomic technologies have revolu- ers that will be selected from the analysis of complete
tionized microbiology and will probably continue to do genomes, it is still too early to know. However, it is likely
so during the next decade. The information that is being that they will follow the example of MLST in being open,
added to sequence databases is increasing exponentially internet-based collaborative models that allow researchers
and every day we are in a better position to describe from hundreds of different laboratories and countries to
microorganisms by their genome, bacterial species by contribute their data to centralized and publicly accessible
their pan-genome and even complex microbial environ- databases82.
ments by their metagenome. The pan-genome concept, Assuming this open-access paradigm also prevails in
Reverse vaccinology which was predicted by a mathematical model before it the forthcoming genome-wide bacterial population stud-
A genomic approach to vaccine
was demonstrated biologically15,76, is an example of how ies, this will produce an invaluable database of genotypic
development that searches the
entire genetic repertoire of a
mathematics is becoming increasingly important in and phenotypic characteristics that relate to each single
pathogen for protective microbiology, not only to manage new information, but isolate, thereby allowing us to perform association stud-
antigens. also to drive new discoveries by challenging conventional ies that are aimed at identifying genotypic traits, such as
specific polymorphisms, virulence factors and patho- non-cultivable microorganisms and the analysis of
genicity islands, as determinants of pathogenicity, the metagenomes of the complex environments they
carriage and enhanced or reduced transmission. inhabit will disclose information on unknown micro-
Constructing new typing systems will require the organisms, genomes and microbial communities that
collection of epidemiological data that allow us to recon- will almost certainly change the way we view micro-
struct the global population biology of each species. New biology86. Metagenomic studies performed in specific
mathematical models that are based on a comparison of niches of the human host could produce a complete
multiple whole-genome sequences have recently been pro- picture of the functional repertoire of bacterial com-
posed83 to identify homologous recombination events that munities as shaped by host–pathogen interactions and
disrupt a clonal pattern of inheritance. Correct inference would complement information on the population
of ancestry, particularly from the perspective of using the structure 64,87. Together, these data should facilitate
whole genome as a marker, is fundamental to our recon- our understanding of the molecular bases of specific
struction of a coherent and potentially complete picture interplays with the host88 and define the boundaries
of bacterial population structures. This kind of approach between pathogenic and commensal sub-populations
was used to analyse the relationship between different of the same species.
S. enterica serovars: a recent convergent evolution between In conclusion, although genomic information is
S. typhi and S. paratyphi A was suggested that was con- challenging our existing knowledge of bacterial spe-
fined to one quarter of the genome owing to a highly cies, typing systems and population biology, further
non-random pattern of homologous recombination clarification of these concepts is a challenge that is at
that, possibly, was connected with the adaptation of both the intersection of diverse disciplines. In the long term,
lineages to the interaction with the human host84. mathematics might provide an accurate description
Finally, metagenomics will probably become of the microbial world, but, in the meantime, we need
the driving force in microbiology in the future by to go back to the laboratory to try to understand the
shedding light on the ‘dark matter’ of the microbial biological relevance of the information that has been
world85. The single-cell genetic analysis of rare and generated by genomics.
1. Dighe, A. S. et al. Comparison of 16S rRNA gene 13. Urwin, R. et al. Distribution of surface protein variants 23. Roumagnac, P. et al. Evolutionary history of
sequences of genus Methanobrevibacter. BMC among hyperinvasive meningococci: implications for Salmonella typhi. Science 314, 1301–1304 (2006).
Microbiol. 4, 20 (2004). vaccine design. Infect. Immun. 72, 5955–5962 24. Sanger, F., Nicklen, S. & Coulson, A. R. DNA
2. Keswani, J. & Whitman, W. B. Relationship of 16S (2004). sequencing with chain-terminating inhibitors. Proc.
rRNA sequence similarity to DNA hybridization in 14. Caugant, D. A. et al. Genetic diversity in relation to Natl Acad. Sci. USA 74, 5463–5467 (1977).
prokaryotes. Int. J. Syst. Evol. Microbiol. 51, serotype in Escherichia coli. Infect. Immun. 49, 25. Smith, L. M. et al. Fluorescence detection in
667–678 (2001). 407–413 (1985). automated DNA sequence analysis. Nature 321,
3. Acinas, S. G. et al. Fine-scale phylogenetic architecture 15. Tettelin, H. et al. Genome analysis of multiple 674–679 (1986).
of a complex bacterial community. Nature 430, pathogenic isolates of Streptococcus agalactiae: 26. Prober, J. M. et al. A system for rapid DNA
551–554 (2004). implications for the microbial “pan-genome”. Proc. sequencing with fluorescent chain-terminating
4. Konstantinidis, K. T., Ramette, A. & Tiedje, J. M. The Natl Acad. Sci. USA 102, 13950–13955 (2005). dideoxynucleotides. Science 238, 336–341 (1987).
bacterial species definition in the genomic era. Philos. First quantification of the diversity of a single 27. Madabhushi, R. S. Separation of 4‑color DNA
Trans. R. Soc. Lond. B 361, 1929–1940 (2006). bacterial species on the basis of genomic sequencing extension products in noncovalently
5. Sacchi, C. T. et al. Sequencing of 16S rRNA gene: a sequences of multiple strains. The authors coated capillaries using low viscosity polymer
rapid tool for identification of Bacillus anthracis. introduced and defined the pan-genome concept. solutions. Electrophoresis 19, 224–230 (1998).
Emerg. Infect. Dis. 8, 1117–1123 (2002). 16. Moorhead, S. M., Dykes, G. A. & Cursons, R. T. An 28. Hall, N. Advanced sequencing technologies and their
6. Huber, H. et al. A new phylum of Archaea represented SNP-based PCR assay to differentiate between Listeria wider impact in microbiology. J. Exp. Biol. 210,
by a nanosized hyperthermophilic symbiont. Nature monocytogenes lineages derived from phylogenetic 1518–1525 (2007).
417, 63–67 (2002). analysis of the sigB gene. J. Microbiol. Methods 55, This comprehensive review describes and discusses
7. Selander, R. K. et al. Methods of multilocus enzyme 425–432 (2003). the recent advances in DNA-sequencing
electrophoresis for bacterial population genetics and 17. Robertson, G. A. et al. Identification and technologies.
systematics. Appl. Environ. Microbiol. 51, 873–884 interrogation of highly informative single nucleotide 29. Schuster, S. C. Next-generation sequencing transforms
(1986). polymorphism sets defined by bacterial multilocus today’s biology. Nature Methods 5, 16–18 (2008).
8. Caugant, D. A., Kristiansen, B. E., Froholm, L. O., sequence typing databases. J. Med. Microbiol. 53, 30. Margulies, M. et al. Genome sequencing in
Bovre, K. & Selander, R. K. Clonal diversity of 35–45 (2004). microfabricated high-density picolitre reactors. Nature
Neisseria meningitidis from a population of 18. Weissman, S. J., Moseley, S. L., Dykhuizen, D. E. & 437, 376–380 (2005).
asymptomatic carriers. Infect. Immun. 56, Sokurenko, E. V. Enterobacterial adhesins and the 31. Shendure, J. et al. Accurate multiplex polony
2060–2068 (1988). case for studying SNPs in bacteria. Trends Microbiol. sequencing of an evolved bacterial genome. Science
9. Maiden, M. C. et al. Multilocus sequence typing: a 11, 115–117 (2003). 309, 1728–1732 (2005).
portable approach to the identification of clones within 19. Read, T. D. et al. Comparative genome sequencing for 32. Bennett, S. Solexa Ltd. Pharmacogenomics 5,
populations of pathogenic microorganisms. Proc. Natl discovery of novel polymorphisms in Bacillus 433–438 (2004).
Acad. Sci. USA 95, 3140–3145 (1998). anthracis. Science 296, 2028–2033 (2002). 33. Bennett, S. T., Barnes, C., Cox, A., Davies, L. &
The first description of the MLST approach and its 20. Alland, D. et al. Modeling bacterial evolution with Brown, C. Toward the 1,000 dollars human genome.
validation for N. meningitidis. comparative‑genome‑based marker systems: Pharmacogenomics 6, 373–382 (2005).
10. Moxon, R., Bayliss, C. & Hood, D. Bacterial application to Mycobacterium tuberculosis evolution 34. Goldberg, S. M. et al. A Sanger/pyrosequencing hybrid
contingency loci: the role of simple sequence DNA and pathogenesis. J. Bacteriol. 185, 3392–3399 approach for the generation of high-quality draft
repeats in bacterial adaptation. Annu. Rev. Genet. 40, (2003). assemblies of marine microbial genomes. Proc. Natl
307–333 (2006). 21. Gutacker, M. M. et al. Genome-wide analysis of Acad. Sci. USA 103, 11240–11245 (2006).
11. Dobrindt, U., Hochhut, B., Hentschel, U. & Hacker, J. synonymous single nucleotide polymorphisms in 35. Parkhill, J. et al. The genome sequence of the food-
Genomic islands in pathogenic and environmental Mycobacterium tuberculosis complex organisms: borne pathogen Campylobacter jejuni reveals
microorganisms. Nature Rev. Microbiol. 2, 414–424 resolution of genetic relationships among closely hypervariable sequences. Nature 403, 665–668
(2004). related microbial strains. Genetics 162, 1533–1543 (2000).
Comprehensive review that presented results on (2002). 36. Gundogdu, O. et al. Re-annotation and re-analysis of
lateral gene transfer for pathogenicity islands in 22. Filliol, I. et al. Global phylogeny of Mycobacterium the Campylobacter jejuni NCTC11168 genome
pathogenic bacteria. Also showed that lateral gene tuberculosis based on single nucleotide polymorphism sequence. BMC Genomics 8, 162 (2007).
transfer is a universal mechanism of evolution. (SNP) analysis: insights into tuberculosis evolution, 37. Baker, S. & Dougan, G. The genome of Salmonella
12. Falkow, S. in Escherichia coli and Salmonella phylogenetic accuracy of other DNA fingerprinting enterica serovar Typhi. Clin. Infect. Dis. 45 (Suppl. 1),
typhimurium: Cellular and Molecular Biology (eds systems, and recommendations for a minimal 29–33 (2007).
Neidhardt, F. C. et al.) 2723–2729 (ASM, standard SNP set. J. Bacteriol. 188, 759–772 38. Cerdeno-Tarraga, A. M. et al. Extensive DNA
Washington DC, 1996). (2006). inversions in the B. fragilis genome control variable
gene expression. Science 307, 1463–1465 Described a metagenomic approach that helped 83. Didelot, X. & Falush, D. Inference of bacterial
(2005). define the gene content and encoded functional microevolution using multilocus sequence data.
39. Bentley, S. D. et al. Sequencing and analysis of the attributes of the gut microbiome in healthy humans Genetics 175, 1251–1266 (2007).
genome of the Whipple’s disease bacterium through whole-genome shotgun sequencing. 84. Didelot, X., Achtman, M., Parkhill, J., Thomson,
Tropheryma whipplei. Lancet 361, 637–644 (2003). 61. Turnbaugh, P. J. et al. An obesity-associated gut N. R. & Falush, D. A bimodal pattern of relatedness
40. Braslavsky, I., Hebert, B., Kartalov, E. & Quake, S. R. microbiome with increased capacity for energy between the Salmonella Paratyphi A and Typhi
Sequence information can be obtained from single harvest. Nature 444, 1027–1031 (2006). genomes: convergence or divergence by
DNA molecules. Proc. Natl Acad. Sci. USA 100, Showed that the microbiome of obese mice has an homologous recombination? Genome Res. 17,
3960–3964 (2003). increased capacity to harvest energy from the diet 61–68 (2007).
41. Kasianowicz, J. J., Brandin, E., Branton, D. & Deamer, through metagenomic and biochemical analysis. Also 85. Marcy, Y. et al. Dissecting biological “dark matter”
D. W. Characterization of individual polynucleotide identified the gut microbiome as being a contributing with single-cell genetic analysis of rare and
molecules using a membrane channel. Proc. Natl factor to the pathophysiology of obesity. uncultivated TM7 microbes from the human mouth.
Acad. Sci. USA 93, 13770–13773 (1996). 62. Tringe, S. G. & Rubin, E. M. Metagenomics: DNA Proc. Natl Acad. Sci. USA 104, 11889–11894
42. Storm, A. J. et al. Fast DNA translocation through a sequencing of environmental samples. Nature Rev. (2007).
solid-state nanopore. Nano Lett. 5, 1193–1197 Genet. 6, 805–814 (2005). 86. Dethlefsen, L. & Relman, D. A. The importance of
(2005). This review presented the methodological advances individuals and scale: moving towards single cell
43. Storm, A. J., Chen, J. H., Zandbergen, H. W. & Dekker, that have allowed natural populations to be microbiology. Environ. Microbiol. 9, 8–10 (2007).
C. Translocation of double-strand DNA through a sequenced and gave examples of metagenomic 87. Dethlefsen, L., Eckburg, P. B., Bik, E. M. & Relman,
silicon oxide nanopore. Phys. Rev. E Stat. Nonlin. Soft studies that have used these techniques. D. A. Assembly of the human intestinal microbiota.
Matter Phys. 71, 051903 (2005). 63. Dethlefsen, L., McFall-Ngai, M. & Relman, D. A. An Trends Ecol. Evol. 21, 517–523 (2006).
44. Fraser-Liggett, C. M. Insights on biology and evolution ecological and evolutionary perspective on human– 88. Relman, D. A. Genome-wide responses of a
from microbial genome sequencing. Genome Res. 15, microbe mutualism and disease. Nature 449, pathogenic bacterium to its host. J. Clin. Invest. 110,
1603–1610 (2005). 811–818 (2007). 1071–1073 (2002).
45. Perna, N. T. et al. Genome sequence of Discussed how humans have co-evolved with their 89. Relman, D. A. & Falkow, S. The meaning and impact
enterohaemorrhagic Escherichia coli O157:H7. microbial partners and examined how evolutionary of the human genome sequence for microbiology.
Nature 409, 529–533 (2001). and ecological principles are relevant to human– Trends Microbiol. 9, 206–208 (2001).
46. Welch, R. A. et al. Extensive mosaic structure revealed microorganism relationships. 90. Turnbaugh, P. J. et al. The Human Microbiome
by the complete genome sequence of uropathogenic 64. Versalovic, J. & Relman, D. How bacterial communities Project. Nature 449, 804–810 (2007).
Escherichia coli. Proc. Natl Acad. Sci. USA 99, expand functional repertoires. PLoS Biol. 4, e430 Together with Reference 89, discusses the Human
17020–17024 (2002). (2006). Microbiome Project, with the focus on its
47. Brochet, M. et al. Genomic diversity and evolution 65. Ley, R. E., Peterson, D. A. & Gordon, J. I. Ecological potential outcomes and conceptual and
within the species Streptococcus agalactiae. Microbes and evolutionary forces shaping microbial diversity in experimental challenges.
Infect. 8, 1227–1243 (2006). the human intestine. Cell 124, 837–848 (2006). 91. Bentley, S. D. et al. Meningococcal genetic variation
48. Brzuszkiewicz, E. et al. How to become a uropathogen: 66. Eckburg, P. B. et al. Diversity of the human intestinal mechanisms viewed through comparative analysis of
comparative genomic analysis of extraintestinal microbial flora. Science 308, 1635–1638 (2005). serogroup C strain FAM18. PLoS Genet. 3, e23
pathogenic Escherichia coli strains. Proc. Natl Acad. 67. Doolittle, W. F. & Papke, R. T. Genomics and the (2007).
Sci. USA 103, 12879–12884 (2006). bacterial species problem. Genome Biol. 7, 116 92. Sacchi, C. T., Whitney, A. M., Reeves, M. W., Mayer,
49. Tettelin, H., Medini, D., Donati, C. & Masignani, V. (2006). L. W. & Popovic, T. Sequence diversity of Neisseria
Towards a universal group B Streptococcus vaccine 68. Goldenfeld, N. & Woese, C. Biology’s next revolution. meningitidis 16S rRNA genes and use of 16S rRNA
using multistrain genome analysis. Expert Rev. Nature 445, 369 (2007). gene sequencing as a molecular subtyping tool. J.
Vaccines 5, 687–694 (2006). 69. Alm, R. A. et al. Genomic-sequence comparison of Clin. Microbiol. 40, 4520–4527 (2002).
50. Hogg, J. S. et al. Characterization and modeling of the two unrelated isolates of the human gastric 93. Deng, W. et al. Comparative genomics of Salmonella
Haemophilus influenzae core and supragenomes pathogen Helicobacter pylori. Nature 397, 176– enterica serovar Typhi strains Ty2 and CT18. J.
based on the complete genomic sequences of Rd and 180 (1999). Bacteriol. 185, 2330–2337 (2003).
12 clinical nontypeable strains. Genome Biol. 8, R103 70. Parkhill, J. et al. Complete DNA sequence of a
(2007). serogroup A strain of Neisseria meningitidis Z2491. Acknowledgements
51. Hiller, N. L. et al. Comparative genomic analyses of Nature 404, 502–506 (2000). This paper was inspired by a meeting entitled ‘Microbial
seventeen Streptococcus pneumoniae strains: insights 71. Tettelin, H. et al. Complete genome sequence of population genomics: sequence, function and diversity’,
into the pneumococcal supragenome. J. Bacteriol. Neisseria meningitidis serogroup B strain MC58. which was organized by D.M., D.S. and R.R. in Siena, Italy,
189, 8186–8195 (2007). Science 287, 1809–1815 (2000). 17–19 January 2007. The authors thank M. Achtman,
52. Ochman, H., Lawrence, J. G. & Groisman, E. A. Lateral 72. Silverman, M., Zieg, J., Hilmen, M. & Simon, M. Phase S. Bentley, C. Buchreiser, D. Caugant, L.L. Cavalli Sforza,
gene transfer and the nature of bacterial innovation. variation in Salmonella: genetic analysis of a A. Covacci, J. Dunning Hotopp, D. Falush, P. Glaser,
Nature 405, 299–304 (2000). recombinational switch. Proc. Natl Acad. Sci. USA 76, J. Hacker, W. Hanage, D. Hood, M. Maiden, J. Telford and
53. Maione, D. et al. Identification of a universal group B 391–395 (1979). H. Tettelin for their contribution to the meeting and G. Corsi
Streptococcus vaccine by multiple genome screen. 73. Kuwahara, T. et al. Genomic analysis of Bacteroides for artwork.
Science 309, 148–150 (2005). fragilis reveals extensive DNA inversions regulating
The use of different GBS strains for genome cell surface adaptation. Proc. Natl Acad. Sci. USA 101, The authors declare competing financial interests: see web
analysis and screening enabled the development of 14919–14924 (2004). version for details.
a universal vaccine. 74. Chambaud, I. et al. The complete genome sequence
54. Lauer, P. et al. Genome analysis reveals pili in group B of the murine respiratory pathogen Mycoplasma
Streptococcus. Science 309, 105 (2005). pulmonis. Nucleic Acids Res. 29, 2145–2153 DATABASES
55. Handelsman, J. Metagenomics: application of (2001). Entrez Genome Project: http://www.ncbi.nlm.nih.gov/
genomics to uncultured microorganisms. Microbiol. 75. Liu, M. et al. Reverse transcriptase-mediated tropism entrez/query.fcgi?db=genomeprj
Mol. Biol. Rev. 68, 669–685 (2004). switching in Bordetella bacteriophage. Science 295, Bacillus anthracis | Bacillus cereus | Bacillus thuringiensis |
56. National Research Council. The New Science of 2091–2094 (2002). Bacteroides fragilis | Bordetella bronchiseptica |
Metagenomics: Revealing the Secrets of Our 76. Medini, D., Donati, C., Tettelin, H., Masignani, V. & Campylobacter jejuni | Candidatus Carsonella ruddii |
Microbial Planet (The National Academies, Rappuoli, R. The microbial pan-genome. Curr. Opin. Escherichia coli CFT073 | Escherichia coli K12 | Escherichia coli
Washington DC, 2007). Genet. Dev. 15, 589–594 (2005). O157:H7 | Haemophilus influenzae | Helicobacter pylori |
This book describes the applications and 77. Rusch, D. B. et al. The Sorcerer II Global Ocean Mycobacterium tuberculosis | Mycoplasma pulmonis |
challenges of metagenomics. Sampling expedition: northwest Atlantic through Neisseria meningitidis | Solibacter usitatus | S. typhi |
57. Tringe, S. G. et al. Comparative metagenomics of eastern tropical Pacific. PLoS Biol. 5, e77 (2007). Streptococcus agalactiae | Streptococcus pneumoniae |
microbial communities. Science 308, 554–557 78. Yooseph, S. et al. The Sorcerer II Global Ocean Tropheryma whipplei
(2005). Sampling expedition: expanding the universe of
58. Tyson, G. W. et al. Community structure and protein families. PLoS Biol. 5, e16 (2007). FURTHER INFORMATION
metabolism through reconstruction of microbial 79. Medini, D., Covacci, A. & Donati, C. Protein homology GenBank database: http://www.ncbi.nlm.nih.gov/Genbank/
genomes from the environment. Nature 428, 37–43 network families reveal step-wise diversification of index.html
(2004). Type III and Type IV secretion systems. PLoS Comput. Genomes OnLine Database (GOLD): http://www.
Reported the first assembled genome to emerge Biol. 2, e173 (2006). genomesonline.org/
from a metagenomic analysis of environmental 80. Rappuoli, R. Reverse vaccinology. Curr. Opin. Nature Collection on Metagenomics: http://www.nature.
samples. Microbiol. 3, 445–450 (2000). com/nature/supplements/collections/metagenomics
59. Venter, J. C. et al. Environmental genome shotgun Definition of the reverse-vaccinology approach and MLST Public Repository: http://www.mlst.net/
sequencing of the Sargasso Sea. Science 304, 66–74 its applications. Nature Reviews Microbiology Focus on Metagenomics:
(2004). 81. Pizza, M. et al. Identification of vaccine candidates http://www.nature.com/nrmicro/focus/metagenomics/
The first work in which the whole-genome shotgun- against serogroup B meningococcus by whole-genome index.html
sequencing technique was applied to an sequencing. Science 287, 1816–1820 (2000). Ribosomal Database Project II: http://rdp.cme.msu.edu/
environmental sample. Greatly improved our The first application of reverse vaccinology: ClonalFrame homepage: http://bacteria.stats.ox.ac.uk/
understanding of the complexity and variability of identified new vaccine candidates against 454 technology: http://www.454.com
the world of uncultured bacteria. N. meningitidis B. SOLiD technology: http://www.appliedbiosystems.com
60. Gill, S. R. et al. Metagenomic analysis of the human 82. Jolley, K. A., Chan, M. S. & Maiden, M. C. mlstdbNet SOLEXA technology: http://www.illumina.com
distal gut microbiome. Science 312, 1355–1359 — distributed multi-locus sequence typing (MLST) All links are active in the online pdf
(2006). databases. BMC Bioinformatics 5, 86 (2004).