You are on page 1of 19

Molecular Ecology (2006) 15, 17131731

doi: 10.1111/j.1365-294X.2006.02882.x

Blackwell Publishing Ltd

INVITED REVIEW

Microbial ecology in the age of genomics and metagenomics: concepts, tools, and recent advances
JIANPING XU Department of Biology, McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada, and Institute of Tropical Medicine, Hainan Medical College, Haikuo, Hainan, China

Abstract
Microbial ecology examines the diversity and activity of micro-organisms in Earths biosphere. In the last 20 years, the application of genomics tools have revolutionized microbial ecological studies and drastically expanded our view on the previously underappreciated microbial world. This review first introduces the basic concepts in microbial ecology and the main genomics methods that have been used to examine natural microbial populations and communities. In the ensuing three specific sections, the applications of the genomics in microbial ecological research are highlighted. The first describes the widespread application of multilocus sequence typing and representational difference analysis in studying genetic variation within microbial species. Such investigations have identified that migration, horizontal gene transfer and recombination are common in natural microbial populations and that microbial strains can be highly variable in genome size and gene content. The second section highlights and summarizes the use of four specific genomics methods (phylogenetic analysis of ribosomal RNA, DNADNA reassociation kinetics, metagenomics, and micro-arrays) in analysing the diversity and potential activity of microbial populations and communities from a variety of terrestrial and aquatic environments. Such analyses have identified many unexpected phylogenetic lineages in viruses, bacteria, archaea, and microbial eukaryotes. Functional analyses of environmental DNA also revealed highly prevalent, but previously unknown, metabolic processes in natural microbial communities. In the third section, the ecological implications of sequenced microbial genomes are briefly discussed. Comparative analyses of prokaryotic genomic sequences suggest the importance of ecology in determining microbial genome size and gene content. The significant variability in genome size and gene content among strains and species of prokaryotes indicate the highly fluid nature of prokaryotic genomes, a result consistent with those from multilocus sequence typing and representational difference analyses. The integration of various levels of ecological analyses coupled to the application and further development of high throughput technologies are accelerating the pace of discovery in microbial ecology.
Keywords: Cryptococous, gene genealogy, microbial diversity, microbial sex, systems microbiology Received 22 September 2005; revision accepted 14 December 2005

Introduction
Micro-organisms have been integral to the history and function of life on Earth. They have played central roles in Earths climatic, geological, geochemical, and biological

Correspondence: Jianping Xu, Fax: 1-905-522-6066; E-mail: jpxu@mcmaster.ca 2006 Blackwell Publishing Ltd

evolution. However, until very recently, the general importance of micro-organisms has been appreciated by only a few specialists. Indeed, micro-organisms are still most often considered from an anthropocentric perspective, with attention focused on the relatively few species that cause human diseases and the potential of micro-organisms to provide useful products and services. The recent advances in genomics are offering fresh perspectives on this previously underappreciated microbial world.

1714 J . X U The microbial world contains a highly heterogeneous group of organisms sharing only one common characteristic, their small sizes. These organisms make up two (out of three) entire Domains of life on Earth, the prokaryotic Bacteria and Archaea (Woese 1987). Within the third Domain, Eukarya, the majority of the phylogenetic diversity is contained within eukaryotic micro-organisms such as protozoa, algae, and fungi. The prokaryotic life emerged about 3.8 billion years ago, about 2 billion years before eukaryotic life arose. Currently, microbial life forms are found in virtually every imaginable ecological niche on Earth, from the tropics to the Arctic and Antarctica, from underground mines and oil fields to the stratosphere and the top of great mountains, from deserts to the Dead Sea, from above-ground hot springs to underwater hydrothermal vents. Microbial ecology examines the diversity of microorganisms and how micro-organisms interact with each other and with their environment to generate and to maintain such diversities. Consequently, microbial ecologists have traditionally focused on two areas of study: (i) microbial diversity, including the isolation, identification and quantification of micro-organisms in various habitats; and (ii) microbial activity, that is, what micro-organisms are doing in their habitats and how their activities contribute to the observed microbial diversity and biogeochemical cycling. Microbial diversity in the environment can be measured by various indices such as phylogenetic diversity, species diversity, genotype diversity, and gene diversity (Box 1). Above the species level, microbial diversity is commonly quantified based on evolutionary distances among observed taxonomic groups from a specific environment (e.g. the phylogenetic diversity based on a common chronometer such as the 16S ribosomal RNA subunit). Below the species level, microbial diversity is typically described using population genetic parameters such as gene diversity and genotype diversity. Gene diversity and genotype diversity refer respectively to the probability that two randomly drawn genes and genotypes in a population will be different. At the species level, microbial diversity is measured as species diversity. There are various measures of species diversity. One commonly used measure refers to the frequency that two randomly drawn individuals in an environment will be different species. This measure takes into account both the number of species (species richness) and the frequency of each species (species abundance) in the environment. Conceptually, this measure of species diversity is similar to those used for gene diversity and genotype diversity. Species is the fundamental unit of biological classification and is critical for describing, understanding and comparing biological diversities at different levels among ecological niches. However, what constitute a species remains controversial. For sexual organisms with the meiotic life cycle (such as the majority of plants, animals and sexual microbial eukaryotes), although over 20 species concepts exist in the literature (Mayden 1997), the most widely used is the biological species concept. In this concept, a species consists of individuals capable of interbreeding with each other to produce fertile progeny but are incapable of doing so with members of other species. However, this definition is not applicable to asexual organisms lacking a regular meiotic life cycle. Such organisms include a large proportion of eukaryotic micro-organisms as well as all prokaryotes. Because most prokaryotes lack diagnostic morphological characteristics, have no meiotic sexual life cycle, but can exchange genetic materials among each other in unusual ways, the biological species concept is not applicable to them. Instead, the current most widely accepted species concept for prokaryotes is an operational one, rooted in the degree of DNADNA re-association. In this definition, two strains belong to the same species when their purified genomic DNA show at least 70% hybridization. This level of hybridization is equivalent to 94% average nucleotide identity at the whole genome scale (Konstantinidis & Tiedje 2005). It should be noted that this prokaryotic species concept does not translate well to that in plants and animals. For example, using this criterion, all members of primates (e.g. chimpanzees, orangutans, gorillas, gibbons and humans) would be belonging to the same species (Sibley et al. 1990). Because of these and other reasons, species concepts for both prokaryotes and eukaryotes are still evolving (e.g. Cohan 2004; Konstantinidis & Tiedje 2005) (Box 2). The spatial and temporal distributions of microbial diversities are the subjects of microbial population genetics

Box 1 Measures of microbial diversity in natural environments Nucleotide diversity Gene diversity Genotype diversity Species diversity Phylogenetic diversity Evolutionary diversity Ecological niche diversity Functional diversity Morphological diversity Structural diversity Metabolic diversity Metabolite diversity Protein diversity

Box 2 The current species concepts for prokaryotes (bacteria and archaea) and eukaryotes (plants, animals and eukaryotic microbes such as fungi, protozoa and algae) are not comparable.

2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

M I C R O B I A L E C O L O G I C A L G E N O M I C S 1715 and biogeography. The patterns of distributions are often discussed in the context of environmental factors such as temperature, pH, salinity, pressure, the availabilities of water and nutrients, and the sources of energy and carbon. These ecological factors influence microbial activities and play very important roles in determining the spatial and temporal dynamics of micro-organisms in natural environments. Consequently, microbial ecologists often group micro-organisms into specific metabolic categories. For example, depending on the energy source, micro-organisms are called either phototrophs (obtaining energy from light) or chemotrophs (obtaining energy from chemicals). Among chemotrophs, if the energy sources are from inorganic molecules (such as H2S, H2, NH3, and Fe2+), they are called chemolithotrophs. In contrast, if their energy sources are from organic compounds, they are called chemoorganotrophs. Similarly, depending on the carbon source, microorganisms can be either autotrophs (obtaining carbon from inorganic sources such as CO2 and HCO3 ) or heterotrophs (obtaining carbon from organic compounds). Some microorganisms, either in a free-living state or in association with other organisms, can use atmospheric nitrogen as its nitrogen source. Indeed, the diversity of microbial metabolisms extends far beyond the typical animal and plant metabolic capabilities. Even more striking are the extreme environmental conditions where many micro-organisms are found and thriving. These conditions include extreme high and low pressure, pH, oxygen and metal concentration, salinity, radiation, desiccation, and temperatures (Rothschild & Mancinelli 2001). For example, the nitratereducing chemolithoautotroph Pyrolobus fumarii can grow at temperatures of up to 113 C (Blochl et al. 1997). Micro-organisms in the environment are commonly organized into several levels of hierarchical organizations, from simple to complex: individuals, populations, guilds (metabolically related populations), communities (sets of interacting guilds), and ecosystems. A microbial ecosystem consists both the microbial community and its interacting biotic (macro-organisms such as plants and animals) and abiotic environmental factors (pH, temperature, inorganic and organic nutrients, etc.). While we commonly associate micro-organisms as decomposers of organic wastes and pathogens of plants, animals and humans, microorganisms can also form mutualistic associations with each other as well as be fierce predators of other micro-organisms. For example, the minute bacteria Bdellovibrio (0.3 m in diameter) can quickly destroy an Escherichia coli cell many times its own size (1 2 m) (Nunez et al. 2003). Until very recently, most of what we know about microbial diversity and microbial activity were derived from cultured microbes and ex situ laboratory experimental investigations. While such studies are essential, recent investigations using high resolution microelectronic, microscopic, and genomic tools have shown that much of what
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

we thought we knew about our natural microbial world were in fact highly biased. In the following sections, I will first provide a brief introduction to the main genomic methods that have been used to examine natural microbial populations and communities. This is then followed by three topics dealing with the impact of genomics on microbial ecology. The first topic is on the widespread application of DNA-based genomics technologies in microbial population studies in two specific areas: (i) the use of multilocus sequence typing to address a variety of ecological questions; and (ii) the use of representational difference analysis (RDA) to investigate genome size and gene content differences among bacterial strains. The second topic is on how genomic methods have been used to reveal unexpected microbial diversities in natural populations and communities. I pay special attention to how phylogenetic typing and metagenomics are transforming our views of the diversity and activity of micro-organisms in their natural habitats. The third topic summarizes how large-scale genome sequencing projects have provided unprecedented insights on the potential functions and activities of various groups of microorganisms. I will conclude with a discussion on some of the long-standing unresolved questions and future perspectives. It should be pointed out that the field of microbial ecological genomics is progressing rapidly with thousands of publications accumulated in the last several years alone. Therefore, an exhaustive review is not possible. Instead, I have used selected examples to illustrate the impact of genomics on our current understanding of microbial ecology and its potential implications for future research.

Genomics tools
The word genomics has become a trendy term widely used by the scientific community and the general public. Originally, the term was used to describe a specific discipline in genetics that deals with mapping, sequencing and analysing genomes. A genome refers to the complete set of genes and chromosomes in an organism. While many people use genomics in this narrow sense, an increasing number of people have expanded its use to include functional analysis of entire genomes as well. These functional analytical aspects include those on whole genome RNA transcripts (called transcriptomics), proteins (proteomics), and metabolites (metabolomics). In addition, various combinations of -omics terms have recently become highly fashionable. For example, the discipline that uses genomics methods to analyse natural ecological communities has been called metagenomics, ecological genomics, community genomics, and environmental genomics. In this section, the main genomics tools and methods are briefly described with a focus on those dealing with DNA (Box 3).

1716 J . X U

Box 3 Genomic methods in microbial ecology research DNA sequencing Polymerase chain reaction DNA cloning systems (plasmid, lambda-phage, cosmid, bacterial artificial chromosome or BAC, yeast artificial chromosome or YAC) DNA re-association Fluorescent in situ hybridization (FISH) Micro-array technology Representational difference analysis (RDA) 2-D gel electrophoresis Denaturing gradient gel electrophoresis (DGGE)

Gas chromatography Mass spectrophotometry Bioinformatics

DNA sequencing
The most significant technical advance in genomic is the development of efficient, high throughput DNA-sequencing techniques and instruments. While the basic principle for DNA sequencing was established in the mid-1970s, it was not until the mid-1990s when efficient automated DNA sequencers and fluorescent dyes to tag the dideoxyribonucleotides (with one colour for each of the four types of nucleotides) were developed. At present, high throughput DNA sequencing facilities are found in most academic institutions and many molecular biology laboratories. Furthermore, faster and cheaper sequencing methods and equipment are continuously developed. For example, the recently developed pyrosequencing protocol used a novel fibre-optic slide of individual wells. This method could sequence 25 million bases in one 4-hour run with an accuracy of 99.96% (Margulies et al. 2005).

allow the separation and amplification of individual DNA sequences from often unknown but heterogeneous gene pools. A large variety of such systems is now available to accommodate different types and sizes of DNA fragments. For example, depending on the size of fragments for cloning, the vectors may be based on plasmids (optimal range of DNA fragments 0.52 kb, upper limit, 10 kb), bacteriophages (710 kb, 20 kb), cosmids or fosmids (35 40 kb; 45 kb), bacterial artificial chromosomes (BAC, 80120 kb, 200 kb), and yeast artificial chromosomes (YAC, 200 800 kb, 1.5 Mb). Vectors with large insert capacities are ideal for studying genome organizations of unculturable microorganisms in the environment. For example, the blooming field of metagenomics has benefited significantly from the cosmid, BAC and YAC cloning systems.

Hybridization techniques
Several other traditional DNA analytical techniques have also been widely used in microbial ecological studies. These include DNA re-association kinetic analysis and fluorescent in situ hybridization (FISH). Using fluorescently tagged specific probes, FISH allows the direct observation and estimation of micro-organisms from specific species, genera, families or phyla in a given environmental sample. In contrast, the analyses of DNA re-association kinetics can be used to provide estimates on the diversity of microbial genomes in environmental DNA samples. More recently, the high throughput micro-array technology has been applied to analyse the distributions of genes and species in natural microbial consortia (Zhou 2003). DNA micro-arrays are glass surfaces to which arrays of specific DNA fragments of various lengths have been attached at discrete locations. These fragments serve as probes for hybridization. Under conditions suitable for hybridization, the DNA spots on the chip are exposed to a solution containing a complex sample of fluorescent-labelled DNA. These arrays may contain probes of lengths from 25 to several hundred or even over a thousand base pairs. While most micro-arrays are derived from single genomes, arrays containing specific genes from multiple genomes
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

Polymerase chain reaction


The second tool is the polymerase chain reaction (PCR) that allows the analysis of minute amount of DNA from laboratory and environmental sources. In combination with appropriate DNA extraction protocols, PCR allows highly selective amplification of target DNA. Indeed, the PCR technique is permeating almost every aspect of biological research, including many other DNA-based genomics techniques. As will be shown below, in combination with various gel electrophoresis techniques such as the denaturing gradient gel electrophoresis (DGGE), amplification and analysis of the nuclear small ribosomal RNA gene from environmental samples have significantly enhanced our understanding and appreciation of natural microbial diversities.

DNA cloning systems


The third highly useful genomics tool for microbial ecological studies is the availability of efficient in vivo cloning systems (including cloning vectors and hosts). These systems

M I C R O B I A L E C O L O G I C A L G E N O M I C S 1717 can also be very useful for studying the distributions and activities of groups of micro-organisms in nature (Zhou 2003; Lehner et al. 2005).

Bioinformatics
Of all the methods mentioned above, none would have been successful in microbial ecological research without bioinformatics tools. Broadly defined, bioinformatics refers to the use of computers to seek patterns in the observed biological data and to propose mechanisms for such patterns. As can be seen from below, bioinformatics not only can help us directly address experimental research objectives but also can integrate information from various sources and seeks patterns not achievable through experimentation alone.

Representational difference analysis


Because large-scale DNA sequencing is still an expensive enterprise, for most species, only one of two strains will be completely sequenced. To study variation among strains in species with sequenced representatives, a technique called representational difference analysis (RDA) has been developed (Lisitsyn et al. 1993). This method combines several molecular techniques such as DNADNA reassociation, selective PCR, cloning, and DNA sequencing. This technique is especially powerful for genome size and gene content comparisons among strains in prokaryotic species. This is because strains in many prokaryotes vary widely in their genome sizes and the differences often contribute to their metabolic and ecological differences (e.g. Bergthorsson & Ochman 1995, 1998; Table 1; see also below section Unexpected microbial diversity from environmental sources as revealed by genomics tools).

Genomics tools in ecological genetics studies of cultured microbial populations


This section highlights the impact of DNA-based molecular techniques on our understanding of microbial diversity at below the species level. I will provide examples in two specific areas. The first is on how multilocus sequence typing (MLST) has improved our understanding of microbial diversity and population structure, with a special focus on the inferences of the relative roles of clonality and recombination in generating genotype diversity in microbial populations. The second topic is on how the use of RDA can help us reveal the tremendous diversity in genome content among microbial strains.

Tools for the analyses of the transcriptome, proteome, and metabolome


Aside from advances in techniques for analysing DNA, technical breakthroughs for analysing messenger RNA (mRNA), proteins, metabolites as well as interactions among these cellular constituents have also become common. For example, the high throughput micro-array technology has greatly increased the efficiency of genome-wide gene expression studies, allowing the analysis of potential genomeenvironment interaction of microbial communities in both laboratory and natural settings. Similarly, 2-D gel electrophoresis, mass spectrometry, and gas chromatography are providing unprecedented access to the constituents of microbial community proteins and small metabolites.

Multilocus sequence typing


The development of highly affordable, reliable, and efficient DNA sequencing technology has accelerated many areas of scientific research. One prominent example is the multilocus sequence typing (MLST) of microbial populations. As the name suggests, MLST refers to the use of DNA sequences from multiple regions in the genome for discriminating strains in populations. Though the term was coined only in 1998 for typing human bacterial pathogens (Maiden et al. 1998), its use in microbial ecological and evolutionary analyses dates back more than two decades ago. It has various other synonyms such as multiple gene genealogical analysis (MGGA) or comparative genealogical analysis (CGA) (e.g. Xu et al. 2000; Xu 2005). There are several advantages of analysing multiple loci over the analysis of data based on a single locus: (i) it can generate more information, thus generally more robust conclusions; (ii) it samples multiple regions of the genome and thus results are more representative of the whole genome; and (iii) in many prokaryotes, horizontal gene transfer is very common and if the selected single gene happened to have been horizontally transferred, information derived from this gene will not be representative of other parts of the genome (Xu 2005). Compared to other types of strain-typing methods (e.g. multilocus enzyme electrophoresis or MLEE, random

Table 1 Genome size and ecological niche comparisons among 250 sequenced prokaryotic genomes (habitat classification and data are based on NCBI information as of August 2005) Genome size (Mb) Habitat Terrestrial Multiple Aquatic Host-associated Specialized Unknown No. of genomes 11 65 26 122 23 3 Mean ( SD) 4.92 ( 1.13) 4.29 ( 1.87) 3.14 ( 1.60) 2.57 ( 1.64) 2.29 ( 0.92) 3.47 ( 2.37) Range 3.287.25 1.409.12 1.317.15 0.499.11 0.715.37 0.805.31

2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

1718 J . X U amplified polymorphic DNA or RAPD, amplified fragment length polymorphisms or AFLP, restriction fragment length polymorphisms or RFLP, PCR-RFLP and PCR fingerprinting) that have been applied to analyse microbial populations, DNA sequence-based typing has many advantages. First, nucleotides in a DNA sequence are unambiguous. Such certainty is essential for many analyses. Second, nucleotides in a given DNA fragment typically share extended evolutionary history. Such sharing cannot be assumed between genetic markers in different parts of genomes as those obtained with other methods. Third, DNA sequences can be easily stored in and retrieved from public databases such as GenBank. Existence of such public databases makes data-sharing among investigators possible. Fourth, many analytical tools for DNA sequences are available. Indeed, many methods have been developed to infer a variety of processes governing the changes in populations and species (Xu 2005). MLST has been used to study the ecological genetics of many microbial populations. It provides fine-scale measures of gene diversity and genotype diversity among microbial populations. These patterns of diversity have been used to infer a variety of ecological and evolutionary processes such as gene flow, cryptic speciation, hybridization, and the relative importance of clonality and recombination among analysed populations (Box 4). In human pathogenic bacteria where much of the initial MLST work was carried out, MLST allows the identification of medically important strains and clones. There are several recent topical reviews for readers interested in MLST of human bacterial pathogens (e.g. Urwin & Maiden 2003; Feil & Enright 2004). In contrast, other environmentally more relevant groups of micro-organisms are less researched or discussed. Using specific examples, the following two subsections illustrate how MLST has been used to address microbial ecological questions. The first subsection provides a brief description on how MLST has been used to address evolutionary divergence, dispersion, hybridization, and the origin of a population in a soil basidiomycete fungus, Cryptococcus neoformans. The second subsection highlights recent evidence for recombination in natural populations of viruses, bacteria, protozoa, algae and fungi. MLST in C. neoformans. C. neoformans (= Filobasidiella neoformans) is a soil fungus that can cause significant infections in humans and other mammals throughout the world. This species has been traditionally classified into five serotypes A, B, C, D, and AD. To understand the evolutionary relationships among strains, geographic populations, and serotypes and to address ecological genetic questions, a series of gene genealogybased studies were conducted. The first analysed 34 strains from various locations around the world, including 14 serotype A strains, 7 serotype D strains, 3 serotype B strains, 5 serotype C strains, 3 serotype AD strains and 2 strains whose serotypes could not be determined (Xu et al. 2000). Fragments of four genes were analysed for each strain, three from different chromosomes of the nuclear genome and one from the mitochondrial genome. Phylogenetic analysis of each of the four genes indicated considerable divergence among serotypes A, D, B, and C, suggesting that individual serotypes A, D, B, and C are good phylogenetic species (Fig. 1). However, there was little geographic pattern of genetic variation. No correlation between geographic distance and DNA sequence divergence among strains was observed either within a serotype or the whole analysed population. The results are consistent with recent dispersals of C. neoformans throughout the world (Xu et al. 2000; Xu 2002). Strains of serotype AD were quite different from those of strains A, B, C, and D. While most predominantly strains of serotypes A, B, C, and D examined so far were haploids, strains of serotype AD are diploid or aneuploid. Furthermore, direct sequencing of PCR products from serotype AD strains often failed to obtain clear chromatograms and DNA sequences. Such results suggested sequence heterogeneity within individual strains. To investigate their origin and relationships to strains of other serotypes, alleles of two different genes from strains of serotype AD were individually cloned, sequenced and compared to strains of serotypes A, B, C, and D (Xu et al. 2002; Xu & Mitchell 2003). Sequence comparisons revealed that most strains contained two different alleles with one allele highly similar to the serotype A group and the other to the serotype D group. Further phylogenetic analyses identified that these serotype AD strains were recent hybrids between strains of serotypes A and D, and that there have been multiple hybridization events in C. neoformans (Fig. 2; Xu et al. 2002; Xu & Mitchell 2003). A recent study applied the same MLST method to identify the origin of a Cryptococcus population responsible for an unusual outbreak in animal and human populations on Vancouver Island, British Columbia, Canada (Kidd et al. 2005). The analyses suggested that the Vancouver Island population contained at least two evolutionary divergent elements shared by strains from many other geographic areas, consistent with cryptic speciation and recent migration observed earlier for Cryptococcus (Xu et al. 2000; Kidd et al. 2005).
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

Box 4 MLST is a powerful method to address a variety of ecological issues in microbial populations Clonality Recombination Speciation/historical divergence Gene flow/dispersion/ migration Hybridization Niche specialization Host shifts Adaptive evolution

M I C R O B I A L E C O L O G I C A L G E N O M I C S 1719

Fig. 1 One most parsimonious tree for 34 isolates of Cryptococcus neoformans from each of the four gene regions sequenced. CI, consistency index; RI, retention index. Numbers above each branch are bootstrap values > 50% and based on 500 replicates. For URA5 and LAC trees, branches with > 50% of bootstrap values were also strict consensus branches. Strain designation indicates serotype, isolate name, and geographic origin (CA, California; NYC, New York City; NC, North Carolina, all from the USA). With the exception of five strains (see text), all major phylogenetic groups correspond to traditional classifications. Of the two serologically untypable strains, one (M0024) clustered consistently with the serotype D group and the other (M0053) clustered consistently with the serotype A group. Two of the three strains of serotype AD, CN110.97 and CN196.88, clustered consistently with the serotype A group, while the other (KW5) lacked a consistent affinity with any of the serotypes. Scale bar represents one nucleotide substitution. (Xu et al. 2000). Reproduced by permission.

Clonality and recombination in microbial populations. All microbes can reproduce asexually and generate clones and clonal lineages. As expected, in natural populations of all microbial species examined (including viruses, bacteria, protozoa, algae and fungi), signatures of clones and clonal lineages are commonly found. These population genetic signatures include (i) limited or lack of genetic variation among individuals, (ii) over-representation of certain genotypes, and (iii) significant associations among alleles located on the same or different genomic regions (Xu 2004). While clonal reproduction is expected and commonly observed in natural microbial populations, the importance of recombination has been rather obscure (e.g. Lenski 1993; Maynard Smith et al. 1993; Feil & Enright 2004). Unlike plants and animals where sexual reproduction (hence recombination)
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

can often be observed directly in nature, recombination in natural populations of micro-organisms has to be inferred using gene and genotype frequencies. The key notion of this inference is that in purely clonal populations, alleles from genes in different parts of the genome should give identical evolutionary patterns among individuals in the population and that these alleles should be in significant linkage disequilibria. In contrast, recombination would break up these associations and generate linkage equilibrium. Using MLST, congruent genealogies for genes distributed in diverse genomic locations would be consistent with clonality and incongruent genealogies suggest recombination. Over the past two decades, numerous studies have confirmed that genetic recombination is ubiquitous in natural populations of viruses, bacteria, protozoa, algae, and fungi.

1720 J . X U
Fig. 2 One of the 10 most parsimonious trees for the 28 LAC sequences from 14 strains of serotype AD in Cryptococcus neoformans. For comparison, five representative sequences from serotype A (E1, CN-A, MMRL750, J10 and ZG280) and five from serotype D (B10, CN-D, J9, MMRL751 and MMRL757) were included in this figure. These 10 sequences were shown in Fig. 1 and represented the genetic diversity of serotypes A and D strains. Numbers above branches are bootstrap values > 50% and based on 1000 replicates. Designations for strains of serotypes A and D included the isolate name, geographic origin (CA, California; NYC, New York City, both in the USA), and serotype. For the 28 serotype AD sequences, strain designations are followed by 1 or 2 to indicate the two alleles within each strain. Midpoint rooting is used for this phylogeny but the tree topology is identical to that when serotype B or C sequences were used as outgroups. Scale bar represents one nucleotide substitution (Xu et al. 2002). Reproduced by permission.

Box 5 Genomic studies suggest all microbial populations have a clonal component. However, signatures of recombination are pervasive in natural populations of viruses, bacteria, fungi, algae and protozoa. Despite significant efforts, no ancient asexual microbes have been convincingly demonstrated.

2002). Recombination has also been observed in many bacteriophages (Hendrix 2003), plant viruses (Keese & Gibbs 1993), and animal and human viruses. Examples of human viruses exhibiting recombination include, but are not limited to, the dengue virus (Tolou et al. 2001), the human immunodeficiency virus (Yamaguchi et al. 2003), and the hepatitis B virus (Miyakawa & Mizokami 2003). Recombination in prokaryotes. MLST has revealed abundant evidence for recombination in natural populations of prokaryotes. Some of the well-known examples include the common human pathogens Escherichia coli, Neisseria meningitidis, Streptococcus pneumoniae, Hemophilus influenzae, and Staphylococcus aureus (e.g. Feil & Spratt 2001). Different degrees of recombination were detected in populations of these species, with E. coli and H. influenzae showing relatively low rates of recombination while N. meningitidis, Str. pneumoniae, and Sta. aureus showed high rates (Feil & Spratt 2001). In nonhuman pathogenic bacteria such as the nitrogen-fixing bacterium Sinorhizobium meliloti, evidence for recombination is also pervasive (Sun S. and Xu J., unpublished). Recent comparative analysis of whole prokaryotic genomes identified that bacterial recombination often extends beyond the traditional species boundary. This phenomenon is commonly referred to as horizontal gene transfer or lateral gene transfer and includes genetic exchange between species from different genera, families,
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

Indeed, despite extensive investigations, and while the frequencies of recombination have been difficult to quantify, no ancient asexual microbial populations or species have been found (Box 5). Below are a few recent examples of genetic recombination identified in natural populations of representative groups of micro-organisms using MLST or whole genome sequences. Recombination in viral populations. One of the best-known examples of viral sexuality is probably that of the influenza A virus the causal agent of the common human flu. This virus has a genome with eight segments of single-stranded RNA. When co-infection of different viral strains occurs, a large number of recombinant influenza A viruses can be produced. These recombinants generate antigenic shifts and have been credited for some of the deadliest flu epidemics in recent human history (Capua & Alexander

M I C R O B I A L E C O L O G I C A L G E N O M I C S 1721 and occasionally across kingdoms and/or domains. Indeed, signatures of horizontal gene transfer are ubiquitous among the sequenced prokaryotic genomes (e.g. Koonin 2003). Recombination in eukaryotic microbes. Similar to observations in natural viral and bacterial populations, molecular investigations have identified that almost all eukaryotic microbial populations show signatures of recombination in nature. Examples include those from the algal species Bostrychia moritziana (West & Zuccarello 1999); pathogenic protozoan species such as Trypanosoma cruzi (the causal agent of African sleeping sickness, Bogliolo et al. 1996) and the malaria parasites Plasmodium falciparum (Conway et al. 1999) and Plasmodium vivax (Putaporntip et al. 2002); fungal species such as C. neoformans mentioned above (Xu & Mitchell 2003). Interestingly, many of the fungal species previously thought to reproduce only asexually (the Deuteromycota or Fungi Imperfecti) have been found to contain signatures of recombination in natural populations (Xu 2005). Among examined fungi, the degrees of sexuality differ greatly, from panmictic to largely clonal (James 2005; Pujol et al. 2005; Xu et al. 2005). At present, plant and human pathogens dominate the examined species in the literature. However, limited evidence from other groups of fungi suggests a similar pattern: abundant evidence for clonality and limited but unambiguous evidence for recombination (James 2005; Pujol et al. 2005; Xu et al. 2005). specific genes in the above-mentioned species were found to play important roles in ecological adaptations such as host specificity, nutrient utilization, stress tolerance, pathogenicity, and antibiotic resistance. Below, I describe a recent example of using RDA to analysing genome size and gene content variation among strains of a nitrogen-fixing soil bacterium Sinorhizobium meliloti (Guo et al. 2005). The sequenced Si. meliloti strain Rm1021 has a tripartite genome structure with one chromosome (3.65 Mb) and two megaplasmids pSymA (1.35 Mb) and pSymB (1.68 Mb). Using the RDA method (Fig. 3), a large number of novel DNA sequences not present in the sequenced laboratory model strain Rm1021 of Si. meliloti were identified. In this study, we used strain Rm1021 as the driver and the type strain of Si. meliloti ATCC9930, which has a genome size 370 kb bigger than strain Rm1021, as the tester. Among the 85 novel DNA fragments examined, 55 showed no obvious homologues anywhere in the public databases. Of the remaining 30 sequences, 24 contained homologs to the Rm1021 genome as well as unique segments not found in the Rm1021 genome; 3 contained sequences homologous to those published for another Si. meliloti strain but absent in Rm1021; 2 contained sequences homologous to other symbiotic nitrogen-fixing bacteria, Rhizobium etli and Bradyrhizobium japonicum and 1 contained a sequence with an 87% sequence identity to the 6-aminohexanoate-dimer hydrolase gene on the plasmid of Pseudomonas spp. NK87. Interestingly, this protein was found capable of degrading nylon oligomers (Yomo et al. 1992; Kanagawa et al. 1993). Nylon oligomers are among the compounds not present in natural environments until synthesized and released by humans very recently. The distribution of 12 of the above 85 novel sequences among a collection of 59 natural Si. meliloti strains were further analysed using PCR. The distribution varied widely among the 12 novel DNA fragments, from 1.7% to 72.9% (Guo et al. 2005). Our recent experiments show that micro-arrays fabricated based on the genome sequence of model strains can also be used very effectively to examine the distributions of genes among strains (Fig. 4; Guo & Xu, unpublished; Box 6). The exact ecological roles of some of these sequences are being examined.

RDA in analysing prokaryotic gene content differences among strains


Variations in genome sizes among strains within and between species are common in bacteria. For example, the genomes of natural isolates of the common bacterium Escherichia coli can vary by more than 1 Mb (Parkhill & Thomson 2004). Among the serotypes of another common bacterium, Salmonella enterica (var. enteriditis; var. paratyphi; var. typhi, and var. typhimurium), chromosome sizes can differ by 300 kb (Parkhill & Thomson 2004). Among the sequenced prokaryotic species, the genome sizes vary by over 18 folds, from the obligate archaeon parasite Nanoarchaeum equitans that has a genome size of about 490 kb (Waters et al. 2003) to the soil bacterium Streptomyces avermitilis that has a genome size of over 9000 kb (Omura et al. 2001). The genomic differences among sequenced bacterial species will be discussed in a later section. In this section, the focus is on analysing the naturally occurring differences among bacterial strains within species. Up till now, the focus has been on human pathogenic bacteria, including N. meningitidis (Bart et al. 2000), Neisseria gonnorhoea (Tinsley & Nassif 1996), Vibrio cholerae (Calia et al. 1998), Bordetella spp. (23), and E. coli (Allen et al. 2001). Using RDA and down-stream functional characterization, many strain 2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

Unexpected microbial diversity from environmental sources as revealed by genomics tools


In this section, I will focus on how modern genomics tools are helping us to reveal microbial diversity in natural microbial communities. Until very recently, microbial diversity in the environments is estimated using culturedependent approaches. However, for two reasons, the culture-dependent methods cannot accurately describe naturally occurring microbial communities. First, our current culturing methods target only those we know how

1722 J . X U
Fig. 3 Overview of the representational difference analysis of genomic differences between strains of Sinorhizobium meliloti (modified from Guo et al. in press). Tester (T): ATCC9930. Driver (D): Rm1021. Filled black boxes: DNA adaptors. Unfilled boxes: tester DNA. Shaded boxes: driver DNA.

Fig. 4 Application of micro-array in the analysis of genomic differences between strains of Sinorhizobium meliloti. In this figure, red represents hybridization signal from one strain; green represents hybridization signal from a different strain; and yellow represents that both strains have the probe sequence. In each of the four subarrays, there are three vertically divided repeats. As can be seen from the arrays, repeatability is high of using micro-array to screen for gene content differences among strains.

Box 6 Representation difference analysis (RDA) and micro-array technology are powerful methods for discovering whole genome differences among natural prokaryotic strains.

to culture. For most unknown micro-organisms, we simply dont know how to grow them. Second, even among culturable micro-organisms, the observed diversity on standard microbiological media may not be representative

of those in nature. This is because while thousands of media and growth conditions have been developed over the years to culture various micro-organisms, very few researchers have the facility or manpower to experiment all the conditions for natural microbial samples. The application of culture-independent genomics tools in the last two decades is allowing more accurate estimations. Below, I provide a summary to show how four specific methods (phylogenetic analysis of the ribosome RNA (rRNA) genes, DNADNA re-association kinetics, metagenomics, micro-arrays) have been used to reveal microbial diversity in natural environments.
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

M I C R O B I A L E C O L O G I C A L G E N O M I C S 1723

Phylogenetic analyses of environmental ribosomal RNA


The use of culture-independent methods to estimate microbial diversity in the environment started in the mid-1980s (Pace et al. 1985). The initial scheme involved isolating total DNA directly from the environment, cloning the DNA using vectors such as bacteriophage-lambda, and screening for clones that hybridized to the rRNA probes, and sequencing the positive clones. Many types of rRNA sequences not present among cultured microbes from the same samples were identified. The incorporation of gene-specific PCR before the cloning step in the late 1980s significantly streamlined the procedure and allowed more direct estimation. The very first application of PCR in phylogenetic analysis of mixed microbial communities in ocean waters led to the discovery of ubiquitous and abundant groups of new micro-organisms (Giovannoni et al. 1990). In addition, this study identified significant genetic microheterogeneity among closely related phylogenetic types. Since the beginning of the 1990s, there has been widespread application of PCR-based analyses of 16S rRNA to examine mixed microbial communities in diverse environments. Phylogenetic comparisons of rRNA genes from environmental sources have led to the discovery of many novel microbial taxonomic groups. Indeed, many new major groups of micro-organisms have been found only through cultureindependent surveys. The following sections highlight recent progresses for the major microbial groups (Box 7). Bacteria. In 1987, based on rRNA sequence data, Woese identified 12 major divisions (phyla) in the Domain Bacteria. The analysed bacteria represent almost all major cultured groups of Bacteria accumulated during the previous century of microbiological research. In just over a decade, culture-independent surveys identified that there are at least 40 well-resolved major bacterial divisions. That is, there are about 30 major bacterial divisions with no or very few cultured representatives in our collection (Hugenholtz et al. 1998; Konstantinidis & Tiedje 2004). These discoveries are now guiding a coordinated effort by the microbiology community to culture representatives from many of the

unknown major divisions of Bacteria in order to study their genetic, physiological and ecological properties. Archaea. The culture-independent methods have also revealed major new types of Archaea. At present, there are about 300 cultured and named archaeal species, primarily belonging to phylum Euryarchaeota, with a few examples from phylum Crenarchaeota, one from Nanoarchaeota and none from Korarchaeota. Schleper et al. (2005) compiled over 8000 deposited archaeal rRNA gene sequences from various natural environments. Phylogenetic analyses suggested that Domain Archaea contains at least 50 distinct phylogenetic groups with 33 from the current Euryarchaeota, 13 from Crenarchaeota, 1 each from Korarchaeota, Nanoarchaeota, and the ancient archaeal group (AAG). The divergence among these phylogenetic groups is similar to those among many bacterial phyla. Among these 50 phylogenetic groups, only 13 have cultured representatives. In addition, before the application of culture-independent methods, Archaea are thought to be only present in extreme habitats. Recent investigations have identified that Archaea are also widespread in diverse nonextreme habitats such as gardens and forests, water and sediments in marine and freshwater lakes, as well as extreme habitats such as hot springs, saline lakes and deep-ocean thermal vents (Black Smokers). For example, in the marine environment at depths 1005000 m, the average Archaea density is about 1 105/mL, accounting for about 20% of all microbial cells in the ocean (Karner et al. 2001). In 2002, a tiny archaeon appropriately called Nanoarchaeum was reported. This archaeon was found to live in an obligate association with another archaeon in the genus Igneococcus. Phylogenetic analysis indicated that Nanoarchaeum has diverged significantly from all known archaeal rRNA sequences (Huber et al. 2002). However, it should be pointed out that a recent phylogenetic analysis using ribosomal protein gene sequences from many archaea species suggested significant uncertainty in the placement of Nanoarchaeum in the tree of life (Brochier et al. 2005). Eukaryotic microbes from anoxic environments. The most deeply divergent of known eukaryotic lineages are found in anaerobic or micro-aerobic environments. Ecologically and evolutionary, this group of organisms are also the least known among eukaryotes. Anoxic environments have existed throughout the history of Earth. Therefore, such environments may harbour unknown diversity of eukaryotic microbes. Indeed, Dawson & Pace (2002) identified a very high eukaryotic diversity from both marine and freshwater sediments. Their analysis identified seven major phylogenetic lineages distinctly different from all known eukaryotic kingdoms such as fungi, plants and animals. Fungi. Approximately 80 000 fungal species have been identified and named, and these species are grouped into

Box 7 Genomic analysis of natural microbial communities are revealing extremely rich and highly variable DNA sequences from forest soils, pastures, aquatic environments in both pristine and contaminated environments. Bioinformatic analyses of such sequences suggest the existence of many uncultured taxonomic groups of viruses, bacteria, archaea, fungi and protozoa.

2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

1724 J . X U five main phyla: Chytridiomycota, Zygomycota, Glomeromycota, Basidiomycota, and Ascomycota (Moncalvo 2005). Several recent studies of environmental DNA identified major groups of unexpected fungal diversity in a variety of environments. For example, in the analysis of fungal DNA from the roots of the grass Arrhenatherum elatius, Vandenkoornhuyse et al. (2002) found 49 unique phylotypes from a random library of 200 18S rRNA clones. Surprisingly, only 7 of the 49 were found closely related to known sequences (> 99% identity). They found five distinct lineages significantly different from all known fungal sequences (in a pool of over 1200 at their time of analysis). In another study by Schadt et al. (2003), culture-independent methods were used to assess the seasonal dynamics of fungal diversity in tundra soil in Colorado. Results revealed three major groups of fungi significantly different from existing classes and phyla. Their results also demonstrated that fungi account for the majority of the biomass under snow in the analysed environment (Schadt et al. 2003). Results from these and other fungal community studies suggest that there are likely over 1.5 million species of fungi in Earths biosphere, a number about 20 times of the currently named fungal species. Viruses. Viruses are extremely abundant in natural environments. They contribute significantly to both prokaryote and eukaryote population dynamics. Current cultureindependent studies identified that both DNA-based and RNA-based viruses are common in terrestrial as well as freshwater and marine environments (Edwards & Rohwer 2005). For example, in an analysis of picorna-like viruses (a group of positive-sense single-stranded RNA viruses that are major pathogens to plants and animals), Culley et al. (2003) identified high, unexpected diversity in the sea. Indeed, all of the picorna-like sequences from marine samples were different from known picorna-like viruses in the databases. Of specific note is a virus isolated in this study that is a lytic pathogen to a toxic-bloom-forming alga Heterosigma akashiwo. This result suggests that picorna-like viruses may be important contributors in the regulation of marine phytoplankton population dynamics. DNA will take longer to re-anneal. The rate of re-association can be compared to known samples of complexity such as the Escherichia coli genome to derive the total genomic complexity of environmental DNA. They found that estimates of environmental genome complexity derived from DNA DNA re-association kinetics were about 100 times higher than those derived from laboratory culture estimates (Torsvik et al. 2002). This result is similar to the comparison between phylogenetic methods based on fluorescent in situ hybridization (FISH) using signature prokaryotic sequences and culture-dependent method (Torsvik et al. 2002). Their analyses identified that terrestrial environments generally contain higher genome complexity than aquatic sediments. Among the three terrestrial niches compared, while the number of prokaryotic cells per cubic centimetre of soil is similar among them (about 10 billion), the pristine pasture and forest soils contain over 10 times the genome complexity (equivalent to 35008800 E. coli genomes) as that of the agricultural field soils (equivalent to 140350 E. coli genomes) (Torsvik et al. 2002). Recently, improved analytical methods showed that in fact, more than 1 million distinct genomes might exist in the above-mentioned pristine soil, exceeding previous estimates by two orders of magnitude (Gans et al. 2005). Furthermore, it was estimated that metal pollution could reduce the genomic diversity of pristine environments by more than 99.9%, revealing the highly toxic effect of metal contamination, especially for rare microbial taxa (Gans et al. 2005).

Metagenomics
Metagenomics refers to the study of the collective genomes in an environmental community. Such a community may be a soil or a marine water sample that contains substantially more genetic information than is available in the cultured subset. Studies of metagenomes typically involve cloning fragments of DNA isolated directly from microbes in natural environments, followed by sequencing and functional analysis of the cloned fragments. While most of the techniques for metagenomics have existed for quite some time and are used routinely in molecular biology research, their application in analysing unknown environmental DNA samples have opened a floodgate of exciting research findings. The phylogenetic analysis of environmental microbial diversity was an early form of metagenomics. Over the years, several significant trends for metagenomic studies have emerged. First, the cloned DNA fragments have been getting larger and larger in attempts to clone long stretches of DNA from the same genome to allow the study of the structure and function of potentially whole unknown/ uncultured genomes in the environments. Such an objective has propelled the development of new DNA isolation methods as well as improved cloning systems. At present,
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

DNADNA re-association kinetics


DNADNA re-association kinetics has long been used to determine the overall genomic relationships between organisms. The current operational definition of bacteria species concept using 70% hybridization is rooted in this kinetics. During DNADNA re-association, complementary single-stranded DNA re-anneal to each other to form double strands and the rate of re-annealing is positively correlated to the degree of similarity. Torsvik et al. (1998) extended this principle to analyse the complexity of environmental DNA samples. The basic idea is that more complex environmental

M I C R O B I A L E C O L O G I C A L G E N O M I C S 1725 of oxygenic phototrophy in the ocean has been confirmed by metagenomic studies, and another phototrophy, the anoxigenic phototrophy, that was previously regarded as playing only a minor role in ocean water productivity has also been found to be very common in ocean surface waters (Beja et al. 2002). One of the most extensive microbial metagenomic studies in the ocean was the shotgun sequencing of microorganisms of size ranges from 0.1 to 3.0 m in the Sargasso Sea in the Atlantic Ocean near Bermuda (Venter et al. 2004). Their study generated almost 2 million sequence reads, yielding over 1.6 billion base pairs of raw DNA sequence. Based on sequence relatedness and unique rRNA gene counts, the analysis suggested that these DNA fragments were derived from at least 1800 genomic species including 148 previously unknown bacterial phylogenetic types. Their analysis also identified spatial variation in species richness and relative abundance among the four sampled sites (Venter et al. 2004). Computational analysis of the data identified over 1.2 million potential unique protein coding genes. This number is astonishing considering that at that time, only about 140 000 protein data entries were available in the curated SwissProt protein database. Among the 1.2 million potential protein-coding genes, at least 782 new rhodopsin-like photoreceptors were identified, confirming the importance of this type of phototrophy in the open sea. Of the specific group of micro-organisms identified, one stood out. This organism, most likely a member of the genus Burkholderia, had 21-fold coverage and comprised 38.5% of the sequence data from one of the four samples. Burkholderia is typically found in terrestrial environmental samples and the identification of a species in this genus in the sea at such a high frequency led the authors to suggest that terrestrial environments or coastal animals might play an important role in marine microbial community structure. However, based on several lines of evidence, DeLong 2005) recently suggested that the high abundance of Burkholderia like sequences in one sample might be due to contamination of the original water sample in the Venter et al. (2004) study. Such a revelation suggests that extreme caution should be taken when conducting microbial metagenomic analysis. Nevertheless, the reconstruction of complete genomes based on shotgun sequencing of environmental microbial community DNA indicated the powerfulness of this approach in future microbial ecology research. Metagenomic analysis of soil microbial communities. Though microheterogeneity in aquatic environments has been found, its complexity pales that of soil environments. Typical soil comprises mineral particles of different sizes, shapes, and attached organic compounds such as humus. The structural and chemical compositions of soil determines their physicalchemical properties such as waterholding capacity, surface-to-volume ratio (hence oxygen

Box 8 Metagenomic studies have identified many novel microbial genes coding for metabolic pathways such as energy acquisition, carbon and nitrogen metabolisms in natural environments that were previously considered to lack such metabolisms.

the bacterial artificial chromosome vector system is the most commonly used for metagenomic studies. Second, the study sites have expanded tremendously. At present, metagenomic libraries and DNA sequence information exist for microbial communities from many of the worlds ecological niches. Third, the number of sequences generated in individual studies has been increasing. For example, a recent study obtained over 1.6 billion base pairs of DNA sequences and about 1.045 billion were nonredundant from a marine environment (Venter et al. 2004) (Box 8). Below I will briefly review and discuss recent metagenomic studies of microbial communities from the ocean, soil, and an acid mine drainage. Metagenomic analysis of marine microbial communities. Marine microbial communities are among the first to be investigated using culture-independent genomics approaches (Giovannoni et al. 1990). Marine microbial communities are complex and contain heterogeneous micro-organisms including viruses, bacteria, archaea, and eukaryotic microorganisms. Because of the size differences among these groups of organisms, typical studies use filters to first select the target size category of microbes. Phylogenetic analyses have identified numerous novel DNA sequences and phylogenetic groups in all groups of organisms surveyed. In combination with other genomics tools, these studies have led to other important discoveries. Two specific studies are highlighted below. In a classical metagenomic study of genome fragments from a BAC library of marine picoplankton, Beja et al. (2000) identified a new class of genes of the rhodopsin family, named proteorhodopsin, from an uncultivated alphaproteobacterium SAR86. At that time, this rhodopsin family was known to exist only in extremely halophilic (salt-loving) archaea and had never before been observed in cultured bacteria. Unlike the archaea rhodopsin that does not express properly in model laboratory strains, the proteorhodopsin gene from SAR86 expressed readily in the laboratory model bacterium E. coli and it functioned as a light-driven proton pump. Later studies identified that this new type of light-driven energy generation process is in fact widespread in the ocean and that there are optimized absorption spectra of bacterial rhodopsins at different depths of ocean water (Beja et al. 2001). In addition to this form of light energy harvesting, the widespread importance
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

1726 J . X U availability) within the soil, pH and the availability of various nutrients. In addition, unlike aquatic habitats, soil surfaces may undergo dramatic daily or seasonal cyclic changes in its physicalchemical properties. Such spatial and temporal environmental microheterogeneity poses significant challenges for microbial ecologists. However, recent investigations especially those based on cultureindependent approaches are revealing the amazing diversities of micro-organisms in the soil. Many studies of soil microbial diversity have been carried out. Based on a variety of culture-independent methods, current estimates indicate that a single gram of soil may contain over 10 billion microbial cells representing several thousand to over a million distinct genomic species (e.g. Torsvik et al. 2002; Gans et al. 2005). This number is remarkable given that the total number of known prokaryotes listed in the website of the National Center for Biotechnology Information is about 17 000 (including uncultured prokaryotes). Comparisons of culture-dependent and independent methods revealed that in most soil environmental samples, only 0.11% of microbial species are cultured by standard microbiological methods. Therefore, a tremendous amount of microbial genetic, physiological and metabolic diversities in the soil remain to be discovered and explored. Significant efforts are underway to clone and analyse the soil metagenome diversity. Daniel (2005) summarized the studies of soil metagenomic libraries constructed to date. These libraries include soil samples from a variety of ecological niches, including meadows, crop fields, and forests. Functional analyses of the soil metagenome are typically conducted by one of two approaches. The first is based on nucleotide sequences using either PCR or target-specific probes to screen the soil metagenome library. This approach has been used successfully to clone genes with highly conserved domains, e.g. the gluconic acid reductase, an essential enzyme during glucose metabolism (Eschenfeldt et al. 2001). The second approach is based on functional screening for metabolic activity of metagenomic clones. Several novel genes coding for proteases, lipases, amylases, agarases, alcohol oxidoreductases, antibiotics, and antibiotic resistance have been found through this screening (Voget et al. 2003). Some of these products hold great commercial potential and are actively pursued by biotechnology companies. Metagenomic analysis of a microbial community from an acid mine drainage. Acid mine drainages are seminatural environments rich in extremophiles. These drainages are created as a result of mining and the exposure of predominantly ferrous iron in pyrite (FeS2) to the oxygen-rich atmosphere. Iron is one of the most abundant elements in Earths crust and exists naturally in two oxidative states, ferrous (Fe2+) and ferric (Fe3+). In nature, these two forms cycle as a result of reduction and oxidation by microorganisms and by abiotic geochemical processes. The reduction of Fe3+ to Fe2+ occurs in anoxic environment (e.g. bogs and waterlogged soil) by bacteria such as Shewanella putrefaciens, with organic compounds in these environments acting as the electron donor. In contrast, the oxidation occurs in oxygenic environment with O2 as the electron acceptor. Though the released energy is small during oxidation, several groups of chemolithotrophic organisms (e.g. Acidithiobacillus ferrooxidans and Leptospirillum ferrooxidans) can actively participate in the reaction and thrive in such environments by oxidizing a large amount of ferrous iron. Because pyrite (FeS2) is one of the most common forms of iron in nature, the oxidation of pyrite will release large amounts of sulphate ( SO2 ) and sulfuric 4 acid, allowing the development of acid conditions in the surrounding environment with pH values as low as 0. Mixing of acidic mine water with natural waters in rivers and lakes causes major environmental problems. The metagenomic analyses of a single biofilm sample from an acid mine drainage from the Richmond Mine at Iron Mountain, California, have provided important insights into the microbial community structure (Tyson et al. 2004). From the 78 Mb sequences obtained from this sample, the genomes of the dominant species were constructed. These included the dominant bacterium Leptospirillum group II (10X coverage) and the dominant Archaeon, Ferroplasma acidarmanus (also 10X coverage). Ferroplasma is a group of cell wall-less prokaryotes. These two species were also found to be dominant in this community by other analytical methods. In addition to the above two genomes, other reconstructed partial genomes were also identified, including that of a group III Leptospirillum (3X coverage), and an unknown species in the genus Sulfobacillus (0.5X coverage) that is closely related to the cultured Sulfobacillus thermosulfidooxidans. Bioinformatics analyses of the metagome sequence data identified several interesting results. First, the Leptospirillum group III strain was found to contain genes homologous to those for biological nitrogen fixation. This knowledge subsequently led to the design of a selective isolation strategy that allowed the isolation of this organism (Allen & Banfield 2005). Second, genes involved in essential pathways (such as nitrogen and carbon dioxide fixation and iron metabolism) in the above chemolithoautotrophs were revealed. Third, the genomic sequence data identified genetic polymorphisms for many genes and suggested evidence for genetic recombination in the Ferroplasma acidarmanus population of this community. The metagenome sequence information established a solid foundation for fine-scale comparisons of microbial communities. In addition, a recent proteomic analysis of this community identified an abundant novel protein, a cytochrome, as an essential component to iron oxidation and acid mine drainage formation
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

M I C R O B I A L E C O L O G I C A L G E N O M I C S 1727 (Ram et al. 2005). These results have the potential to guide the remediation of sites contaminated by acid mine drainages. analyses have been recently extended to other groups of organisms such as the Rhodocyclales in beta-proteobacteria (Loy et al. 2005) and Enterococcus species (Lehner et al. 2005). Despite these successes, significant challenges remain with regard to specificity, sensitivity, and quantification of microbes in natural habitats. This is mainly because microbial communities contain highly heterogeneous groups of organisms with undefined/unknown genomic relationships. The highly skewed distribution of microbial species, the potential of cross-hybridization between closely related species, the genetic variation among strains within species, and the differential efficiencies of isolating DNA from among the species can all bias our results and influence the interpretations of the data. Further evaluations are needed to understand the specific experimental conditions appropriate for the analyses of various environmental samples using the different types of micro-arrays.

Micro-arrays
Micro-array technology is a powerful, high throughput experimental system that allows the simultaneous analysis of thousands to hundreds of thousands of genes at the same time. Originally developed for monitoring wholegenome gene expressions, micro-arrays have been used for other purposes such as the genome-wide mutational screening for single nucleotide polymorphisms and the distributions of species and strains in natural microbial communities. Recently, several types of micro-arrays have been developed and evaluated for bacterial detection and microbial community analysis. These arrays include (i) phylogenetic oligonucleotide arrays that contain signature sequences from rRNA of specific groups of organisms; (ii) community genome arrays that contain highly specific signature gene sequences from known cultured microbial species; and (iii) functional gene arrays that contain conserved domains of genes involved in specific metabolic pathways such as the biogeochemical cycling of carbon, nitrogen, sulphate, phosphate and metals (Zhou 2003). The number of genes and the sizes of arrayed DNA fragments in the functional gene arrays can vary according to analytical purposes. Preliminary evaluations suggested micro-arrays have a great potential for the detection, identification and characterization of micro-organisms in natural habitats (Wu et al. 2004). For example, Loy et al. (2002) constructed a microarray with 132 16S rRNA-targeted oligonucleotide probes (18 nucleotides long) representing all recognized groups of sulphate-reducing prokaryotes and showed that this micro-array could be used to distinguish most of the reference strains. Using this array, they determined the diversity of sulphate-reducing prokaryotes in periodontal tooth pockets and a hypersaline cyanobacterial mat. Results from the micro-array study were similar to those from cloning and sequencing of environmental 16S rRNA. These

Inferences of microbial diversity and activity from completed microbial genome sequences
Micro-organisms are the first and most abundant species to be completely sequenced. While most of the original objectives for microbial genome sequencing were guided by their practical applications such as understanding disease progression mechanisms of human pathogens and the potential generation of useful products and services from these microbes, the microbial genome sequencing efforts have helped reveal much about their ecological roles in their natural environments as well as the potential genomic diversities within and between species. Currently, the sequenced microbial genomes are highly biased towards pathogens of plants, animals and humans. There are many detailed comparisons and reviews on these microbial genomes (e.g. Fraser et al. 2004). In the following paragraphs, I briefly summarize several important features with regard to the relationship among microbial genome size, gene content and their ecology (Box 9). First, microbial genome sequence comparisons have revealed that prokaryotic genomes are highly variable in

Box 9 The published 250 prokaryotic genomes as of September 2005 suggest several general features of these genomes relevant to microbial ecology: 1. 2. 3. Prokaryotic genomes are highly variable in genome size and gene content among strains from both within and between species. Microbial species with narrow ecological niches generally have smaller genomes than those with broader ecological niches. A large fraction (20 40%) of identified open reading frames in sequenced microbial genomes code for proteins with unknown functions. Most of these genes are likely regulated by ecological-niche specific factors.

2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

1728 J . X U both genome size and gene content (Table 1). Among the completely sequenced and annotated 250 unique prokaryotic genomes (four strains were sequenced twice for a total of 254 completed genomes as of August 2005), the genome sizes vary by over 18 folds, from the smallest archaeon Nanoarchaeum equitans (0.49 Mb, Waters et al. 2003) to the largest Streptomyces avermitilis (9.12 Mb, Omura et al. 2001). The genome sizes vary not only among species but also among strains within individual species. An example is the common Escherichia coli where whole genome sequences of four strains are now available: the model laboratory strain K12, the enterohemorrhagic O157:H7 RIMD and O157:H7 EDL933, and the uropathogenic CFT073 (Parkhill & Thomson 2004). While all three pathogenic strains have genomes essentially colinear with each other and with the nonpathogenic K12, both the genome size and gene content vary considerably among the four strains. For example, the two pathogenic O157:H7 strains have genomes over 5.5 Mb, almost 1 Mb bigger than that of strain K12 (4.6 Mb) and about 300 kb bigger than that of strain CFT073 (5.2 Mb). Furthermore, about 25% of the genes in the pathogenic O157:H7 strains were not found in strain K12. When all four strains are considered, only about 3000 of the total genes were shared from the total of 4288, 5349, 5361 and 5379 predicted protein-coding genes, respectively, for strains K12, O157:H7 RIMD, O157:H7 EDL933 and CFT073, respectively. Most of these extra genes have unusual sequence characteristics and were likely obtained through horizontal gene transfer events from external sources and by the action of mobile genetic elements. Some of these genes play important roles in their ecological adaptation, including adhesion to specific host cell types. Comparisons between strains in other human pathogenic bacteria (e.g. Streptococcus pneumoniae and Burkholderia cepacia) as well as the nonpathogenic plant symbiont Si. meliloti revealed similarly highly variable genome size and gene contents (Fraser et al. 2004; Guo et al. 2005; Sun S., unpublished). At present, population-level studies of genome size and gene content variations are still very limited to human pathogens. Second, species with narrow ecological niches (e.g. obligate human pathogens) on average have smaller genomes than those capable of living in diverse ecological conditions (Table 1). For example, the obligate intracellular pathogen Mycoplasma genitalium has a genome size of 580 kb (encoding 484 genes) and that of the amphids Buchnera aphidicola has a genome size of 650 kb (504 genes). These genomes lack many of the genes essential for metabolic functions in many free-living organisms. The deletion and degeneration of such genes were likely due to their nonessential functions in obligate parasites because the hosts can provide such resources to the cells. Indeed, in several obligate intracellular parasites such as Rickettsia prowazekii and Rickettsia conorii, there is evidence that their genomes are in the processes of deteriorating and shrinking (Andersson 2004). Though the 250 sequenced prokaryotic genomes may not be representative of the community genomes in various natural environments, there seemed a correlation between genome size and habitat. Among the six groups of prokaryotes classified based on habitats, those from terrestrial environments have, on average, the largest genomes, followed by prokaryotes that live in multiple habitats, in aquatic environments, and in specialized environments (Table 1). Some of the largest bacterial genomes are found in those with complex lifestyles such as the social bacteria Myxococcus xanthus (> 10 Mb), the facultative nitrogenfixing plant symbiont Bradyrhizobium japonicum (> 9 Mb), and the antibiotic-producing, free-living soil bacteria Streptomyces (> 9 Mb). Third, in almost all microbial genomes sequenced, a significant percentage (2040%) of the putative open-reading frames show no obvious homology to any known proteins or to any sequences in the database, including those from other micro-organisms and macro-organisms. While one reason for this high percentage of unknown open-reading frames is due to our limited knowledge about the microbial world (e.g. limited genomes that have been sequenced and limited knowledge about the functional properties of these sequenced genomes even in standard laboratory conditions), the ubiquitous distribution of such unknown sequences suggest their potential importance in natural environments. Indeed, a transcriptome analysis of the radiation-resistant Deinococcus radiodurans revealed that about 48% of the poorly characterized or uncharacterized genes were highly expressed in at least one experimental condition (Liu et al. 2003). Systematic investigations into the potential roles of this group of genes are now underway in the nitrogen-fixing bacterium Si. meliloti using a high throughput gene knockout, systematic screening of hundreds of growth conditions for these mutants, and the genome-wide transcriptome and metabolome analyses (Finan T. et al., personal communication).

Conclusions and perspectives


With the development and application of genomics tools, microbial ecology is undergoing a renaissance. Genomics tools have allowed us unprecedented access to natural microbial diversity and their potential activities. However, genomics tools have also exposed how little we know about the vast diversity of micro-organisms colonizing and transforming our planet Earth. Indeed, many fundamental questions remain to be addressed. For example, how many microbial species are there on Earth? How many unknown metabolic pathways are there in the microbial world? What is the relationship between microbial diversity and microbial activity in natural environments? Do laboratory analyses of microbial activity reflect those in natural
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

M I C R O B I A L E C O L O G I C A L G E N O M I C S 1729 environments? And, how best to use microbial ecological data gained through genomic analysis in practical applications such as mining, environmental remediation, the control of infectious diseases, the modulation of the global climate, and the production of biotechnology goods and services? To address these questions, an interdisciplinary systems approach is needed. This approach requires the integration of the analyses at various levels of ecological organization, from subcellular and cellular levels to those of individuals, populations, communities and ecosystems. The approach also requires the development and complementary analysis of biological variations at the genome, transcriptome, proteome and metabolome levels. Indeed, the American Society of Microbiology has issued a call to create systems microbiology and systems microbial ecology to coordinate such efforts and to set it a priority area for future development (Buckley 2005). There is no doubt that such coordinated efforts will reveal many exciting new discoveries.
Stetter KO (1997) Pyrolobus fumarii, gen. and sp. nov., represents a novel group of archaea, extending the upper temperature limit for life to 113 degrees C. Extremophiles, 1, 1421. Bogliolo AR, Lauria-Pires L, Gibson WC (1996) Polymorphism in Trypanosoma cruzi: evidence of genetic recombination. Acta Tropica, 61, 3140. Brochier C, Gribaldo S, Zivanovic Y, Confalonieri F, Forterre P (2005) Nanoarchaea: representatives of a novel archaeal phylum or a fast-evolving euryarchaeal lineage related to Thermococcales? Genome Biology, 6, R42. Buckley MR (2005) Systems Microbiology: Beyond Microbial Genomics. ASM Report. ASM Press, Washington, D.C. Calia KE, Waldor MK, Calderwood SB (1998) Use of representational difference analysis to identify genomic differences between pathogenic strains of Vibrio cholerae. Infection and Immunity, 66, 849852. Capua I, Alexander DJ (2002) Avian influenza and human health. Acta Tropica, 83, 16. Cohan FM (2004) Concepts of bacterial biodiversity for the age of genomics. In: Microbial Genomics (eds Fraser CM, Read TD, Nelson KE), pp. 175194. Humana Press, Totowa, New Jersey. Conway DJ, Roper C, Oduola AM et al. (1999) High recombination rate in natural populations of Plasmodium falciparum. Proceedings of the National Academy of Sciences, USA, 96, 45064511. Culley AI, Lang AS, Suttle CA (2003) High diversity of unknown picorna-like viruses in the sea. Nature, 424, 10541057. Daniel R (2005) The metagenomics of soil. Nature Reviews. Microbiology, 3, 470478. Dawson SC, Pace NR (2002) Novel kingdom-level eukaryotic diversity in anoxic environments. Proceedings of the National Academy of Sciences, USA, 99, 83248329. DeLong EF (2005) Microbial community genomics in the ocean. Nature Reviews. Microbiology, 3, 459469. Edwards RA, Rohwer F (2005) Viral metagenomics. Nature Reviews. Microbiology, 3, 504510. Eschenfeldt WH, Stols L, Rosenbaum H et al. (2001) DNA from uncultured organisms as a source of 2, 5-diketo-D-gluconic acid reductases. Applied and Environmental Microbiology, 67, 4206 4214. Feil EJ, Enright MC (2004) Analyses of clonality and the evolution of bacterial pathogens. Current Opinions in Microbiology, 7, 308 313. Feil EJ, Spratt BG (2001) Recombination and the population structure of bacterial pathogens. Annual Review of Microbiology, 55, 561590. Fraser CM, Read TD, Nelson KE (2004) Microbial Genomes. Humana Press, Totowa, New Jersey. Gans J, Wolinsky M, Dunbar J (2005) Computational improvements reveal great bacterial diversity and high metal toxicity in soil. Science, 309, 13871390. Giovannoni SJ, Britschgi TB, Moyer CL, Field KG (1990) Genetic diversity in Sargasso Sea bacterioplankton. Nature, 345, 60 63. Guo H, Sun S, Finan TM, Xu J (2005) Novel DNA sequences from natural strains of the nitrogen-fixing symbiotic bacterium Sinorhizobium meliloti. Applied and Environmental Microbiology, 71, 71307138. Hendrix RW (2003) Bacteriophage genomics. Current Opinions in Microbiology, 6, 506511. Huber H, Hohn MJ, Rachel R, Fuchs T, Wimmer VC, Stetter KO (2002) A new phylum of Archaea represented by a nanosized hyperthermophilic symbiont. Nature, 417, 6367.

Acknowledgements
I thank Dr Hong Guo for preparing Figs 3 and 4 and Dr Turlough M. Finan for comments on the manuscript. During the preparation of this review, research in my lab is supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada, the Ontario Premiers Research Excellence Award, and Genome Canada.

References
Allen EE, Banfield JF (2005) Community genomics in microbial ecology and evolution. Nature Reviews. Microbiology, 3, 489498. Allen NL, Hilton AC, Betts R, Penn CW (2001) Use of representational difference analysis to identify Escherichia coli O157specific DNA sequences. FEMS Microbiology Letters, 197, 195 201. Andersson SGE (2004) Obligate intracellular pathogens. In: Microbial Genomes (eds Fraser CM, Read TD, Nelson KE), pp. 291308. Humana Press, Totowa, New Jersey. Bart A, Dankertvan J, der Ende A (2000) Representational difference analysis of Neisseria meningitidis identifies sequences that are specific for the hyper-virulent lineage III clone. FEMS Microbiology Letters, 188, 111114. Beja O, Aravind L, Koonin EV, Suzuki MT et al. (2000) Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science, 289, 19021906. Beja O, Spudich EN, Spudich JL, Leclerc M, DeLong EF (2001) Proteorhodopsin phototrophy in the ocean. Nature, 411, 786 789. Beja O, Suzuki MT, Heidelberg JF et al. (2002) Unsuspected diversity among marine aerobic anoxygenic phototrophs. Nature, 415, 630633. Bergthorsson U, Ochman H (1995) Heterogeneity of genome sizes among natural isolates of Escherichia coli. Journal of Bacteriology, 177, 57845789. Bergthorsson U, Ochman H (1998) Distribution of chromosome length variation in natural isolates of Escherichia coli. Molecular Biology and Evolution, 15, 6 16. Blochl E, Rachel R, Burggraf S, Hafenbradl D, Jannasch HW, 2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

1730 J . X U
Hugenholtz P, Goebel BM, Pace NR (1998) Impact of cultureindependent studies on the emerging phylogenetic view of bacterial diversity. Journal of Bacteriology, 180, 47654774. James TY (2005) The population genetics of phycomycetes. In: Evolutionary Genetics of Fungi (eds Xu J), pp. 117148 Horizon Biosciences, Norfolk, UK. Kanagawa K, Oishi M, Negoro S, Urable I, Okada H (1993) Characterization of the 6-aminohexanoate-dimer hydrolase from Pseudomonas sp. Nk87. Journal of General Microbiology, 139, 787795. Karner MB, DeLong EF, Karl DM (2001) Archaeal dominance in the mesopelagic zone of the Pacific Ocean. Nature, 409, 507510. Keese P, Gibbs A (1993) Plant viruses: master explorers of evolutionary space. Current Opinion in Genetics and Development, 3, 873877. Kidd SE, Guo H, Bartlett K, Xu J, Kronstad JW (2005) Comparative gene genealogies indicate that two clonal lineages of Cryptococcus gattii in British Columbia resemble strains from other geographical areas. Eukaryotic Cell, 4, 16291638. Konstantinidis K, Tiedje JM (2004) Microbial diversity and genomics. In: Microbial Functional Genomics (eds Zhou J, Thompson DK, Xu Y, Tiedje JM), pp. 2140. John Wiley & Sons, New Jersey. Konstantinidis KT, Tiedje JM (2005) Genomic insights that advance the species definition for prokaryotes. Proceedings of the National Academy of Sciences, USA, 102, 25672572. Koonin EV (2003) Horizontal gene transfer: the path to maturity. Molecular Microbiology, 50, 725727. Lehner A, Loy A, Behr T et al. (2005) Oligonucleotide microarray for identification of Enterococcus species. FEMS Microbiology Letters, 246, 133142. Lenski RE (1993) Assessing the genetic structure of microbial populations. Proceedings of the National Academy of Sciences, USA, 90, 43344336. Lisitsyn N, Lisitsyn N, Wigler M (1993) Cloning the differences between two complex genomes. Science, 259, 946951. Liu Y, Zhou J, Omelchenko MV, Beliaev AS, Venkateswaran A, Stair J, Wu L, Thompson DK, Xu D, Rogozin IB, Gaidamakova EK, Zhai M, Makarova KS, Koonin EV, Daly MJ (2003) Transcriptome dynamics of Deinococcus radiodurans recovering from ionizing radiation. Proceedings of the National Academy of Sciences, USA, 100, 41914196. Loy A, Lehner A, Lee N et al. (2002) Oligonucleotide microarray for 16S rRNA gene-based detection of all recognized lineages of sulfate-reducing prokaryotes in the environment. Applied Environmental Microbiology, 68, 5064 5081. Loy A, Schulz C, Lucker S et al. (2005) 16S rRNA gene-based oligonucleotide microarray for environmental monitoring of the betaproteobacterial order Rhodocyclales. Applied and Environmental Microbiology, 71, 1373 1386. Maiden MC, Bygraves JA, Feil E et al. (1998) Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proceedings of the National Academy of Sciences, USA, 95, 31403145. Margulies M, Egholm M, Altman WE et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437, 376380. Mayden RL (1997) A hierarchy of species concepts: the denouement in the saga of the species problem. In: Species: The Unit of Biodiversity (eds Claridge MF, Dawah HA, Wilson MR), pp. 381 424. Chapman & Hall, London. Maynard Smith J, Smith NH, ORourke M, Spratt BG (1993) How clonal are bacteria? Proceedings of the National Academy of Sciences, USA, 90, 43844388. Miyakawa Y, Mizokami M (2003) Classifying hepatitis B virus genotypes. Intervirology, 46, 329338. Moncalvo JM (2005) Molecular systematics: major fungal phylogenetic groups and fungal species concepts. In: Evolutionary Genetics of Fungi (ed. Xu J), pp. 134. Horizon BioScience, Norfolk, UK. Nunez ME, Martin MO, Duong LK, Ly E, Spain EM (2003) Investigations into the life cycle of the bacterial predator Bdellovibrio bacteriovorus 109J at an interface by atomic force microscopy. Biophysials Journal, 84, 33793388. Omura S, Ikeda H, Ishikawa J et al. (2001) Genome sequence of an industrial microorganism Streptomyces avermitilis: deducing the ability of producing secondary metabolites. Proceedings of the National Academy of Sciences, USA, 98, 1221512220. Pace NR, Stahl DA, Olsen GJ, Lane DJ (1985) Analyzing natural microbial populations by rRNA sequences. ASM News, 51, 412. Parkhill J, Thomson NR (2004) The Genomes of Pathogenic Enterobacteria. In: Microbial Genomes (eds Fraser CM, Read TD, Nelson KE), pp. 269290. Humana Press, Totowa, New Jersey. Pujol C, Dodgson A, Soll DR (2005) Population genetics of ascomycetes pathogenic to humans and animals. In: Evolutionary Genetics of Fungi (eds Xu J), pp. 149188. Horizon Biosciences, Norfolk, UK. Putaporntip C, Jongwutiwes S, Sakihama N et al. (2002) Mosaic organization and heterogeneity in frequency of allelic recombination of the Plasmodium vivax merozoite surface protein-1 locus. Proceedings of the National Academy of Science, USA, 99, 16348 16353. Ram RJ, Verberkmoes NC, Thelen MP et al. (2005) Community proteomics of a natural microbial biofilm. Science, 308, 1915 1920. Rothschild LJ, Mancinelli RL (2001) Life in extreme environments. Nature, 409, 10921101. Schadt CW, Martin AP, Lipson DA, Schmidt SK (2003) Seasonal dynamics of previously unknown fungal lineages in tundra soils. Science, 301, 13591361. Schleper C, Jurgens G, Jonuscheit M (2005) Genomic studies of uncultured archaea. Nature Reviews. Microbiology, 3, 479488. Sibley CG, Comstock JA, Ahlquist JE (1990) DNA hybridization evidence of hominoid phylogeny: a reanalysis of the data. Journal of Molecular Evology, 30, 202236. Tinsley CR, Nassif X (1996) Analysis of the genetic differences between Neisseria meningitidis and Neisseria gonorrhoeae: two closely related bacteria expressing two different pathogenicities. Proceedings of the National Academy of Sciences, USA, 93, 11109 11114. Tolou HJ, Couissinier-Paris P, Durand JP et al. (2001) Evidence for recombination in natural populations of dengue virus type 1 based on the analysis of complete genome sequences. Journal of General Virology, 82, 12831290. Torsvik V, Daae FL, Sandaa RA, Ovreas L (1998) Novel techniques for analysing microbial diversity in natural and perturbed environments. Journal of Biotechnology, 64, 5362. Torsvik V, Ovreas L, Thingstad TF (2002) Prokaryotic diversity magnitude, dynamics, and controlling factors. Science, 296, 10641066. Tyson GW, Chapman J, Hugenholtz P et al. (2004) Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature, 428, 3743. Urwin R, Maiden MCJ (2003) Multi-locus sequence typing: a tool for global epidemiology. Trends in Microbiology, 11, 479 487. 2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

M I C R O B I A L E C O L O G I C A L G E N O M I C S 1731
Vandenkoornhuyse P, Baldauf SL, Leyval C, Straczek J, Young JP (2002) Extensive fungal diversity in plant roots. Science, 295, 2051. Venter JC, Remington K, Heidelberg JF et al. (2004) Environmental genome shotgun sequencing of the Sargasso Sea. Science, 304, 6674. Voget S, Leggewie C, Uesbeck A, Raasch C, Jaeger KE, Streit WR (2003) Prospecting for novel biocatalysts in a soil metagenome. Applied and Environmental Microbiology, 69, 6235 6242. Waters E, Hohn MJ, Ahel I et al. (2003) The genome of Nanoarchaeum equitans: insights into early archaeal evolution and derived parasitism. Proceedings of the National Academy of Sciences, USA, 100, 1298412988. West JA, Zuccarello GC (1999) Biogeography of sexual and asexual populations in Bostrychia moritziana (Rhodomelaceae, Rhodophyta). Phycological Research, 47, 115 123. Woese CR (1987) Bacterial evolution. Microbiological Reviews, 51, 221271. Wu L, Thompson DK, Liu X et al. (2004) Development and evaluation of microarray-based whole-genome hybridization for detection of microorganisms within the context of environmental applications. Environmental Science and Technology, 38, 67756782. Xu J (2002) Mitochondrial DNA polymorphisms in the human pathogenic fungus Cryptococcus neoformans. Current Genetics, 41, 4347. Xu J (2004) The prevalence and evolution of sex in microorganisms. Genome, 47, 775780. Xu J (2005) Evolutionary Genetics of Fungi. Horizon Biosciences, Norfolk, UK. Xu J, Cheng M, Tan Q, Pan Y (2005) Molecular population genetics of basidiomycete fungi. In: Evolutionary Genetics of Fungi (eds Xu J), pp. 221252. Horizon Biosciences, Norfolk, UK. Xu J, Mitchell TG (2003) Comparative gene genealogical analyses of strains of serotype AD identify recombination in populations of serotypes A and D in the human pathogenic yeast Cryptococcus neoformans. Microbiology, 149, 21472154. Xu J, Vilgalys R, Mitchell TG (2000) Multiple gene genealogies reveal recent dispersion and hybridization in the human pathogenic fungus Cryptococcus neoformans. Molecular Ecology, 9, 14711481. Xu J, Luo G, Vilgalys RJ, Brandt ME, Mitchell TG (2002) Multiple origins of hybrid strains of Cryptococcus neoformans with serotype AD. Microbiology, 148, 203212. Yamaguchi J, Bodelle P, Kaptue L et al. (2003) Near full-length genomes of 15 HIV type 1 group O isolates. AIDS Research and Human Retroviruses, 19, 979988. Yomo T, Urable I, Okada H (1992) No stop codons in the antisense strands of the genes for nylon oligomer degradation. Proceedings of the National Academy of Sciences, USA, 89, 37803784. Zhou J (2003) Microarrays for bacterial detection and microbial community analysis. Current Opinions in Microbiology, 6, 288 294.

J-P Xus general research interests are in microbial ecology and evolutionary genetics. The current research in his laboratory attempts to understand the origins and maintenance of genetic variation in microorganisms. His research group examines both natural microbial populations from the environment and clinics and experimental populations evolved in the laboratory. Specifically, by using microbiological, molecular and computational tools, their research seeks to determine the rate and effect of spontaneous mutations on microbial life history traits; the rate and route of spread of microbes in natural environments and human populations; the origins of novel strains and species, and the origin and evolution of sex.

2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

You might also like