You are on page 1of 8

Plant Nuclear Genome Composition

JS (Pat) Heslop-Harrison, University of Leicester, Leicester, UK Thomas Schmidt, Technische Universita t Dresden, Dresden, Germany
The plant nuclear genome consists of DNA divided among the chromosomes within the cell nucleus. Plant genomes contain coding and regulatory sequences for the genes and repetitive DNA. Genomes are evolutionarily dynamic and analysis provides insights into the evolution of genes, sequence families and genomes, and supports studies of species phylogeny and relationships.

Introductory article
Article Contents
. Deoxyribonucleic Acid and Plant Genomes . Nuclear Genomes and their Size . Chemical and Physical Composition of Plant DNA . The Packaging of the Genome: Chromatin and Chromosomes . The Genomic DNA Sequence . Plant Nuclear and Organellar Genomes . Gene Structure . Plant Nuclear Genome Composition and Its Consequences

doi: 10.1002/9780470015902.a0002014

Deoxyribonucleic Acid and Plant Genomes


The plant nuclear genome consists of deoxyribonucleic acid (DNA) and is contained within the nucleus, an organelle encased by a double membrane in each cell (Figure 1). During cell division, mitosis, the genome condenses into a characteristic number of metaphase chromosomes, the nuclear membranes break down, and the chromosomes divide, moving into the two daughter cells before the nuclei reform. A small number of plant genomes have been studied in great detail in the 1990s and early in the twenty-rst century and the DNA of their genomes has been sequenced. Rice, the staple food crop in the grass family, and Arabidopsis, a small, fast-growing weed related to the cabbages, have fully sequenced genomes. Some 20 other species, mostly important crops or their relatives, have large amounts of sequence data from their genomes linked to excellent genetics and phenotypic (morphological and behavioural) knowledge. Well-characterized genomes include maize (corn), soybean, Medicago and lucerne/ alfalfa (in the Legume family like peas and beans), Brassica (including rape or canola and cabbage), grape, Citrus, sugar beet, sorghum, barley (and its wheat relatives), potato and tomato and the poplar tree. These data provide a useful basis to bring out general principles about the nature, organization and evolution of a plant nuclear genome. Most information currently relates to angiosperms, although increasing amounts of data about gymnosperms (cycads and conifers) and nonseed plants such as bryophytes (mosses, hornworts and liverworts), green algae and pteridophytes (ferns), suggest that the general pattern of plant nuclear genome composition is universal. Each plant cell also contains multiple mitochondria and plastids (chloroplasts in green tissues), each one of which has its own genome (the cytoplasmic genomes), which interact with the nuclear genome. Cells may also include other genomes such as those of viruses, bacteria or mycoplasmas that cohabit in the cell and may be infective,

synergistic or parasitic. See also: Mitochondria: Structure and Role in Respiration; Plant Cell: Overview; The Cell Nucleus

Nuclear Genomes and their Size


The nuclear genome of rice consists of 450 million base pairs (Mbp) of DNA divided among the 12 chromosome pairs, and includes the genes that encode some 38 000 proteins. Along with their controlling sequences, these genes represent less than 10% of the total amount of DNA, and about half of all the DNA consists of repetitive motifs that are present thousands of times in the genome. Another fully sequenced plant genome, Arabidopsis (Arabidopsis thaliana) has a total of 157 Mbp with about 31 000 genes on ve chromosome pairs. All higher plants, at the diploid level, require approximately the same number of genes and regulatory DNA sequences for physiological processes like seed germination, growth, owering and reproduction. However, nuclear genome sizes, measured by the number of base pairs of DNA, of dierent plant species vary enormously between species, although each species has a characteristic and relatively constant genome size. The amount of nuclear DNA can be given as an absolute weight of the DNA (in pg, picograms) or converted into the number of base pairs represented by that weight. The number of base pairs in an unreplicated haploid genome of higher plants, the 1C genome size, is known to range from some 70 Mbp in the carnivorous plant Genlisea up to more than 130 000 Mbp in the lily species Fritillaria assyriaca, a remarkable dierence of nearly 2000 times. Some of the genome size variation is due to polyploidy, where there are multiple copies of complete chromosome sets, and it is thought that 50% or more of angiosperms are polyploid in their origin. Other variation in genome size occurs because of dierences in the amount of repetitive DNA in the genome.
1

ENCYCLOPEDIA OF LIFE SCIENCES & 2007, John Wiley & Sons, Ltd. www.els.net

Plant Nuclear Genome Composition

A diagram (idiogram) of one set of seven chromosomes from a cell with 2n =2x =14. Chromosome arms are drawn to scale showing the arm lengths in the cell, with a gap at the centromere. One chromosome has a constriction at the NOR, the site of the rRNA genes.

The DNA sequence defining both ends of chromosomes of most plant species is TTTAGGG/CCCTAAA Intercalary, centromeric and subterminal tandem repeats (green) and dispersed repetitive sequences such as micro satellites and retroelements (grey)

Long Arm

Centromere

Short Arm

Telomere

One chromosome from the nucleus; after replication it consists of two chromatids.

Nucleolus Less dense area with rRNA genes expressed

An interphase nucleus containing the nuclear genome. The nucleus has chromatin in a decondensed state.

Probable structure of interphase chromatin with dense regions and loops of expressed DNA

DNA double helix coiled around histone proteins (not shown) the nucleosome core fibre. Histone proteins are modified depending on whether they are in DNA doubleexpressed loops or helix more condensed unexpressed chromatin

Figure 1 Diagrams of the plant nuclear genome at various scales. The genome is divided among chromosomes within the cell nucleus, and consists of various classes of repetitive and single copy DNA sequence.

Chemical and Physical Composition of Plant DNA


Each double-stranded DNA molecule is made up of the four deoxynucleotides adenine, thymine, guanidine and cytosine (A, T, G and C). The numbers of A residues equals the number of T residues (and G 5 C) because of pairing of bases in the double helix, but the ratio of (G+C)/(A+T), or GC content, is a characteristic of the composition of a genome: typical values for plants are 36% in Arabidopsis and 44% in rice. Plant genes usually have a few percent higher GC content in exons, and lower contents in introns,
2

contrasting with overall base composition in bacteria or mycoplasmas where ratios vary between 25% and 75%. In plants, up to 50% of the cytosine residues in the DNA are modied after DNA replication by addition of a methyl group, giving 5-methylcytosine. The physical properties of the DNA underpin techniques used daily by plant molecular biologists for the characterization, separation, staining and identication of DNA molecules. DNA, a hydrophilic acid molecule, is highly soluble, but can be precipitated in dilute alcohol solutions giving a white or pale precipitate. Spectrophotometry, based on the absorbance of ultraviolet (UV) light

Plant Nuclear Genome Composition

Figure 2 Metaphase chromosomes from rye photographed under the light microscope. The DNA of the 14 chromosomes has been stained with a molecule called 4 6-diamidino-2-phenylindole (DAPI) which fluorescence blue. Weakly staining gaps are seen at the centromere near the middle of each chromosome. Two different tandemly repeated sequences, labelled in red and green, have been hybridized to the chromosomes to show the locations of particular repeats at subterminal (green) and intercalary (red) locations on the chromosomes (scale bar 10 mm).

at 260 nm by DNA, is used to measure the concentration and purity of DNA in solution. Also important is the sitespecic cleavage (hydrolysis) of very large DNA molecules by enzymes, mostly bacterial in origin, called restriction endonucleases, which recognize short characteristic sequence motifs of 48 bp and cleave the long plant genomic DNA double strand into dened fragments. Electrophoresis in a gel is used to separate fragments of chromosomal DNA based on their charge (and hence size), and many enzymatic manipulations are based on local nucleotide base composition. Key methods in gaining understanding of plant genome organization involve denaturation of the double-stranded DNA into singlestranded molecules and controlled reforming of doublestranded helixes with labelled probe molecules (hybridization), achieved by altering factors such as temperature, salt concentration or acidity. This procedure on isolated DNA separated on membranes is Southern hybridization, while uorescent in situ hybridization (FISH) is carried out on chromosome preparations using probes detected with uorescence (Figure 2). Dyes used to stain DNA (Figures 2 and 3) interact with the molecule in various ways: intercalating between the bases, binding in the minor groove of the double helix, or interacting with the charged molecular groups.

The Packaging of the Genome: Chromatin and Chromosomes


In each nucleus, plant DNA is divided between chromosomes, with the precise number of these being characteristic for a species. In diploid plants most owering plants the chromosomes are in pairs of homologues, each with very similar DNA sequences, one of each chromosome pair originating from the mother and one from the father. The diploid chromosome number is referred to as 2n, while the

number in a gamete, the haploid number, would be n. Chromosome number is characteristic of each species, known to vary from 2n 5 4 (in the daisy species Haplopappus gracilis and Brachycome lineariloba) to about 2n 5 1440 in the Adders tongue fern (Ophioglossum reticulatum). Exceptionally, haploid angiosperms, or aneuploids with loss or gain of chromosomes are found and these can be very useful for genetic analysis; in bryophytes, the haploid or gametophytic plant is the dominant photosynthetic phase of the life cycle. Physically, metaphase chromosomes can be seen as distinct objects (Figures 1 and 2) in a dividing cell using a light microscope and suitable staining, and they consist of two parallel arms, about 12 mm wide, with a length of 215 mm depending on the species. At interphase, when the chromosomes decondense and are active in gene expression and DNA replication (cell cycle), the chromosomes are seen as an amorphous to brous, structured mass within the nuclear envelope (Figure 3). At interphase, the chromosomes are decondensed but highly organized and occupy territories within the nucleus, forming loops of chromatin active in gene expression (Figures 1 and 3). Dierent chromosome domains lie in characteristic positions: in many cells, the telomeres and centromeres are arranged at or near opposite poles of the nuclear envelope. Another example of nonrandom organization of interphase chromosomes is the nucleolus, a weakly staining structure seen by light microscopy (Figure 3) where the decondensed ribosomal ribonucleic acid (rRNA) genes are being transcribed. Each chromosome includes one or, after replication, two, double-stranded linear DNA molecules (contrasting with the circular DNA molecules of bacterial chromosomes). The length of a chromosome is characteristic for each type in a species, and may range from less than 20 Mbp to more than 900 Mbp, and if stretched to their full length, the DNA molecules would be between 7 and 300 mm long. This double-stranded DNA is closely associated with nuclear proteins known as histones: the DNA is
3

Plant Nuclear Genome Composition

Figure 3 Nuclei from a hybrid cereal plant at various stages of cell division showing many circular interphase nuclei with decondensed chromatin and internal structures including lighter areas (the nucleoli with rDNA gene expression) and some chromatin fibres. Chromosomes condense from prophase (P), and metaphase chromosomes are seen in the metaphase (M), separating into the chromatids at anaphase (A) before decondensing during telophase (T) of the cell cycle. Arrow shows the nucleolus where rRNA genes are transcribed.

wrapped around nucleosomes consisting of the octamer core of histones, and about 150 bp of DNA wrap twice around each nucleosome, and then there is a spacer (typically 1020 bp long) before the next nucleosome (Figure 1). Since 2000, the signicance of the histone proteins to gene expression has become increasingly recognized, and the modication of these proteins (which can be tracked by the careful use of antibodies linked to labels) by methylation or other posttranslational changes to specic amino-acid residues has been functionally correlated with both specic modication of gene expression and broader, genomewide modications in chromosome activity. Surprisingly, little is conrmed about the packaging of the DNA at higher levels: how are metres of DNA tted into a nucleus 1020 mm across without tangling or breaking, remaining accessible for replication and expression enzymes, and allowing condensation to metaphase chromosomes during division? Speculative diagrams often show levels of coiling and supercoiling, but experimental evidence is equivocal because of the diculty of imaging a complex, hydrated structure where interactions of charge, salts, nuclear proteins and the DNA together give the structure. Telomeres dene the two physical ends of the doublestranded DNA helix in the chromosome. The telomere protects the chromosome from degradation, but uniquely is not replicated from a preexisting DNA template. In most plant species, the sequence is TTTAGGG, and is added to the end of the DNA molecule by the enzyme telomerase, an enzyme which has reverse transcriptase activity and includes a integral RNA template. See also: Telomeres
4

In most plant genomes, each chromosome has a regional centromere that is hundreds of kilobases long, visible on metaphase chromosomes as a constriction, and this functions to hold the two DNA molecules, condensed into chromatids, together at metaphase, and is the region where the kinetochore assembles and spindle microtubules attach to move the chromatids apart during division. While centromere functions are conserved, unlike telomeres the DNA sequences spanning centromeres shows little clear conservation. However, many species have repetitive DNA elements either tandem repeats, or sometimes retroelement-related sequences that are conserved at all the centromeres. The interphase nucleus, when chromosomes are decondensed, is active in gene expression and replication of its DNA. The replication and transcription enzymes open the DNA structure to allow access for DNA polymerase and the associated DNA binding transcription factors. Far from being unstructured and static, the interphase nucleus is a dynamic environment with the chromatin moving and internal structures such as the nucleolus, where the rRNA genes are transcribed, forming and fusing with other nucleoli. At interphase, the metaphase chromosomes decondense extending to several times their metaphase length and there are thinner chromatin loops; some regions, particularly tandemly repeated blocks of DNA visualized as heterochromatin do not decondense and can be seen as densely staining regions by light microscopy often described as chromocentres.

Plant Nuclear Genome Composition

Plastid/mitochondrial sequences

Plant Nuclear Genome

Genes and single copy regulatory sequences

Virus-origin sequences Transgene sequences Repetitive DNA

Unclassified sequences

Tandem repeats

Dispersed repeats rDNA 45S rRNA and 5S rRNA genes

Transposable elements DNA elements and retrotransposons

Simple sequence repeats/microsatellites

Centromeres, telomeres and other blocks

Figure 4 Components of the plant genome and their relationships.

The Genomic DNA Sequence


The sequence of the DNA within the genome encodes the genes (exons), and also includes the intron sequences within genes, regulatory sequences, and other DNA sequences present in low-copy number, and repetitive DNA motifs. Repetitive DNA consists of sequence motifs varying in size from dinucleotides (such as the monotonic repetition GAGAGA) to motifs longer than 10 000 bp; the motifs are repeated in copy numbers from many hundreds to hundreds of thousands in the plant genome. These repetitive sequences some present at only one or a few sites in a genome, while other motifs are dispersed widely throughout the whole genome make up some 5075% of the entire DNA in a nucleus. Although sometimes referred to as junk DNA, repetitive DNA is important for genome function and evolution. Some sequences have structural roles in the chromosome (such as centromeres and chromosome ends), the modication of associated histone proteins is related to chromatin packaging or epigenetic phenomena, while some repetitive DNA may be transcribed to small RNAs that are involved in genome regulation. During evolution, repetitive DNA may change very rapidly in sequence and abundance, leading to generation of diversity and divergence of genomes and speciation. The loss and gain of repeats can lead to modulation of gene expression. The investigation of repetitive DNA provides insights into the evolution of plant genomes and sequence families and supports studies of species phylogeny and relationships. Repetitive DNA can be divided into several sequence classes (Figure 2) that dier in their organization and localization along the chromosomes, although intermediate forms of arrangement do exist. The analysis of

the distribution of repeat families within a group of related plant species reveals that some repeats are amplied in a subset of chromosomes and of species. FISH using cloned repetitive DNA sequence motifs as probes is a key method to understand the physical organization of the motifs and their clustering along the chromosomes (Figure 4).

Tandemly repeated DNA


Tandemly repeated DNA units are motifs that are arranged adjacent to each other in monotonous arrays. Satellites and microsatellites are examples of tandemly repeated sequences. Some tandemly repeated DNA sequences may show species- and chromosome-specic amplication, while others are present in many species (including simple microsatellites). Satellite DNA makes up a large proportion of the heterochromatin, the chromatin fraction that remains condensed and densely packed throughout the cell cycle. Satellite DNA is an evolutionarily dynamic component of plant genomes and its abundance and genomic location may vary. When chromosomes are examined in the light microscope, tandemly repeated satellite DNA is often visualized as heterochromatin, being seen as discrete characteristically stained regions on metaphase chromosomes and interphase chromocentres. Genetically, heterochromatin is characterized as condensed genomic regions with few genes and little activity in terms of transcription. Many satellite arrays occur at a few preferred positions on the chromosomes (pericentromeric, intercalary and subtelomeric). Some tandem repeats are coding and have a complex organization, e.g. the genes for the rRNAs and telomeric repeats.
5

Plant Nuclear Genome Composition

Typical plant tandem repeat motifs fall into preferred size classes, most frequently with a length of 150170 nucleotides and multiples of this size (see the section on The Packaging of the Genome: Chromatin and Chromosomes). Tandem repeats show variable divergence both within and between species, always in nucleotide composition and sometimes in length. In some groups of related plants, divergence of satellite repeat sequences can be analysed and used to infer relationships, as well as being suggestive of mechanisms that homogenize sequences across the genome. Unlike mammalian genomes, chromosome-specic repeats are rarely observed in plants, suggesting that a dierent mechanism of genome homogenization and sequence dispersal is dominant in plants. Repeats of satellite DNA families are constantly subjected to homogenization and xation within a population and hence are characterized by a relatively high genomic turnover. As a result, repeat families can expand or contract rapidly during dierent evolutionary scales such as species speciation or plant breeding. These genomic changes can be monitored by in situ or Southern hybridization experiments, or more sensitively by polymerase chain reaction (PCR). See also: Polymerase Chain Reaction (PCR)

Microsatellites
Tandemly repeated DNA, consisting of very short repeating motifs (15 bp), are known as microsatellites or simple sequence repeats (SSRs), and is found in most eukaryotic genomes. In situ hybridization to chromosomes using microsatellite sequences shows that most blocks of microsatellites are widely dispersed over the genome, although there are some clusters or exclusions, particularly associated with heterochromatin or other tandemly repeated satellites. Microsatellite repeats are typically anked by DNA sequences that are often single or low-copy within the genome. The number of repeats of the microsatellite motif at each genomic site (locus) is extremely variable between accessions in most species: presumably, slipped-strand mispairing during replication results in this high variability. Many microsatellites in plants are actually complex with two or more short motifs and some degeneracy between the anking conserved sites. Dierent taxonomic groups of plants dier in the conservation of the microsatellite anking regions: while the whole palm family mostly has conserved anking regions, most seem to be specic to one species in cereals and grasses in the Triticeae tribe. Because of the variability in number of repeats and the conservation of the anking sites, microsatellites provide highly informative and polymorphic markers. Variation between genotypes in the number of microsatellite repeats at a locus is detected by amplication with oligonucleotide primers homologous to parts of the anking region by
6

PCR. From the late 1990s, microsatellites have become the preferred DNA-based genetic marker for genetic mapping and aspects of genome analysis. After testing, primers that amplify multiple sequences from a genome, or show no polymorphisms are discarded, and the remaining primers can be used for genetic linkage mapping and quantication of genome diversity. In most cases, the genomic context of the detected repeats remains unknown, although the increasing amount of genomic sequence in databases means more primers are now designed from within expressed sequence tag - simple sequence repeats (EST-SSR) or next to genes or from long stretches of DNA with known genes bacterial articial chromosomes (BACs). Genetic mapping of microsatellites involves the amplication of the repeat arrays using PCR with primers anking the arrays or also the amplication of DNA stretches between the arrays. Again, there is a dierence between plant and mammalian microsatellites: in mammals, CA/GT repeats seem to be most abundant, while plants have variable lengths of 2, 3 or 4 bp repeated units. See also: Polymerase Chain Reaction (PCR) Another type of short DNA repeats are minisatellites, which are characterized by longer repeating units (1050 bp). First isolated from mammalian genomes, minisatellites have been detected in many plant species although are poorly characterized; like microsatellites they are present at multiple genomic sites. Minisatellite repeats share a GC-rich core motif and are highly polymorphic enabling their application as markers for genome mapping.

Transposable elements the source of dispersed repetitive DNA


In contrast to tandemly repeated sequences, dispersed repetitive DNA sequences are scattered throughout the genome, interspersed with other sequences and distributed over all or most chromosomes. It is becoming clear that the majority of dispersed DNA sequences originate from mobile DNA sequences, also known as transposable elements. Indeed, in a survey of genes in rice, 30% of all genes identied were transposable element-related. Mobile DNA sequence elements are separated into two major classes (classes I and II) depending on their mode of transposition. Class II elements move as DNA copies and include the Ac/ Ds, En/Spm and Mu transposons and the transposon superfamily Mariner; these families are found in most plant and animal genomes. Shorter sequences derived from Mariner transposons are also highly abundant and known as miniature inverted-repeat transposable elements (MITEs). Class I transposable elements including retrotransposons were rst found in insect and yeast genomes and can be classied along with retroviruses. Retrotransposons proliferate by reverse transcription of their own RNA molecule, which is transcribed by the cellular transcription

Plant Nuclear Genome Composition

machinery. Because of the mode of transposition, where the parental element stays integrated within the genome and new, reverse transcribed copies are integrated into the genome, retrotransposons make up the major class of repetitive DNA in many plants and are an important reason for the variability of nuclear genome size. Two dierent groups of retrotransposons, the long terminal repeat (LTR) retrotransposons and the non-LTR retrotransposons, have been identied: LTR retrotransposons are anked by long terminal repeats which range in length from a few hundred up to more than 2200 bp. Their internal domain carries the polyprotein gene which encodes a protease, RNAaseH, reverse transcriptase and integrase. According to the order of genes within the internal domain, LTR retrotransposons can be further subdivided into Ty1-copia-like and Ty3-gypsy-like retroelements. Both Ty1-copia and Ty3-gypsy retrotransposons vary in length from some 56 kb up to 13 kb. Some LTR retrotransposons have structural similarities to eukaryotic retrovirus genomes. In fact, an additional open reading frame (ORF) with similarity to transmembrane proteins resembling a virus capsid-like structure has been found in many plant LTR retrotransposons. These LTR retrotransposons form a third group, designated envlike retrotransposons suggesting that the boundaries between retrotransposons and retroviruses are blurred, and the detection of infective plant retroviruses might be expected in the future. Non-LTR retrotransposons are separated into long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs) and are considered as ancient genome components and ancestors of LTR retroelements. Although species-specic in nucleotide structure and variable in length, all LINEs are terminated by a poly(A) tail and contain two ORFs encoding a nucleic acid binding protein (ORF1), and an endonuclease and reverse transcriptase domain (ORF2), together conferring the elements ability for autonomous retrotransposition. SINEs do not encode any reverse transcriptase activity and are only a few hundred basepairs long. Molecular analyses of large insert clones of maize have shown that some 50% of the nuclear genome consists of various transposable DNA sequences, and Vicia species may contain up to 106 retrotransposon copies in their genomes. How can plants survive with such a genetic burden of potentially mutagenic transposons which may cause gene disruption or inactivation? Several mechanisms have been proposed to account for the silencing of retroelements, including inter- and intra-element recombination resulting in deletion, methylation leading to dense packaging of chromatin and reduced transcription, or degeneracy and lack of activity. Indeed, there may be advantages from the elements in silencing infective viruses through RNAi or other mechanisms, or the generation of new variation. Many retroelements are only activated in specic tissues, developmental stages or under certain stress conditions such as

in vitro culture, and by insertion near or into genes, may modify or repress gene expression patterns. See also: RNA Interference (RNAi) and MicroRNAs

Plant Nuclear and Organellar Genomes


Plant cells include plastids (chloroplasts, amyloplasts and other types of plastid, typically each with 50100 copies of the organellar genome) and mitochondria, which are thought to have arisen by integration of free-living prokaryotic organisms into the eukaryotic cell. Subsequently during evolution, many of the genes originally present in the organellar genomes have been translocated into the nuclear genome so the genes are found in the nucleus, and the copy of the gene in the organelle has become nonfunctional or been deleted from the organellar genome: the genes moved to the nucleus produce mRNA, which makes proteins that are eventually translocated back to the organelle. This process of moving to the nuclear genome is apparently continuing in evolutionary terms since plant species dier in the number of genes transferred; furthermore, the mitochondrial DNA of plants includes many sequences of plastid origin. One of the surprises from both the rice and Arabidopsis genome projects was the discovery of large parts of mitochondrial genome in the nucleus.

Gene Structure
Plant genes, like those of other eukaryotes, have a promoter, then a 5 region at the start of the mRNA which runs from the transcription start site to the coding region. The coding region of the gene that starts with a methionine codon (AUG) marking the translation initiation point, includes multiple introns and exons, with the gene ending with a stop codon, followed by the 3 untranslated region. The 5 untranslated region can regulate translation and mRNA stability or cellular targeting. The genes are very similar between most plants, and indeed, plants, fungi and animals: about 70% of all genes seem to be similar in all three kingdoms. Based on the similarity of DNA and protein sequences, genes with similar functions to a reference gene in one species can be isolated relatively easily from almost any plant species. From analysis of Arabidopsis and rice, most genes are now known to occur in families of related genes. As well as being important for gene regulation, duplication is of evolutionary importance since duplicate copies can accumulate mutations leading to new functions or regulation without loss of the original function, which is carried out by another copy of the gene. Some duplicated gene copies accumulate mutations, and give rise to pseudo-genes, which are often found in DNA sequences.
7

Plant Nuclear Genome Composition

Plant Nuclear Genome Composition and Its Consequences


Knowledge of the plant genome has advanced our understanding of how dierent plants are related to each other, allowing their evolutionary relationships to be determined. It has proved important in measuring the genetic diversity in ecosystems, and in determining the variation present within crops and their wild relatives. The use of genetic markers, DNA tagging methods and mutations has enabled us to gain an understanding of the function of most plant genes. Increasingly, understanding of the universal features of plant genomes enables us to make use of knowledge from model species to understand the diversity of all plants.

Further Reading
Our understanding of the plant genomes continues to develop rapidly. Special issues of the major Journals, Nature and Science, devoted to

the rice and Arabidopsis genomes give excellent overviews of these genomes and are perhaps the best references to current understanding. Updates are largely published via the web or in very specic research publications. Science 296 (5 April 2002): 13146+poster (includes various Research Papers, Editorials, News/Focus, Letters, Perspectives). Nature 408 (14 December 2000): 791826. As well as the sequences and functional annotation of proteins in the Genbank/EMBL/DDJB databases, many species have genomic databases where much current information about plant genomes is described. These include Arabidopsis: The Arabidopsis Information Resource, TAIR, www.arabidopsis.org (see particularly education and outreach) and The Institute for Genome Research, TIGR www.tigr.org (many genome projects). The US National Science Foundation (the grant body funding major plant genome projects in the US) has the National Plant Genome Initiative and their website, www.nsf.gov, includes accessible, accurate and current information and reports (see, for example, http://www.nsf.gov/bio/pubs/reports/npgi2006/highlights.htm). Genome sizes of plants are given in the database at http://www. rbgkew.org.uk/cval/

You might also like