Professional Documents
Culture Documents
JS (Pat) Heslop-Harrison, University of Leicester, Leicester, UK Thomas Schmidt, Technische Universita t Dresden, Dresden, Germany
The plant nuclear genome consists of DNA divided among the chromosomes within the cell nucleus. Plant genomes contain coding and regulatory sequences for the genes and repetitive DNA. Genomes are evolutionarily dynamic and analysis provides insights into the evolution of genes, sequence families and genomes, and supports studies of species phylogeny and relationships.
Introductory article
Article Contents
. Deoxyribonucleic Acid and Plant Genomes . Nuclear Genomes and their Size . Chemical and Physical Composition of Plant DNA . The Packaging of the Genome: Chromatin and Chromosomes . The Genomic DNA Sequence . Plant Nuclear and Organellar Genomes . Gene Structure . Plant Nuclear Genome Composition and Its Consequences
doi: 10.1002/9780470015902.a0002014
synergistic or parasitic. See also: Mitochondria: Structure and Role in Respiration; Plant Cell: Overview; The Cell Nucleus
ENCYCLOPEDIA OF LIFE SCIENCES & 2007, John Wiley & Sons, Ltd. www.els.net
A diagram (idiogram) of one set of seven chromosomes from a cell with 2n =2x =14. Chromosome arms are drawn to scale showing the arm lengths in the cell, with a gap at the centromere. One chromosome has a constriction at the NOR, the site of the rRNA genes.
The DNA sequence defining both ends of chromosomes of most plant species is TTTAGGG/CCCTAAA Intercalary, centromeric and subterminal tandem repeats (green) and dispersed repetitive sequences such as micro satellites and retroelements (grey)
Long Arm
Centromere
Short Arm
Telomere
One chromosome from the nucleus; after replication it consists of two chromatids.
An interphase nucleus containing the nuclear genome. The nucleus has chromatin in a decondensed state.
Probable structure of interphase chromatin with dense regions and loops of expressed DNA
DNA double helix coiled around histone proteins (not shown) the nucleosome core fibre. Histone proteins are modified depending on whether they are in DNA doubleexpressed loops or helix more condensed unexpressed chromatin
Figure 1 Diagrams of the plant nuclear genome at various scales. The genome is divided among chromosomes within the cell nucleus, and consists of various classes of repetitive and single copy DNA sequence.
contrasting with overall base composition in bacteria or mycoplasmas where ratios vary between 25% and 75%. In plants, up to 50% of the cytosine residues in the DNA are modied after DNA replication by addition of a methyl group, giving 5-methylcytosine. The physical properties of the DNA underpin techniques used daily by plant molecular biologists for the characterization, separation, staining and identication of DNA molecules. DNA, a hydrophilic acid molecule, is highly soluble, but can be precipitated in dilute alcohol solutions giving a white or pale precipitate. Spectrophotometry, based on the absorbance of ultraviolet (UV) light
Figure 2 Metaphase chromosomes from rye photographed under the light microscope. The DNA of the 14 chromosomes has been stained with a molecule called 4 6-diamidino-2-phenylindole (DAPI) which fluorescence blue. Weakly staining gaps are seen at the centromere near the middle of each chromosome. Two different tandemly repeated sequences, labelled in red and green, have been hybridized to the chromosomes to show the locations of particular repeats at subterminal (green) and intercalary (red) locations on the chromosomes (scale bar 10 mm).
at 260 nm by DNA, is used to measure the concentration and purity of DNA in solution. Also important is the sitespecic cleavage (hydrolysis) of very large DNA molecules by enzymes, mostly bacterial in origin, called restriction endonucleases, which recognize short characteristic sequence motifs of 48 bp and cleave the long plant genomic DNA double strand into dened fragments. Electrophoresis in a gel is used to separate fragments of chromosomal DNA based on their charge (and hence size), and many enzymatic manipulations are based on local nucleotide base composition. Key methods in gaining understanding of plant genome organization involve denaturation of the double-stranded DNA into singlestranded molecules and controlled reforming of doublestranded helixes with labelled probe molecules (hybridization), achieved by altering factors such as temperature, salt concentration or acidity. This procedure on isolated DNA separated on membranes is Southern hybridization, while uorescent in situ hybridization (FISH) is carried out on chromosome preparations using probes detected with uorescence (Figure 2). Dyes used to stain DNA (Figures 2 and 3) interact with the molecule in various ways: intercalating between the bases, binding in the minor groove of the double helix, or interacting with the charged molecular groups.
number in a gamete, the haploid number, would be n. Chromosome number is characteristic of each species, known to vary from 2n 5 4 (in the daisy species Haplopappus gracilis and Brachycome lineariloba) to about 2n 5 1440 in the Adders tongue fern (Ophioglossum reticulatum). Exceptionally, haploid angiosperms, or aneuploids with loss or gain of chromosomes are found and these can be very useful for genetic analysis; in bryophytes, the haploid or gametophytic plant is the dominant photosynthetic phase of the life cycle. Physically, metaphase chromosomes can be seen as distinct objects (Figures 1 and 2) in a dividing cell using a light microscope and suitable staining, and they consist of two parallel arms, about 12 mm wide, with a length of 215 mm depending on the species. At interphase, when the chromosomes decondense and are active in gene expression and DNA replication (cell cycle), the chromosomes are seen as an amorphous to brous, structured mass within the nuclear envelope (Figure 3). At interphase, the chromosomes are decondensed but highly organized and occupy territories within the nucleus, forming loops of chromatin active in gene expression (Figures 1 and 3). Dierent chromosome domains lie in characteristic positions: in many cells, the telomeres and centromeres are arranged at or near opposite poles of the nuclear envelope. Another example of nonrandom organization of interphase chromosomes is the nucleolus, a weakly staining structure seen by light microscopy (Figure 3) where the decondensed ribosomal ribonucleic acid (rRNA) genes are being transcribed. Each chromosome includes one or, after replication, two, double-stranded linear DNA molecules (contrasting with the circular DNA molecules of bacterial chromosomes). The length of a chromosome is characteristic for each type in a species, and may range from less than 20 Mbp to more than 900 Mbp, and if stretched to their full length, the DNA molecules would be between 7 and 300 mm long. This double-stranded DNA is closely associated with nuclear proteins known as histones: the DNA is
3
Figure 3 Nuclei from a hybrid cereal plant at various stages of cell division showing many circular interphase nuclei with decondensed chromatin and internal structures including lighter areas (the nucleoli with rDNA gene expression) and some chromatin fibres. Chromosomes condense from prophase (P), and metaphase chromosomes are seen in the metaphase (M), separating into the chromatids at anaphase (A) before decondensing during telophase (T) of the cell cycle. Arrow shows the nucleolus where rRNA genes are transcribed.
wrapped around nucleosomes consisting of the octamer core of histones, and about 150 bp of DNA wrap twice around each nucleosome, and then there is a spacer (typically 1020 bp long) before the next nucleosome (Figure 1). Since 2000, the signicance of the histone proteins to gene expression has become increasingly recognized, and the modication of these proteins (which can be tracked by the careful use of antibodies linked to labels) by methylation or other posttranslational changes to specic amino-acid residues has been functionally correlated with both specic modication of gene expression and broader, genomewide modications in chromosome activity. Surprisingly, little is conrmed about the packaging of the DNA at higher levels: how are metres of DNA tted into a nucleus 1020 mm across without tangling or breaking, remaining accessible for replication and expression enzymes, and allowing condensation to metaphase chromosomes during division? Speculative diagrams often show levels of coiling and supercoiling, but experimental evidence is equivocal because of the diculty of imaging a complex, hydrated structure where interactions of charge, salts, nuclear proteins and the DNA together give the structure. Telomeres dene the two physical ends of the doublestranded DNA helix in the chromosome. The telomere protects the chromosome from degradation, but uniquely is not replicated from a preexisting DNA template. In most plant species, the sequence is TTTAGGG, and is added to the end of the DNA molecule by the enzyme telomerase, an enzyme which has reverse transcriptase activity and includes a integral RNA template. See also: Telomeres
4
In most plant genomes, each chromosome has a regional centromere that is hundreds of kilobases long, visible on metaphase chromosomes as a constriction, and this functions to hold the two DNA molecules, condensed into chromatids, together at metaphase, and is the region where the kinetochore assembles and spindle microtubules attach to move the chromatids apart during division. While centromere functions are conserved, unlike telomeres the DNA sequences spanning centromeres shows little clear conservation. However, many species have repetitive DNA elements either tandem repeats, or sometimes retroelement-related sequences that are conserved at all the centromeres. The interphase nucleus, when chromosomes are decondensed, is active in gene expression and replication of its DNA. The replication and transcription enzymes open the DNA structure to allow access for DNA polymerase and the associated DNA binding transcription factors. Far from being unstructured and static, the interphase nucleus is a dynamic environment with the chromatin moving and internal structures such as the nucleolus, where the rRNA genes are transcribed, forming and fusing with other nucleoli. At interphase, the metaphase chromosomes decondense extending to several times their metaphase length and there are thinner chromatin loops; some regions, particularly tandemly repeated blocks of DNA visualized as heterochromatin do not decondense and can be seen as densely staining regions by light microscopy often described as chromocentres.
Plastid/mitochondrial sequences
Unclassified sequences
Tandem repeats
the distribution of repeat families within a group of related plant species reveals that some repeats are amplied in a subset of chromosomes and of species. FISH using cloned repetitive DNA sequence motifs as probes is a key method to understand the physical organization of the motifs and their clustering along the chromosomes (Figure 4).
Typical plant tandem repeat motifs fall into preferred size classes, most frequently with a length of 150170 nucleotides and multiples of this size (see the section on The Packaging of the Genome: Chromatin and Chromosomes). Tandem repeats show variable divergence both within and between species, always in nucleotide composition and sometimes in length. In some groups of related plants, divergence of satellite repeat sequences can be analysed and used to infer relationships, as well as being suggestive of mechanisms that homogenize sequences across the genome. Unlike mammalian genomes, chromosome-specic repeats are rarely observed in plants, suggesting that a dierent mechanism of genome homogenization and sequence dispersal is dominant in plants. Repeats of satellite DNA families are constantly subjected to homogenization and xation within a population and hence are characterized by a relatively high genomic turnover. As a result, repeat families can expand or contract rapidly during dierent evolutionary scales such as species speciation or plant breeding. These genomic changes can be monitored by in situ or Southern hybridization experiments, or more sensitively by polymerase chain reaction (PCR). See also: Polymerase Chain Reaction (PCR)
Microsatellites
Tandemly repeated DNA, consisting of very short repeating motifs (15 bp), are known as microsatellites or simple sequence repeats (SSRs), and is found in most eukaryotic genomes. In situ hybridization to chromosomes using microsatellite sequences shows that most blocks of microsatellites are widely dispersed over the genome, although there are some clusters or exclusions, particularly associated with heterochromatin or other tandemly repeated satellites. Microsatellite repeats are typically anked by DNA sequences that are often single or low-copy within the genome. The number of repeats of the microsatellite motif at each genomic site (locus) is extremely variable between accessions in most species: presumably, slipped-strand mispairing during replication results in this high variability. Many microsatellites in plants are actually complex with two or more short motifs and some degeneracy between the anking conserved sites. Dierent taxonomic groups of plants dier in the conservation of the microsatellite anking regions: while the whole palm family mostly has conserved anking regions, most seem to be specic to one species in cereals and grasses in the Triticeae tribe. Because of the variability in number of repeats and the conservation of the anking sites, microsatellites provide highly informative and polymorphic markers. Variation between genotypes in the number of microsatellite repeats at a locus is detected by amplication with oligonucleotide primers homologous to parts of the anking region by
6
PCR. From the late 1990s, microsatellites have become the preferred DNA-based genetic marker for genetic mapping and aspects of genome analysis. After testing, primers that amplify multiple sequences from a genome, or show no polymorphisms are discarded, and the remaining primers can be used for genetic linkage mapping and quantication of genome diversity. In most cases, the genomic context of the detected repeats remains unknown, although the increasing amount of genomic sequence in databases means more primers are now designed from within expressed sequence tag - simple sequence repeats (EST-SSR) or next to genes or from long stretches of DNA with known genes bacterial articial chromosomes (BACs). Genetic mapping of microsatellites involves the amplication of the repeat arrays using PCR with primers anking the arrays or also the amplication of DNA stretches between the arrays. Again, there is a dierence between plant and mammalian microsatellites: in mammals, CA/GT repeats seem to be most abundant, while plants have variable lengths of 2, 3 or 4 bp repeated units. See also: Polymerase Chain Reaction (PCR) Another type of short DNA repeats are minisatellites, which are characterized by longer repeating units (1050 bp). First isolated from mammalian genomes, minisatellites have been detected in many plant species although are poorly characterized; like microsatellites they are present at multiple genomic sites. Minisatellite repeats share a GC-rich core motif and are highly polymorphic enabling their application as markers for genome mapping.
machinery. Because of the mode of transposition, where the parental element stays integrated within the genome and new, reverse transcribed copies are integrated into the genome, retrotransposons make up the major class of repetitive DNA in many plants and are an important reason for the variability of nuclear genome size. Two dierent groups of retrotransposons, the long terminal repeat (LTR) retrotransposons and the non-LTR retrotransposons, have been identied: LTR retrotransposons are anked by long terminal repeats which range in length from a few hundred up to more than 2200 bp. Their internal domain carries the polyprotein gene which encodes a protease, RNAaseH, reverse transcriptase and integrase. According to the order of genes within the internal domain, LTR retrotransposons can be further subdivided into Ty1-copia-like and Ty3-gypsy-like retroelements. Both Ty1-copia and Ty3-gypsy retrotransposons vary in length from some 56 kb up to 13 kb. Some LTR retrotransposons have structural similarities to eukaryotic retrovirus genomes. In fact, an additional open reading frame (ORF) with similarity to transmembrane proteins resembling a virus capsid-like structure has been found in many plant LTR retrotransposons. These LTR retrotransposons form a third group, designated envlike retrotransposons suggesting that the boundaries between retrotransposons and retroviruses are blurred, and the detection of infective plant retroviruses might be expected in the future. Non-LTR retrotransposons are separated into long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs) and are considered as ancient genome components and ancestors of LTR retroelements. Although species-specic in nucleotide structure and variable in length, all LINEs are terminated by a poly(A) tail and contain two ORFs encoding a nucleic acid binding protein (ORF1), and an endonuclease and reverse transcriptase domain (ORF2), together conferring the elements ability for autonomous retrotransposition. SINEs do not encode any reverse transcriptase activity and are only a few hundred basepairs long. Molecular analyses of large insert clones of maize have shown that some 50% of the nuclear genome consists of various transposable DNA sequences, and Vicia species may contain up to 106 retrotransposon copies in their genomes. How can plants survive with such a genetic burden of potentially mutagenic transposons which may cause gene disruption or inactivation? Several mechanisms have been proposed to account for the silencing of retroelements, including inter- and intra-element recombination resulting in deletion, methylation leading to dense packaging of chromatin and reduced transcription, or degeneracy and lack of activity. Indeed, there may be advantages from the elements in silencing infective viruses through RNAi or other mechanisms, or the generation of new variation. Many retroelements are only activated in specic tissues, developmental stages or under certain stress conditions such as
in vitro culture, and by insertion near or into genes, may modify or repress gene expression patterns. See also: RNA Interference (RNAi) and MicroRNAs
Gene Structure
Plant genes, like those of other eukaryotes, have a promoter, then a 5 region at the start of the mRNA which runs from the transcription start site to the coding region. The coding region of the gene that starts with a methionine codon (AUG) marking the translation initiation point, includes multiple introns and exons, with the gene ending with a stop codon, followed by the 3 untranslated region. The 5 untranslated region can regulate translation and mRNA stability or cellular targeting. The genes are very similar between most plants, and indeed, plants, fungi and animals: about 70% of all genes seem to be similar in all three kingdoms. Based on the similarity of DNA and protein sequences, genes with similar functions to a reference gene in one species can be isolated relatively easily from almost any plant species. From analysis of Arabidopsis and rice, most genes are now known to occur in families of related genes. As well as being important for gene regulation, duplication is of evolutionary importance since duplicate copies can accumulate mutations leading to new functions or regulation without loss of the original function, which is carried out by another copy of the gene. Some duplicated gene copies accumulate mutations, and give rise to pseudo-genes, which are often found in DNA sequences.
7
Further Reading
Our understanding of the plant genomes continues to develop rapidly. Special issues of the major Journals, Nature and Science, devoted to
the rice and Arabidopsis genomes give excellent overviews of these genomes and are perhaps the best references to current understanding. Updates are largely published via the web or in very specic research publications. Science 296 (5 April 2002): 13146+poster (includes various Research Papers, Editorials, News/Focus, Letters, Perspectives). Nature 408 (14 December 2000): 791826. As well as the sequences and functional annotation of proteins in the Genbank/EMBL/DDJB databases, many species have genomic databases where much current information about plant genomes is described. These include Arabidopsis: The Arabidopsis Information Resource, TAIR, www.arabidopsis.org (see particularly education and outreach) and The Institute for Genome Research, TIGR www.tigr.org (many genome projects). The US National Science Foundation (the grant body funding major plant genome projects in the US) has the National Plant Genome Initiative and their website, www.nsf.gov, includes accessible, accurate and current information and reports (see, for example, http://www.nsf.gov/bio/pubs/reports/npgi2006/highlights.htm). Genome sizes of plants are given in the database at http://www. rbgkew.org.uk/cval/