Functional Genomics in Plants PDF

Functional Genomics in Plants
Jeffrey L Bennetzen, Purdue University, West Lafayette, Indiana, USA

Functional genomics refers to a suite of genetic technologies that will contribute to a comprehensive understanding of plant gene function.
Secondary article
Article Contents
. Introduction . Expression Arrays . Reverse Genetics . Genomic Sequencing, Annotation and Comparative Analysis . Summary
Introduction
The overall goal of genomics is to identify all of the genes in an organism, and then to determine the functions of these genes within the organism. Genomic technologies are mostly not new, but involve a highly increased scale applied to traditional genetic and molecular approaches. Structural genomics, for instance, involves identifying all of the genes within a single species by the sequencing of large collections of complementary DNAs (cDNAs) and/ or total genome sequencing. Understanding the actions and roles of all of the genes in an organism is a much more daunting task that will occupy biologists for many decades to come. Functional genomics refers to a suite of genetic technologies that will contribute tremendously to a comprehensive understanding of gene function, as will concurrent studies in other areas of biology (e.g. physiology, biochemistry, ecology, etc.). Many plant species are receiving some genomic characterization. This array of genomic characterizations, and comparisons with results from other biological kingdoms, will allow a uniquely valuable set of insights into what genetic functions are shared by eukaryotes, which are shared only by plants, and which are unique to individual lineages or species.
Expression Arrays
Investigators from many dierent laboratories have undertaken analysis of expressed genes by the comprehensive cloning and sequencing of cDNA copies of messenger RNA (mRNA) molecules. From a genomics perspective, these sequenced cDNAs can be considered as expressed sequence tags (ESTs) that provide the raw material for simultaneously analysing the expression of all of the genes in an organism. Because many genes are rarely or barely expressed, while others are expressed at very high levels in some tissues, the random cloning and sequencing of ESTs is not likely to yield all of the 25 000 or more genes within a diploid plant species, even when several hundred thousand cDNAs have been sequenced. Genomic sequencing, followed by various pattern-recognition approaches to gene identication within raw DNA sequence, can be used to nd genes that were missed by the EST approach. This
approach to gene identication, eschewing the traditional one-gene-at-a-time mindset of traditional genetics, is predicated upon the idea that one should best study the expression of genes after they are all identied. Signicant EST projects have been undertaken with a large number of dierent plant species, in both the public and private sectors. Particularly comprehensive projects are now highly advanced in soyabean, tomato, alfalfa, maize, rice, wheat and a model weed, Arabidopsis thaliana. With respect to the study of gene expression, the techniques of structural genomics hope to identify the 25 000 or more genes that are expected for a diploid plant. Once sequenced, these 25 000 genes can be individually attached to various types of structural supports, commonly a glass slide, by a robotic arraying device (Figure 1). This slide then represents a unigene set, where fragments representing each of the genes within an organism can be used to measure the level of expression of that gene in any tissue, at any time in development, and in response to any internal or external signal. These slides are usually called microarrays. In plants, scores of dierent species are slated for microarray analysis. These studies will proceed rst in those species that have active EST projects, and academic labs will provide unigene sets as a service to the scientic community for several organisms, including Arabidopsis and maize. Microarrays can be hybridized to labelled RNA, and the results quantitated for each fragment on the slide, represented as an individual spot. Various RNA labelling procedures can be utilized, but the representation of mRNAs by reverse transcription with a uorescently labelled deoxynucleotide is particularly useful. Very sensitive microuorimeters have been designed to scan hybridized microarrays, allowing detection across three orders of magnitude, with an ability to dierentiate twofold dierences in expression levels (Richmond and Somerville, 2000). Key advantages of a microarray system for measuring gene expression are (1) all of the genes can be measured, in unison, in a single experiment, (2) the amount of sample RNA needed to prepare labelled probe is fairly low, so that small tissues or regions of tissues can be analysed, and (3) the data can be quantitated with a relatively high level of accuracy. However, there are also a large number of
1
ENCYCLOPEDIA OF LIFE SCIENCES / & 2002 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net
PCR amplification Printing Microarray
Hybridize Label transcripts mRNAA mRNAB

AA mRNAB T T AAA T T AA TT T
A T T AAA T T AA TT A T
AA T T AAA T T AA TT T
mRNAC
A AAA AAA T T T T TTT
B A C Fluorimetric analysis
Figure 1 Microarray technology for the comprehensive assessment of gene expression. Individual plasmid clones containing different genes or gene fragments at upper left have their inserts amplified by the polymerase chain reaction (PCR) and the fragments are individually dotted onto a glass slide by a gridding robot. Different messenger RNAs (mRNAs) in a total RNA population are labelled by reverse transcription using fluorescently labelled nucleotides and an oligo dT primer to start the labelling reaction (lower left). The labelled cDNA copies of the mRNAs are then hybridized to the microarray slide. A quantitative assessment of mRNA amounts in the original sample is indicated by the relative intensity of the hybridization to the microarray. In the example shown, the fragment homologous to mRNAB has twice the intensity because it was twice as abundant as the other two mRNAs in the sample. The other genes represented on the grid showed no hybridization, indicating that these genes were not expressed in the tissue that was the source of the sample RNA.
potential problems with microarrays, including the fact that multiple dierent genes (members of the same gene family) will often cross-hybridize, thereby leading to a single spot that hybridizes to more than one gene product. In order to avoid this problem, some projects have used chips that contain thousands of short oligonucleotides, and single mismatch controls, which should be unique to individual genes. Beyond the various technical diculties, however, neither microarray nor DNA chip studies inform the investigator as to de novo synthesis rates, RNA turnover rates, precise tissue of expression, or the quality (e.g. size, degree of intron excision, etc.) of the mRNA from any of the genes that are being expressed. Hence, microarrays can best be used to identify a comprehensive set of candidate genes whose expression can be more carefully measured in subsequent studies by nuclear run o, S1 protection, in situ hybridization or other technologies. Moreover, the most important question in understanding gene expression is knowing the actual tissues and times in which an active protein is present, a phenomenon that does not always
2
mimic mRNA levels. Traditional studies of protein expression patterns and enzyme activity are needed to answer this question, including the use of proteomics for the high throughput identication of the proteins present in a given tissue sample. The power of microarray studies has been demonstrated by the large number of genes that have been discovered to be associated with a given biological process. Often these processes had been so extensively investigated by various dierential cloning technologies that investigators thought they had found most or all of the associated genes, yet microarray studies uncovered numerous additional loci (Richmond and Somerville, 2000). A nal challenge to microarray analysis lies in the interpretation and display of the huge volume of data that can come from these experiments. One approach has been to lump together sets of genes that respond in similar time frames or tissue patterns to a particular signal or time in development. Their similarity in response would suggest their involvement in related processes and/or their activation by related signal transduction pathways. Another way
to investigate the degree of response-relatedness for a set of genes is to investigate how mutations in a particular gene aect the expression of the other genes in the same organism. One can imagine a nearly innite array of experiments that investigate these questions. It will be interesting to see how many expression patterns and gene sets come from these studies, and to what degree commonalities are observed between species and between environmental, hormonal or developmental signals.
Reverse Genetics
The only way in which a particular gene can be proven to determine a particular phenotype is by nding the alteration in phenotype yielded by a mutation in that gene. Traditionally, geneticists started with a phenotype that they wished to study (for instance, ower development) and then used mutagenesis to nd genes that were involved in the process. This approach could be very slow, but had the advantage that it required no additional information about the process. In some cases, no mutation was found, suggesting that the process may be determined by genes that are redundant in function. The question of redundancy is particularly problematic in owering plants, because many angiosperms are derived from recently polyploid parents. Hence, multiple nonallelic loci will often exist for any biological process, thereby making it dicult or impossible to identify a phenotype associated with inactivation of only one locus. In the last few years, plant researchers in the public and private sectors have begun to use reverse genetic approaches to identify mutations in candidate genes that appear likely to be involved in a particular process. For instance, these candidate genes might be identied by a microarray analysis as genes that are only expressed during ower development. If so, then it is likely that those genes play a role in ower development, and that mutations in any such gene would aect the phenotype of the developing and/or mature ower. Instead of the random mutagenesis and careful screening of traditional genetics, reverse genetics technology uses specic mutagenesis of a target gene, followed by even more careful screening for any possible resultant phenotype. In plants, the two major approaches to reverse genetics involve tagging with mobile DNA insertions or epigenetic inactivation with homologous sequences. Transposable elements have long been useful tools in plant genetics, since their discovery by Dr Barbara McClintock in the 1940s. Although transposable elements dier in the degree and nature of their insertion specicities, a few (like Mutator of maize) appear to insert in essentially any gene, at fairly similar frequencies. Hence, a large population of independent Mutator maize lines is likely to contain individuals with insertions in essentially
any gene. The challenge is to nd the plant that has an insertion in your particular candidate gene. Figure 2 depicts one strategy that is used to nd a specic insertional mutation. Oligonucleotide primers are made to an end of the transposable element and to the candidate gene. Genomic DNA from pools of dierent Mutator lines are screened by polymerase chain reaction (PCR) using these two primers, under conditions where an amplication product is seen only if the two primers are within 12 kb of each other, in opposed orientation. The rst pools may contain, for instance, aliquots of DNA from 100 dierent plants, thereby making it likely that a Mutator insertion in that gene will be found in one out of about every 100 pools. Aliquots of DNA from individual plants within the pool can then be screened to see which contains the insertion. The size of the DNA amplication product also indicates the approximate location of the insertion within (or near) the gene, and additional primers can be used to screen for insertions at dierent sites within a large gene. Once a maize line is identied with an insertion in the candidate gene, then seed from this line can be planted, and the investigator can look to see if any phenotype in this line cosegregates with the insertional mutation. Although Mutator of maize was the rst system used for insertional reverse genetics (Martienssen, 1998), the T-DNA transferred from the bacterium Agrobacterium tumefaciens to its plant host has also been very useful, particularly in Arabidopsis (Krysan et al., 1999), and several other reverse genetic systems of this type are currently under development in several plant species. In dealing with redundant genes, reverse genetics can identify mutations in a single gene with a high enough frequency that an investigator can eventually nd independent mutations in each member of a gene family. These dierent mutant lines can then be crossed to generate individuals that are homozygous for insertional inactivations in most or all gene family members, thus indicating the phenotype of such a general inactivation. Another approach to determining the function of a gene can be to test the phenotype of plants that overexpress the gene, or express it in the wrong tissue and/or at the wrong time in development. This can be accomplished by a type of insertional mutation as well, using mobile DNAs that contain a strong promoter or enhancer that activates adjacent genes (Weigel et al., 2000). Alternatively, this type of phenotype test can be conducted by construction of a transgenic plant that contains the targeted gene engineered with a promoter from a gene with a dierent transcriptional activity. Expression of a standard gene sequence from the inappropriate, or antisense, strand can be simply accomplished by transforming into a plant a structural gene that contains a new promoter engineered to transcribe in the opposite direction, starting at the normal 3 terminus of the gene. Antisense expression has been shown to decrease the amount of mRNA that is now available for translation
3
A Line 1
Line 2 Gel analysis 1 Line 3 PCR Find homozygous progeny in lines 2 and 5 2 3 4 5 6
Line 4
Line 5
Score phenotype
Line 6
Figure 2 Detection of an insertional mutation for use in reverse genetic analysis of gene function. Using short oligonucleotides (small black half arrows) from one end of an insertional DNA and one from inside the targeted gene (gene B) as primers, polymerase chain reaction (PCR) is performed on pools of plant DNA from a population in which new insertions by this mobile DNA occur at a reasonable frequency. Even with a very active mobile DNA, insertions in any particular gene out of the 25 000 or more in a plant will be quite rare, so only a few pools will show an amplification product. The size of the PCR product, determined by gel analysis, also indicates where the insertion has occurred. Once a pool is found with an insertion, then subpools or individual plants from the pool are tested for the insertion by the same PCR procedure. Once an individual mutant plant is found, the investigator can request seed of this line from the appropriate stock centre and can look at progeny to see whether any mutant phenotype cosegregates with the insertional mutation. If it does, then the investigator can use either complementation by transformation or the similar biologies of different insertions in the same gene to prove that the detected phenotype is caused by the mutation in the candidate gene. In the example shown, a primer is employed for gene B, and detects insertions (red boxes) in lines 2 and 5. The insertion in gene C in line 4 is not detected because the distance between the primers is too great to allow PCR amplification.
from the wild-type gene in the same nucleus, largely by leading to the formation of double-stranded RNAs (dsRNAs) that are rapidly degraded. In both plants and animals, it has also been observed that overexpression of a sense transgene (i.e. one transcribed in the normal direction) can decrease nal mRNA levels of all genes in the same nucleus that have extensive sequence homology with the transgene. This so-called sense suppression appears to occur at both an RNA level, inducing apparent dsRNA production and subsequent dsRNA turnover, and at the DNA level, associated with DNA methylation and decreased transcription of the nuclear genes. These epigenetic changes, although not an actual mutational change in DNA sequence, provide a phenotypic copy (phenocopy) of a mutation because they decrease the gene product that is produced. In practice, the investigator can transform sense or antisense constructs of their targeted gene into a plant, and then determine which progeny have lower nal levels of the candidate genes mRNA. Then, these plants can be scored for a new phenotype to see what eect that mRNA change may have had and, hence, the
4
role of the gene. As in the case for reverse genetics by insertional mutation, the investigator must also determine that the phenotype cosegregates with the lowered mRNA level, to be sure that the phenotype is due to the actual epigenetic change that has been engineered. Viral vectors have recently been developed that allow ecient epigenetic inactivation without the need for germinal transformation (Ruiz et al., 1998). Infection with an engineered virus that has homology to a normal cellular gene can lead to a loss of translated mRNA for that gene from any tissue that the virus infects. Like germinal suppression by a transgene, this approach can (in theory) lead to loss of function by several homologous genes in the same family within the plant. A third technique for gene inactivation involves actual gene replacement, using homologous recombination and/ or DNA repair to replace a wild-type version of a gene with a mutant version that has been engineered in vitro. Although some promising avenues are being investigated, this is not yet a workable general approach in plants. At this stage, the large amount of total nuclear DNA in plants
has made homologous events very rare compared with nonhomologous (e.g. random) events.
Genomic Sequencing, Annotation and Comparative Analysis

Over the last 10 years, the amount of DNA sequence information available to any researcher has increased exponentially. This rate of increase shows no signs of slowing. Various databases contain genomic DNA sequence from scores of organisms, large arrays of EST/ cDNA sequence, and predicted protein sequences. In this era, the rst step researchers take toward predicting the function of a gene they have identied and cloned is to compare the sequence of the gene to sequences present in these databases. This homology scanning is so routine and easy that most investigators do not stop to think that they are performing a functional genomic test by comparative analysis. When a homologous sequence is found, then the researchers have acquired an approximation of a potential role for this gene, if a function is known for the homologous gene. Very few genes in any species are unique to that species, at least by a homology criterion. In comparisons of maize and rice, for instance, two species that have diverged for about 50 million years since their descent from a common ancestor, over 95% of the genes have homologues in each species. This does not mean, however, that the genes perform exactly the same function in each species. At least a few of these genes, although still perhaps very similar in sequence, are responsible for the genetic dierences that make each species physiologically and developmentally unique. Small changes, particularly in gene regulation, can have major eventual eects on phenotype. Perhaps the most interesting question in all of biology will be to identify the genes, and the evolved changes therein, that are responsible for the signicant dierences between any two species. Given this commonality of gene content and common similarity in gene function, discovery of a close homologue in one species can provide very useful information to all other researchers interested in the same gene family. However, it is also possible to greatly misinterpret this information. For instance, a research team could nd that their newly discovered rice gene shows its highest homology to a predicted protein kinase gene from Drosophila that has been associated with response to cold. This does not mean, though, that the rice gene encodes a protein kinase (although that is a testable hypothesis) or that it is involved in any response to cold (also testable). In some cases, a plant gene might be annotated as most similar to a kinase gene (for instance) from another plant
species, which was annotated by its similarity to another gene, etc. It may be several steps of similarity (and tentative annotation) before any gene with a known function is actually found. In these cases, each additional annotation should be taken as being a bit more tentative. Only direct functional tests can determine the role that a gene performs, and all similarities to other genes only provide predictions of possible function. Of course, the more closely related a homologous gene is in sequence and in organism of origin, then the more likely that it will perform a similar function. Beyond sequence analysis, comparative mapping has provided a new tool to comparative genomics. If two genes with high sequence homology also map to colinear locations in their genomes, then it is much more likely that they are directly descended from the same ancestral gene, and hence have a similar function (Devos and Gale, 2000).
Summary
Gene identication, comprehensive gene expression, gene inactivation or activation, and comparative analyses provide a powerful set of tools for identifying the functions of plant genes. All of these tools are universal, and all are growing synergistically in power as information is added to the eld. Because so many dierent plant species are being investigated, functional genomics will provide a uniquely broad understanding of functional evolution in plants. Perhaps the greatest challenge will be in developing ways to present and interpret the mountains of data that will be generated. Although we will not know the precise functions of all the genes in any plant species the ultimate goal of plant genomics for a very long time, our level of knowledge will continue to expand at unprecedented rates for the foreseeable future.
References
Devos KM and Gale MD (2000) Genome relationships: the grass model in current research. The Plant Cell 12: 637646. Krysan PJ, Young JC and Sussman MR (1999) T-DNA as an insertional mutagen in Arabidopsis. The Plant Cell 11: 22832290. Martienssen RA (1998) Functional genomics: probing plant gene function and expression with transposons. Proceedings of the National Academy of Sciences of the USA 95: 20212026. Richmond T and Somerville S (2000) Chasing the dream: plant EST microarrays. Current Opinion in Plant Biology 3: 108116. Ruiz MT, Voinnet O and Baulcombe DC (1998) Initiation and maintenance of virus-induced gene silencing. The Plant Cell 10: 937 946. Weigel D, Ahn JH, Blazquez MA et al. (2000) Activation tagging in Arabidopsis. Plant Physiology 122: 10031014.

Functional Genomics in Plants PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Functional Genomics in Plants PDF

Uploaded by

Copyright:

Available Formats

Functional Genomics in Plants

Jeffrey L Bennetzen, Purdue University, West Lafayette, Indiana, USA