Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

RNA Methodologies: Laboratory Guide for Isolation and Characterization
RNA Methodologies: Laboratory Guide for Isolation and Characterization
RNA Methodologies: Laboratory Guide for Isolation and Characterization
Ebook1,762 pages33 hours

RNA Methodologies: Laboratory Guide for Isolation and Characterization

Rating: 0 out of 5 stars

()

Read preview

About this ebook

RNA Methodologies, Fifth Edition continues its tradition of excellence in providing the most up-to-date ribonucleic acid lab techniques for seasoned scientists and graduate students alike. This edition features new material on the exploding field of microRNA as well as the methods for the profiling of gene expression, both which have changed considerably in recent years. As a leader in the field, Dr. Farrell provides a wealth of knowledge on the topic of RNA while also giving readers helpful hints from his own personal experience in this subject area. Beginning with the most contemporary, RNA Methodologies, Fifth Edition, presents the essential techniques to use when working with RNA for the experienced practitioner while at the same time providing images and examples to aid the beginner in fully understanding this important branch of molecular biology. The next generation of scientists can look to this work as a guide for ensuring high productivity and highly representative data, as well as best practices in troubleshooting laboratory problems when they arise.

  • Features new material in miRNA, MIQE guidelines, biomarkers, RNA sequencing, digital PCR and more
  • Includes expanded coverage on quantitative PCR techniques, RNAi, bioinformatics, the role of locked nucleic acids, aptamer biology, PCR arrays, and other modern technologies
  • Presents comprehensive, cutting-edge information covering all aspects of working with RNA
  • Builds from basic information on RNA techniques to in-depth protocols to guidance on how to modify and adjust each step of a particular application
  • Presents multiple avenues for addressing the same experimental goals
LanguageEnglish
Release dateAug 11, 2017
ISBN9780128046791
RNA Methodologies: Laboratory Guide for Isolation and Characterization
Author

Robert E. Farrell Jr.

Dr. Robert Farrell is a bench-current scientist who has 35 years of experience working with RNA in the study of transcriptional and posttranscriptional regulation of gene expression in a variety of model systems. He is also experienced in animal cell culture methods. Prior to joining the faculty at Penn State University, he operated a biotech education and service firm, winning the 1998 Small Business Contractor of the Year award from the U.S. Department of Agriculture. He is the recipient of campus- and college-wide awards for excellence in teaching, and has extensive experience running RNA and specialized biotechnology hands-on laboratory training programs all over the world. He often serves as a consultant within the pharmaceutical and biotech industries. Dr. Farrell received his Ph.D. and M.S. degrees from The Catholic University of America and his B.S. in Biology from Providence College. Dr. Farrell currently serves as the campus academic officer at Penn State York.

Related to RNA Methodologies

Related ebooks

Biology For You

View More

Related articles

Reviews for RNA Methodologies

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    RNA Methodologies - Robert E. Farrell Jr.

    India

    Preface

    RNA Never Ceases to Amaze

    The first edition of RNA Methodologies was published in 1993. At that time, RNA was viewed as an interesting molecule and many molecular biologists were happy if they could do a decent Northern blot. Twenty-five years later, we have at least begun to appreciate that RNA is as diverse in function as it is in form within the society of the cell. With that in mind, a major goal of this book is to tell the RNA story from several perspectives to ensure a holistic understanding of this aspect of molecular biology. The recurrent themes herein are the correct way to isolate, handle, store, and assay RNA, and an appropriate level of background information related to the fundamentals of gene expression is likewise provided.

    Many roles of RNA support the widely acclaimed RNA world hypothesis, in which RNA, and not DNA, protein, or anything else was the primary primordial information molecule. RNA serves as a keeper of genetic information (RNA viral genomes), a transporter of genetic information (mRNA), a guide that leads proteins to specific RNA sequences on other molecules for possible modification (gRNA, siRNA), a powerful posttranscriptional regulator of gene expression (miRNA), the scaffolding of the protein synthesis machinery (rRNA component of ribosomes), a sustainer of translation (tRNA transport of amino acids), and a modifier of itself and other molecules (self-splicing and catalytic RNA). Without a doubt, there are many other functions associated with RNA that have yet to be uncovered. It is safe to say that we are in the midst of a revolution in terms of our understanding of the several faces of RNA.

    Who would have ever imagined?

    In the RNA world, it is all about quality control. It is well known that RNA molecules are examined, repeatedly and systematically, from the onset of transcription, posttranscriptionally, and throughout its biological lifespan, which ends with its dismantling when it has been damaged or is otherwise no longer needed by the cell. To put this into perspective, consider CQI, that is, continuous quality improvement. Successful companies and other institutions often embrace the philosophy of CQI in order to sustain optimized performance. Some of the CQI strategies that the cell has been using from time immemorial are just starting to be understood, and it is truly mind-boggling. The multiple quality control checkpoints associated with the production of mRNA ensure that only the highest fidelity, error-free template material is available to the ribosomes for protein synthesis.

    In high school in the mid-1970s, this Author learned about the three, and only three, types of RNA known at that time, to wit, mRNA, tRNA, and of course rRNA; at least one long, noncoding RNA (lncRNA) was known back then! In the present day, numerous functional lncRNAs have been described, not to mention their all-important smaller miRNA cousins. In many pathological states, miRNA expression patterns are altered, leading to detrimental changes to cellular morphology and cellular physiology. Retrospectively, it is amazing that miRNAs remained unknown for as long as they did. Perhaps the major reason is the fact that RNA isolation protocols, particularly in the 1980s and early 1990s, did not favor the efficient recovery of small transcripts. This was also true of the first generation of molecular biology kits. With the contemporary tools now at hand, new transcript species are being identified continuously. The number of known miRNAs in human cells is already in the thousands and these small, powerful transcripts are critical modulators of the flow of genetic information.

    Many functions that RNA molecules are able to perform are a direct result of the single-stranded character of polyribonucleotides. Consider, for example, the presence of regulatory structures formed by some RNA molecules such as stems, loops, and hairpins, and compare these structures to regulatory sequences such as AUG, UAG, and AAUAAA, which influence translation and various posttranscriptional facets of RNA biogenesis. These complementary properties are inherent to RNA because of its amazing ability to fold into formidable secondary and tertiary structures, thereby imparting transcript functionalities perhaps as diverse as its very nucleotide sequence.

    Regarding the business at hand, transcriptional profiling is possible only when high-quality RNA is isolated from its biological source, such that it is able to support reverse transcription, hybridization, and downstream applications that include variegated quantitation assays as well as the detection of previously uncharacterized genes, differentially spliced transcripts, and transcripts with multiple start sites. These abilities are of ever increasing importance because of the apparent link between an abnormal abundance of a transcript (coding or noncoding; too high or too low) and a genetic disease. While impressively sensitive methods now exist for measuring transcript abundance, it is just as important to be able to identify polymorphisms within transcripts. For example, alternative splicing imparts an added level of vulnerability to mutations and the disease state. Moreover, serious thought is required outside the box, i.e., the cell, because circulating nucleic acids offer enormous potential as biomarkers. Succinctly, what happens at the RNA level often determines the fate of the cell.

    There have been many wonderful technological advances in the study of RNA since the publication of the previous edition of this book in 2010, and many of those techniques and their applications and limitations are discussed here. My philosophy in the preparation of the fifth edition of RNA Methodologies has been that while technology is great, the fundamentals of working with RNA must be understood because they are the foundation upon which the contemporary methods to which the research community has become accustomed have been built. It may come as something of a surprise to learn that plenty of people continue to use comparatively roughhewn tools such as the time-honored Northern blot, often as means of confirming data gleaned from more sophisticated techniques. Regardless of the method, good laboratory practices (also a quality control system) associated with RNA methodologies are important to know about, particularly when it becomes necessary to troubleshoot (it always does).

    In light of the many advances in the study of RNA, another goal of this book is to unify many of the facets of RNA characterization in a coherent start-to-finish format. One of the difficulties toward the realization of this goal is that the rapid succession of new techniques, and variants thereof, has resulted in confusing technical nomenclature. To make matters worse, not everyone uses the same terminology to describe the same techniques. Regardless of the intended experimental trajectory, the purification of high-quality RNA, what this Author affectionately refers to as eRNA (excellent RNA!), from the biological source is always the starting point. Whether isolated from cell culture or directly from whole tissue, only the meticulous handling of RNA will support experiments that will be used for its study. All of the background information and the updates included herein are appropriate since the RNA novice lacks the historical perspective and frame of reference that more experienced investigators often enjoy.

    This laboratory guide represents a growing collection of tried, tested, and optimized laboratory protocols for the isolation and characterization of eukaryotic RNA, with lesser emphasis on the characterization of prokaryotic transcripts. Another goal of this book is to help the reader develop greater confidence in the laboratory. Consequently, this text is written for the principal investigator, bench scientist, physician, veterinarian, lab technician, graduate student, undergraduate research assistant, and anyone else capable of performing basic research techniques—there is something in it for everyone. This resource is intended to provide a rationale to assist in the decision-making process for individuals at all levels of experience by presenting realistic alternatives for achieving the same experimental goals, and demonstrating how various techniques contribute to the understanding of gene expression and functionality. Many of the incorporated notations and hints are based upon personal experience and pave the way for the expedient recovery of RNA and the most judicious use of resources. It is unfortunate that commonplace unsound tactics for RNA handling and characterization result in wasted resources due to an obvious failure to understand the what and the why from the onset of the study. The best advice that I can offer: always think two steps ahead in an experiment, and reflect upon how the method of RNA isolation and the ensuing protocols will impact the interpretation of data.

    While it is hoped that this text be studied from cover to cover, one may pick and choose salient protocols without loss of continuity. Collectively, the chapters work together to embellish the RNA story, each presenting clear take-home lessons. The liberal incorporation of flow charts, tables, and representative data likewise facilitate learning and assist in the planning and implementation phases of a project. You are limited only by your own ingenuity.

    * * *

    The Author acknowledges, with sincere thanks and appreciation, the intellectual encouragement of the many colleagues and friends who, in some way, supported the preparation of this manuscript. The support and patience of the Author’s family are also gratefully acknowledged and are very much appreciated.

    Initium sapientiae timor Domini

    Chapter 1

    RNA and the Cellular Biochemistry Revisited

    Abstract

    This chapter focuses on the forms and functions of ribonucleic acid (RNA) and its transcription. RNA is a long, unbranched polymer of ribonucleoside monophosphate moieties joined together by phosphodiester linkages, and both eukaryotic and prokaryotic RNAs are essentially single-stranded molecules. RNA molecules are produced by the process of transcription, the process by which a single-stranded RNA molecule is synthesized from a specific chromosome locus. There are important organizational differences in associated with genes and the ensuing RNA molecule that results from the transcription when comparing prokaryotic and eukaryotic organisms. The synthesis of RNA is mediated by the activity of enzymes known as RNA polymerases. These transcripts observed within a cell are traditionally classified as ribosomal RNA (rRNA), transfer RNA (tRNA), heterogeneous nuclear RNA (hnRNA), or messenger RNA (mRNA). Important new classes of RNA have been discovered, including microRNA (miRNA), circular RNA (circRNA), and noncoding RNA (ncRNA). The mRNA subpopulation drives the phenotype of the cell, although it is the least abundant of all transcript types and the expression of which is tightly coupled to the expression of other regulatory transcripts.

    Keywords

    Transcriptome; gene expression; miRNA; circRNA; operon; splicing; ncRNA; RNA polymerase; transcriptional regulation; posttranscriptional regulation; polynucleotide; transcription

    Why Study RNA?

    All cell and tissue functions are ultimately governed by gene expression. Consequently, the reasons for electing to study the modulation of RNA levels as at least one parameter of the cellular biochemistry may be as diverse as the intracellular RNA population itself. Generally speaking, the characterization of RNA is almost always related to transcription, i.e., gene expression questions being asked in the context of a particular scientific inquiry, and most often revolves around measuring the dynamic abundance level of one or several transcripts.

    The goals in any experimental design involving RNA generally revolve around one or more fundamental themes, including but not limited to the following:

    1. Measurement of the steady-state abundance of cellular transcripts. Steady-state RNA refers to the net accumulation of transcription products in the cell, or in a subcellular compartment such as the nucleus or the cytoplasm. It is the combined result of RNA synthesis, stability, and degradation. This is the most common reason why RNA is isolated from cells and tissues. Analysis may focus on one transcript, a few transcripts, or all transcripts simultaneously; this latter approach is commonly known as global analysis of gene expression or whole transcriptome profiling. Given the ease with which RNA can be purified from biological sources, the use of various sensitive, contemporary approaches is widespread for generating quantitative and qualitative profiles of RNA populations using any of a variety of laboratory techniques.

    2. Synthesis of complementary DNA (cDNA). Unstable, single-stranded messenger RNA (mRNA) can serve as the template for the in vitro synthesis of very stable single- or double-stranded cDNA molecules. This is the first step for subsequent amplification by the polymerase chain reaction (PCR), often for some quantitative purpose, for transcript mapping purposes, for direct ligation into a vector for sequencing or for expression of the encoded protein, for the physical separation of two or more cDNA species, or for the older strategy of synthesizing an entire cDNA library (older literature occasionally refers to a cDNA library as a clone bank.) which can be propagated for long-term storage and analysis. In any event, the construction of cDNA is the creation of a permanent biochemical record of the cell at the moment of cellular disruption. Historically, the synthesis of highly representative cDNA is one of the most important methodologies in the molecular biology laboratory and, in some hands, remains a significant challenge.

    3. Detection of viruses which harbor an RNA genome. This proceeds via the synthesis of cDNA, as described above, followed by PCR or another cDNA amplification method.

    4. Identification of the transcription start site (TSS). Historically, mapping of RNA molecules, including the 5′ end, the 3′ end, and the size and location of introns, was accomplished via nuclease protection assay, as described in Chapter 18, Quantification of Specific mRNAs by Nuclease Protection. Now, however, transcript mapping is now almost always performed by some variant of 5′- or 3′-rapid amplification of cDNA ends (RACE; see Chapter 8: RT-PCR: A Science and an Art Form). As it is well known that a single genetic locus often has the potential to produce multiple RNAs, each with a different TSS and often in a tissue-specific manner, TSS mapping is an invaluable technique.

    5. Measurement of the rate of transcription of gene sequences or the pathways of RNA processing. This may be deduced, at least in part, by the nuclear run-on assay in which radiolabeled ribonucleotide precursors are incorporated into nascent transcripts in direct proportion to the abundance of each species of RNA being transcribed (see Chapter 19: Analysis of Nuclear RNA). When used in conjunction with other methods that examine steady-state RNA levels, the regulation of genes can often be assigned as transcriptional or due to posttranscriptional events.

    6. In vitro translation of purified mRNA. The resulting polypeptide may be further characterized by immunoprecipitation or Western analysis. Cell-free translation represents an older method for the identification of specific transcripts: by providing the raw materials needed to support translation, one is able to demonstrate that a transcript of putative identity is able to support the synthesis of the cognate peptide. For example, this approach could be used to demonstrate that two transcripts from the same genetic locus with alternative TSSs are, in fact, able to direct the synthesis of identical or closely related proteins. In applications such as rational drug design, in vitro translation is helpful because understanding the three-dimensional architecture of a protein, and its wild type or mutated function(s), may suggest novel applications in the area of functional proteomics.

    What is RNA?

    RNA is a long, unbranched polymer of ribonucleoside monophosphate moieties joined together by phosphodiester linkages. Both eukaryotic and prokaryotic RNAs are single-stranded molecules. The unlinked monomer building blocks of both RNA and DNA are known generically as nucleotides. Each nucleotide consists of three key components: a pentose (five-carbon sugar), at least one phosphate group (nucleotides may contain as many as three phosphate groups), and a nitrogenous base (Fig. 1.1). A nitrogenous base joined to a pentose sugar is known as a nucleoside. When a phosphate group is added, the composite, a phosphate ester of the nucleoside, is known as a nucleotide.

    Figure 1.1 The identity of a nucleotide is defined by the base that is attached to the 1′ carbon. In practice, the nucleotides that make up an RNA or a DNA molecule are represented by the standard one-letter abbreviation for the base each contains: adenine (A), cytosine (C), guanine (G), thymine (T), and uracil (U).

    The components of RNA and DNA nucleosides and nucleotides are compared in Table 1.1.

    Table 1.1

    Comparative Nucleotide Structure

    The key chemical difference between RNA and DNA is the presence the five-carbon sugar ribose, in which a hydroxyl group (–OH) is joined to the 2′ carbon of the ribose sugar, in the case of RNA; the absence of the 2′ OH group in DNA is the underlying basis of the name of the sugar deoxyribose. In addition, the base uracil is found in RNA, substituted in DNA by the closely pyrimidine thymine (Chemically, thymine is 5-methyluracil), though it is possible to find deoxynucleotides containing uracil in certain situations. More precisely, RNA is assembled from ribonucleotide precursors and DNA is assembled from deoxyribonucleotide precursors. Hence, RNA is so-named because of the ribose sugar it contains, just as DNA is named from its constituent 2′-deoxyribose sugar. Essential base, nucleoside, and nucleotide nomenclature is summarized in Table 1.2.

    Table 1.2

    Essential Base, Nucleoside, and Nucleotide Nomenclature

    Nucleosides consist of a base and a sugar only. A nucleoside is promoted to a nucleotide upon addition of at least one phosphate group.

    Nitrogenous bases and the pentose sugar components of nucleosides are both cyclic. By convention, the numbering system for the carbon and nitrogen atoms that make up the bases is 1, 2, 3, and so forth, while the numbering system for the constituent carbon atoms of the sugar (ribose or deoxyribose) is 1′, 2′, 3′, 4′, and 5′. The purpose of this nomenclature is to avoid confusion when referring to the constituent atoms of the sugar versus those found in the base of a particular nucleotide or nucleoside.

    The ribonucleoside triphosphates are collectively referred to as NTP; in various molecular biology protocols, the symbol NTP refers to an equimolar cocktail of ATP, CTP, GTP, and UTP. Similarly, the deoxy-form of a nucleotide is denoted by the placement of a lower case d preceding the nucleotide triphosphate, as in dATP, dCTP, dGTP, and dTTP, and the symbol dNTP (also, dXTP) refers to an equimolar cocktail of the four deoxynucleoside triphosphates in protocols capable of supporting the synthesis of cDNA or PCR products. It is the triphosphate form of a nucleotide that is utilized as a precursor during nucleic acid synthesis. The phosphate nearest to the sugar is known as the α phosphate, followed by the β phosphate, followed by the γ phosphate, which is furthest from the nucleoside moiety (Fig. 1.2). During nucleic acid polymerization, the β and γ phosphates (PPi; inorganic phosphate) are cleaved (released) from the nucleotide, and the resulting single-phosphate nucleotide, a nucleoside monophosphate, is then incorporated into the nascent polynucleotide chain.

    Figure 1.2 Adenosine-5′-triphosphate (ATP). The three constituent phosphate groups are designated α, β, and γ based on the proximity of each group to the nucleoside (base + sugar) component of the molecule. The replacement of the 2′-OH with H would convert this molecule to a deoxynucleotide.

    Polynucleotide Synthesis

    Any enzyme with an associated polymerase activity is capable of synthesizing nucleic acid molecules from nucleotide precursors. The synthesis of RNA is mediated by the activity of enzymes known as RNA polymerases while DNA is synthesized, not unexpectedly, by DNA polymerases. A nucleic acid molecule is the result of linking nucleotides together by phosphodiester bonds. The formation of these bonds involves the hydrophilic attack by the 3′ hydroxyl group of the last nucleotide added to the nascent polynucleotide on the 5′ phosphate group of the incoming nucleotide (Fig. 1.3). For this reason nucleic acid synthesis is said to proceed 5′→3′, and there are no known exceptions to this process.

    Figure 1.3 The dinucleotide that results from the formation of the first phosphodiester linkage has structurally different ends, namely a phosphate group at the 5′ end and a hydroxyl group at the 3′ end. The structural differences at the 5′- and 3′-ends are maintained regardless of the number of nucleotides that are joined together.

    In order for the synthesis of nucleic acids to occur in vivo or in vitro, there are two fundamental requirements that must be fulfilled and maintained to initiate and to support continued nucleic acid polymerization:

    1. There must be a template (a strand or an oligomer) to direct the polymerase-mediated insertion of the correct (complementary) nucleotide into the nascent chain (DNA polymerases capable of adding nucleotides without template information are said to exhibit rare terminal transferase activity, as in the case of the unusual enzyme terminal deoxynucleotidyl transferase. This enzyme has broad applications in the area of cDNA synthesis as well as certain forms of 5′-RACE. These special cases are discussed in detail in Chapter 7, cDNA: A Permanent Biochemical Record of the Cell, and Chapter 8, RT-PCR: A Science and an Art Form.). This occurs predictably, according to the conventions set down in Chargaff’s Rule (Zamenhof et al., 1952), which succinctly states that adenine ordinarily base pairs with thymine or uracil through the formation of two hydrogen bonds (A::T, A::U) and that guanine ordinarily base pairs to cytosine through three hydrogen bonds (G:::C).

    2. For initiation and elongation, there must be a free 3′-OH to which the next nucleotide in the chain can be joined via a phosphodiester linkage. Thus, the entire process of transcription requires some type of primer manifesting the requisite 3′-OH. This applies equally to RNA and DNA synthesis, both in vivo and in vitro. Most of the enzymes used in molecular biology that exhibit polymerase activity have nearly identical template and 3′-OH primer requirements.

    This results in a polynucleotide with a consistent pattern of 5′→3′ linkages between adjacent nucleotides; elongation is frequently referred to as the 5′→3′ polymerase activity associated with the enzyme. Upon completion, nucleic acid molecules are assembled in such a way that:

    1. The ends of the molecule are structurally different from one another. The first nucleotide of the molecule has an uninvolved 5′ (tri)phosphate, constituting the so-called 5′ end of the molecule. The last nucleotide that was added exhibits a free 3′ hydroxyl group, and this is known as the 3′ end of the molecule.

    2. The backbone of the molecule consists of an alternating series of sugar and phosphate groups. Known as the phosphodiester backbone, or simply the backbone, of the molecule, it imparts a net negative charge to the molecule by virtue of its constituent phosphate groups.

    3. The base associated with each nucleotide protrudes away from the backbone of the molecule. This stereochemistry makes the bases very accessible for hydrogen bonding (base pairing) to a complementary polynucleotide sequence. This proclivity is at the very heart of molecular hybridization in the laboratory.

    The nitrogenous bases found in nucleotides are categorized as either purines (adenine and guanine) or pyrimidines (cytosine, thymine, and uracil), both of which are flat aromatic molecules. The specificity of base pairing (purine with pyrimidine) is maintained by the stereochemical preferences of the bases listed here. In other words, what is commonly known as Watson–Crick base pairing is predicated on the bases involved being in their preferred tautomeric forms.

    Hydrogen bonds, which are highly directional, form between complementary bases when an electropositive hydrogen atom is attracted to an electronegative atom such as oxygen or nitrogen. Because of the manner in which bases protrude from their respective phosphodiester backbones, antiparallel base pairing or hybridization of complementary strands is strongly favored. Thus, the 5′ end of one strand is opposite the 3′ end of the complementary strand to which it is base-paired and often represented as shown in the following graph:

    This is true for all double-stranded molecules: dsDNA, dsRNA, and DNA:RNA hybrids. The ability to promote, or to prevent, base pairing in this manner is a central act in the molecular biology laboratory.

    The obvious structural differences at the 5′ and 3′ ends of a molecule support a convention by which one may unambiguously refer to the position of any feature of a nucleic acid molecule in relation to any other feature:

    Upstream means that a structure or feature is closer to or in the direction of the 5′ end of the molecule, relative to some other point of reference; it can also mean in the opposite direction of gene expression.

    Downstream means that a structure or feature is closer to or in the direction of the 3′ end of the molecule, relative to some other point of reference; it can also mean in the direction of gene expression.

    For the sake of simplicity, upstream and downstream are most often used to mean in the opposite direction of expression and in the direction of expression, respectively. This nomenclature may be especially useful when describing features or regions of a double-stranded nucleic acid molecule, in discussions pertaining to either the structure or the expression of a gene and, in particular, for the purpose of primer design to support PCR (see Chapter 8: RT-PCR: A Science and an Art Form).

    The actual base sequence, i.e., the linear order of ribonucleotides, is known as the primary (1°) structure of an RNA molecule, and this order is dictated by the order of nucleotides on the DNA template strand. There is a tremendous proclivity for a single RNA molecule to exhibit intramolecular base pairing to occur, resulting in what is known as secondary (2°) structure. The variety of possible interactions within the phosphodiester backbone are often described using such colorful nomenclature as RNA hairpins, stems, interior loops, bulge loops, multibranched loops, kissing loops, cruciform structures, and pseudoknots (Fig. 1.4). Higher-order three-dimensional folding, the so-called tertiary (3°) structure which RNA molecules exhibit, is best described as the collection of 2° structural elements arranged in such a way that an RNA molecule is able to perform its biological function. Much has been suggested, for example, about the role of folding by careful study of transfer RNAs, the classical example of intramolecular base pairing par excellence. It is important to note that some of the 2° and 3° structures of tRNA are attributed to the formation of noncanonical base pairs. The canonical base pairs are G·C, A·T, and A·U; examples of noncanonical base pairs include G·U, A·C, A·G, C·U, U·U, G·G, A·Ψ (Ψ=pseudouridine), G·Ψ, A·A·U trimers, and others. An excellent database containing known noncanonical base pairs involving RNA is maintained by Dr. George Fox at http://prion.bchs.uh.edu/bp_type/ (Nagaswamy et al., 2000, 2002). Contemporary studies have demonstrated that mRNA also assumes varying degrees of transient 2° and 3° structures which, in no small measure, influence its function in the cytoplasm. For most laboratory applications, higher-order folding must be disrupted, as described below, before an assay with a quantitative component can be performed using an RNA sample. Failure to do so generally has a severe negative impact on accurate quantitative profiling of the sample.

    Figure 1.4 Examples of secondary structure commonly observed in single-strand RNA molecules. Note how a single molecule is able to exhibit intramolecular base pairing by the antiparallel juxtaposition of complementary regions. Double-stranded regions may be perfectly or imperfectly base-paired. To a large extent, the variety and locations of stems, hairpins, and loops will influence the ensuing tertiary structure.

    Types of RNA

    Transcription results in the production of RNA molecules, generically referred to as transcripts. In the past, cellular transcripts were broadly classified as ribosomal RNA (rRNA), transfer RNA (tRNA), heterogeneous nuclear RNA (hnRNA), or messenger RNA (mRNA), as well as a collection of small RNAs of previously unknown function. Now, however, one must include the very diverse population of noncoding RNA (ncRNA), all of which are of immense interest in the study of the regulation of gene expression (Table 1.3). Each category of RNA, which in eukaryotic cells is synthesized by a different type of RNA polymerase, performs a different function in the cell. In contrast, all transcripts in bacteria are produced by a single type of RNA polymerase. The various types of RNA are not represented in equal amounts—the abundance of each is directly related to the physiology of the cell.

    Table 1.3

    RNA Types and Functions

    rRNA is the most abundant RNA component in the cell. In prokaryotic cells the major rRNA species are the 23S rRNA, 16S rRNA, and 5S rRNA. The eukaryotic counterparts are identified as the 28S rRNA, 18S rRNA, and 5S rRNA, as well as a fourth ribosomal transcript, the 5.8S rRNA. These molecules form the scaffolding of ribosomes, which become translationally competent when decorated with myriad ribosomal proteins. At present there are 55 known prokaryotic ribosomal proteins and 82 known eukaryotic (mammalian) ribosomal proteins. Not all ribosomes are functional at any given time, and the existence of a pool of transiently inactive ribosomes is itself a regulator of gene expression. The super abundance of rRNA in a purified RNA sample is often used as both an RNA mass loading control (see Chapter 9: Quantitative PCR Techniques) as well as internal electrophoresis molecular weight markers (see Chapter 13: Electrophoresis of RNA).

    tRNA is responsible for the transportation of amino acids to the ribosome to support protein synthesis. Amino acid molecules are small, ordinarily ranging from 74 to 95 nts. When shuttling an amino acid covalently linked to its 3′ end, a tRNA is said to be charged. Placement of the correct amino acid into the nascent polypeptide depends on recognition of the mRNA codon (a group of three nucleotides) within the coding region of mRNA by a complementary trinucleotide motif carried on one arm of the tRNA known as the anticodon. The tRNA anticodon base pairs to the mRNA codon within the ribosome, thereby supporting protein elongation (for review, see Krebs et al., 2012). While neither as large nor as abundant as rRNA, the smaller tRNA species play a central role in translation.

    mRNA is the most diverse of all the transcripts. Ironically, even though mRNA is by far the least abundant of all transcript types, it is the mRNA that drives the phenotype of the cell. mRNA alone directs the synthesis of proteins through the use of the cellular translation apparatus. There is wide variation in the number and abundance of RNA species in the cell; the abundance of specific type of RNA is subject to dramatic change as the demands on the cell change. Some mRNAs are present in hundreds of copies per cell while others are present only a few copies per cell; this aspect of the RNA profile of the cell can be problematic because very low abundance transcripts are sometimes difficult to detect even with sensitive contemporary techniques.

    Transcription and the Central Dogma

    According to the central dogma (Crick, 1957) of molecular biology, the expression of hereditary information flows from genomic sequences (DNA), through an mRNA intermediate, to ultimate phenotypic manifestation in the form of a functional polypeptide (Fig. 1.5). Whereas this design mirrors what occurs naturally in both prokaryotic and eukaryotic cells, certain violations have been observed in nature: (1) accompanying the discovery of the retroviral enzyme reverse transcriptase (RNA-dependent DNA polymerase) (Baltimore, 1970; Temin and Mizutani, 1970), by which RNA may serve as the template for the synthesis of DNA, and (2) the discovery of RNA editing (Benne et al., 1986; reviewed by Nishikura, 2010), in which a transcribed sequence is subject to alteration.

    Figure 1.5 The central dogma of molecular biology. The process of transcription produces mRNA while the process of translation produces protein. Replication, the process by which DNA is duplicated, occurs during S phase in the eukaryotic cell cycle. cDNA, in contrast, is not found in the cell but is synthesized in vitro and is commonly used to measure transcriptional activity or to assay for the presence of an RNA virus.

    Transcription is that process by which a single-stranded RNA molecule is synthesized at a specific chromosome locus; this is the first of several steps in what is commonly referred to as RNA biogenesis. Transcription occurs in the nucleus (and mitochondria and chloroplasts) of eukaryotic cells, and in the common cellular compartment in prokaryotic cells. All phases of transcription are subject to variation and are potential control points in the regulation of gene expression. A transcriptional unit is best thought of as a DNA sequence that manifests appropriate signals for the initiation and termination of transcription and is capable of supporting the synthesis of a primary RNA transcript. The process of transcription is so-named because the transfer of information from DNA to RNA is in the same language, namely the language of nucleic acids. In contrast, the process known as translation is so-named because nucleic acid instructions in the form of mRNA are used to direct the assembly of a primary polypeptide from amino acid precursors: the nucleic acid instructions are executed in (translated to) the language of proteins. The ribosome is the organelle of polypeptide synthesis in all cells, and each ribosome independently directs the sequential linkage of amino acids as the associated mRNA is interpreted. Upon completion of translation eukaryotic proteins are typically modified, sorted, packaged, and directed to their proper subcellular location as they move through the endomembrane system, of which the endoplasmic reticulum and the Golgi apparatus are key components; prokaryotic and other eukaryotic proteins are often under the influence of various small cytoplasmic RNA (scRNA) species that guide them to their proper destination. As with RNA, and to a lesser extent DNA, proteins exhibit a marked capacity for higher-order folding (Table 1.4). As with RNA, the functionality of a protein molecule is associated with its shape. Unlike RNA, however, in which the shape of the molecule is naturally dynamic, the distortion of the tertiary (3°) or quaternary (4°) structure of protein is associated with immediate loss of function.

    Table 1.4

    Higher-Order Folding of Nucleic Acids and Protein

    The primary structure of nucleic acids and proteins is the order of monomers. The secondary structure of a molecule is the first level of folding that occurs as a consequence of its primary structure. The tertiary structure is the three-dimensional arrangement of atoms within the molecule. The quaternary structure of a molecule, when it forms, is higher-order folding the results from interaction of the molecule with one more identical or nonidentical molecules.

    For DNAzyme review, see Hollenstein, M. (2015). DNA catalysis: the chemical repertoire of DNAzymes. Molecules 20, 20777–20804.

    Promoters, Transcription Factors, and Regulatory Elements

    Transcription is mediated by enzymes known as RNA polymerases. These enzymes, in conjunction with myriad proteins known as transcription factors, recognize very specific and highly conserved promoter, or initiation, sequences within the enormous complexity of genomic DNA. Promoters are spatially associated with the structural portion (body) of a gene (Fig. 1.6) and consist of several recognizable upstream nucleotide sequence motifs. These sequences are known as consensus sequences, a term used to describe the most commonly observed pattern of nucleotides at a particular location. For example, the symbol T80A95T45A60A50T96 indicates that thymine is the first base associated with this consensus motif 80% of the time, and so forth. The exact sequence and precise geometry of these regulatory elements can either promote or prevent the onset of transcription, and do so with varying degrees of efficiency.

    Figure 1.6 Genes, some of which encode mRNA which, in turn, encode proteins, are under the direct influence of a regulatory element known as a promoter.

    Any promoter component that is located 5′, or upstream, from the TSS is indicated with a minus sign in front of the actual nucleotide distance from the TSS. By convention, the first transcribed nucleotide is designated as +1, and any other nucleotides or features located 3′, or downstream, from the TSS are likewise designated with a plus sign placed in front of the actual nucleotide distance. Knowledge of promoter consensus sequence function is due largely to experiments involving standard DNA cloning techniques, site-directed mutagenesis, DNA sequencing, and in silico analysis.

    In prokaryotic systems, the essential elements of the promoter region include the so-called −10 hexamer sequence, formerly known as the Pribnow box (or the Pribnow-Schaller box), consisting of the consensus sequence T80A95T45A60A50T96, and another conserved region located further upstream is known as the −35 sequence (T82T84G78A65C54A45). In some organisms, an AT-rich domain (the UP element) is also observed further upstream. The spacing between the −10 sequence and the −35 sequence is tightly regulated, with 17 base pairs being optimal, and variations in the length of the region between these two elements can reduce the efficiency of the promoter.

    In eukaryotic cells, promoters associated with nuclear genes are variable in structure; these variations are due to the presence of multiple nuclear RNA polymerases as well as the requisite transcription factor initiation complex that must form. Transcription factors are small proteins that are continually binding to and altering the shape of the chromatin. The remodeling of chromatin in the promoter locale is characterized by changes in the association between genomic DNA and the histone proteins which decorate it. Best thought of as a type of histone displacement, the objective is to facilitate access to the gene promoter by altering the local architecture of the chromatin. This is an ATP-dependent process. Transient covalent modifications to histone proteins include acetylation, methylation, and phosphorylation. Generally speaking, histone acetylation is associated with the activation of transcription, while methylation commonly correlates with gene silencing. The net result is the activation, or silencing, of various subsets of genes in a temporal or environmentally induced manner.

    Interestingly, promoters recognized by RNA polymerase II, the enzyme responsible for the synthesis of mRNA (discussed below), often display similar sequence homology with prokaryotic gene promoters (Fig. 1.7). The eukaryotic promoter counterpart is known as the TATA box, formerly known as the Hogness box, and so-named because of the prevalence of the highly conserved TATAA motif. Point mutations involving any of these five bases strongly downregulate the function of that promoter. Another promoter component, the transcription initiation factor TFIIBrecognition element (BRE), is directly adjacent to and upstream from the TATA box. The function of this heptanucleotide motif (often, GGGCGCC) is to attract TFIIB, a key element in the assembly of the transcription apparatus associated with RNA polymerase II. While at one time it was thought that all eukaryotic promoters manifest a TATA box, this is now known to be untrue. Instead, these rather prevalent TATA-less promoters are typically characterized by an initiator region (INR) and a downstream promoter element (DPE), which is observed approximately 30 base pairs downstream (+30) from the TSS. The motif ten element (MTE) exclusively maps to +18 through +27 and is located downstream from INR and immediately upstream from the DPE. At least one function associated with the MTE is its ability to act in place of an absent TATA box. In addition to the TATA, DPE, and MTE promoter structural components, another promoter motif is the CAAT box, found in several but not all promoters, and so-named because of the conservation of its sequence. When present in eukaryotic promoters, the TATA box is usually centered at −30 and the CAAT box appears around −75, though the CAAT box has been shown to function quite effectively much further upstream, and even in reverse orientation. These elements appear to control initial binding of the RNA polymerase and promoter efficiency, respectively. Another frequently observed promoter element is the sequence (GGGCGG)n, known as the G-box element or simply as the GC box. Present in one or more copies, this GC-rich region is generally observed between −90 and −120 within the promoter region. Interestingly, it appears that there is no one component or organization that is shared by all promoters, though the particular permutation of promoter elements and distances between them is recognizable as a transcription initiation regulator. Succinctly, by comparison with transcription in prokaryotic cells, the elaborate initiation of eukaryotic transcription requires the presence of numerous transcription factors, coactivators, and transcription activator proteins that bind to these cis-acting components which, collectively, make up a promoter. The widely accepted role of early transcription factor binding to gene promoters is to recruit RNA polymerase to that site so as to ultimately initiate transcription. Rather than being thought of as merely an on–off switch associated with a particular gene, a promoter functions more like a thermostat that increases (upregulates) and decreases (downregulates) the expression of a gene in response to the prevailing local conditions acting upon a cell.

    Figure 1.7 Generalized structure of a eukaryotic gene promoter. See text for details.

    Eukaryotic promoters do not always function alone. Transcription in eukaryotic cells can be influenced profoundly by the presence of a regulatory element known as an enhancer, the function of which appears to be the stimulation of transcription. First discovered in the early 1980s, the precise location and orientation of an enhancer relative to the gene promoter varies from one gene to the next. Some genes, including those which encode immunoglobulins, carry enhancers within the structural portion of the gene itself. Removal of enhancer sequences can reduce the transcriptional efficiency at a locus normally under the influence of that enhancer sequence, as can the binding of repressor proteins to functionally disparate DNA sequences known as silencers.

    In vitro transcription of genes that are not naturally associated with an enhancer element can be increased significantly if an enhancer is ligated to the DNA construct, usually in no particular orientation, and often hundreds, if not thousands of base pairs away from the TSS. In vivo, a translocation event that brings a promoter and a gene into proximity can result in inappropriate expression of the gene, often with potentially catastrophic consequences, as in the case of Burkitt’s lymphoma (Taub et al., 1982). The transcriptional influence of upstream and downstream enhancer sequences, and antagonistic silencer sequences, on gene promoters is well documented.

    Many enhancers, but not all, have been shown to be transcriptionally active (Djebali et al., 2012; Andersson et al., 2014), producing enhancer RNAs, or simply eRNAs. At present, the number of known eRNAs in human cells is in the tens of thousands and their transcription points to enhancer functionality in terms of promoting expression of the cognate gene (reviewed by Li et al., 2016). These noncoding transcripts are believed to recruit components of the transcription initiation complex; it is possible that RNA polymerase II may track to a transcription promoter by first identifying the enhancer itself. It is also possible that transcription of the intergenic area between the enhancer and the promoter may have a role in chromatin acetylation and ensuing remodeling in order to facilitate transcription initiation at the promoter (Gribnau et al., 2000). The sequential binding of transcription factors and ancillary components in the immediate vicinity of the gene locus ultimately results in the formation of a loop and concomitant spatial juxtaposition of the components of the template DNA needed to support the initiation of transcription. Succinctly, enhancers perform their function by increasing the concentration of transcription activator proteins in the vicinity of the associated promoter.

    During transcription, both strands of the gene being transcribed have different names and different roles. The strand that actually serves as the template upon which RNA is polymerized is properly referred to as the template strand. The other strand, which does not act in a template capacity, is called the coding strand. The coding strand is also known in some circles as the sense strand, while the template strand may be referred to as the antisense strand. The choice of nomenclature is purely a matter of personal preference. When publishing a gene sequence, the convention is to report the sequence of the coding strand, written 5′ to 3′, from left to right. The implication is that the template strand is base-paired to the coding strand and lying antiparallel to it and therefore does not need to be reported. The DNA template strand is so-named because the precise sequence of nucleotides inserted into the nascent RNA transcript is determined by, and complementary to, the template strand nucleotide sequence. It is important to realize that the coding strand and the template strand may switch roles depending upon the placement of transcriptional promoter sequences (Fig. 1.8). One powerful example of this phenomenon in vitro is the cloning of a double-stranded DNA between two different transcription promoters in opposite orientations; often the bacteriophage polymerase promoters SP6, T3, or T7 are selected because of their high efficiency. Constructions such as these are frequently employed to accommodate in vitro transcription of large amounts of sense and/or antisense RNA for use as nucleic acid probes (see Chapter 16: Nucleic Acid Probe Technology) or for RNAi applications (see Chapter 11: RNA Interference and RNA Editing).

    Figure 1.8 Promoters positioned in the opposite orientation relative to a DNA sequence allow the template and coding strands to switch roles during transcription. This arrangement permits the synthesis of +RNA and –RNA from the same DNA construct.

    Gene and Genome Organization Affect Transcription

    In order to understand the significance of the products of transcription, it is first essential to understand the organization of the genes themselves. The typical prokaryotic genome exhibits little extraneous baggage. Frequently, genes that encode proteins associated with a common metabolic pathway are clustered together, as suggested by the operon model (Jacob and Monod, 1961). The lac operon, the gene products of which facilitate the metabolism of lactose as a carbon source in bacteria, is but one extremely well-characterized example. The RNA molecule that results from the transcription of an operon is usually polycistronic, meaning that more than one polypeptide is encoded in a single RNA transcript.

    The coding information within a polycistronic mRNA for each polypeptide is contiguous: there are no interruptions in the coding sequences by noncoding information. This design favors maximum efficiency of energy resource utilization in unicellular organisms.

    In fact, the kinetics of prokaryotic gene expression are so rapid that bacterial mRNA is usually being transcribed, undergoing translation, and being degraded simultaneously. The rapid turnover of RNA in this manner has, in the past, frustrated valiant attempts to clone or otherwise characterize prokaryotic mRNA. While significant improvements favoring the isolation of high quality RNA from both Gram-negative and Gram-positive bacteria have been made, and many of these innovations are available in kit form, the isolation of intact prokaryotic RNA remains something of a challenge in many laboratories.

    In contrast to prokaryotes, nearly all eukaryotic mRNAs are monocistronic. Although a single-polypeptide species results from the translation of a particular monocistronic eukaryotic mRNA molecule, that same mRNA is subject to repeated translation as long as the transcript remains biologically competent and chemically stable. To maximize translation potential, an mRNA transcript is often engaged by several ribosomes that are all involved in simultaneous, orderly translation of that transcript. Such a cluster of ribosomes attached to a single mRNA molecule is known as a polysome (or polyribosome). Polysomes are observed both in prokaryotic and eukaryotic cells, though eukaryotic polysomes, with 7–8 ribosomes per polysome complex, tend to be smaller than their prokaryotic counterparts. Succinctly, a large number of polypeptide molecules can be manufactured from a single RNA molecule. Polysomes are observed free-floating in the cytoplasm, they can be membrane-bound, and sometimes are attached to the cytoskeleton (Lenk et al., 1977; Davies et al., 1991). The entirety of mRNAs so engaged in a cell at any given moment is known as the polysome fraction, which can be used to assess the translational competence of a cell under a defined set of experimental conditions.

    Close examination of eukaryotic genes reveals that for a vast majority of genes there are considerably more nucleotides within a particular locus than are necessary to direct the synthesis of the corresponding polypeptide, that is, the DNA sequence and the amino acid sequence are not colinear over the span of the locus. This size differential can also be observed at the level of the mature mRNA in the cytoplasm, which is usually quite a bit shorter than the DNA sequence from whence it was transcribed. Upon further scrutiny, this discrepancy can be resolved at the level of the organization of the structural portion of the gene itself, the sequences within which fall into one of two categories:

    1. Exons are regions of DNA that are represented in the corresponding mature mRNA. Exons may or may not have a peptide coding function.

    2. Introns are regions of DNA that are transcribed but generally are not represented in the corresponding mRNA. Introns are usually spliced out of the primary RNA transcript (the immediate product of transcription), accompanied by the joining of adjacent exon sequences. The majority of introns do not direct polypeptide synthesis, though there are several noteworthy exceptions (for review, see Farrell and Bassett, 2007).

    The number and the length of exons and introns associated with a gene are highly variable depending upon locus and this variability even pertains to loci that are highly conserved across evolutionary time. By comparison with introns which can be several thousand base pairs in length, exons tend to be rather short, each encoding fewer than 100 amino acids in most organisms. In some cases the high sequence conservation in one or more exons of a gene has been directly responsible for the isolation of a related gene (an ortholog) from a different organism. In some unusual cases, genes lack introns altogether, of which human β-interferon and thrombomodulin are examples.

    The base sequence of the primary RNA transcript correlates precisely with the DNA from which it is derived, meaning that it contains both exon and intron sequences. These primary transcription products are only a precursor to functional mRNA, and are confined to the eukaryotic nucleus where, appropriately, it is collectively known as heterogeneous nuclear RNA (hnRNA) or simply pre-mRNA. hnRNA and specific nuclear proteins that bind to it form rather abundant heterogeneous nuclear ribonucleoprotein complexes (hnRNPs). Similarly, mRNAs exist in the cytoplasm, following intron removal, as messenger ribonucleoprotein (mRNP) complexes after having traversed a nuclear membrane channel. In order to promote unidirectional movement, the combination of proteins associated with the mRNP is changed immediately upon arrival in the cytoplasm. The ensuing remodeling ensures that the mRNP, and the mRNA that it carries, is unable to travel back into the nucleus.

    Introns vary dramatically in number, length, and base sequence and often exhibit multiple translation termination (stop) codons in all reading frames. This is not entirely unexpected because the noncoding nature of introns favors the accumulation of mutations that might otherwise be lethal if they were to occur within an exon or other critical area. Examination of the splice junctions of introns, however, reveals a strict conservation of two dinucleotide consensus sequences (Breathnach and Chambon, 1981; Mount, 1982) contained entirely within the intron. Proceeding from the 5′ end of the RNA, introns are found to begin with a GU dinucleotide (known as the left or donor site) and end with an AG dinucleotide (known as the right or acceptor site) (The so-called GU-AG rule, describing exon–intron splice sites, refers to the RNA sequence. The corresponding DNA coding strand dinucleotide at the 5′ end (beginning) of an intron is GT); while once believed to occur 100% of the time in higher eukaryotes, some exceptions have been noted (Szafranski et al., 2007) and the consensus phenomenon does not apply to yeast mitochondrial or tRNA genes, nor to chloroplast loci (Krebs et al., 2012).

    The nucleotides immediately adjacent to both sides of the GU and AG intron boundaries are also conserved to an extent, typically 60%–80% and a point mutation at a splice site generally results in the inactivation of that site. In some cases, splice-site mutations can result in the production of an aberrant mRNA through the use of an alternative splice site, often located within the intron (Triesman et al., 1982), as in certain β-thalassemic individuals. Knowledge of the high conservation of splice sites is the basis of a method for exon identification known as exon trapping (Duyk et al., 1990; Péterfy et al., 2000) in which a putative exon-containing sequence is cloned into a specialized vector that consists of an intron flanked by two known exons (exon–intron–exon); if an exon is present in the experimental DNA, it will be trapped by ligation into the vector intron and will result in a longer transcript that can be can be detect electrophoretically or by melting curve analysis. The exon trapping method has fallen out of favor due to the low cost and ready availability of cDNA sequencing and an extensive repertoire of tools for in silico analysis.

    The mechanics of intron removal and exon ligation, which occurs cotranscriptionally, i.e., while the RNA polymerase is still active, are mediated in part by a highly conserved family of small nuclear RNAs (snRNA; 100–300 bases). These molecules exist as the RNA–protein complexes, known as U1, U2, U4, U5, and U6, and are confined to the nucleus where they are referred to as small nuclear ribonucleoproteins (snRNPs, or snurps). The snRNPs, along with many other proteins as splicing factors, form enormous complexes known as spliceosomes, which are known to mediate pre-mRNA splicing.

    Closely associated with the capacity for transcript splicing are Cajal bodies (CBs), small organelle-like, punctulate structures in the nucleoplasm that were first observed more than a century ago (Cajal, 1903). Unlike organelles, these structures are nonmembrane bound; they are spherical in appearance, with a typical diameter of 0.5–1.0 μm. CBs are characterized by a high concentration of the protein coilin and are packed with RNA. They are dynamic in that they are visible at certain times and not others, which may be related to differentiation, development, and even progression through the cell cycle. Found in higher eukaryotic plant and animal cells, CBs represent transcriptionally active regions, particularly the histone loci, and are also linked to ribosome biogenesis and telomere upkeep. However, one of the best known functions of CBs is their factory-like role in the assembly of snRNPs associated with mRNA splicing.

    Intron removal has also been demonstrated to have a role in nuclear export of the spliced and matured mRNA. The proteins involved in the splicing mechanism and exon concatenation recruit additional proteins that are specifically required for nuclear egress. Among the proteins in the resulting exon junction complex (EJC) is the ALY/REF export adapter, which binds directly to the RNA, and the TAP-p15 export receptor complex, each of which has a direct role in nuclear pore engagement (reviewed by Grünwald et al., 2011). Once on the cytoplasmic side of a nuclear pore, the mRNA sheds the array of proteins which facilitated its nuclear egress. This ensures unidirectional movement. Improperly spliced or otherwise compromised mRNAs fail to associate with the correct combination of proteins required for nucleocytoplasmic transport, thereby promoting their retention in the nucleus and rapid degradation. Although intron removal and the splicing together of exons in and of itself is not required for transport from the nucleus, since intron-less transcripts move efficiently into the cytoplasm, splicing clearly enhances transport. The export process used for rRNA and other RNAs is less clear.

    Splicing of pre-mRNA molecules also produces some unexpected results. Once believed to be a rare consequence of a spliceosomal machinery error, circular RNAs (circRNAs) are well-documented consequence of a phenomenon known as backsplicing, a process by which the 3′ end of exon "n is joined covalently to the 5′ end of the same exon, i.e., exon n, rather than to the 5′ end of exon n+1". Consisting of one or two exons, the number of known eukaryotic circRNAs is in the thousands. Nearly all circRNAs are exon-encoded sequences; intronic circRNA sequences are almost completely unknown.

    The highest incidence of circRNAs is found in the mammalian brain, with the greatest density of these molecules observed in the synaptic region of nervous tissue cells (Rybak-Wolf et al., 2015). circRNAs are believed to play a role in neuronal differentiation, as their expression is upregulated during brain development (You et al., 2015). In a surprise development, it is known that the number of circRNAs from some genetic loci exceeds the linear counterpart by a factor of as much as 10 (Salzman et al., 2012). There is also speculation that circRNAs may absorb miRNAs (described below and in Chapter 10: miRNA), sequestering them as a means of controlling the expression of specific genes associated with a metabolic of developmental pathway. Similarly, the formation of circRNAs may regulate the concentration of certain types of the more than 1000 known RNA binding proteins by transiently attracting them.

    Since these molecules are the result of splicing events intended to bring exons together, it is reasonable to assume that many gene loci are capable of producing an even greater array of processes transcripts, further diversifying the transcriptome. It is clear, however, that certain exons are greatly favored in the formation of circRNA transcripts; this is probably controlled by the presence of repetitive sequences residing in the flanking introns and trans-acting factors associated with splicing that collaborate to control transcript circularization (Kramer et al., 2015). circRNAs are nonpolyadenylated and localized primarily in the cytoplasm; the lack of a poly(A) tail therefore excludes their identification, characterization, and abundance measurement by RNA-seq (poly A). The nonlinear shape of circRNA imparts great stability to these molecules. The extended half-lives of these molecules presumably allow them perform their intended function(s), whatever they may be, for as long as possible. Even though circRNAs are derived

    Enjoying the preview?
    Page 1 of 1