You are on page 1of 39

BIOMOLECULES (INTRODUCTION, STRUCTURE AND FUNCTIONS)

Nucleic acids
Lecturer, Department of Biotechnology, Integral University, Lucknow, India Professor, Department of Biochemistry, University of Lucknow, Lucknow-226007, India 6-Jun-2006 (Revised 12-Jun-2007) CONTENTS Composition of nucleic acids Generalized structural units of nucleic acids Nucleosides Nucleotides or Nucleoside 5-triphosphates Oligonucleotides Nomenclature of nucleic acids Structural levels of nucleic acids Deoxyribonucleic acid (DNA) Ribonucleic acid (RNA) Structure Types of RNA Messenger RNA (mRNA) Ribosomal RNA (rRNA) Transfer RNA (tRNA) Heterogeneous nuclear RNA (hnRNA)
1

Smita Rastogi 1 & U. N. Dwivedi 2

Key words
Deoxyribose sugar, DNA, mRNA, Purines, Pyrimidines, Ribose sugar, rRNA, tRNA

The nucleic acids are the molecular repositories for genetic information and referred to as the Molecules of Heredity. Although the name nucleic acid suggests their location in the nuclei of cells, yet some of them are, however, also present in the cytoplasm. The nucleic acids are the hereditary determinants of living organisms. They are the macromolecules present in most living cells either in the free state or bound to proteins as nucleoproteins. There are two types of nucleic acids, deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). Both are present in all plants and animals. Viruses also contain nucleic acids, however, unlike a plant or animal has either RNA or DNA, but not both. DNA is found mainly as a component of chromatin material of the cell nucleus whereas most of the RNA (90%) is present in the cell cytoplasm and the remaining (10%) in the nucleolus. Extranuclear DNA also exists, for e.g., in mitochondria and chloroplasts.

Composition of nucleic acids Nucleic acids are biopolymers of high molecular weight with mononucleotide as their repeating units. Each mononucleotide consists of the following: (A) Nitrogenous bases (B) Phosphoric acid (C) Pentose sugars (A) Nitrogenous bases

Two types of major nitrogenous bases, which account for the base composition of DNA or RNA, are found in all nucleic acids. These are: a) Purine bases b) Pyrimidine bases The purine and pyrimidine bases found in nucleic acids are listed in Table 1 and their structures are given in Fig. 1. Table 1: Purine and pyrimidine bases in DNA and RNA Name of base Adenine (A) Guanine (G) Cytosine (C) Purine or pyrimidine Purine Purine Pyrimidine Molecular formula C5H5N5 C5H5ON5 C4H5ON3 Molecular weight (Da) 135.15 151.15 111.12 Properties Found in DNA and / or RNA DNA and RNA DNA and RNA RNA and DNA

White, crystalline Colourless, crystalline White, crystalline, first isolated from Guano (bird manure) White, crystalline, first isolated from thymus tissue White, crystalline

Thymine (T) Uracil (U)

Pyrimidine

C5H6O2N2

126.13

DNA only

Pyrimidine

C4H4O2N2

112.10

RNA only

Nitrogenous bases Pyrimidines


H N H

H N O N H Thymine

CH3
O

N N H Cytosine

In RNA, Uracil is present instead of Thymine

H N O N H Uracil
H N N N N N H Adenine
H N H

H
O

N N N N H Guanine

Fig. 1: Structure of purine and pyrimidine bases Both the purine and pyrimidine bases are planar molecules, owing to their -electron clouds. Purine and pyrimidine bases are hydrophobic and relatively insoluble in water at the near neutral pH of cell. Purines can exist in syn or anti forms; pyrimidines can exist in anti form because of steric interference between the sugar and carbonyl oxygen at C-2 of pyrimidine. Besides, the major nitrogenous bases, some minor bases also called modified nitrogenous bases (purines and pyrimidines) also occur in polynucleotide structures. Some naturally occurring forms of modified purines are hypoxanthine, xanthine, uric acid, 6-methyladenine (6-Me), 6-dimethyladenine (6-DiMe), 6-N-isopentenyladenine (6-IPA), 13

methylguanine (1-MeG), 2-dimethylguanine (2-DiMeG). Among the modified purines, some are found in tRNA (described later). Methylation is the most common form of purine modification in microorganisms. The presence of such methylated purines is also suggested in plant genomes. Some naturally occurring forms of modified pyrimidines (e.g. 5,6-dihydrouracil, pseudouracil, 4-Thiouracil etc.) are common in tRNA (described later). Other examples include 5-methylcytosine (5-MeC) and 5-hydroxymethylcytosine. The 5-methylcytosine is a common component of higher plant and animal DNA. Infact up to 25% of the cytosine residues of plant genome are methylated. The DNA of plants is richer in 5-MeC than the DNA of animals. The DNA of the T-even bacteriophages (T2, T4) of E. coli has no cytosine but instead has 5-hydroxymethylcytosine and its glucoside derivatives.

(B)

Phosphorus

Phosphorus, present in the backbone of nucleic acids, is a constituent of phosphodiester bond that links the two sugar moieties. The molecular formula of phosphoric acid is H3PO4. It contains three monovalent hydroxyl groups and a divalent oxygen atom, all linked to the pentavalent phosphorus atom.

(C)

Sugar

Both DNA and RNA contain five-carbon ketose sugar, i.e. a pentose sugar. The essential difference between DNA and RNA is the type of sugar they contain. RNA contains the sugar D-ribose (hence called ribonucleic acid, RNA) whereas DNA contains its derivatives 2-deoxy-D-ribose, where the 2-hydroxyl group of ribose is replaced by hydrogen (hence called deoxyribonucleic acid, DNA). Sugars are always in closed ring -furanose form in nucleic acids and hence are called furanose sugars because of their similarity to the heterocyclic compound furan. The structure of pentose sugars present in DNA and RNA are shown in Fig. 2.

Sugars

OH

5' CH

2 O

OH 1' CH 2' CH OH

OH

5' CH

2 O

OH 1' CH 2' CH H

HC 4' 3' HC OH

HC 4' 3' HC OH

D-Ribose

2'-Deoxyribose

Fig. 2: Structure of sugars present in nucleic acids

The structural difference in the sugars of DNA and RNA, though minor, confers very different chemical and physical properties upon DNA than RNA. RNA is much stiffer due to steric hindrance and more susceptible to hydrolysis in alkaline conditions, perhaps explaining in part why DNA has emerged as the primary genetic material. Sugar, along with phosphate performs the structural role in nucleic acids.

Generalized structural units of nucleic acids Generalized structural units of nucleic acids are indicated in the Scheme 1.

Components of Nucleic acids:


Deoxyribose (or ribose)
Base

Phosphoric acid

Nucleoside

Nucleotide Polynucleotide or Nucleic acid (DNA / RNA) Scheme 1: Generalized structural units of nucleic acid

(A)

Nucleosides

The nucleosides are compounds in which nitrogenous bases (purines and pyrimidines) are conjugated to the pentose sugars (ribose or deoxyribose) by a -N-glycosidic linkage. These consist of a base joined to a pentose sugar at position C1. The sugar C1 carbon atom is joined to the N1 atom of pyrimidine and the N9 atom of purine. This represents a -Nglycosidic bond. Thus, the purine nucleosides are N-9 glycosides and the pyrimidine nucleosides are N-1 glycosides. These are stable in alkali. The purine nucleosides are readily hydrolyzed by acid whereas pyrimidine nucleosides are hydrolyzed only after prolonged treatment with concentrated acid. The nucleosides are generally named for the particular purine and pyrimidine present. Nucleosides possessing ribose are called ribonucleosides (riboside) and those containing deoxyribose are called deoxyribonucleosides (deoxyriboside). The nomenclature of nucleosides differs from that of the bases.

In case of pseudouridine, which is otherwise identical to uracil, differs in the point of attachment to the ribose to the base. In case of pseudouridine, base is attached to sugar through C5 of base as opposed to that in case of uridine, where the attachment of base to sugar is through N1 (structure described later in section of tRNA). Two nucleoside analogues, 3-azidodeoxythymidine (AZT) and 2, 3-dideoxycytidine (DDC), have found therapeutic use for the treatment of acquired immune deficiency syndrome (AIDS) patients.

(B)

Nucleotides or Nucleoside 5-triphosphates

These are phosphate esters of nucleosides i.e. nucleosides form nucleotides by joining with phosphoric acid. Esterification can occur at any free hydroxyl group, but is most common at the 5 and 3 positions in sugars. The phosphate residues are joined to the sugar ring by a phosphomonoester bond and several phosphate groups can be joined in series by phosphoanhydride bonds. These occur either in the free form or as subunits in nucleic acids. The phosphate is always esterified to the sugar moiety. The trivial names of purine nucleosides end with the suffix sine, and those of pyrimidine nucleosides end with suffix dine. In addition to their role as structural components of nucleic acids, nucleotides also participate in a number of other functions as described below: Energy carriers: Nucleotides represent energy rich compounds that drive metabolic process, especially biosynthetic, in all cells. Hydrolysis of nucleoside triphosphate provides the chemical energy to drive a wide variety of cellular reactions. ATP is the most widely used for this purpose. UTP, GTP, CTP are also used. Nucleoside triphosphate also serves as the activated precursors of DNA and RNA synthesis. The hydrolysis of ester linkage (between ribose and -phosphate) yields about 14 kJ / mol under standard conditions, whereas hydrolysis of each anhydride bond (between - and - phosphates) yields about 30 kJ / mol. ATP hydrolysis often plays an important thermodynamic role in biosynthesis. Enzyme cofactors: Many enzyme cofactors include adenosine in their structure, e.g., NAD, NADP, FAD. Chemical messengers: Some nucleotides act as regulatory molecules and serve as chemical signals or secondary messengers, key links in cellular systems that respond to hormones and other extracellular stimuli and lead to adaptive changes in cells interior. Two hydroxyl groups can be esterified by the same phosphate moiety to generate a cyclic AMP (cAMP, adenosine 3-5 cyclic phosphate) or cyclic GMP (cGMP, guanosine 3-5 cyclic phosphate).

(C)

Oligonucleotides

Oligonucleotides are polymers containing <100 nucleotides. These nucleotides are linked by phosphodiester bond as shown in Fig. 3.

5' CH 2 O HC 4' HC 3' O O P O 5'CH 2 HC 4' 3' HC O O

Base 1' CH 2' CH _ O Phosphodiester bond

Base 1' CH 2' CH

Fig. 3: Phosphodiester linkage The oligonucleotides occur naturally and are used as primers during DNA replication and for various other purposes in the cell. Synthetic oligonucleotides can be made by chemical synthesis and are essential for many lab techniques, e.g., DNA sequencing, PCR, in situ hybridization, nucleic acid probe, nucleic acid hybridization, gene therapy. The polymers containing >100 ribonucleotides or deoxyribonucleotides are called RNA and DNA (nucleic acids), respectively.

Nomenclature of nucleic acids Direction: By convention, single strand of nucleic acid is always written with the 5 end at the left and 3 end at the right i.e. in 5 3 direction. Sugar: In the chemical nomenclature, the carbon atoms of sugars are designated by primed numbers i.e. C-1, C-2, C-3 etc. to avoid confusion with the base numbering system. Base: The various atoms in the bases lack the prime () sign and are designated by the cardinal numbers, i.e. 1, 2, 3 etc. By convention, the N, C, O atom attached directly 7

to ring is numbered 2, 3, 7 etc., but the exocyclic atom (not within the ring structure) is denoted as the atom with ring position as superscript to which it is attached e.g. Amino N attached to C-6 in adenine is N6. Bases are represented by single letters, such as adenine is represented as A, guanine as G, cytosine as C, thymine as T and uracil as U. Nucleosides and nucleotides: While names of nucleosides and nucleotides are generally derived from the corresponding bases, there is one exception to this rule: the base corresponding to the nucleoside called inosine (and the derived nucleotides) is called hypoxanthine. Short hand notation: In short hand notation of nucleotides, phosphate group is symbolized by P, deoxyribose a vertical line from C1 at top to C5 at bottom. The connecting lines between nucleotides, which pass through P, are drawn diagonally from the middle (C3) of deoxyribose of one nucleotide to bottom (C5) of next (Fig. 4).

Short hand notation for pTATGC 5' terminus T A T 3' terminus G C

3' P 5' P

3' P 5'

3' P 5'

3' P 5' 5'

Fig. 4 Short hand notation for a sequence 5TATGC3 The nucleoside and nucleotide derivatives of deoxyribose are distinguished by prefix d. Where clarity is especially important, ribonucleosides and ribonucleotides can similarly be identified with the prefix r, e.g. ATP = rATP. A second short hand notation is used to discriminate between 5 and 3 phosphates, with 5phosphate placed before the base (e.g. pA is adenosine 5-monophosphate) and 3phosphates placed after the base (e.g. Ap is adenosine 3-monophosphate). The deoxy prefix can be omitted from the names of thymidine derivatives because, as a predominantly DNA specific base, it is usually evident that sugar is deoxyribose. However, the full nomenclature is preferred for the sake of convention and because thymine is a minor base in RNA (thymine exists as modified base at various places, most notably in the TC loop of every tRNA; note that thymine is 5-methyl uracil). Where context is obvious, both DNA and RNA sequences are represented as a single series of bases. The ambiguous bases are represented by the single letter representations as shown in Table 2.

Table 2: Ambiguous bases represented by the single letter representations

S. No. 1 2 3 4 5 6 7 8 9 10

Single letter representation R Y K M S W B D H V

Purine / pyrimidine represented by single letter A / G (any PuRine) C / T / U (any pYrimidine) G / T (Keto) A / C (aMino) G / C (strong three bonds) A / T (weak two bonds) G, T, C (i.e. not A) G, T, A (i.e. not C) A, C, T (i.e. not G) A, C, G (i.e. not T)

Structural levels of nucleic acids Nucleic acids possess following structures: (a) Primary structure

The nature, properties and function of the two nucleic acids (DNA and RNA) depend on the exact order of the purine and pyrimidine bases in the molecule. This sequence of specific bases is termed as the primary structure. Thus, primary structure of nucleic acid is its covalent structure and nucleotide sequence. (b) Secondary structure

The term secondary structure relates to regions of regular conformation of the chain, stabilized by regular, repeating interactions (e.g. double helix of DNA). Thus, any regular, stable structure taken up by some or all of the nucleotides in a nucleic acid can be referred to as secondary structure. Nucleic acid secondary structures are generated by two kinds of noncovalent interactions between bases. The secondary structure of DNA is characterized by intermolecular base pairing to generate double stranded or duplex molecules. Watson and Crick base pairs form the basis of secondary structure interactions in nucleic acids as well as explaining Chargaffs rule. Secondary structures in RNA, which exist primarily in single stranded form, generally reflect intramolecular base interactions. Thus, the secondary structures arise due to following interactions: Complementary base pairing: It involves stable and specific configurations of Hbonds between bases in DNA. It is the predominant force causing nucleic acid strands to associate. The molecular basis of Chargaffs rule is complementary base pairing between A-T and between G-C in double stranded DNA. Chargaffs rule was later explained by double helical structure described by Watson and Crick. G:C with three Hbonds are more stable than A:T (or A:U). 9

Base stacking: The structures are stabilized by hydrophobic interactions between adjacent bases brought about by electrons in rings. It is these - interactions, which are described as base stacking forces. Alternative forms of base pairing: Watson-Crick base pairs (A: T and G:C) are predominant in the structure and function of nucleic acids. However, there are 28 possible arrangements of at least two H-bonds between bases, which provide the basis for a diverse set of interactions. The most significant to these alternative configurations are the Hoogsteen base pairs, which contribute to tRNA structure and allow the formation of triple helices. A modification to Watson-Crick base pairs is the Wobble pairs, which allow bases in the 5-anticodon position of tRNA to pair ambiguously with the mRNA. The Wobble base pairs are formed because bases are offset from their normal Watson-Crick positions and one of the H-bonds is lost. Intramolecular base pairing: In RNA and single stranded regions of DNA (nonduplex DNA), secondary structure is determined by intramolecular base pairing. Since cellular DNA is usually present as a duplex, the bases are available for intramolecular interactions only rarely. Conversely intramolecular secondary structures are abundant in cellular RNA and underlie their functional specialization. The major classes of intramolecular nucleic acid secondary structures are bulges, bulge loops, bubbles, hairpins, stem loops, panhandle, cruciform. Lariats are often classified as secondary structures, but because they are formed by the covalent bonds joining nucleotides, they are strictly primary structures. (C) Tertiary structure

The complex folding of large chromosomes within eukaryotic chromatin and bacterial nucleoids is generally considered tertiary structure. Thus, tertiary structures of nucleic acid reflect interactions, which contribute to overall 3D shape. (D) Quaternary structure

In many structures, nucleic acid interacts in trans (e.g. the ribosome and spliceosome) and this may be considered a quaternary level of nucleic acid structure. Nucleic acids also interact with an enormous number of proteins (e.g. genome structural proteins, transcription factors, enzymes, splicing factors). Many of these proteins have a significant effect on DNA or RNA conformation. Interactions with proteins may be general or sequence specific and may involve subtle or overt changes in structure. The restriction enzymes EcoRI and EcoRV, for e.g., both introduce a pronounced kink in the DNA at their recognition sequence which may facilitate their endonucleolytic activity. Proteins of the high mobility group (HMG) class appear specifically to bend DNA in order to facilitate interactions between components bound at distant sites.

Deoxyribonucleic acid (DNA) DNA is the genetic material in all organisms, except few viruses where RNA acts as the genetic material e.g. retroviruses. In prokaryotic cells, DNA occurs in the cytoplasm and is the only component of the chromosome. In eukaryotic cells, DNA is largely confined to the nucleus and is the main component in chromosome. It is combined with simple proteins to form deoxyribonucleoproteins (DNP). A small quantity of DNA also occurs in some cytoplasmic organelle such as mitochondria and chloroplast. This extranuclear DNA is naked as in prokaryotic DNA. The DNA content is fairly constant in all the cells of a given species. Just before cell division, however, the amount of DNA is doubled. The gametes 10

have half the amount of DNA as they contain half the number of chromosomes. The amount of DNA per nucleus is constant in all body cells of a given species. Mirsky and Vendrely estimated that there is some 6 x 10-9 mg of DNA per nucleus in diploid somatic cells of mammals and 3 x 10-9 mg of DNA per nucleus in haploid gametes (eggs and sperms). (A) Evidence that DNA is the genetic information carrier

In 1928, Frederick Griffith made a startling discovery. He injected mice with a mixture of live R and heat-killed S pneumococci. The virulent (disease causing) form of the pneumococcus (Diplococcus pneumoniae), a bacterium that causes pneumonia, is encapsulated by a gelatinous polysaccharide coating that contains the binding sites (known as O-antigens) through which it recognizes the cells it infects. Mutant pneumococci that lack this coating, because of a defect in an enzyme involved in its formation, are not pathogenic. The virulent (pathogenic) and non-virulent (non-pathogenic) pneumococci are known as the S and R forms, respectively, because of the smooth and rough appearances of their colonies in culture. This experiment resulted in the death of most of the mice. More surprisingly yet was that the blood of the dead mice contained live S pneumococci. The dead S pneumococci initially injected into the mice had somehow transformed the otherwise innocuous R pneumococci to the virulent S form. Furthermore, the progeny of the transformed pneumococci were also S; the transformation was permanent. Eventually, it was shown that transformation could also be made in vitro by mixing R cells with a cellfree extract of S cells. This experiment could not, however, explain that DNA is the transforming principle. The experimental results are depicted in Fig. 5. DNA is Genetic Material

Tissue analyzed

Living S recovered Fig. 5: Experimental evidence to establish that DNA is the genetic material 11

In 1944, Ostwald Avery, Colin MacLeod and Maclyn McCarty, after a 10-year investigation, extended Griffiths experiment and reported that transforming material is DNA. The conclusion was based on the observation that the laboriously purified transforming material had all the physical and chemical properties of DNA, contained no detectable protein, was unaffected by enzymes that catalyze the hydrolysis of proteins and RNA, and was totally inactivated by treatment with an enzyme that catalyzes the hydrolysis of DNA. DNA must therefore be the carrier of genetic information. In 1952, Alfred Hershey and Martha Chase performed Blender experiment to demonstrate that DNA is genetic material in bacteriophage. Bacteriophage T2 was grown on E. coli in a medium containing the radioactive isotopes 32P and 35S. They labeled the phage capsid, which contains no P, with 35S, and its DNA, which contains no S, with 32P. These phages are added to an unlabeled culture of E. coli. After sufficient time allowed for the phages to infect the bacterial cells, the culture was agitated in a blender so as to shear the phage heads from the bacterial cells. This rough treatment neither injured the bacteria nor ghosts were separated from the bacteria (by centrifugation), the ghosts were found to contain most of the 35 S, whereas the bacteria contained most of the 32P. Furthermore, 30% of the 32P appeared in the progeny phages but only 1% of the 35S did so. Hershey and Chase therefore concluded that only the phage DNA was essential for the production of progeny and the protein coat served only as a protective shell. DNA, therefore, must be the hereditary material. The details of the experiment are outlined in Fig. 6.

DNA is a Genetic Material

Phage T2

Blender Experiment

Fig. 6: Hershey-Chase experiment to demonstrate that DNA is the genetic material

12

(B)

Size and shape of DNA in prokaryotic and eukaryotic cells

DNA is one of the largest known macromolecules. DNA molecules may be of two types: linear and circular. Linear DNA is found in the nuclei of eukaryotic cells. It exists in association with proteins. Circular DNA is found in prokaryotic cells and in mitochondria and chloroplast (plastids) of eukaryotic cells. It is naked being without a protein coat. Table 3 below lists the dimensions of the various viral, bacterial and eukaryotic DNA molecules. A perusal of the table indicates that even the smallest DNA molecules are highly elongated. For instance, the DNA from polyoma virus contains 5100 base pairs and has a contour length of 1.7 m. Table 3: Dimensions of certain DNA molecules Organism Viruses Polyoma virus or SV40 phage T2 phage Vaccinia Bacteria Mycoplasma E. coli Eukaryotes Yeast Drosophila Human Number of base pairs *Length (in m) Molecular weight (in thousands or kb) 5.1 48.6 166 190 760 4000 13500 165000 2900000 1.7 17 56 65 260 1360 4600 56000 990000 3.1 x 106 31 x 106 122 x 106 157 x 106 504 x 106 2320 x 106 -

* 1 m of double helix = 2.94 x103 base pairs = 1.94 x 106 D

(C) (a)

DNA structure Chargaffs Equivalence Rule

In 1950, E. E. Chargaff formulated important generalizations about DNA structure based on the data of quantitative chromatographic methods for separation and quantitative analysis of four bases in hydrolysates of DNA specimen isolated from different organisms. These generalizations are called Chargaffs equivalence rule. These include: Base composition of DNA varies from one species to another. DNA specimens isolated from different tissues of the same species have the same base composition. The base composition of DNA in a given species does not change with age, nutritional state, or changes in environment. Purines (A, G) and pyrimidines (T, C) are always equal such that amount of A is equal to T and the amount of G is always equal to C, i.e. A=T, G=C (Molar 13

equivalence of few bases). Base ratio A+T / G+C may vary from one species to other, but is constant for a given species. This ratio can be used to identity the source of DNA and can sometimes help in classification. The deoxyribose sugar and phosphate components occur in equal proportions.

(b)

Double helical structure of DNA (Watson-Crick model) (B-DNA)

In 1953, J. D. Watson and F. H. Crick postulated precise 3-D model of DNA structure, based on the X-Ray data of Franklin and Wilkins and the base equivalence observed by Chargaff. This model accounted for many of the observations on the chemical and physical properties of DNA and also suggested a mechanism for accurate replication of genetic information. The Watson-Crick model of DNA structure proposed the following: DNA contains two polynucleotide chains that are coiled in helical fashion around the same axis in right handed or counterclockwise direction, thus forming a double helix. The two chains or strands are antiparallel i.e. their 3, 5- internucleotide phosphodiester bridges run in opposite directions (as determined by nearest neighbour analysis). These chains are complementary to each other. The antiparallel orientation is a stereochemical consequence of the way that A and T and G and C pair with each other. All the phosphodiester linkages have the same orientation along the chain, giving each linear nucleic acid strand a specific polarity and distinct 5 and 3 ends. By definition 5 end lacks a nucleotide at 5 position, 3 end lacks nucleotide at 3 position. The backbone of helix consists of sugar and phosphate groups while bases are perpendicular to the backbone, projecting inwards to the center. Purine and pyrimidine bases are stacked inside the helix with their planes parallel to each other and perpendicular to the helix axis. Backbone is found on the periphery of the helix and is hydrophilic. Hydroxyl groups of sugar forms H-bonds with water. Phosphate groups with pKa near zero are negatively charged at neutral pH and negative charges are generally neutralized by ionic interaction with positive charges of protein, metals and polyamines. Bases are hydrophobic and shielded from water. It means that single stranded structure, in which the bases are exposed to aqueous environment, is unstable. Hence DNA is double helix. DNA double helix is held together by two forces: H-bonding of complementary base pairs and hydrophobic interactions. A base pair consists of a purine and a pyrimidine. Moreover, a specific purine pairs with a specific pyrimidine owing to a perfect match between hydrogen donor and acceptor sites on the two bases. The bases of one strand are paired in the same planes with the bases of other strand. Base pairing is due to steric and H-bonding factors. Base A is bonded with T by two H-bonds (double bond) and G is bonded to C by triple H-bond. Only A and T and also G and C have the proper spatial arrangements to form correct H-bonding. This is the concept of specific base pairing. The allowed pairs are A-T and G-C which are precisely the base pairs showing Chargaffs equivalence in DNA. Thus, Watson-Crick double helix involves not only the maximum possible number of H-bonded base pairs but also those pairs giving maximum fit and stability. The individual H-bond is weak in nature, but, as in the case of proteins, a large number of them involved in the DNA molecule confer stability to it. However, the stability of DNA is primarily a consequence of van der Waals forces and hydrophobic (base stacking) interactions between the planes of stacked bases. Thus, H-bonding is specific and is responsible for complementarity of two strands, while hydrophobic interactions [(-) stacking interactions between 14

adjacent bases] are non-specific and are responsible for stability of the macromolecule. The nucleic acid strands tend to stick together even in the absence of specific base pairing, although the specific interactions make the association stronger. DNA was found to possess two periodicities, a major one of 0.34 nm and a second one of 3.4 nm. To account for the 0.34 nm periodicity, Watson and Crick postulated that the bases are stacked at a center-to-center distance of 0.34 nm from each other, i.e. successive base pairs are 3.4 apart in the stack and are related by a rotation of 36. DNA helix is about 20 in diameter. The helical structure repeats after 10 residues on each chain, i.e., at intervals of 34 . Thus, there are 10.5 nucleotide residues in each complete turn of double helix to account for the secondary repeat distance of 3.4 nm. The space available between the two sugar-phosphate chains of DNA i.e. 20 (2 nm) can accommodate one purine and one pyrimidine but not two purines, which would be too large and not two pyrimidines which would not be close enough to form proper H-bonds. The two helices are wound in such a way so as to produce two interchain spacings or grooves, a major or wide groove (width 12 , depth 8.5 ) and a minor or narrow groove (width 6.0 , depth 7.5 ). Thus, major groove is slightly deeper than minor one. The two grooves arise because the glycosidic bonds of a base pair are not diametrically opposite each other. The minor grove contains the pyrimidine O-2 and the purine N-3 of the base pair; and the major groove is on the opposite side of the pair. Potential H-bond donor and acceptor atoms line each groove. The major groove displays more distinctive features than the minor groove. In these grooves, specific proteins interact with sequences of DNA. Such double helices cannot be pulled apart and can be separated only by the unwinding process. They are called as plectonemic coils, i.e., coils that are interlocked about the same axis. The helical structure helps in shielding of the bases from the environment, thereby protecting the genetic information from physical and chemical attack. The structural details of double stranded DNA as suggested by Watson and Crick are shown in Fig. 7. Fig 7a depicts helical structure of DNA and Fig. 7b shows normal Watson-Crick base pairing interactions. Double helical structure explains the mechanism by which general information can be accurately replicated. The complementarity of bases in and antiparallel directions of the two chains of DNA molecule provide the basis for precise replication of DNA. Since the two strands are structurally complementary to each other and thus contain complementary information, the replication of DNA during cell division was postulated to occur by replication of two strands, so that each parent strand serves as the template specifying the base sequence of new complementary strand. The end result of such a process is the formation of two daughter double-helical molecules of DNA, each identical to that of the parent DNA and each containing one strand from the parent.

(c)

Local flexibility in DNA structure

The analysis of oligonucleotide crystals as opposed to fibers shows that there is great variation in the helical parameters of molecules with diverse base sequences. This occurs because different base sequences influence helical and torsional parameters to maximize the stability of stacking and pairing interactions. B-DNA is particularly flexible in this respect and different local configurations adapt to particular sequences. This indicates that DNA 15

probably does not exist in rigid conformational forms but may change smoothly between different conformations punctuated by local polymorphisms such as bent DNA and helical transitions (sudden transitions between different helical conformations within a single molecule, e.g. B-Z transitions). DNA bending is an intrinsic property depending on stacking interactions, which according to local sequence, may be isotropic (unbiased) or anisotropic (bending in a specific direction). Intrinsic DNA bends occur in A-T rich runs and in repeats of the sequence GGCC in step with helical periodicity. DNA bending can also be induced by proteins (nucleic acid binding proteins) and by circularization (DNA topology). Induced bending is necessary for DNA packaging in chromosomes and for replication, recombination and transcription. Proteins may also recognize DNA that is bent in a certain way (e.g. topoisomerase).

Fig 7a: Double helical structure of DNA

(d)

DNA topology

If the DNA molecule has free ends (e.g. a linear molecule), the two strands wind around each other in the most energetically favorable manner and the molecule is said to be relaxed. The number of times one-strand winds around the other in this relaxed state is the duplex winding number. If extra twists are introduced into such a molecule and to make it overwound, then the total number of helical turns which is the linking number exceeds the duplex winding number. Conversely, if twists are removed from the molecule to make it underwound, the duplex winding number exceeds the linking number. In either case, the strands can rotate with respect to each other and return the molecule to its relaxed state. In a closed circle, however, there are no free ends and the linking number is a topological property - it can be changed only by breaking the circle open and not by deforming it. If DNA is a closed circle becomes overwound or underwound, the only way to relax the torsional strain thus produced is by supercoiling, where a twist is introduced into the helical axis itself. Supercoiling is another form of nucleic acid tertiary structure, one involving the effect of torsional stress upon shape rather than strand-strand interactions. 16

Fig 7b: Watson-Crick base pairing The physiological significance of supercoiling is that unconstrained DNA is often biologically inactive. Negative supercoiling is required for many essential processes: replication, transcription and recombination included. Supercoiled DNA has stored energy, which drives these reactions. In eukaryotes, which possess linear chromosomes, topological constraints are introduced by organizing chromatin into loops with ends fixed by scaffold proteins; nucleosomes introduce negative supercoils into eukaryote DNA.

(e)

Structural variants (helical conformers) of DNA

Watson-Crick structure of DNA is referred to as B-DNA (normal form). It is the biologically important one and exists under physiological conditions. DNA is very flexible in nature. Due to thermal fluctuation, bending, stretching and unpairing (melting) of strands can occur. Structural variants of DNA may arise due to three reasons: Difference in possible conformation of deoxyribose. Rotation about contiguous bonds that make up the phosphodeoxyribose backbone. Free rotation about C-1-N-glycosyl bond (syn or anti). The first investigations of DNA secondary structure demonstrated that alternative helical conformations (conformers) formed at different humidities. Different forms have different size and shape of grooves. The change in conformation alters the shape of the major and minor grooves, potentially influencing the nature of protein-DNA interactions, thereby affecting the regulatory property of DNA. Helical conformations reflect differences in the various parameters such as gross morphological features, bond angles, base inclination, displacement of the base pairs from the helical axis resulting from dehydration, helix parameter as base pairs per helical turn and helical twist. However, the key properties of DNA in different forms are not changed. The structural variants that have been well characterized in crystal structures are: 17

(i) A-DNA Dehydration favours A form. It does not occur under physiological conditions. It is observed in dehydrated DNA fibers by X-Ray diffraction studies i.e. when relative humidity is reduced below 75%. It is favoured in many solutions that are relatively devoid of water. There is no evidence for its existence in cells. The reagents used to promote crystallization of DNA tend to dehydrate it and thus most short DNA molecules tend to crystallize in Aform. The A-form is not confined to dehydrated DNA. Double stranded regions of RNA (as in hairpins) and RNA-DNA hybrids adopt a double helical very similar to that of A-DNA. The 2-OH of ribose prevents RNA from forming a classic Watson-Crick B-helix because of steric hindrance. In A-form, O-2 projects outward away from other atoms. Under physiological conditions, duplex RNA and RNA-DNA hybrids are thought to adopt A-form structure because they are inherently less flexible than DNA. The A-form of DNA is less soluble than B-form, which is why DNA which is overdried during plasmid preparation, is difficult to dissolve. (ii) Z-DNA Alexander Rich in 1984 discovered Z-DNA while solving the structure of CGCGCG. ZDNA is adopted by short oligonucleotides that have sequences of alternating pyrimidines and purines. Early studies of oligonucleotides with alternating purine-pyrimidine sequences revealed the left-handed helical conformation of Z-DNA. This structure is characterized by alternating helical parameters and torsion angles with a 2-base pair periodicity, causing the backbone of the helix to zig-zag (hence the name Z-DNA). Zig-zagging is thus a consequence of the fact that the repeating unit is a dinucleotide (not a mononucleotide), especially a sequence in which pyrimidine alternate with purines, e.g., alternating C and G or 5-methyl cytosine and G residues. Although, alternating purine-pyrimidine tracts such as oligo-dGdC and oligo-dAdC provide a good substrate for Z-DNA, this sequence specificity is now known to be neither necessary nor sufficient for its formation. Methylation of C-5 of cytosyl residues in alternating CG sequences (e.g. CGCGCG) facilitates the transition of BDNA to Z-DNA, because the added hydrophobic methyl groups stabilize the Z-DNA structure. Z-DNA is formed when purine residue flip in syn conformation while alternating pyrimidine is in anti conformation. Phosphate groups of backbone are closer to each other as compared to that in A or B forms, hence high salt concentration is required to minimize electrostatic repulsion between the backbone phosphates. It contains one deep helical groove. The Z-DNA form occurs under physiological conditions in certain cases only. The biological role of Z-DNA is uncertain, however, its existence graphically shows that DNA is a flexible, dynamic molecule. Z-DNA structure tend to form in torsionally stressed DNA and are stabilized by dehydration, they may play an important role in control of gene expression. Fig. 8 depicts the common structural variants of DNA, i.e. A and Z, along with B form of DNA. The general characteristics of A, B and Z DNA are summarized in Table 4.

(f)

Properties of DNA in solution

(i) Acid-base properties DNA is strongly acidic. The recurring secondary phosphate groups of DNA, which constitute the bridges between adjacent mononucleotides have a rather low pK and are fully ionized at any pH above 4. These phosphate groups are located on the outer periphery of the double helix, exposed to water. They strongly bind divalent cations as Mg++ and Ca++, as well as polycationic amines, spermine and spermidine, which are associated with the DNA in many viruses and bacteria. The binding of the polyamines in the groove of double helical 18

DNA both stabilizes the DNA molecule and makes it more flexible.

A DNA

B DNA Fig. 8: Common structural variants of DNA

Z DNA

Double helical DNA is maximally stable between pH 4.0 and 11.0 (physiological range). Outside these physiological limits, DNA becomes unstable and unwinds. The stability of Hbonded base pairs of double helical DNA is a function of pH, since the H-bonding properties of different bases depends on their ionic form, which in turn depends on pH. (ii) Light absorption Typical absorption spectrum for DNA at pH 7.0 is represented in Fig. 9. As shown, DNA molecule absorbs light energy strongly at 260 nm. This characteristic absorption maximum is the property of its individual bases, purines and pyrimidines and their corresponding nucleotides. A native intact molecule of DNA absorbs lesser light energy at 260 nm as compared to free bases, as the bases are packed into a double helix of DNA. (iii) Viscosity Because of the rigidity of the double helix and the immense length of DNA in relation to its small diameter, even very dilute DNA solutions are highly viscous. Solution of DNA is highly viscous at pH 7.0 and room temperature (25C). Viscosity measurements are often used to follow the course of unwinding and denaturation of duplex DNA molecules. Viscosity decreases at extremes of pH and above 80C and as the two strands separate. There is another consequence of the immense length of DNA molecule. When they diffuse, they sweep with them a relatively enormous volume of solution, more than 10000-fold greater than their own volume. For this reason, DNA shows ideal behavior as a solute only in extremely dilute solutions. (iv) Sedimentation behaviour The sedimentation coefficient and molecular weight of DNA can be determined by ultracentrifugal methods. Because of the extremely elongated nature of DNA molecule and the high viscosity of DNA solutions, sedimentation measurements are carried out in a series 19

of low concentrations of DNA and the sedimentation coefficient extrapolated to zero DNA concentration. The sedimenting boundary is usually detected by measuring the optical absorbance at a wavelength of 260 nm, at which DNA strongly absorbs. Table 4: General characteristics of three major forms of DNA S. No. A 1 a b 2 a b c d e f 3 a b 4 a b c d e f g Conditions Relative humidity 75% Ions required / Salt Na+, K+, concentration Cs+ ions Morphological characteristics Shape Broadest Helical state Right Pitch (base pairs per turn) 11 Major groove Deep, narrow Minor groove Broad, shallow Helix diameter ~26 Torsional parameters Sugar pucker C2 endo conformation Glycosidic bond angle Anti Helical parameters Displacement Twist Helix rise per base pair Helix pitch Base tilt normal to helix axis Inclination Rotation per base pair -4.4 33 2.6 25.30 20 22 +32.72 Conformation B 92% Low strength

ion Very high salt concentration Narrowest Left 12 (= 6 dimers) Flat Narrow and very deep ~18 Alternating Alternating anti / syn 3.2 -49 / -10 3.7 45.60 7 -7 -60 (per dimmer)

Intermediate Right 10.5 Wide Narrow ~20 C3 endo Anti 0.6 36 3.4 35.36 6 -2 +34.61

Molecular weights of DNA can also be obtained by comparing their rate of sedimentation in a sucrose density gradient with the rate given by a DNA sample of known size and sedimentation coefficient. Equilibrium sedimentation in CsCl gradients is very widely used to determine the buoyant density of DNA molecules. When a concentrated (8 M) CsCl solution is centrifuged to equilibrium in a high gravitational field, the CsCl becomes distributed in a linear gradient down the tube; at the top of 1 cm column the density of the solution is about 1.55 g cm-3 and at the bottom about 1.8 g cm-3 or 1.8 g ml-1. When DNA is present during formation of gradient, it concentrates into a stable band at a position at which its buoyant density is exactly equal to the density of CsCl solution. The density of DNA can be calculated directly or by comparison with the density of known standard DNA specimen centrifuged in the 20

same gradient. Single stranded DNA is denser in such a CsCl gradient than double stranded DNA, which in turn is denser than proteins in general. RNA can be distinguished from DNA since it is denser than either single stranded or double stranded DNA. Buoyant density measurements also provide information on the base composition of DNA specimen, because G + C base pairs, which are joined by three H-bonds, are more compact and dense than A-T pairs, which are joined by only two H-bonds. The buoyant density of DNA in CsCl gradient is a linear function of ratio of G-C to A-T pairs. The intact homogeneous DNAs of viruses give very sharp bands, whereas random heterogeneous DNA fragments from cells of higher animals give broad bands with a wide density range.

Absorbance

1.5

0.5 200

220

240

260

280

300

Wavelength (nm)

Fig. 9: The absorption spectrum of a DNA solution at pH 7.0

(g)

Denaturation

Double helical structure of DNA is maintained due to H-bonding between base pairs and stacking interactions between successive bases. When either or both sets of forces are interrupted, the native, double helical structure undergoes transition into a randomly looped form, denoted as single stranded or denatured DNA. Thus, DNA double helix can be easily separated by denaturation and rejoined (Fig. 10).

Denaturation of DNA

Double helical DNA

Partially unwound (denatured) DNA

Separated strands of DNA

Fig. 10: Denaturation (melting) of DNA

21

Causes of denaturation or factors affecting denaturation The unwinding and rewinding of DNA occur naturally in vivo during DNA replication and transcription at regions rich in A-T. In vitro, following conditions lead to the denaturation of DNA. Extreme pH (titration with acid or alkali): Acidic and alkaline pH at which ionic changes of the subsituents on the purine and pyrimidine bases can occur also leads to denaturation of DNA. In acidic solutions (pH 2.0 to 3.0), at which amino groups bind protons, the DNA helix is disrupted. Similarly, in alkaline solutions (pH 12), the enolic hydroxyl groups ionize, thus preventing the keto-amino group H-bonding. Acid or alkali leads to ionization of bases of DNA. Heat (high temperature): Native DNA molecules usually denature within a very small increment of temperature. The thermal denaturation of DNA is often designated as melting. The separation of two strands of DNA upon denaturation is shown in Fig. 10. DNA samples from different cell types have characteristically different melting temperatures. Tm increases in linear proportion with G-C base pair content, which have three H-bonds and are thus stable than A-T pairs. The higher the content of G-C pairs, the more stable the structure and more thermal energy required to disrupt it. Implications of denaturation Denaturation significantly affects various properties of double stranded DNA. These include: Change in specific optical rotation: Native DNA exhibits a strong positive rotation. Upon denaturation, optical rotation is highly decreased and becomes more negative. Change in absorption of ultraviolet light at 260 nm: As mentioned earlier, double stranded DNA possesses an absorption maximum at 260 nm (Fig. 9). Upon denaturation of DNA, an increase in light absorption at 260 nm is observed (physical change). As compared to free bases, a native intact molecule of DNA absorbs lesser light energy at 260 nm as their bases are packed into a double helix. Upon denaturation, the bases in single strands are exposed and consequently a denatured DNA molecule absorbs more light as compared to native DNA. The total light absorption of fully denatured DNA is nearly equal to that of an equivalent number of the corresponding free mononucleotides. This increase in absorption of light (up to 40%) occurs even though the amount of DNA remains the same. This phenomenon is called hyperchromic effect. A single stranded DNA does not show hyperchromic effect. Thus, this phenomenon can be used to distinguish single stranded DNA. Fig. 11a represents the characteristic melting curve of DNA, demonstrating hyperchromic effect upon denaturation of double stranded DNA. The temperature at the midpoint of melting curve is called melting temperature, defined as Tm. The effect of temperature on absorbance at 260 nm and their relationship with strand separation is evident in Fig. 11b. As the temperature increases, the absorbance also increases till strand separation, after which the absorbance does not increase. The percentage increase in light absorption at 260 nm produced by heating a native DNA sample is directly related to its content of A-T base pairs, the higher the proportion of A-T base pairs, the greater the increase in light absorption.

22

A 260 of bases = 1.80

Strand separation

260

1.4

ssDNA

Absorbance at 260 nm

0.1 0.75

DNA ded ran e st ingl S


Double stranded DNA

Relative value of A

1.2

Appearance of DNA

0.5 75

80 Tm 85 Temperature ( 0C)

90

1.0 dsDNA

Tm

30 50 70 90 110 Temperature ( 0 C)

(a)

(b)

Fig. 11: (a) Typical melting curve of DNA; (b) Effect of change of temperature on absorbance with respect to strand separation Renaturation When denatured DNA (melted DNA) is incubated at a temperature about 25C below that at which denaturation occurs, the two separated strands reassociate or reanneal to form a duplex DNA molecule. It is called renaturation. The process of renaturation of denatured DNA upon cooling is shown in Fig. 12. The strands separated upon denaturation (melting) reanneal to form duplex DNA. Even in the absence of small stretches of DNA from one strand, the strands reassociate with the bulging of non-complementary (missing) sequences. Renaturation can be a one- or two-step process. One-step process: If denaturation has proceeded to first stage, with few base pairs still present (i.e. if about 12 or more residues are still united), the unfolded segments of two strands will spontaneously rewind or anneal to form an intact duplex on lowering the temperature or change of pH. They snap back to their native conformation, which is the minimum-free energy form. Two-step process: Upon denaturation, when two strands are completely separated, then renaturation is much slower and occurs in two-step process. First step, called nucleation reaction, is relatively slow step in which two strands find each other by random collisions and form a short segment of complementary double helix. Second step, called zippering reaction, is faster when remaining unpaired bases successively come to base pair and the two strands zipper them together to form double helix.

(h)

Functions of DNA DNA is the very basis of life and has five-fold role: It carries hereditary characters from parents to offspring. 23

It enables the cell to maintain, grow and divide by directing the synthesis of structural proteins. It controls metabolism in the cell by directing the formation of necessary enzymatic proteins. It contributes to the evolution of the organism by undergoing gene mutations (changes in the sequence of base pairs). It brings about differentiation of cells during development. Only certain genes remain functional in particular cell. This enables the cells having similar genes to assume different structure and function.

Fig. 12: Renaturation of denatured DNA Ribonucleic acid (RNA) RNA is the only molecule known to have a role both in the storage and transmission of information and in catalysis. RNA is synthesized from DNA in a process called transcription. Chemically, RNA is very similar to DNA. The fundamental chemical differences are: RNA backbone contains ribose rather than the 2-deoxyribose (i.e. ribose without the 24

OH group at 2-position) present in the DNA. However, this slight difference has a powerful effect on some properties of the nucleic acid, especially on its stability. Thus, RNA is readily destroyed by exposure to high pH. Under these conditions, DNA is stable, although the strands will separate, they will remain intact and capable of renaturation when the pH is lowered again. RNA contains uracil instead of thymine. Uracil has the same single-ringed structure as thymine, except that it lacks the methyl group at C-5 position. The reason for the use of uracil in RNA instead of thymine is probably that the uracil energetically less expensive to produce than thymine. Moreover, in DNA, as uracil is readily produced by chemical deamination of cytosine, so having thymine as the normal base makes detection and repair of such incipient mutations more efficient. Thus, uracil is appropriate for RNA, where quantity is important but lifespan is not, whereas thymine is appropriate for DNA where maintaining sequence with high fidelity is crucial. Mostly single stranded i.e. single polynucleotide chain. However, some viruses have double stranded RNA (dsRNA) as their genetic material. Single strand can fold back on itself having potentially much greater structural diversity than DNA.

(A)

Structure

(a) Primary structure RNA is single stranded, long, unbranched macromolecule consisting of nucleotides joined by 3 5 phosphodiester bonds. The number of nucleotides ranges from as few as 75 to many thousands. In RNA, U replaces T, but since U has a similar chemical structure to T and forms the same H-bonds with A, it hybridize according to general rules. Ubiquitous as these interactions are, however, there are alternative base pairing schemes playing important roles in the secondary and tertiary structures (described below). (b) Secondary, tertiary and quaternary structures Despite being single stranded, RNA molecules often exhibit a great deal of double helical character. This is because RNA chain frequently folds back on itself to form base paired segments between short stretches of complementary sequences. In contrast to DNA, where the secondary structure of DNA is characterized by intermolecular base pairing, in RNA, the secondary structure generally reflects intramolecular base interactions. Such secondary structure formation in RNA by intramolecular normal Watson-Crick base pairing (C:G and A:U) is shown in Fig. 13a. If the two stretches of complementary sequence are near each other, the RNA may adopt one of various stem loop structures in which the intervening RNA is looped out from the end of the double helical segment as in a hairpin, a bulge or a simple loop (Fig. 13b). The single strands tend to assume a right-handed helical conformation dominated by base stacking interactions, which are stronger between two purines than between a purine and pyrimidine or between two pyrimidines. The purine-purine interaction is so strong that a pyrimidine separating two purines is often displaced from the stacking pattern so that the purines can interact. Secondary structures are important in regulation of gene expression. The 3-D structures of many RNAs, like those of proteins, are complex and unique. Weak interactions, especially base stacking interactions, play a major role in stabilizing RNA structures, just as they do in DNA. Where complementary sequences are present, the predominant double stranded structure is an A-form right handed double helix. The 25

presence of 2-OH in the RNA backbone prevents RNA from adopting a B-form helix. Rather, under physiological conditions, duplex RNA and RNA-DNA hybrids adopt an Aform structure because they are inherently less flexible than DNA. Secondary structure formation in RNA

(a)

Hairpin

Bulge
(b)

Loop

Fig. 13: Double helical characteristics of RNA (hairpin double helix) (a) Intramolecular base pairing forming secondary structure in RNA (b) Formation of stem and loop / bulge structures in complementary and non-complementary regions, respectively As such, the minor groove is wide and shallow and hence accessible, but the minor groove offers little sequence-specific information. Meanwhile, the major groove is so narrow and deep that it is not very accessible to amino acid side chains from interacting proteins. Zform helices have been made in the lab (under very high salt or high temperature 26

conditions). The B-form of RNA has not been observed. Thus, the RNA double helix is quite distinct from the DNA double helix in its detailed atomic structure and less well suited for sequence-specific interactions with proteins (although some proteins do bind to RNA in a sequence-specific manner). A feature of RNA that adds to its propensity to form double helical structures is an additional, non Watson-Crick base pair. This is the G : U base pair, which has H-bonds between N3 of U and carbonyl on C6 of G and between the carbonyl on C2 of U and N1 of guanine. Since G:U base pairs can occur in addition to the four conventional Watson-Crick base pairs, RNA chains have an enhanced capacity of self complementarity (Fig. 14). Thus, RNA frequently exhibits local regions of base pairing but not the long-range, regular helicity of DNA.

G:U base pair


O

N N Ribose N NH G 2 U O N H H N O N Ribose

Fig. 14: G:U base pair Important additional structural contributions are made by H-bonds that are not part of standard Watson-Crick base pairs, e.g., 2-OH group of ribose can H-bond with other groups. Some of these properties are evident in the structure of the tRNAphe of yeast or ribozymes, whose functions, like those of protein enzymes, depend on their 3-D structures. RNA secondary structures play a major role in gene expression and its regulation: base pairing between rRNA and mRNA controls the initiation of protein synthesis, base pairing between tRNA and mRNA facilitates translation, RNA hairpins and stem loops control transcriptional termination, translational efficiency and mRNA stability. RNA-RNA base pairing also plays a major role in the splicing of introns. Like DNA, RNA helical conformation is modulated by local sequence character, but the relatively high percentage of modified bases further adds to the variety of structures. RNA can fold into complex structures involving tertiary interactions between strands, loops and duplexes. For example, in tRNA there are base triples, sections of triple helix, stem junctions (where two or more duplex regions are joined) and pseudoknots (where strands interact with stem loops). This is because RNA has enormous rotational freedom in the backbone of its non base-paired regions. Tertiary structure frequently involves unconventional base pairing, such as the base triples and base backbone interactions seen in tRNA (e.g. U:A:U base triples). 27

Interaction of RNA with ribosome, spliceosome, proteins may be considered as quaternary structure. Proteins can assist the formation of tertiary structures by large RNA molecules, such as those found in the ribosome. Proteins shield the negative charges of backbone phosphates, whose electrostatic repulsive forces would otherwise destabilize the structure.

(B)

Types of RNA

RNAs have a broader range of functions and several classes are found in cells. On basis of size, function and stability, RNAs are of three types: rRNA, mRNA and tRNA: Messenger RNA (mRNA): These are intermediaries, carrying genetic information from one or a few genes to a ribosome, where the corresponding proteins can be synthesized. Ribosomal RNA (rRNA): These are components of ribosomes, the complexes that carry out the synthesis of proteins. Transfer RNA (tRNA): These are adapter molecules that faithfully translate the information in mRNA into a specific sequence of amino acid. The properties and functions of different types of RNAs is described below and also summarized in Table 5. (a) Messenger RNA (mRNA) The mRNAs are intermediaries, carrying genetic information from DNA for protein synthesis. The mRNA codes for polypeptide chain (s). It is synthesized in the nucleus during the process of transcription. The sequence of bases of mRNA strand so formed is complementary to that of the DNA strand being transcribed. After transcription, mRNA passes into cytoplasm and then to ribosomes, where it serves as a template for the sequential ordering of amino acids during the biosynthesis of proteins. Some mRNA is also produced in mitochondria and chloroplast. The mRNA forms only about 5% of total RNA. Although it makes very small part of total RNA of cell, it occurs in many distinctive forms, which vary greatly in molecular weight and base sequence. It is very unstable. The mRNA is degraded by ribonucleases present in all cells. The cellular concentration of mRNA generally indicates the level of gene expression. (i) Prokaryotic mRNA: Prokaryotic mRNA is mainly polycistronic (Fig. 15). A single mRNA molecule codes for two or more polypeptide chains i.e. contains multiple ORFs. The mRNA contains a ribosome-binding site (RBS) referred to as Shine Dalgarno Sequence. It is complementary to a sequence located near the 3 end of one of the RNA components, the 16S rRNA. RBS base pairs with 16S rRNA, thereby aligning the ribosome with the beginning of mRNA. Some mRNAs lack RBS and have translational coupling, e.g. 5AUGA3 has an overlapping sequence. The protein coding region(s) of each mRNA is composed of a contiguous, non-overlapping string of codons called open reading frame (ORF). ORF is a sequence of DNA consisting of triplets that can be translated into amino acids starting from initiation codon and ending with a termination codon. Each ORF specifies a single polypeptide and starts and ends at internal sites within the mRNA, i.e. the ends of an ORF are distinct from the ends of mRNA. Translation starts at 5 end of ORF. First codon of an ORF is called start codon. In bacteria, it is usually 5-AUG-3. Some also have 5-GUG-3 or 5-UUG-3. Last codon of ORF where translation stops is 28

stop codon. There are UAG, UGA and UAA. Table 5: Different types of RNAs and their properties and functions S. Type No. 1 Function Number Stability Relative Sediment- Molecular weight of amount ation nucleotides coefficient (%) mRNA 15 Heterogeneous 400-4000 Unstable (in Carry (mammal) prokaryotes genetic half life is information few seconds from DNA to 2 min; in for eukaryotes assembly of half life is amino acids few hours to on ribosomes one day) for protein synthesis tRNA 15 4S 2.5 x 104 73-93 Quite stable Act as specific in prokaryotes; carrier of somewhat activated less stable amino acids to specific in eukaryotes sites on protein synthesizing templates 6 Most stable Ribosomal rRNA 80 28S 1.5 x 10 4700 5 assembly; form of (eukaryote) 18S 7.8 x 10 1900 4 provide RNA 5.8S 4.5 x 10 160 4 specific 5S 3.5 x 10 120 6 sequence to rRNA 80 23S 1.2 x 10 2900 6 which (prokaryote) 16S 0.55 x 10 1540 mRNA bind 5S 3.6 x 104 120

Fig. 15: Prokaryotic polycistronic message 29

In prokaryotes, half-life of mRNA is few seconds to 2 minutes. (ii) Eukaryotic mRNA Eukaryotic mRNA is monocistronic (Fig. 16). A single mRNA codes for single polypeptide chain i.e. contains single ORF. There is no RBS. Start codon is AUG.

Fig. 16: Eukaryotic monocistronic message In eukaryotes, half-life of mRNA is few hours to one day. The primary transcript for a eukaryotic mRNA typically contains sequences encompassing one gene, although the sequences encoding the polypeptide may not be contiguous. Non-coding tracks that break up the coding region of the transcript are called introns and the coding segments are called exons. In a typical process called splicing, the introns are removed from the primary transcript and the exons are joined to form a continuous sequence that specifies a functional polypeptide The mRNAs are transcribed as large transcripts (pre-mRNA) from DNA. The premRNA has same organization as the gene. The primary transcript (also called heterogeneous nuclear RNA; hnRNA) is much larger than mRNA, very unstable and has much greater sequence complexity. Primary transcript undergoes splicing to form mature mRNA, which is 10-100 times smaller than the primary transcript. Most eukaryotic mRNAs have 5 cap, a residue of 7-methyl guanosine [modified G base (m7G)] linked to the 5 terminal residue of mRNA through an unusual 5 5triphosphate linkage. The cap is added in reverse polarity (5 to 5), thus acting as a barrier to 5 exonuclease attack, but it also promotes splicing, transport and translation. Caps contribute to the stability of mRNAs by protecting their 5 ends from phosphatases and nucleases. Thus, cap has following functions: It protects mRNA from ribonucleases. It promotes splicing. Cap binds to specific cap-binding complex of proteins and participates in binding of mRNA to ribosome to initiate translation (i.e. help in recruitment of ribosome to mRNA or recognition of mRNA by translational machinery). It increases efficiency of translation. In eukaryotic mRNAs poly (A) tail is present at extreme 3 end of mRNA. It is 80250 A residues long. It is added enzymatically by poly A polymerase. Functions of poly (A) tail are: Contributes to efficient translation. Serves as binding sites for one or more specific proteins. Apparently plays a role in the processing or transport of mRNA from nucleus to ribosome. Enhances the level of translation of mRNA by promoting efficient recycling of ribosomes. Probably help protect mRNA from enzymic destruction. 30

Many prokaryotic mRNAs also acquire poly (A) tails but these tails stimulate decay of mRNA rather than protecting it from degradation.

(b)

Ribosomal RNA (rRNA) These are components of ribosomes and hence the name. These constitute 80% of total RNA. They represent 40-60% of total weight of ribosome. They are the most stable form of RNA. The rRNAs function in ribosome assembly along with proteins. However, the rRNAs are not simply the structural components of ribosome. Rather they are directly responsible for the key function of ribosome. The 16S rRNA contains specific pyrimidine rich sequence (a subset of AGGAGG) at the 3 end that is complementary to the purine rich Shine-Dalgarno (SD) sequence at the 5 end of mRNA and thus helps in binding to mRNA during translation. The rRNA plays a central role in the function of small subunit of ribosome. The rRNAs are transcribed as large transcripts from DNA. A few of the bases in rRNA are methylated. The rRNA from all sources has G:C content more than 50%. The rRNA molecule appears as a single unbranched polynucleotide strand (primary structure). At low ionic strength, the molecule shows a compact rod with random coiling. But at high ionic strength, the molecule reveals the presence of compact helical regions with complementary base pairing and looped outer region (secondary structure). The double helical structure can form within a single RNA molecule or between two separate RNA molecules. RNAs can often assume even more complex shapes as in bacteria.

(i)

Prokaryotic rRNA In E. coli cells, rRNA occur as linear, single stranded molecules that appear in three characteristic forms with different sedimentation coefficient. These are 23S, 16S and 5S. These are transcribed as single pre-rRNA transcript (Fig. 17). These three forms differ in base ratios and sequences.

Fig. 17: Pre-rRNA transcript in prokaryotes (30S) (~6500 nt) The rRNAs function in ribosome assembly along with proteins. In prokaryotes, 23S and 5S rRNAs form the part of large (50S) ribosomal subunit, while the 16S rRNA forms the part of small (30S) subunit. The constitution of ribosomal subunits is shown in Fig. 18. The rRNA also plays a central role in the function of both the subunits of ribosome. The anticodon loops of charged tRNAs and the codons of mRNA contact the 16S rRNA, not the ribosomal proteins of small subunit. 23S rRNA plays crucial role in transpeptidase reaction during translation.

31

Fig. 18: rRNA as constituents of prokaryotic ribosomes (ii) Eukaryotic rRNA On basis of sedimentation coefficient there are four types of rRNAs in eukaryotes 28S, 18S, 5.8S and 5S. The rRNAs are transcribed as large transcripts from DNA. Thus, 28S, 18S and 5.8S rRNAs are transcribed as single pre-rRNA transcript, while 5S rRNA is synthesized separately (Fig. 19). Pre-rRNA transcription units are arranged in clusters in the genome as long tandem arrays separated by non-transcribed spacer sequences. The arrays of rRNA genes loop together to form the nucleolus and are known as nucleolar organizer regions. Each rRNA gene produces a 45S rRNA transcript called pretranscript or preribosomal RNA or pre-rRNA, which is ~13000 nucleotide long. The 45S pretranscript is processed in nucleolus to give one copy each of 28, 18, 5.8S rRNAs, which are 5000, 2000 and 160 nucleotides long respectively. The genes for 5S rRNA are organized in a tandem gene cluster. This is the only rRNA subunit to be transcribed separately.

Fig. 19: Pre-rRNA transcript in eukaryotes (45S) (~13000 nt) The rRNAs function in ribosome assembly along with proteins. In eukaryotes, 28S, 5S and 5.8S rRNAs are present in large (60S) subunit of ribosome, while 18S rRNA is present in small subunit of ribosome (Fig. 20).

(c)

Transfer RNA (tRNA) Transfer RNA serves as adapter molecule in translating the language of nucleic acids in mRNA into the language of proteins, by serving as carriers of specific amino acids to specific sites on protein-synthesizing template i.e. ribosome. The tRNAs, covalently linked to an amino acid at one end, pair with the mRNA in such a way that amino acids are joined to a growing polypeptide in the correct sequence. Each tRNA is specific of an amino acid, i.e., it can bind or accept only that 32

particular amino acid.

Fig. 20: rRNA as constituents of eukaryotic ribosomes The tRNA contributes to 15% of total RNA. The tRNA molecules remain dissolved in solution after centrifuging a broken cell suspension at 100,000X gravity for several hours, hence also called soluble RNA. Molecular weight of tRNAs range from 24000-31000 (2.5 x 104 to 3.1 x 104). Sedimentation coefficient of tRNA is ~4S. The base sequence of a tRNA molecule was first determined by Robert Holley in 1965. His study of yeast alanine tRNA (tRNAAla) provided the first complete sequence of any nucleic acid. These are 73-93 nucleotide long. The conventional numbering of nucleotides begins at the 5 end and proceeds toward the 3end. The 5 terminus is phosphorylated (pG) whereas the 3 terminus has a free OH group. There are more than one specific tRNA for each amino acid [5 for Leucine, 5 for Serine, 4 for Glycine, 4 for Lysine]. There are no tRNAs for Hyp and cysteine. A striking feature is the presence of modified bases or minor bases, introduced by enzymes that recognize target bases in tRNA structure. About 7-15 bases are modified (modification can be methylation of A, G, C, T or presence of modified base pseudouridine. Modifications of pyrimidines are less complex than those of purine. In tRNA, there is a vast range of modifications, ranging from simple methylation to wholesome restructuring of purine ring. A striking feature of tRNA is its high content (up to 25%) of unusual bases other than A, U, G and C. These include post-translationally modified or hypermodified bases. Nearly 80 such bases, found at >60 different tRNA positions, have been characterized. A few of such minor or modified bases in tRNA are listed in Table 6. In addition to the modifications of the bases themselves, methylation at the 2-O position of the ribose ring also occurs. Purpose of methylation / modification: * Methylated bases do not form base pairs and become accessible for other interactions (disallow unwanted base pairing with mRNA). * Methylation provides hydrophobic character to some portions, which is important for their interaction with synthetases and ribosomal protein. * Unusual bases provide stability, protect from hydrolytic attack by nucleases. Codon-anticodon recognition involves wobbling at the first position of the anticodon (third position of codon), which allows some tRNAs to recognize multiple codons. Wobble base is less specific in its interaction with its 33

corresponding base in codon than other two bases. This wobbling also allows some tRNAs to recognize multiple codons. Wobbling also allows easy release of tRNA once an amino acid has been added. Table 6: Examples of some minor or modified bases in tRNA and their standard abbreviations 5, 6-Dihydrouridine (D or hU or UH2 or DHU) Pseudouridine () Ribothymidine (T) 1-methyl guanosine (m1G) 1-methyl adenosine (m1A) Inosine (I) 1 1-methyl inosine (m I) N2, N2-Dimethyl guanosine (m22 G) N6-isopentenyl adenosine (i6A) N7-methylguanosine (m7G) 3-methylcytidine (m3C) 4-Thiouridine (s4U) 2-Thiouridine (s2U) N4-Acetlycytidine (ac4C) Lysidine (L) * Quenosine (Q-base) ** Wyosine (Wyo; Y-base)
* Quenosine or Q-base: Pentenyl ring at methyl group of 7-methylguanosine ** Wyosine or Y base: Additional ring fused with purine ring itself. Extra ring carries a long C chain, again to which further groups are added in different cases.

Each amino acid recognized by particular aminoacyl tRNA synthetases, which also recognizes all of the tRNAs coding for that amino acid. tRNAs are derived from longer RNA precursors by enzymatic removal of nucleotides from 5 and 3 ends. Where two or more different tRNAs are contained in a single primary transcript, they are separated by enzymatic cleavage. The endonuclease RNaseP found in all organisms, removes RNA at the 5 end of tRNA. This enzyme contains both protein and RNA. It is an example of catalytic RNA. The 3 end of the tRNA is processed by one or more nucleases, including the exonuclease RNaseD. In eukaryotes, introns are present in a few tRNA transcripts and must be excised. As the function of tRNA is to bind the specific amino acids, one might think that there are 20 types of tRNAs. Since the code is degenerate (i.e. there is more than one codon for an amino acid), there may also be more than one tRNA for a specific amino acid. In fact, their total number far exceeds than 20. In a bacterial cell, there are more than 70 tRNAs and in eukaryotic cell, this number is even greater, because there are tRNAs specific of mitochondria and chloroplast (which usually differ from the corresponding cytoplasmic tRNAs). Eukaryotic cells have multiple copies of many of the tRNA genes. Therefore, there are generally several tRNAs specific of the same amino acid (sometimes up to 4 or 5); they are called isoacceptor tRNAs. These various tRNAs, capable of binding the same amino acid, differ in their nucleotide sequence; they can either have the same anticodon and therefore recognize the same codon or have different anticodons and thus permit the incorporation of the amino acid in response to multiple codons specifying the same amino acid. Secondary structure: When drawn in two dimensions, the secondary structure of tRNA resembles a clover leaf structure (Fig. 21).

34

Fig. 21: Clover leaf secondary structure of tRNA In tRNA, ~50% of the bases are paired forming 4 arms with three loops. Longer tRNAs have a short fifth extra arm of variable length. These arms act as recognition sites. These are: Amino acid attachment site (3-OH, 5-pG) Anticodon arm (-Py-Py-X-Y-Z-Pu-N-) DHU arm TC arm Variable extra arm (3-21 bases) The common features of secondary structure of tRNA are: Acceptor or amino acid stem: A 7 bp stem that includes the 5 terminal nucleotide and that may contain non-Watson-Crick base pairs such as G:U. This assembly is known as acceptor or amino acid stem because the amino acid residue carried by the tRNA is appended to its 3 terminal OH group. 5 end has terminal G residue, which is phosphorylated. 3 end has CCA residue at terminal region with a free 3-OH group. This forms amino acid arm for recognition of particular amino acid. The amino acid attachment site is the 3-OH group of the adenosine residue at the 3 terminus of the molecule via the 3 group of its ribose. Amino acid arm can carry a specific amino acid esterified by its carboxyl group to the 2 or 3 hydroxyl group of A residue (at 3 end). The amino acid residue is enzymatically transferred to the end of growing polypeptide chain on surface of ribosome during protein synthesis. Anticodon arm: Just opposite the amino acid arm, is a 5 bp stem ending in a loop, which contains an anticodon (a sequence of three bases complementary to three base codon sequence in mRNA). The anticodon forms H-bond with complementary base in mRNA attached to a ribosome. Anticodon loop contains 7 bases with a sequence: 5-Py-Py-X-Y-Z-modified Pu-N-3, where Py is any pyrimidine, Pu is any purine, N is any base, X, Y, Z signifies anticodon complementary to codon of mRNA. This 35

arm helps in selection and positioning of correct amino acid for transfer to growing polypeptide chain. A modification to Watson-Crick base pairs are the Wobble pairs, which allow bases in 5 anticodon position of tRNA to pair ambiguously with mRNA i.e. the Wobble base is less specific in its interaction with its corresponding base in codon than other two bases. The Wobble base pairs are formed because bases are offset from their normal Watson-Crick and one of the H-bonds is lost. Wobble pairs are thus represented by first base of anticodon of tRNA and last base of codon of mRNA. In anticodon loop, at 3 end of the anticodon is a purine or pyrimidine derivative. Some tRNAs, particularly those of plants contain an isopentenyl derivative of purine. These characteristic bases apparently serve as stoppers to demarcate the anticodon. Thiacanthine [6-(3-methyl but-2-enylamino) purine] is one of the minor purines found next to anticodon. This compound is a cytokinin, a plant hormone. The predictions of Wobble pairing accord very well with the observed abilities of almost all tRNAs. But there are expectations in which the codons recognized by a tRNA differ from those predicted by the Wobble rules. Such effects probably result from the influence of neighbouring bases and / or the conformation of the anticodon loop in the overall tertiary structure. D arm or DHU arm: It is a 3 or 4 bp stem ending in a loop and contains 2 to 3 unusual nucleotide, 5,6-dihydrouridine (DHU). The length of the D loop of D arm varies from 5 to 7 nucleotides depending on tRNA. T arm or TC arm or Ribothymidine-Pseudouracil-Cytosine arm: It is a 5 bp stem ending in a loop. It contains ribothymidine (rT), not usually present in RNA and pseudouridine (), which has an unusual C-C bond between the base and ribose. Extra arm (variable arm): This is the site of greatest variability. It is located between the anticodon loop and TC loop. It has from 3 to 21 nucleotides and may have a stem consisting of up to 7 bp. DHU and TC arms contribute important interactions for the overall folding of tRNA molecules and the TC arm interacts with the large subunit rRNA. The length (distance from CCA end to anticodon site) is constant. X-ray analysis of crystals of tRNA shows molecule is asymmetrically folded to yield a compact structure about 9 nm long and 2.5 nm thick. Extra bases in longer molecules are adjusted in extra arm or DHU arm. There are 15 invariant positions in the loop regions of all tRNAs, which always have the same base and 8 semi-invariant positions, which have only a purine or a pyrimidine base. Each tRNA must have at least two such recognition sites: one for the activated amino acid-enzyme complex with which it must react to form the aminoacyl-tRNA and another for the site on a mRNA molecule which contains the codon for that particular amino acid. The former involves recognition by bases of amino acid residues (either of activated amino acid or of a site on enzyme molecule) whereas the latter involves recognition of base by bases (H-bonding). Some unusual base pairing patterns in tRNA: G:U base pairs are common in RNA duplex structures. In codon-anticodon stable contact, this G:U pairs can contribute only in the last position of codon. Codon-anticodon pairing involves wobbling at the third position. The most direct effect of modification is seen in the anticodon where change of sequence influences the ability to pair with the codon, thus determining the meaning of tRNA. Modifications elsewhere in the vicinity of the anticodon also influences its pairing. Where bases in the anticodon are modified, further pairing patterns become possible in addition to those predicted by the regular and Wobble pairing involving A, C, U and G. The G:U base pairs enhances capacity 36

of self complementarity. Inosine (I) is often present at the first position of anticodon. Inosine can pair with any one of three bases, U, C and A, but not with G. This ability is especially important in the Ile codons, where AUA codes for Ile, while AUG codes for Met. Because with the usual bases, it is not possible to recognize A alone in the third position, any tRNA with U starting its anticodon would have to recognize AUG as well as AUA. So AUA must be read together with AUU and AUC, a problem that is solved by the existence of tRNA with inosine in the anticodon. 4-Thiouracil base pairs only with A. Quenosine are modified G bases. These modified G bases continue to recognize both C and U, but pair with U more readily. Tertiary structure The tertiary structure of tRNA was described by Alexander Rich and Aaron Klug in 1960s and was found to be of twisted L shaped (Fig. 22).

Fig. 22: 3-D Tertiary (L-shaped) structure of tRNA There are two segments of double helix. They are like A-DNA, as expected for an RNA duplex. Each of these helices contains about 10 base pairs, which corresponds to one turn of the helix. These helical segments are perpendicular to each other, giving the molecule its L-shape. Most of the bases in the non-helical regions participate in unusual H-bonding interactions. These tertiary interactions are between bases that are not usually complementary (eg. GG, AA and AC). Moreover, the ribose-phosphate backbone interacts with some bases and even with another region of the backbone itself. The 2-OH groups of ribose units act as H-bond donors or acceptors in many of these interactions. In addition, most bases are stacked. These hydrophobic interactions between adjacent aromatic rings play a major role in stabilizing the architecture of the molecule. The amino acid (or amino acid acceptor) arm and TC arm form a continuous double helix and the anticodon (AC) arm and DHU arm form the other partially continuous double helix. The two helical columns meet to form a twisted L-shaped molecule. The CCA terminus and the adjacent helical region do not interact strongly with the rest of the molecule. This part of the molecule may change conformation during amino acid activation and also during protein synthesis on the ribosome. 37

The acceptor stem and the stem of the C loop form an extended helix in the final tRNA structure. Similarly, the anticodon stem and the stem of the D-loop form a second extended helix. These two extended helices align at a right angle to each other, with the D-loop and the C loop coming together. In the final stage, the two extended helices adopt their proper helical configuration. These structures reveal that base stacking plays a major role in RNA conformation, for example 72 out of the 76 bases in tRNA are involved in stacking interactions. As in the DNA double helix structure, stacking of RNA bases on top of one another is energetically favourable. For this reason, short base paired, helical regions of RNA stack on top of one another to form longer, discontinuous helical regions. These regions of stacked helices then pack against each other via additional tertiary interactions. Four kinds of interactions stabilize the twisted L-shaped structure: By forming base triples: The first stabilizing interactions are H-bonds between bases in different helical regions that are brought near each other in 3-D space by the tertiary structure. These are generally unconventional (non Watson-Crick) bonding. Such type of bonding is also called Hoogsteen base pairing. By base backbone interactions: The second stabilizing interactions are the interactions between the bases and the sugar phosphate backbone. By base stacking: The third kind of stabilizing interaction is the additional base stacking gained from formation of the two extended regions of base pairing. Action of 2-OH of ribose: The presence of 2-OH in RNA backbone prevents RNA from adopting B-form helix.

(C)

Heterogeneous nuclear RNA (hnRNA)

It comprises transcripts of nuclear genes made by RNA polymerase. It has wide size distribution and low stability. In mammalian cells including those of human beings, a precursor RNA is first synthesized in the nucleoplasm by DNA dependent RNA polymerase. This precursor is then degraded by nuclear nuclease to mRNA that is then translocated to cytoplasm where it becomes associated to ribosomal system. This precursor RNA constitutes the fourth class of RNA molecules and is designated as heterogeneous nuclear RNA (hnRNA). The hnRNA molecules may have molecular weights exceeding 107 D whereas the mRNA molecules are generally smaller than 2 x 106 D. Most mammalian mRNA molecules are 400-4000 nucleotides in length whereas an hnRNA molecule possesses 5000-50000 nucleotides. Some uncertainty still exists concerning the precursorproduct relationship between hnRNA and mRNA, the former being 10-100 times longer than the latter. Thus, the hnRNA molecules appear to be processes to generate the mRNA templates for protein synthesis. Eukaryotes contain a vast majority of interrupted genes. Genes vary widely according to the number and lengths of introns, but the typical mammalian gene has 7-8 exons spread out over ~16 kb. The exons are relatively short (~100-200 bp) and the introns are relatively long (>1 kb). The discrepancy between the interrupted organization of gene and uninterrupted organization of its mRNA requires processing of the primary transcription products. The primary transcript has the same organization as the gene and is sometimes called premRNA. Removal of introns from pre-mRNA leaves a typical messenger of ~2.2 kb. The average size of hnRNA is much larger than mRNA, it is very unstable and has a much greater sequence complexity. Taking its name from its broad size distribution, it was called hnRNA. It includes pre-mRNA but could also include other transcripts.

38

The physical form of hnRNA is a ribonucleoprotein particle (hnRNP) in which the hnRNA is bound by proteins. As characterized in vitro, an hnRNP particle takes the form of beads connected by a fiber. The hnRNP is organized in 40S particles. The most abundant proteins in the particle are core proteins, but other proteins are present at lower stoichiometry, making a total of ~20 proteins. The proteins typically are present at ~108 copies per nucleus, compared with ~106 molecules of hnRNA. Some of the proteins may have a structural role in packaging the hnRNA, several are known to shuttle between the nucleus and cytoplasm and play roles in exporting the RNA or otherwise controlling its activity.

Suggested Reading
1. Berg J.M., Tymoczko J.L., Stryer L., Biochemistry, International Edition, V Edition, W.H. Freeman & Co. New York. 2. Watson J.D., Baker T.A., Bell S.P., Gann A., Levine M., Losick R., Molecular Biology of the Gene, V Edition, Pearson Education. 3. Lewin B., Genes VIII, International Edition, Pearson Education International. 4. Glick B.R., Pasternak J.J., Molecular Biotechnology Principles and Applications of Recombinant DNA, III Edition, ASM Press. 5. Turner P.C., McLennan A.G., Bates A.D., White M.R.H., Instant Notes, Molecular Biology, II Edition, Viva Books Pvt. Ltd. 6. Das H.K., Textbook of Biotechnology, Wiley Dreamtech. 7. Voet D., Voet J.G., Biochemistry, John Wiley & Sons. 8. Nelson D.L., Cox M.M., Lehninger Principles of Biochemistry, IV Edition, W.H. Freeman & Co., New York. 9. Twymann R.M., Advanced Molecular Biology, Viva Books Pvt. Ltd. 10. Brown T.A., Genomes 2, Wiley Liss Publ.

39

You might also like