You are on page 1of 36

Keywords Genetic code - Origin of Iife - DoubIet code - Pattern - Amino acid properties

Abstract Since the earIy days of the discovery of the genetic code nonrandom patterns have been
searched for in the code in the hope of providing information about its origin and earIy evoIution. Here we
present a new cIassification scheme of the genetic code that is based on a binary representation of the
purines and pyrimidines. This scheme reveaIs known patterns more cIearIy than the common one, for
instance, the cIassification of strong, mixed, and weak codons as weII as the ordering of codon famiIies.
Furthermore, new patterns have been found that have not been described before: NearIy aII quantitative
amino acid properties, such as Woeses poIarity and the specific voIume, show a perfect correIation to
Lagerkvists codon-anticodon binding strength. Our new scheme Ieads to new ideas about the evoIution
of the genetic code. It is hypothesized that it started with a binary doubIet code and deveIoped via a
quaternary doubIet code into the contemporary tripIet code. Furthermore, arguments are presented
against suggestions that a simpIer code, where onIy the midbase was informationaI, was at the origin of
the genetic code.
Popular Text Citations
Osawa. S. 1995. Evolution of the genetic code. Oxford Universitv Press. Oxford. UK
Jungck. J. R. 1984. The adaptationist programme in molecular evolution. The origins of genetic code.
Pp. 345-364 in Matsuno. K, Dose. K, Harada. K, Rohlfing. D. L.. eds.. Molecular Evolution and
Protobiologv. Plenum Press. New York
Ratner. J. A.. et al. 1996. Molecular Evolution. Biomathematics Jolume 24. Springer Jerlag. New
York See chapter 3.2. Noise immunitv of the genetic code and texts. pp.44-50.
#esearch Articles
Bertman. M O., Jungck J. R. 1978. Some unresolved mathematical problems in genetic coding. Notices
of the American Mathematical Societv 25.A-174.
Bertman. M O., Jungck J. R. 1979. Group graph of the genetic code. Journal of Hereditv
70.379-384.
Cedergren. R., Miramontes. P. 1996. The puzzling origin of the genetic code. Trends on the
Biochemical Sciences 21.199-200.
Crick. F. H. C., Barnett. L., Brenner. S., Watts-Tobin. R. J. 1961. General nature of the genetic code
for proteins. Nature 192.1227-1232.
Jimenez-Montano. M A.. et al. 1995. On the hvpercube structure of the genetic code. In Lim. A., Cantor.
C. R.. eds.. Biolnformatics and Genome Research. World Scientific. Singapore. p.445.
Jimenez-Montano. M A.. et al. 1996. The hvpercube structure of the genetic code explains
conservative and non-conservative amino acid substitutions in vivo and in vitro. BioSvstems
39.117-125.
Jungck. J. R. 1978. The genetic code as a periodic table. Journal of Molecular Evolution
11.211-224.
Shepard. J. C. 1981. Method to determine the reading frame of a protein from the
purine/pvrimidine genome sequence and its possible evolutionarv iustification. Proceedings of the
National Academv of Sciences USA 78.1596-1600.
/ucation #esearch & Pe/agogical Materials
Bergland. M 1996. DNA Electrophoresis. The BioQUEST Librarv IJ.(Extended Learning
Resources). Universitv of Marvland cif

College Park
Jungck. J. R. 1977. Complementarilv and coding. Journal of College Science Teaching 7.27-28.
Jungck. J. R., Friedman. R. M 1984. Mathematical tools for molecular genetics data. an
annotated bibliographv. Bulletin of Mathematical Biologv 46. 699-744.
Gilbert. D. 1996. SeqApp. SeqPup. Dottv Plotter. GelFragSizer. GenBank Search. and
LoopDLoop/Loop Jiewer. The BioQUEST Librarv IJ. (Support Materials Archive). Universitv of
Marvland elf

College Park
Amino acid difference formula to help explain protein
evolution.
Grantham R.
A Iormula Ior /iIerence between amino aci/s combines properties that correlate best with protein resi/ue
substitution Irequencies: composition, polarity, an/ molecular volume. Substitution Irequencies agree
much better with overall chemical /iIIerence between exchanging resi/ues than with minimum base
changes between their co/ons. Correlation coeIIicients show that Iixation oI mutations between
/issimilar amino aci/s is generally rare.
Recent evidence for evolution of the genetic code.
Osawa S. 1ukes TH. Watanabe K. Muto A.Collaborators (1)
1ukes TH.
Department oI Biology, Nagoya University, Japan.
The genetic co/e, Iormerly thought to be Irozen, is now known to be in a state oI evolution. This was Iirst
shown in 1979 by Barrell et al. (G. Barrell, A. T. Bankier, an/ J. Drouin, Nature |Lon/on| 282:189-194,
1979), who Ioun/ that the universal co/ons AUA (isoleucine) an/ UGA (stop) co/e/ Ior methionine an/
tryptophan, respectively, in human mitochon/ria. Subsequent stu/ies have shown that UGA co/es Ior
tryptophan in Mycoplasma spp. an/ in all nonplant mitochon/ria that have been examine/. Universal
stop co/ons UAA an/ UAG co/e Ior glutamine in ciliate/ protozoa (except uplotes octacarinatus) an/
in a green alga, Acetabularia. . octacarinatus uses UAA Ior stop an/ UGA Ior cysteine. Can/i/a
species, which are yeasts, use CUG (leucine) Ior serine. Other /epartures Irom the universal co/e, all in
nonplant
mitochon/ria, are CUN (leucine) Ior threonine (in yeasts), AAA (lysine) Ior asparagine (in
platyhelminths an/ echino/erms), UAA (stop) Ior tyrosine (in planaria), an/ AG# (arginine) Ior serine (in
several animal or/ers) an/ Ior stop (in vertebrates). We propose that the changes are typically prece/e/ by
loss oI a co/on Irom all co/ing sequences in an organism or organelle, oIten as a result oI /irectional
mutation pressure, accompanie/ by loss oI the t#NA that translates the co/on. The co/on reappears
later by conversion oI another co/on an/ emergence oI a t#NA that translates the reappeare/ co/on with
a /iIIerent assignment. Changes in release Iactors also contribute to these revise/ assignments. We also
/iscuss the use oI UGA (stop) as a selenocysteine co/on an/ the early history oI the co/e.
The genetic code as a periodic table.
1ungck 1R.
The contemporary genetic co/e is reIlective oI a signiIicant correlation between the properties oI amino
aci/s an/ their antico/ons in a perio/ic manner. Almost all properties oI amino aci/s showe/ a greater
correlation to anticon/onic than to co/onic /inucleosi/e monophosphate properties. The polarity an/
bulkiness oI amino aci/ si/e chains can be use/ to pre/ict the antico/on with consi/erable conIi/ence.
The results are most consistent with pre/ictions oI the "/irect interaction" an/ "ambiguity re/uction"
hypotheses Ior the origin oI the genetic co/e.
Directional mutation pressure. mutator mutations. and
dynamics of molecular evolution.
Sueoka N.
Department oI Molecular, Cellular, an/ Developmental Biology, University oI Colora/o, Boul/er
80309-0347.
Using a general Iorm oI the /irectional mutation theory, this paper analyzes the eIIect oI mutations in
mutator genes on the GC content oI DNA, the Irequency oI substitution mutations, an/ evolutionary
changes (cumulative mutations) un/er various /egrees oI selective constraints. Directional mutation theory
pre/icts that when the mutational bias between A/T an/ G/C nucleoti/e pairs is equilibrate/ with the
base composition oI a neutral set oI DNA nucleoti/es, the mutation Irequency per gene will be much
lower than the Irequency imme/iately aIter the mutator mutation takes place. This pre/iction explains
the wi/e variation oI the DNA GC content among unicellular organisms an/ possibly also the wi/e
intragenomic heterogeneity oI thir/ co/on positions Ior the genes oI multicellular eukaryotes. The
present analyses lea/ to several pre/ictions that are not consistent with a number oI the Irequently hel/
assumptions in the Iiel/ oI molecular evolution, inclu/ing belieI in a constant rate oI evolution,
symmetric branching oI phylogenetic trees, the generality oI higher mutation Irequency Ior neutral sets oI
nucleoti/es, the notion that mutator mutations are generally /eleterious because oI their high mutation
rates, an/ teleological explanations oI DNA base composition.
Genetic code correlations: amino acids and their anticodon
nucleotides.
Weber AL. Lacey 1C 1r.
1he data here show direct correlations between both the hvdrophobicitv and the hvdrophilicitv o
the homocodonic amino acids and their anticodon nucleotides. \hile the dierences between
properties o uracil and cvtosine deriaties are small. urther data show that uracil has an
ainitv or charged species. Although these data suggest that molecular relationships between
amino acids and anticodons were responsible or the origin o the code. it is not clear what the
mechanism o the origin might hae been.
Group graph of the genetic code.
Bertman MO. 1ungck 1R.
1he genetic code doublets can be diided into two octets o completelv degenerate and
ambiguous coding dinucleotides. 1hese two octets hae the algebraic propertv o lving on
continuouslv connected planes on the group graph a tesseract, o the (artesian product o two
Klein 4-groups o nucleotide exchange operators. 1he K X K group can also be broken into our
cosets. one o which has completelv degenerate coding elements. and another that has
completelv ambiguous coding elements. 1he two octets o coding doublets hae the urther
algebraic propertv that the product o their internal exchange operators naturallv diide into two
exactlv equialent sets. 1hese properties o the genetic code are releant to unraeling
error-detecting and error-correcting proo-reading, aspects o the genetic code and mav be
helpul in understanding the context-sensitie grammar o genetic language.
The regularity of changes of the Chou-Fasman parameters
within the genetic code.
Siemion IZ.
Institute o (hemistrv. \roctaw Uniersitv. Poland.
It has been shown that (hou-lasman conormational parameters o amino acids. which relect
their abilitv to adopt a deinite conormation within the peptide chain. change erv regularlv
within the genetic code. arranged in the manner discussed recentlv bv Siemion and Steanowicz
1992a, BioSvstems 2. -84,. 1wo mutuallv perpendicular (2 axes o pseudosvmmetrv
appear in the center o the diagrams between A(\ and A(R threonine codons, presenting the
changes o P alpha and P beta parameters. 1he let and right parts o diagrams superimpose on
each other quite well when the svmmetrv operation inoling a proper axis is perormed. 1his
phenomenon is due. in our opinion. to the regular arrangement o equialent codons in the
'one-step mutation' ring ormed bv 64 triplets o the genetic code.
Biased distribution of adenine and thymine in gene
nucleotide sequences.
Mrizek 1. Knot. 1.
Institute o Biophvsics. Academv o Sciences o the (zech Republic. Brno.
\e analvzed occurrences o bases in 20.352 introns. exons o 25.54 protein-coding genes. and
among the three codon positions in the protein-coding sequences. 1he nucleotide sequences
originated rom the whole spectrum o organisms rom bacteria to primates. 1he analvsis
reealed the ollowing: 1, In most exons. adenine dominates oer thvmine In other words.
adenine and thvmine are distributed in an asvmmetric wav between the exon and the
complementarv strand. and the coding sequence is mostlv located in the adenine-rich strand. 2,
1hvmine dominates oer adenine not onlv in the strand complementarv to the exon but also in
introns. 3, A general bias is urther reealed in the distribution o adenine and thvmine among
the three codon positions in the exons. where adenine dominates oer thvmine in the second and
mainlv the irst codon position while the reerse holds in the third codon position. 1he product
A111,xA212,x13A3, is smaller than one in onlv a ew analvzed genes.
The informational context of the third base in amino acid
codons.
Siemion IZ. Siemion P1.
Institute o (hemistrv. \roclaw Uniersitv. Poland.
It is shown that in the pairs o amino acids coded bv the codons possesing identical bases in the
irst and second positions. the amino acids with R in the third position are o higher structural
importance which is relected bv higher alues o P alpha - P beta, sums o (hou-lasman
conormational parameters,. and o stronger helix orming potentials relected bv the
dierences P alpha - P beta,,. than the amino acids coded with \. 1he same structural actors seem
to be o importance or the codon choice in the case o amino acids coded bv more than two
codons. 1he amino acids which preer alpha-helical oer the beta-sheet conormation aour the
codons with R in their third position. and those which aour the beta-sheet conormation
aour the codons with \ in this position.
The origin and evolution of the genetic code.
Beland P. Allen TF.
St Lawrence National Institute o Lcotoxicologv. Montreal. Ouebec. (anada.
\e argue that a primitie genetic code with onlv 20 separate words explains that there are 20
coded amino acids in modern lie. 1he existence o 64 words on the modern genetic code
requires modern lie to read almost exclusielv one strand o DNA in one direction. In our
primitie code. both the original and the complementarv sequence are read in either direction to
gie the same strings o amino acids. 1he algebra o complements orces svnonvmv o primitie
Szathmary L.
co/ons so as to re/uce the 64 in/epen/ent co/ons oI the mo/ern co/e to exactly 20 in/epen/ent separate
wor/s in the primitive con/ition. The synonymy in the mo/ern co/e is the result oI selection rather than
algebraic Iorcing. The primitive co/e has almost no resilience to base mutations, unlike the thir/ base
re/un/ancy oI the mo/ern co/e. Our primitive an/ the mo/ern co/e are orthogonal. II palin/romic proteins
were co/e/ by hairpin DNA or #NA, then (i) no punctuation woul/ be nee/e/ (ii) the reverse rea/ing
woul/ give the same secon/arily Iol/e/ protein structure an/ (iii) the sugar backbone woul/ be rea/ in the
conventional 5' to 3' /irection Ior the original arm an/ its complement. Mo/ern copying oI genetic material
is almost always antiparallel. However, occasional parallel copying, as /oes occur in mo/ern liIe, woul/ give
the complementary hairpin that woul/ also rea/ 5' to 3' along its entire length.(
Codon evolution and conservation of the reading phase in
genetic code translation.
%4ha JC, D43484 R, L8tav M, Diaz-Valde8 J.
Departmento De Fisica Faculta/ /e Ciencias Fisicas y Matematicas Universi/a/ /e Chile, Santiago.
The /escription oI the optimize/ evolution oI a co/e base/ on 4 nucleoti/es involves a sequential transition oI
co/ons, Iorme/ Iirstly by monomers evolving to /imers an/ then to triplets, in accor/ance with the
progressive increase oI the number oI amino aci/s to be co/e/. The successive increase in the size oI
these co/ons /uring evolution implies changes in the phase rea/ing oI the genetic message, which coul/
become chaotic. In or/er to overcome this constraint, this paper proposes a co/on evolution where two
things occur simultaneously: co/ons change in size an/ there is an alternation oI the molecule which hol/s
the inIormation. For example, the nucleoti/es oI the original oligonucleoti/e are rea/ as monomers when
they are translate/ to an oligopepti/e, but Iurther on, this oligopepti/e which is rea/ as amino aci/
/imers, is translate/ to a nucleoti/e Iorm (oligonucleoti/e). Finally, amino aci/s conIorming a pepti/e are
translate/ Irom this oligonucleoti/e, through a rea/ing oI triplets. Although plausible, this evolution is a
low-probability process /ue to the Iact that it requires a singular sequence oI the oligonucleoti/e an/
oligopepti/e involve/. An alternative hypothesis oI evolution is also /iscusse/. It proposes that with the
exclusion oI the establishment oI monomer an/ /imer co/ons, there is a /irect generation oI a co/e oI
trinucleoti/es which arises only when a certain
number oI amino aci/s has alrea/y been generate/. Both hypotheses are /iscusse/ in terms oI the /evelopment
oI a co/e in which an optimize/ har/ware is maintaine/ through out its evolution.
volution of the genetic triplet code via two types of doublet
codons.
Wu HL. agbv S, va3 de3 Ll8e3 JM.
Department oI Biology an/ Biochemistry, University oI Bath, 4 South, Claverton Down, Bath BA2
7AY, UK.
xplaining the apparent non-ran/om co/on /istribution an/ the nature an/ number oI amino aci/s in the
'stan/ar/' genetic co/e remains a challenge, /espite the various hypotheses so Iar
Szathmary L.
propose/. In this paper we propose a simple new hypothesis Ior co/e evolution involving a progression
Irom singlet to /oublet to triplet co/ons with a rea/ing mechanism that moves three bases each step. We
suggest that triplet co/ons gra/ually evolve/ Irom two types oI ambiguous /oublet co/ons, those in which
the Iirst two bases oI each three-base win/ow were rea/ ('preIix' co/ons) an/ those in which the last two bases
oI each win/ow were rea/ ('suIIix' co/ons). This hypothesis explains multiple Ieatures oI the genetic co/e
such as the origin oI the pattern oI Iour-Iol/ /egenerate an/ two-Iol/ /egenerate triplet co/ons, the origin
oI its error minimising properties, an/ why there are only 20 amino aci/s.
A new classification scheme of the genetic code.
ilhelm %, Nik4la'ewa S.
Institute oI Molecular Biotechnology, Beutenbergstr. 11, 07745 Jena, Germany.
wilhelm(imb-iena./e
Since the early /ays oI the /iscovery oI the genetic co/e nonran/om patterns have been searche/ Ior in the
co/e in the hope oI provi/ing inIormation about its origin an/ early evolution. Here we present a new
classiIication scheme oI the genetic co/e that is base/ on a binary representation oI the purines an/
pyrimi/ines. This scheme reveals known patterns more clearly than the common one, Ior instance, the
classiIication oI strong, mixe/, an/ weak co/ons as well as the or/ering oI co/on Iamilies. Furthermore,
new patterns have been Ioun/ that have not been /escribe/ beIore: Nearly all quantitative amino aci/
properties, such as Woese's polarity an/ the speciIic volume, show a perIect correlation to Lagerkvist's
co/on-antico/on bin/ing strength. Our new scheme lea/s to new i/eas about the evolution oI the genetic
co/e. It is hypothesize/ that it starte/ with a binary /oublet co/e an/ /evelope/ via a quaternary /oublet
co/e into the contemporary triplet co/e. Furthermore, arguments are presente/ against suggestions that a
"simpler" co/e, where only the mi/base was inIormational, was at the origin oI the genetic co/e.
Guilt by association: the arginine case revisited.
3ight RD, La3dweber LI.
Department oI cology an/ volutionary Biology, Princeton University, New Jersey 08544-
1003, USA.
II the genetic co/e arose in an #NA worl/, present co/on assignments may reIlect primor/ial #NA-
amino aci/ aIIinities. Whether aptamers selecte/ Irom ran/om pools to bin/ Iree amino aci/s /o so using
the cognate co/ons at their bin/ing sites has been controversial. Here we /eIen/ an/ exten/ our previous
analysis oI arginine bin/ing sites, an/ propose a mo/el Ior the maintenance oI co/on-amino aci/
interactions through the evolution oI amino aci/s Irom ribozyme coIactors into the buil/ing blocks oI
proteins
The origin of the genetic code: amino acids as cofactors in an
RNA world.
Szathmary L.
Department oI Plant Taxonomy an/ cology, iitvos University, Bu/apest an/ Collegium Bu/apest,
Szentharomsag u. 2, H-1014 Bu/apest, Hungary. szathmary(colbu/.hu
The genetic co/e, un/erstoo/ as the speciIic assignment oI amino aci/s to nucleoti/e triplets, might have prece/e/
the existence oI translation. Amino aci/s became utilize/ as coIactors byribozymes in a metabolically complex #NA
worl/. SpeciIic charging ribozymes linke/ amino aci/s to correspon/ing #NA han/les, which coul/ basepair with
/iIIerent ribozymes, via an antico/on hairpin, an/ so /eliver the coIactor to the ribozyme. Growing oI the 'han/le' into
a presumptive t#NA was possible while Iunction was retaine/ an/ mo/iIie/ throughout. A stereochemical
relation between some amino aci/s an/ cognate antico/ons/co/ons is likely to have been important in the earliest
assignments. #ecent experimental Iin/ings, inclu/ing selection Ior ribozymes catalyzing pepti/e-bon/ Iormation
an/ those utilizing an amino aci/ coIactor, hol/ promise that scenarios oI this maior transition can be teste/.
On the origin of the translation system and the genetic code
in the RNA world by means of natural selection. exaptation.
and subfunctionalization.
WoII YI, 443i3 LV.
National Center Ior Biotechnology InIormation, National Library oI Me/icine, National Institutes oI Health,
Bethes/a, MD 20894, USA. wolI(ncbi.nlm.nih.gov
BACKG#OUND: The origin oI the translation system is, arguably, the central an/ the har/est problem in the stu/y oI
the origin oI liIe, an/ one oI the har/est in all evolutionary biology. The problem has a clear catch-22 aspect: high
translation Ii/elity har/ly can be achieve/ without a complex, highly evolve/ set oI #NAs an/ proteins but an elaborate
protein machinery coul/ not evolve without an accurate translation system. The origin oI the genetic co/e an/
whether it evolve/ on the basis oI a stereochemical correspon/ence between amino aci/s an/ their cognate co/ons (or
antico/ons), through selectional optimization oI the co/e vocabulary, as a "Irozen acci/ent" or via a combination oI all
these routes is another wi/e open problem /espite extensive theoretical an/ experimental stu/ies. Here we combine the
results oI comparative genomics oI translation system components, /ata on interaction oI amino aci/s with their cognate
co/ons an/ antico/ons, an/ /ata on catalytic activities oI ribozymes to /evelop conceptual mo/els Ior the origins oI
the translation system an/ the genetic co/e. #SULTS: Our main gui/e in constructing the mo/els is the
Darwinian Continuity Principle whereby a scenario Ior the evolution oI a complex system must consist oI
plausible elementary steps, each conIerring a /istinct a/vantage on the evolving ensemble oI genetic elements.
volution oI the translation system is envisage/ to occur in a compartmentalize/ ensemble oI replicating, co-selecte/
#NA segments, i.e., in a #NA Worl/ containing ribozymes with versatile activities. Since evolution has no
Ioresight, the translation system coul/ not evolve in the #NA Worl/ as the result oI selection Ior protein synthesis
an/ must have been a by-pro/uct oI evolution /rive by selection Ior another Iunction, i.e., the translation system
evolve/ via the exaptation route. It is propose/ that the evolutionary process that eventually le/ to the emergence oI
translation starte/ with the selection Ior ribozymes bin/ing abiogenic amino aci/s that stimulate/ ribozyme-catalyze/
reactions. The propose/ scenario Ior the evolution oI translation consists oI the Iollowing steps:
bin/ing oI amino aci/s to a ribozyme resulting in an enhancement oI its catalytic activity evolution oI
the amino-aci/-stimulate/ ribozyme into a pepti/e ligase (pre/ecessor oI the large ribosomal subunit)
yiel/ing, initially, a unique pepti/e activating the original ribozyme an/, possibly, other ribozymes in the
ensemble evolution oI selI-charging proto-t#NAs that were selecte/, initially, Ior accumulation oI
amino aci/s, an/ subsequently, Ior /elivery oI amino aci/s to the pepti/e ligase ioining oI the pepti/e
ligase with a /istinct #NA molecule (pre/ecessor oI the small ribosomal subunit) carrying a built-in
template Ior more eIIicient, complementary bin/ing oI charge/ proto-t#NAs evolution oI the ability oI the
pepti/e ligase to assemble pepti/es using exogenous #NAs as template Ior complementary bin/ing oI
charge/ proteo-t#NAs, yiel/ing pepti/es with the potential to activate /iIIerent ribozymes evolution oI
the translocation Iunction oI the protoribosome lea/ing to the pro/uction oI increasingly longer pepti/es (the
Iirst proteins), i.e., the origin oI translation. The speciIics oI the recognition oI amino aci/s by proto-
t#NAs an/ the origin oI the genetic co/e /epen/ on whether or not there is a physical aIIinity between
amino aci/s an/ their cognate co/ons or antico/ons, a problem that remains unresolve/. CONCLUSION:
We /escribe a stepwise mo/el Ior the origin oI the translation system in the ancient #NA worl/ such
that each step conIers a /istinct a/vantage onto an ensemble oI co-evolving genetic elements. Un/er this
scenario, the primary cause Ior the emergence oI translation was the ability oI amino aci/s an/ pepti/es to
stimulate reactions catalyze/ by ribozymes. Thus, the translation system might have evolve/ as the
result oI selection Ior ribozymes capable oI, initially, eIIicient amino aci/ bin/ing, an/ subsequently,
synthesis oI increasingly versatile pepti/es. Several aspects oI this scenario are amenable to experimental
testing.
Obcells as proto-organisms: membrane heredity.
lithophosphorylation. and the origins of the genetic code. the
first cells. and photosynthesis.
Cavalier-Smith %.
Department oI Zoology, University oI OxIor/, South Parks #oa/, OxIor/, OX1 3PS, Unite/ King/om.
tom.cavalier-smith(zoo.ox.ac.uk
I attempt to sketch a uniIie/ picture oI the origin oI living organisms in their genetic, bioenergetic, an/
structural aspects. Only selection at a higher level than Ior in/ivi/ual selIish genes coul/ power the
cooperative macromolecular coevolution require/ Ior evolving the genetic co/e. The protein synthesis
machinery is too complex to have evolve/ beIore membranes. ThereIore a symbiosis oI membranes,
replicators, an/ catalysts probably me/iate/ the origin oI the co/e an/ the transition Irom a nucleic aci/
worl/ oI in/epen/ent molecular replicators to a nucleic aci//protein/lipi/ worl/ oI repro/ucing organisms.
Membranes initially Iunctione/ as supramolecular structures to which /iIIerent replicators attache/
an/ were selecte/ as a higher-level repro/uctive unit: the proto-organism. I /iscuss the roles oI
stereochemistry, gene /ivergence, co/on capture, an/ selection in the co/e's origin. I argue that proteins were
primarily structural not enzymatic an/ that the Iirst biological membranes consiste/ oI amphipathic
pepti/yl-t#NAs an/ prebiotic mixe/ lipi/s. The pepti/yl-t#NAs Iunctione/ as genetically-speciIie/
lipi/ analogues with hy/rophobic tails (ancestral signal pepti/es) an/ hy/rophilic polynucleoti/e hea/s.
Protoribosomes arose Irom two cooperating #NAs: pepti/yl
transIerase (large subunit) an/ m#NA-bin/er (small subunit). arly proteins ha/ a secon/ key role: coupling energy
Ilow to the phosphorylation oI gene an/ pepti/e precursors, probably by lithophosphorylation by membrane-anchore/
kinases scavenging geothermal polyphosphate stocks. These key evolutionary steps probably occurre/ on the outer
surIace oI an 'insi/e out-cell' or obcell, which evolve/ an unambiguous hy/rophobic co/e with Iour prebiotic amino aci/s
an/ proline, an/ initiation by isoleucine antico/on CAU early proteins an/ nucleozymes were all membrane-attache/.
To improve replication, translation, an/ lithophosphorylation, hy/rophilic substrate-bin/ing an/ catalytic /omains were
later a//e/ to signal pepti/es, yiel/ing a ten-aci/ /oublet co/e. A primitive proto-ecology oI molecular scavenging,
parasitism, an/ pre/ation evolve/ among obcells. I propose a new theory Ior the origin oI the Iirst cell: Iusion oI
two cup-shape/ obcells, or hemicells, to make a protocell with /ouble envelope, internal genome an/ ribosomes,
protocytosol, an/ periplasm. Only then /i/ water-soluble enzymes, amino aci/ biosynthesis, an/ interme/iary
metabolism evolve in a concentrate/ autocatalytic internal cytosolic soup, causing 12 new amino aci/ assignments,
termination, an/ rapi/ Ireezing oI the 22-aci/ co/e. Antico/ons were recruite/ sequentially: GNN, CNN, INN, an/
*UNN. CO2 Iixation, photore/uction, an/ lipi/ synthesis probably evolve/ in the protocell beIore
photophosphorylation. Signal recognition particles, chaperones, compartmente/ proteases, an/ pepti/oglycan arose prior
to the last common ancestor oI liIe, a complex autotrophic, anaerobic green bacterium.
Molecular evolution before the origin of species.
Davi8 .
#esearch Foun/ation oI Southern CaliIornia, Inc., La Jolla, CA 92037, USA.
/avisrIsc(yahoo.com
Amino aci/s at conserve/ sites in the resi/ue sequence oI 10 ancient proteins, Irom 844 phylogenetically /iverse
sources, were use/ to speciIy their time oI origin in the interval beIore species /ivergence Irom the last common
ancestor (LCA). The or/er oI amino aci/ a//ition to the genetic co/e, base/ on biosynthesis path length an/ other
molecular evi/ence, provi/e/ a reIerence Ior evaluating the 'co/e age' oI each resi/ue proIile examine/. SigniIicantly
earlier estimates were obtaine/ Ior conserve/ amino aci/ resi/ues in these proteins than non-conserve/ resi/ues.
vi/ence Irom the primary structure oI 'Iossil' proteins thus corroborate/ the biosynthetic or/er oI amino aci/
a//ition to the co/e.Low potential Ierre/oxin (F/xn) ha/ the earliest resi/ue proIile among the proteins in this stu/y.
A phylogenetic tree Ior 82 prokaryote F/xn sequences was roote/ mi/way between bacteria an/ archaea branches.
LCA F/xn ha/ a 23-resi/ue antece/ent whose resi/ue proIile matche/ mi/-expansion phase co/on assignments an/
inclu/e/ an ami/e resi/ue. It containe/ a highly aci/ic N-terminal region an/ a non-charge/ C-terminal region, with
all Iour cysteine resi/ues. This small protein apparently anchore/ a |4Fe-4S| cluster, ligate/ by C-terminal
cysteines, to a positively charge/ mineral surIace, consistent with me/iating e(-) transIer in a primor/ial surIace
system beIore cells appeare/. Its negatively charge/ N-terminal 'attachment site' was highly mutable /uring evolution oI
ancestral F/xn Ior Bacteria an/ Archaea, consistent with a loss oI Iunction aIter cell Iormation. An initial glutamate to
lysine substitution may link 'attachment site' removal to early post-expansion phase entry oI basic amino aci/s to the
co/e. As proteins evi/ently anchore/ non-charge/ ami/e resi/ues initially, surIace attachment oI coIactors an/
other Iunctional groups emerges as a
general Iunction oI pre-cell proteins.A phylogenetic tree oI 107 proteolipi/ (PL) helix-1 sequences
Irom H()-ATPase oI bacteria, archaea an/ eukaryotes ha/ its root between prokaryote branches. LCA
PL hl resi/ue proIile optimally Iit a late expansion phase co/on array. Sequence repeats in
transmembrane PL helices hl an/ h2 in/icate/ Iormation oI the archetypal PL hairpin structure
involve/ successive tan/em /uplications, initiate/ within the gene Ior an 11-resi/ue (or 4-resi/ue)
hy/rophobic pepti/e. Ancestral PL hl lacke/ aci/ic resi/ues, in a Iun/amental /eparture Irom the
prototype pre-cell protein. By this stage, proteins
with a hy/rophobic /omain ha/ evolve/. Its non-polar, late expansion phase resi/ue proIile point to
ancestral PLbeing a component oI an earlypermeable cell membrane. Other in/icators oI cell Iormation about
this stage oI co/e evolution inclu/e phospholipi/ biosynthesis path length, FtsZresi/ue proIile, an/ late entry
oI basic amino aci/s into the genetic co/e.stimates base/ on
conserve/ resi/ues in prokaryote cell septation protein, FtsZ, an/ proteins involve/ with synthesis,
transcription an/ replication oI DNA reveale/ FtsZ, ribonucleoti/e re/uctase, #NA polymerase core
subunits an/ 5'--~3' Ilap exonuclease, FN-1, originate/ soon aIter cells putatively evolve/. While
reverse transcriptase an/ topoisomerase I, Topo I, appeare/ late in the pre-/ivergence era, when the
genetic co/e was essentially complete. The transition Irom #NA genes to a DNA genome seemingly
procee/e/ via Iormation oI a DNA-#NA hetero/uplex. These results suggest Iormation oI DNA
awaite/ evolution oI a catalyst with a hy/rophobic /omain, capable oI sequestering ra/ical bearing
interme/iates in its synthesis Irom ribonucleoti/e precursors. Late Iormation oI topology altering protein,
Topo I, Iurther suggests consoli/ation oI genes into chromosomes Iollowe/ synthesis oI comparatively
thermostable DNA stran/s.
volutionary connection between the catalytic subunits of
DNA-dependent RNA polymerases and eukaryotic
RNA-dependent RNA polymerases and the origin of RNA
polymerases.
yer LM, 443i3 LV, Aravi3d L.
National Center Ior Biotechnology InIormation, National Library oI Me/icine, National Institutes oI
Health, Bethes/a, MD 20894, USA. lakshmin(ncbi.nlm.nih.gov
BACKG#OUND: The eukaryotic #NA-/epen/ent #NA polymerase (#D#P) is involve/ in the
ampliIication oI regulatory micro#NAs /uring post-transcriptional gene silencing. This enzyme is highly
conserve/ in most eukaryotes but is missing in archaea an/ bacteria. No evolutionary relationship between
#D#P an/ other polymerases has been reporte/ so Iar, hence the origin oI this eukaryote-speciIic
polymerase remains a mystery. #SULTS: Using extensive sequence proIile searches, we i/entiIie/
bacteriophage homologs oI the eukaryotic #D#P. The comparison oI the eukaryotic #D#P an/ their
homologs Irom bacteriophages le/ to the /elineation oI the conserve/ portion oI these enzymes, which
is pre/icte/ to harbor the catalytic site. Further, /etaile/ sequence comparison, ai/e/ by examination
oI the crystal structure oI the DNA-/epen/ent #NA polymerase (DD#P), showe/ that the #D#P an/
the beta' subunit oI DD#P (an/ its orthologs in archaea an/ eukaryotes) contain a conserve/ /ouble-psi
beta-barrel (DPBB) /omain. This DPBB /omain contains the signature motiI DbDGD (b is a bulky resi/ue),
which is conserve/ in all #D#Ps an/ DD#Ps an/ contributes to catalysis via a coor/inate/
/ivalent cation. Apart Irom the DPBB /omain, no similarity was /etecte/ between #D#P an/ DD#P,
which leaves open two scenarios Ior the origin oI #D#P: i) #D#P evolve/ at the onset oI the evolution oI
eukaryotes via a /uplication oI the DD#P beta' subunit Iollowe/ by /ramatic /ivergence that obliterate/ the
sequence similarity outsi/e the core catalytic /omain an/ ii) the primor/ial #D#P, which consiste/
primarily oI the DPBB /omain, evolve/ Irom a common ancestor with the DD#P at a very early stage oI
evolution, /uring the #NA worl/ era. The latter hypothesis implies that #D#P ha/ been subsequently
eliminate/ Irom cellular liIe Iorms an/ might have been reintro/uce/ into the eukaryotic genomes
through a bacteriophage. Sequence an/ structure analysis oI the DD#P le/ to Iurther insights into the
evolution oI #NA polymerases. In a//ition to the beta' subunit, beta subunit oI DD#P also contains a
DPBB /omain, which is, however, /istorte/ by large inserts an/ /oes not harbor a counterpart oI the
DbDGD motiI. The DPBB /omains oI the two DD#P subunits together Iorm the catalytic cleIt, with the
/omain Irom the beta' subunit supplying the metal-coor/inating DbDGD motiI an/ the one Irom the beta
subunit provi/ing two lysine resi/ues involve/ in catalysis. Given that the two DPBB /omains oI DD#P
contribute completely /iIIerent sets oI active resi/ues to the catalytic center, it is hypothesize/ that the
ultimate ancestor oI #NA polymerases Iunctione/ as a homo/imer oI a generic, #NA-bin/ing DPBB
/omain. This ancestral protein probably /i/ not have catalytic activity an/ serve/ as a coIactor Ior a
ribozyme #NA polymerase. Subsequent evolution oI DD#P an/ #D#P involve/ accretion oI /istinct sets
oI a//itional /omains. In the DD#Ps, these inclu/e/ a #NA-bin/ing Zn-ribbon, an AT-hook-like
mo/ule an/ a san/wich-barrel hybri/ motiI (SBHM) /omain. Further, lineage-speciIic accretion oI
SBHM /omains an/ other, DD#P-speciIic /omains is observe/ in bacterial DD#Ps. In contrast, the
orthologs oI the beta' subunit in archaea an/ eukaryotes contains a Iour-stran/e/ alpha beta /omain that
is share/ with the alpha-subunit oI bacterial DD#P, eukaryotic DD#P subunit #BP11, translation
Iactor eIF1 an/ type II topoisomerases. The a//itional /omains oI the #D#Ps remain to be
characterize/. CONCLUSIONS: ukaryotic #NA-/epen/ent #NA polymerases share the catalytic
/ouble-psi beta-barrel /omain, containing a signature metal-coor/inating motiI, with the universally
conserve/ beta' subunit oI DNA-/epen/ent #NA polymerases. Beyon/ this core catalytic /omain, the two
classes oI #NA polymerases /o not have common /omains, suggesting early /ivergence Irom a common
ancestor, with subsequent in/epen/ent /omain accretion. The beta-subunit oI DD#P contains another,
highly /iverge/ DPBB /omain. The presence oI two /istinct DPBB /omains in two subunits oI DD#P is
compatible with the hypothesis that the ith the hypothesis that the ultimate ancestor oI #NA polymerases
was a #NA-bin/ing DPBB /omain that ha/ no catalytic activity but rather Iunctione/ as a homo/imeric
coIactor Ior a ribozyme polymerase.
volution of viral DNA-dependent RNA polymerases.
S433tag C, Darai G.
Institut Iur Me/izinische Virologie, Universitat Hei/elberg, F#G.
The DNA-/epen/ent #NA polymerase (D/#P or #NAP) is an essential enzyme oI transcription oI
replicating systems oI prokaryotic an/ eukaryotic organisms as well as cytoplasmic DNA viruses.
D/#Ps are complex multisubunit enzymes consisting oI 8-14 subunits, inclu/ing two large subunits an/
several smaller polypepti/es (small subunits). An extensive search between the amino aci/ sequences oI
the known largest subunit oI DNA-/epen/ent #NA polymerases
(#PO1) oI /iIIerent organisms in/icates that all these polypepti/es possess a universal heptapepti/e
NADFDGD in /omain D. All #PO1 harbor a secon/ well-conserve/ hexapepti/e #"P(TS)LH upstream
(26-31 amino aci/s) oI the universal motiI. The genes enco/ing the largest subunit oI D/#P oI insect
iri/escent virus type 6 (IW6), Iish lymphocystis /isease virus (LCDV), an/ molluscum contagiosum virus
(MCV-1), all members oI the group oI cytoplasmic DNA viruses, were i/entiIie/ by PC# technology.
With the exception oI IIV6, all other viral #PO1 possess the two C-terminal conserve/ regions G an/ H.
The lack oI C-terminal repetitive heptapepti/e (YSPTSPS), which is a common Ieature oI the largest
subunit oI eukaryotic #NAPII, is an a//itional characteristic oI #PO1 proteins oI LCDV an/ oI MCV-
1. All viral #PO1 proteins were Ioun/ to be lacking the amino aci/ N at a /istinct position in /omain F. This
amino aci/ is known to be highly conserve/ in alpha-amanitin-sensitive eukaryotic #NA polymerases II.
Comparison oI the amino aci/ sequences oI the #PO1 polypepti/es oI IW6, LCDV, an/ MCV-1 with the
correspon/ing prokaryotic, eukaryotic, an/ viral proteins reveale/ /iIIerences in amino aci/ similarity an/
phylogenetic relationships. IW6 #PO1 possesses the closest similarity to the homologous subunit oI
eukaryotic #NAPII an/ lower but also signiIicant similarity to that oI eukaryotic #NAPI an/ #NAPIII,
archaeal, eubacterial, an/ viral polymerases. The similarity between #PO1 oI IW6 an/ the cellular
polymerase subunits is consistently higher than to the #PO1 oI other cytoplasmic DNA viruses, Ior
example, vaccinia an/ variola virus, AIrican swine Iever virus (ASFV), an/ MCV-1. The #PO1 oI
LCDV shows the highest similarity to the #PO1 oI IW6 an/ signiIicant lower similarity to the eukaryotic
polymerases II an/ III as well as to the archaebacteral subunit. However, it is still consi/erably more
similar to the cellular polymerase subunits than to the homologous viral proteins. The #PO1 oI IW6
possesses more similarity to cellular polymerases than the complete #PO1 oI LCDV, in/icating that
there is a substantial /iIIerence in the organization oI the #PO1 genes between these members oI two
genera oI the Iri/oviri/ae Iamily. Analysis oI the MCV-1 #PO1 reveale/ high amino aci/ homologies to the
correspon/ing polypepti/es oI vaccinia an/ variola virus. The viral #PO1 proteins, inclu/ing vaccinia an/
variola virus, MCV-1, ASFV, IW6, an/ LCDV, share the common Ieature oI showing the highest
similarity to the largest subunit oI eukaryotic #NAPII than to that oI #NAPI, #NAPIII, an/ #PO1 oI
archaebacterias, eubacterias, ASFV, IW6, an/ LCDV. volution oI the in/ivi/ual largest subunit oI D/#Ps
was tentatively investigate/ by generating phylogenetic trees using multiple amino aci/ alignments.
These in/icate that the #PO1 proteins oI IW6 an/ LCDV might have evolve/ Irom the largest subunit oI
eukaryotic #NAPII aIter /ivergence Irom the homologous subunits oI #NAPI an/ #NAPIII. In contrast,
evolutionary /evelopment oI the #PO1 oI vaccinia an/ variola virus, MCV-1, an/ ASFV seems to be
quite /iIIerent, with their common ancestor /iverging Irom cellular homologues beIore the separation oI
the three types oI eukaryotic ploymerases an/ having probably /iverge/ earlier Irom their common
lineage with cellular proteins.
The rhomboids: a nearly ubiquitous family of
intramembrane serine proteases that probably evolved by
multiple ancient horizontal gene transfers.
443i3 LV, Makar4va S, R4g4zi3 IB. David4vic L, Letellier MC, Pellegri3i L.
National Center Ior Biotechnology InIormation, National Library oI Me/icine, National Institutes oI
Health, Bethes/a, MD 20894, USA.
BACKG#OUND: The rhomboi/ Iamily oI polytopic membrane proteins shows a level oI evolutionary
conservation unique among membrane proteins. They are present in nearly all the sequence/ genomes oI
archaea, bacteria an/ eukaryotes, with the exception oI several species with small genomes. On the basis
oI experimental stu/ies with the /evelopmental regulator rhomboi/ Irom Drosophila an/ the AarA
protein Irom the bacterium Provi/encia stuartii, the rhomboi/s are thought to be intramembrane serine
proteases whose signaling Iunction is conserve/ in eukaryotes an/ prokaryotes. #SULTS:
Phylogenetic tree analysis carrie/ out using several in/epen/ent metho/s Ior tree constructions an/ the
correspon/ing statistical tests suggests that, /espite its broa/ /istribution in all three superking/oms, the
rhomboi/ Iamily was not present in the last universal common ancestor oI extant liIe Iorms. Instea/, we
propose that rhomboi/s evolve/ in bacteria an/ have been acquire/ by archaea an/ eukaryotes through
several in/epen/ent horizontal gene transIers. In eukaryotes, two /istinct, ancient acquisitions apparently
gave rise to the two maior subIamilies, typiIie/ by rhomboi/ an/ PA#L (presenilins-associate/
rhomboi/-like protein), respectively. Subsequent evolution oI the rhomboi/ Iamily in eukaryotes
procee/e/ by multiple /uplications an/ Iunctional /iversiIication through the a//ition oI extra
transmembrane helices an/ other /omains in /iIIerent orientations relative to the conserve/ core that harbors
the protease activity. CONCLUSIONS: Although the near-universal presence oI the rhomboi/ Iamily in
bacteria, archaea an/ eukaryotes appears to suggest that this protein is part oI the heritage oI the last
universal common ancestor, phylogenetic tree analysis in/icates a likely bacterial origin with subsequent
/issemination by horizontal gene transIer. This emphasizes the importance oI explicit phylogenetic analysis Ior
the reconstruction oI ancestral liIe Iorms. A hypothetical scenario Ior the origin oI intracellular membrane
proteases Irom membrane transporters is propose/.
A novel family of P-loop NTPases with an unusual phyletic
distribution and transmembrane segments inserted within
the NTPase domain.
Aravind L. Iver LM. Leine DD. Koonin '.
National Center Ior Biotechnology InIormation, National Library oI Me/icine, National Institutes oI
Health, Bethes/a, MD 20894, USA. aravin/(ncbi.nlm.nih.gov
BACKG#OUND: #ecent sequence-structure stu/ies on P-loop-Iol/ NTPases have substantially a/vance/ the
existing un/erstan/ing oI their evolution an/ Iunctional /iversity. These stu/ies provi/e a Iramework Ior
characterization oI novel lineages within this Iol/ an/ pre/iction oI their Iunctional properties.
#SULTS: Using sequence proIile searches an/ homology-base/ structure pre/iction, we have i/entiIie/ a
previously uncharacterize/ Iamily oI P-loop NTPases, which inclu/es the neuronal membrane protein an/
receptor tyrosine kinase substrate Ki/ins220/A#MS, which is conserve/ in animals, the F-plasmi/ PiIA
protein involve/ in phage T7 exclusion, an/ several uncharacterize/ bacterial proteins. We reIer to
these (pre/icte/) NTPases as the KAP Iamily, aIter Ki/ins220/A#MS an/ PiIA. The KAP Iamily NTPases
are spora/ically /istribute/ across a wi/e phylogenetic range in bacteria but among the eukaryotes are
represente/ only in animals. Many oI the prokaryotic KAP NTPases are enco/e/ in plasmi/s an/ ten/ to
un/ergo /isruption to Iorm pseu/ogenes. A unique Ieature oI all eukaryotic an/ certain bacterial KAP
NTPases is the presence oI two or Iour transmembrane helices inserte/
into the P-loop NTPase /omain. These transmembrane helices anchor KAP NTPases in the membrane
such that the P-loop /omain is locate/ on the intracellular si/e. We show that the KAP Iamily belongs
to the same maior /ivision oI the P-loop NTPase Iol/ with the AAA, ABC, #ecA-like, VirD4-like,
Pi1T-like, an/ AP/NACHT-like NTPase classes. In a//ition to the KAP Iamily, we i/entiIie/ another small
Iamily oI pre/icte/ bacterial NTPases, with two transmembrane helices inserte/ into the P-loop /omain.
This Iamily is not speciIically relate/ to the KAP NTPases, suggesting in/epen/ent acquisition oI the
transmembrane helices. CONCLUSIONS: We pre/ict that KAP Iamily NTPases Iunction principally
in the NTP-/epen/ent /ynamics oI protein complexes, especially those associate/ with the intracellular
surIace oI cell membranes. Animal KAP NTPases, inclu/ing Ki/ins220/A#MS, are likely to Iunction as
NTP-/epen/ent regulators oI the assembly oI membrane-associate/ signaling complexes involve/ in
neurite growth an/ /evelopment. One possible Iunction oI the prokaryotic KAP NTPases might be in the
exclusion oI selIish replicons, such as viruses, Irom the host cells. Phylogenetic analysis an/ phyletic patterns
suggest that the common ancestor oI the animals acquire/ a KAP NTPase via lateral transIer Irom
bacteria. However, an earlier transIer into eukaryotes Iollowe/ by multiple losses in several eukaryotic
lineages cannot be rule/ out.
Classification and evolution of P-loop GTPases and related
ATPases.
Lei3e DD, WoII VI, 443i3 LV, Aravi3d L.
National Center Ior Biotechnology InIormation, National Library oI Me/icine, National Institutes oI
Health, Bethes/a, MD 20894, USA.
Sequences an/ available structures were compare/ Ior all the wi/ely /istribute/ representatives oI the P-
loop GTPases an/ GTPase-relate/ proteins with the aim oI constructing an evolutionary classiIication Ior this
superclass oI proteins an/ reconstructing the principal events in their evolution. The GTPase superclass
can be /ivi/e/ into two large classes, each oI which has a unique set oI sequence an/ structural signatures
(synapomorphies). The Iirst class, /esignate/ T#AFAC (aIter translation Iactors) inclu/es enzymes
involve/ in translation (initiation, elongation, an/ release Iactors), signal trans/uction (in particular, the
exten/e/ #as-like Iamily), cell motility, an/ intracellular transport. The secon/ class, /esignate/ SIMIBI
(aIter signal recognition particle, MinD, an/ BioD), consists oI signal recognition particle (S#P)
GTPases, the assemblage oI MinD-like ATPases, which are involve/ in protein localization, chromosome
partitioning, an/ membrane transport, an/ a group oI metabolic enzymes with kinase or relate/ phosphate
transIerase activity. These two classes together contain over 20 /istinct Iamilies that are Iurther
sub/ivi/e/ into 57 subIamilies (ancient lineages) on the basis oI conserve/ sequence motiIs, share/
structural Ieatures, an/ /omain architectures. Ten subIamilies show a universal phyletic /istribution
compatible with presence in the last universal common ancestor oI the extant liIe Iorms (LUCA). These
inclu/e Iour translation Iactors, two OBG-like GTPases, the YawG/Y1qF-like GTPases (these two
subIamilies also consist oI pre/icte/ translation Iactors), the two signal-recognition-associate/ GTPases,
an/ the M#P subIamily oI MinD-like ATPases. The /istribution oI nucleoti/e speciIicity among the
proteins oI the GTPase superclass in/icates that the common ancestor oI the entire superclass was a GTPase
an/ that a secon/ary switch to ATPase activity has occurre/ on several in/epen/ent occasions /uring
evolution. The Iunctions oI most GTPases that are traceable to LUCA are associate/ with translation.
However, in
contrast to other superclasses oI P-loop NTPases (#ecA-F1/F0, AAA, helicases, ABC), GTPases /o
not participate in NTP-/epen/ent nucleic aci/ unwin/ing an/ reorganizing activities. Hence, we
hypothesize that the ancestral GTPase was an enzyme with a generic regulatory role in translation, with
subsequent /iversiIication resulting in acquisition oI /iverse Iunctions in transport, protein traIIicking, an/
signaling. In a//ition to the classiIication oI previously known Iamilies oI GTPases an/ relate/ ATPases,
we intro/uce several previously un/etecte/ Iamilies an/ /escribe new Iunctional pre/ictions.
volutionary genomics of the HAD superfamily:
understanding the structural adaptations and catalytic
diversity in a superfamily of phosphoesterases and allied
enzymes.
:77o:8 AM, AIIen KN, D:naway-Ma7ano D, A7avnd L.
National Center Ior Biotechnology InIormation, National Library oI Me/icine, National Institutes oI
Health, Bethes/a, MD 20894, USA.
The HAD (haloaci/ /ehalogenase) superIamily inclu/es phosphoesterases, ATPases, phosphonatases,
/ehalogenases, an/ sugar phosphomutases acting on a remarkably /iverse set oI substrates. The availability oI
numerous crystal structures oI representatives belonging to /iverse branches oI the HAD superIamily
provi/es us with a unique opportunity to reconstruct their evolutionary history an/ uncover the principal
/eterminants that le/ to their /iversiIication oI structure an/ Iunction. To this en/ we present a
comprehensive analysis oI the HAD superIamily that i/entiIies their unique structural Ieatures an/ provi/es
a /etaile/ classiIication oI the entire superIamily. We show that at the highest level the HAD superIamily is
uniIie/ with several other superIamilies, namely the DHH, receiver (CheY-like), von Willebran/ A,
TOP#IM, classical histone /eacetylases an/ PIN/FLAP nuclease /omains, all oI which contain a
speciIic Iorm oI the #ossmannoi/ Iol/. These #ossmannoi/ Iol/s are /istinguishe/ Irom others by the
presence oI equivalently place/ aci/ic catalytic resi/ues, inclu/ing one at the en/ oI the Iirst core
beta-stran/ oI the central sheet. The HAD /omain is /istinguishe/ Irom these relate/ #ossmannoi/
Iol/s by two key structural signatures, a "squiggle" (a single helical turn) an/ a "Ilap" (a beta hairpin
motiI) locate/ imme/iately /ownstream oI the Iirst beta-stran/ oI their core #ossmanoi/ Iol/. The squiggle
an/ the Ilap motiIs are pre/icte/ to provi/e the necessary mobility to these enzymes Ior them to
alternate between the "open" an/ "close/" conIormations. In a//ition, most members oI the HAD
superIamily contains inserts, terme/ caps, occurring at either oI two positions in the core #ossmannoi/ Iol/.
We show that the cap mo/ules have been in/epen/ently inserte/ into these two stereotypic positions on
multiple occasions in evolution an/ /isplay extensive evolutionary /iversiIication in/epen/ent oI the core
catalytic /omain. The Iirst group oI caps, the C1 caps, is /irectly inserte/ into the Ilap motiI an/ regulates
access oI reactants to the active site. The secon/ group, the C2 caps, Iorms a rooI over the active site, an/
access to their internal cavities might be in part regulate/ by the movement oI the Ilap. The
/iversiIication oI the cap mo/ule was a maior Iactor in the exploration oI a vast substrate space in the
course oI the evolution oI this superIamily. We show that the HAD superIamily contains 33 maior
Iamilies /istribute/ across the three superking/oms oI liIe. Analysis oI the phyletic patterns suggests that
at least Iive /istinct HAD proteins are traceable to the last universal
common ancestor (LUCA) oI all extant organisms. While these prototypes /iverge/ prior to the emergence oI the
LUCA, the maior /iversiIication in terms oI both substrate speciIicity an/ reaction types occurre/ aIter the
ra/iation oI the three superking/oms oI liIe, primarily in bacteria. Most maior /iversiIication events appear to
correlate with the acquisition oI new metabolic capabilities, especially relate/ to the elaboration oI carbohy/rate
metabolism in the bacteria. The newly i/entiIie/ relationships an/ Iunctional pre/ictions provi/e/ here are likely to ai/
the Iuture exploration oI the numerous poorly un/erstoo/ members oI this large superIamily oI enzymes.
%he emerge3ce 41 catalytic a3d 8tructural diver8ity withi3
the beta-clip 14ld.
I et.M. Aravi3d L.
National Center Ior Biotechnology InIormation, National Library oI Me/icine, National Institutes oI Health,
Bethes/a, Marylan/ 20894, USA.
The beta-clip Iol/ inclu/es a /iverse group oI protein /omains that are uniIie/ by the presence oI two characteristic
waist-like constrictions, which boun/ a central exten/e/ region. Members oI this Iol/ inclu/e enzymes like
/eoxyuri/ine triphosphatase an/ the ST methylase, carbohy/rate-bin/ing /omains like the Iish antiIreeze
proteins/Sialate synthase C-terminal /omains, an/ Iunctionally enigmatic accessory subunits oI urease an/
molyb/opterin biosynthesis protein MoeA. In this stu/y, we reconstruct the evolutionary history oI this Iol/ using
sensitive sequence an/ structure comparisons metho/s. Using sequence proIile searches, we i/entiIie/ novel
versions oI the beta-clip Iol/ in the bacterial Ilagellar chaperone F1gA an/ the relate/ pilus protein CpaB, the StrU-
like /ehy/rogenases, an/ the UxaA/GarD-like hexuronate /ehy/ratases (SAF superIamily). We present evi/ence
that these versions oI the beta-clip /omain, like the relate/ type III anti-Ireeze proteins an/ C-terminal /omains oI
sialic aci/ synthases, are involve/ in interactions with carbohy/rates. We propose that the F1gA an/ CpaB-like
proteins me/iate the assembly oI bacterial Ilagella an/ Flp pili by means oI their interactions with the carbohy/rate
moieties oI pepti/oglycan. The N-terminal beta-clip /omain oI the hexuronate /ehy/ratases appears to have
evolve/ a novel metal-bin/ing site, while their C-terminal /omain is likely to a/opt a metal-bin/ing TIM barrel-like
Iol/. Using structural comparisons, we show that the beta-clip Iol/ can be Iurther classiIie/ into two maior groups, one
that inclu/es the SAF, ST, /UTPase superIamilies, an/ the other that inclu/es the phage lamb/a hea/
/ecoration protein, the beta subunit oI urease an/ the C-terminal /omain oI the molyb/enum coIactor biosynthesis
protein MoeA. Structural comparisons also suggest the beta-clip Iol/ was assemble/ through the /uplication oI
a three-stran/e/ unit. Though the three-stran/e/ units are likely to have ha/ a common origin, we present evi/ence
that complete beta-clip /omains were assemble/ through such /uplications, in/epen/ently on multiple occasions.
There is also evi/ence Ior circular permutation oI the basic three-stran/e/ unit on /iIIerent occasions in the evolution
oI the beta-clip unit. We also /escribe how assembly oI this Iol/ Irom a basic three-stran/e/ unit has been utilize/ to
accommo/ate a variety oI activities in its /iIIerent versions. Copyright 2004 Wiley-Liss, Inc
Diver8i1icati43 41 catalytic activitie8 a3d liga3d i3teracti438
in the protein fold shared by the sugar isomerases. eIF2B.
DeoR transcription factors. acyl-CoA transferases and
methenyltetrahydrofolate synthetase.
A3a3tharama3 V, Aravi3d L.
National Center Ior Biotechnology InIormation, National Library oI Me/icine, National Institutes oI
Health, Bethes/a, MD 20894, USA.
volution oI /iverse catalytic an/ ligan/-bin/ing activities in a given protein Iol/ is a wi/ely observe/
phenomenon in the protein-/omain universe. However, the /etails oI this evolutionary process, general
principles, iI any, an/ implications Ior origins oI particular catalytic mechanisms are poorly
un/erstoo/ in many common protein Iol/s. Taking a/vantage oI the wealth oI currently available protein
structure an/ sequence /ata, we explore these issues in the context oI a large assemblage oI biochemically
/iverse protein /omains sharing a common origin, namely the sugar isomerases, translation Iactor
eIF2B, ligan/-bin/ing /omains oI the Deo#-Iamily transcription Iactors, acetyl-CoA transIerases an/
methenyltetrahy/roIolate synthetase. We show that in at least three in/epen/ent instances, inclu/ing the
sugar-bin/ing /omains oI the Deo# Iamily transcription Iactors, this /omain has been use/ as small
molecule sensor couple/ to helix-turn-helix DNA-bin/ing /omains. In at least two oI these instances the
/omain Iunctions as a non-catalytic sensor oI ligan/s. We provi/e evi/ence that the ancestral version oI
this Iol/ was a /istinct version oI the #osmann-like Iol/s, which probably possesse/ two /istinct ligan/-
bin/ing areas that were /iIIerentially utilize/ in /iIIerent /escen/ents. Analyzing the sequences an/
structures oI proteins in this Iol/ we show that there are two
principal Iactors relate/ to the origin oI catalytic /iversity in this Iol/. Firstly, speciIic inserts an/ extension
a//e/ to the core /omain on multiple occasions in evolution have aIIecte/ the access to the active site
regions, an/ thereby allowe/ Ior /iIIerent substrates an/ allosteric regulators. The secon/ maior Iactor
appears to be the emergence oI consi/erable /iversity oI Iamily-speciIic resi/ues with important
biochemical roles. Interestingly, proteins oI this Iol/, which catalyze similar reactions on similar
substrates, might possess very /istinctive sets oI active resi/ues require/ Ior substrate bin/ing catalysis.
In particular, /iIIerent sugar isomerases or acyl transIerases in this Iol/ might show /istinct constellations
oI active site resi/ues. These Iin/ings suggest that whereas ligan/-bin/ing, an/ even generic catalytic
ability emerge/ early in the evolution oI the Iol/, the speciIic catalytic mechanisms appear to have
in/epen/ently emerge/
on multiple occasions in the generic precursors oI this Iol/.
Novel conserved domains in proteins with predicted roles in
eukaryotic cell-cycle regulation. decapping and RNA
stability.
A3a3tharama3 V, Aravi3d L.
National Center Ior Biotechnology InIormation, National Library oI Me/icine, National Institutes oI
Health, Bethes/a, MD 20894, USA. ananthar(mail.nih.gov
1he amino acid code is usuallv presented as a table o 64 codons. Actuallv the code results rom
BA(KGROUND: 1he emergence o eukarvotes was characterized bv the expansion and
diersiication o seeral ancient RNA-binding domains and the apparent de noo innoation o
new RNA-binding domains. 1he identiication o these RNA-binding domains mav throw light
on the emergence o eukarvote-speciic svstems o RNA metabolism. RLSUL1S: Using
sensitie sequence proile searches. homologv-based old recognition and sequence-structure
superpositions. we identiied noel. diergent ersions o the Sm domain in the Scd6p amilv o
proteins. 1his amilv o Sm-related domains shares certain eatures o conentional Sm domains.
which are required or binding RNA. in addition to possessing some unique consered eatures.
\e also show that these proteins contain a second preiouslv uncharacterized (-terminal
domain. termed the lDl domain ater a consered sequence moti in this domain,. 1he lDl
domain is also ound in the ungal Dcp3p-like and the animal lLJ22128-like proteins. where it
used to a (-terminal domain o the \jel-N domain amilv. In addition to the lDl domains. the
l1122128-like proteins contain vet another diergent ersion o the Sm domain at their extreme
N-terminus. \e show that the \jel-N domains represent a noel ersion o the Rossmann old
that has acquired a set o catalvtic residues and structural eatures that distinguish them rom the
conentional dehvdrogenases. (ON(LUSIONS: Seeral lines o contextual inormation suggest
that the Scd6p amilv and the Dcp3p-like proteins are consered components o the eukarvotic
RNA metabolism svstem. \e propose that the noel domains reported here. namelv the
diergent ersions o the Sm domain and the lDl domain mav mediate speciic RNA-protein
and protein-protein interactions in cvtoplasmic ribonucleoprotein complexes. More speciicallv.
the protein complexes containing Sm-like domains o the Scd6p amilv are predicted to regulate
the stabilitv o mRNA encoding proteins inoled in cell cvcle progression and esicular
assemblv. 1he Dcp3p and lLJ22128 proteins mav localize to the cvtoplasmic processing bodies
and possiblv catalvze a speciic processing step in the decapping pathwav. 1he explosie
diersiication o Sm domains appears to hae plaved a role in the emergence o seeral uniquelv
eukarvotic ribonucleoprotein complexes. including those inoled in decapping and mRNA
stabilitv.
Novel conserved domains in proteins with predicted roles in
eukaryotic cell-cycle regulation. decapping and RNA
stability.
A3a3tharama3 V, Aravi3d L.
National (enter or Biotechnologv Inormation. National Librarv o Medicine. National
Institutes o lealth. Bethesda. MD 20894. USA. ananthar(mail.nih.go
BA(KGROUND: 1he emergence o eukarvotes was characterized bv the expansion and
diersiication o seeral ancient RNA-binding domains and the apparent de noo innoation o
new RNA-binding domains. 1he identiication o these RNA-binding domains mav throw light
on the emergence o eukarvote-speciic svstems o RNA metabolism. RLSUL1S: Using
sensitie sequence proile searches. homologv-based old recognition and sequence-structure
superpositions. we identiied noel. diergent ersions o the Sm domain in the Scd6p amilv o
proteins. 1his amilv o Sm-related domains shares certain eatures o conentional Sm domains.
which are required or binding RNA. in addition to possessing some unique consered eatures.
\e also show that these proteins contain a second preiouslv uncharacterized (-terminal
1he amino acid code is usuallv presented as a table o 64 codons. Actuallv the code results rom
domain. termed the lDl domain ater a consered sequence moti in this domain,. 1he lDl
domain is also ound in the ungal Dcp3p-like and the animal lLJ22128-like proteins. where it
used to a (-terminal domain o the \jel-N domain amilv. In addition to the lDl domains. the
l1122128-like proteins contain vet another diergent ersion o the Sm domain at their extreme
N-terminus. \e show that the \jel-N domains represent a noel ersion o the Rossmann old
that has acquired a set o catalvtic residues and structural eatures that distinguish them rom the
conentional dehvdrogenases. (ON(LUSIONS: Seeral lines o contextual inormation suggest
that the Scd6p amilv and the Dcp3p-like proteins are consered components o the eukarvotic
RNA metabolism svstem. \e propose that the noel domains reported here. namelv the
diergent ersions o the Sm domain and the lDl domain mav mediate speciic RNA-protein
and protein-protein interactions in cvtoplasmic ribonucleoprotein complexes. More speciicallv.
the protein complexes containing Sm-like domains o the Scd6p amilv are predicted to regulate
the stabilitv o mRNA encoding proteins inoled in cell cvcle progression and esicular
assemblv. 1he Dcp3p and lLJ22128 proteins mav localize to the cvtoplasmic processing bodies
and possiblv catalvze a speciic processing step in the decapping pathwav. 1he explosie
diersiication o Sm domains appears to hae plaved a role in the emergence o seeral uniquelv
eukarvotic ribonucleoprotein complexes. including those inoled in decapping and mRNA
stabilitv.
volution of anticodons.
1ukes 1l.(ollaborators 1,
Julcs TH.
Uniersitv o (aliornia. Berkelev Space Sciences Laboratorv. Oakland 94608. USA.
Anticodons are trinucleotides in transer RNA tRNA, molecules. 1he latter carrv amino acids
or insertion into the polvpeptide sequences o proteins during the translation o messenger RNA
mRNA, molecules. Messenger RNA molecules are transcribed rom genes. Lolution o tRNA
molecules has resulted in a set o anticodons or the 20 amino acids that are used in protein
svnthesis. 1his set o anticodons is slightlv dierent in mitochondria' codes rom the set that
used in the nuclear "uniersal" code. 1heories or the eolution o the code include rozen
accident. doublet expansion. repeating triplets and coeolutionarv distribution. 1he number o
codons has alwavs been ixed at 64 bv mathematical rules. but because an anticodon mav pair
with more than one codon. the number o anticodons is onlv 54 in the uniersal code. is smaller
in mitochondrial codes. and was probablv een smaller in archetvpal primitie codes. Lidence
o anticodon eolution can be seen bv comparing mitochondria' codes with the uniersal code.
(odes used bv erv primitie organisms that are now extinct might hae speciied ewer amino
acids than are now used.
volution of the amino acid code: inferences from
mitochondrial codes.
1ukes TH.
1he amino acid code is usuallv presented as a table o 64 codons. Actuallv the code results rom
the action o tRNA molecules that carrv amino acids to codons in mRNA bv means o
codon-anticodon pairing. 1he tRNA molecules are transcribed rom genes that undergo eolution
and the number o anticodons can thereore increase during eolution. but the number o codons
is ixed at 64. Mammalian mitochondria' codes contain onlv 22 anticodons or 20 amino acids as
compared with 54 anticodons or 20 amino acids in the uniersal code. It is proposed that an
archetvpal code containing 16 anticodons or 15 amino acids eoled into the uniersal code bv
gene duplication. ollowed bv mutations that modiied the anticodons and amino acid acceptor
sites. In substantiation o this proposal. it is noted that the mammalian mitochondrial code is
simpliied bv comparison with the uniersal code. lor example. single anticodons are used or
each o eight amino acids in the mammalian mitochondria' code. 1his simpliication mav
represent an eolutionarv retrogression towards the proposed archetvpal code.
volution of anticodons: variations in the genetic code.
Juke8 %, O8awa S, M:9o A, Lehma3 N.
Space Sciences Laboratorv. Uniersitv o (aliornia. Berkelev 9420.
(lues to eolution o the genetic code can be ound bv comparing usage o anticodons in arious
organisms and organelles. G( content o DNA aries. as a result o directional mutation pressure
A1G( pressure,. especiallv in bacteria. Low G( in Mvcoplasma is accompanied bv use o
UGA or trvptophan and. in ciliated protozoa. bv use o UAA and UAG or glutamine. 1hese are
examples o "stop codon capture." which has been preceded bv duplication o tRNA genes
ollowed bv nucleotide substitutions in their sequences. including mutational changes in their
anticodons. Lolutionarv changes in the code mav hae resulted rom disappearance o codons
and anticodons resulting rom G( pressure and rom their reappearance when the direction o the
pressure was reersed. In this manner. codon UGA and anticodon U(A or trvptophan could
hae disappeared under G( pressure and reappeared in Mvcoplasma under A1 pressure. Stop
codon UGA mav hae been the third o the three stop codons to appear. originating rom
mutations in UAA. (hanges in the code are adaptie and nondeleterious. \e propose that the
number o anticodons has increased and that eolution continued until three existing orms o the
uniersal code were produced: eukarvotic. eubacterial. and the code or halobacteria and
methanococci. 1hese three codes are distinguished rom each other bv their anticodon pattern.
1he eukarvotic code contains eight INN ANN, anticodons that hae replaced GNN
anticodons as a result o A1 pressure. Mitochondrial and chloroplast codes hae eoled
rom the eubacterial code through genomic economization and A1 pressure. leading to losses o
GNN and (NN anticodons.
A unified model of codon reassignment in alternative genetic
codes.
Sengupta S, igg8 PG.
Department o Phvsics and Astronomv. McMaster Uniersitv. lamilton. Ontario. (anada.
Manv modiied genetic codes are ound in speciic genomes in which one or more codons hae
been reassigned to a dierent amino acid rom that in the canonical code. \e present a new
Iramework Ior co/on reassignment that incorporates two previously propose/ mechanisms (co/on /isappearance
an/ ambiguous interme/iate) an/ intro/uces two Iurther mechanisms (unassigne/ co/on an/ compensatory change).
Our theory is base/ on the observation that reassignment involves a gain an/ a loss. The loss coul/ be the /eletion or
loss oI Iunction oI a t#NA or release Iactor. The gain coul/ be the gain oI a new type oI t#NA or the gain oI Iunction
oI an existing t#NA /ue to mutation or base mo/iIication. The Iour mechanisms are /istinguishe/ by whether the
co/on /isappears Irom the genome /uring the reassignment an/ by the or/er oI the gain an/ loss events. We present
simulations oI the gain-loss mo/el showing that all Iour mechanisms can occur within the same Iramework as the
parameters are varie/. We investigate the way the Irequencies oI the mechanisms are inIluence/ by selection strengths,
the number oI co/ons un/ergoing reassignment, /irectional mutation pressure, an/ selection Ior re/uce/ genome
size
Genetic code variations in mitochondria: tRNA as a major
determinant of genetic code plasticity.
Yokobori S. Suzuki T. Watanabe K.
Department oI Molecular Biology, School oI LiIe Science, Tokyo University oI Pharmacy an/ LiIe Science, 1432
Horinouchi, Hachioii, Tokyo 192-0392, Japan.
Characteristic Ieatures oI t#NA such as the antico/on sequence an/ mo/iIie/ nucleoti/es in the antico/on loop are
thought to be crucial eIIectors Ior promoting or restricting co/on reassignment. Our recent Iin/ings on basepairing
rules between antico/on an/ co/on in various metazoan mitochon/ria suggest that the complete loss oI a co/on is not
necessarily essential Ior co/on reassignment to take place. We postulate that a possible competition between two
t#NAs with cognate antico/on sequences towar/s the relevant co/on to be varie/ has a potential role in co/on
reassignment. Our proposition can be viewe/ as an expan/e/ version oI the co/on capture theory propose/ by Osawa an/
Jukes
Different pattern of codon recognition by mammalian
mitochondrial tRNAs.
Barrell BG. Anderson S. Bankier AT. de Bruiln MH. Che3 L, Coulson AR. Drouin 1. peron IC.
Nierlich DP. R4e A, Sanger F. Schreier PH. Smith A1. Staden R. Young IG.
Analysis oI an almost complete mammalian mitochon/rial DNA sequence has i/entiIie/ 23 possible t#NA genes
an/ we speculate here that these are suIIicient to translate all the co/ons oI the mitochon/rial genetic co/e. This
number is much smaller than the minimum oI 31 require/ by the wobble hypothesis. For each oI the eight genetic
co/e boxes with Iour co/ons Ior one amino aci/ we Iin/ a single speciIic t#NA gene with T in the Iirst (wobble)
position oI the antico/on. We suggest that these t#NAs with U in the wobble position can recognize all Iour co/ons
in these genetic co/e boxes either by a "two out oI three" base interaction or by U.N wobble.
Decoding the genome: a modified view.
Agri8 PI.
Department oI Molecular an/ Structural Biochemistry, 128 Polk Hall, Campus Box 7622, North Carolina State
University, #aleigh, NC 27695-7622, USA. PaulAgris(ncsu.e/u
TransIer #NA's role in /eco/ing the genome is critical to the accuracy an/ eIIiciency oI protein synthesis.
Though mo/iIie/ nucleosi/es were i/entiIie/ in #NA 50 years ago, only recently has their importance to
t#NA's ability to /eco/e cognate an/ wobble co/ons become apparent. #NA mo/iIications are ubiquitous.
To /ate, some 100 /iIIerent posttranslational mo/iIications have been i/entiIie/. Mo/iIications oI t#NA
are the most extensively investigate/ however, many other #NAs have mo/iIie/ nucleosi/es. The
mo/iIications that occur at the Iirst, or wobble position, oI t#NA's antico/on an/ those 3'-a/iacent to the
antico/on are oI particular interest. The t#NAs most aIIecte/ by in/ivi/ual an/ combinations oI
mo/iIications respon/ to co/ons in mixe/ co/on boxes where /istinction oI the thir/ co/on base is
important Ior /iscriminating between the correct cognate or wobble co/ons an/ the incorrect near-
cognate co/ons (e.g. AAA/G Ior lysine versus AAU/C asparagine). In contrast, other mo/iIications
expan/ wobble co/on recognition, such as U*U base pairing, Ior t#NAs that respon/ to multiple
co/ons oI a 4-Iol/ /egenerate co/on box (e.g. GUU/A/C/G Ior valine). Whether restricting co/on
recognition, expan/ing wobble, enabling translocation, or maintaining the messenger #NA, rea/ing
Irame mo/iIications appear to re/uce antico/on loop /ynamics to that accepte/ by the ribosome. ThereIore,
we suggest that antico/on stem an/ loop /omain nucleosi/e mo/iIications allow a limite/ number oI
t#NAs to accurately an/ eIIiciently /eco/e the 61 amino aci/ co/ons by selectively restricting some
antico/on-co/on interactions an/ expan/ing others.
tRNA's wobble decoding of the genome: 40 years of
modification.
Agri8 PI, Ve3deix IA, Graham D.
Department oI Molecular an/ Structural Biochemistry, North Carolina State University, #aleigh, NC
27695-7622, USA. PaulAgris(ncsu.e/u
The genetic co/e is /egenerate, in that 20 amino aci/s are enco/e/ by 61 triplet co/es. In 1966, Francis
Crick hypothesize/ that the cell's limite/ number oI t#NAs /eco/e/ the genome by recognizing more
than one co/on. The ambiguity oI that recognition resi/e/ in the thir/ base-pair, giving rise to the
Wobble Hypothesis. Post-transcriptional mo/iIications at t#NA's wobble position 34, especially
mo/iIications oI uri/ine 34, enable wobble to occur. The Mo/iIie/ Wobble Hypothesis propose/ in
1991 that speciIic mo/iIications oI a t#NA wobble nucleosi/e shape the antico/on architecture in such a
manner that interactions were restricte/ to the complementary base plus a single wobble pairing Ior amino
aci/s with twoIol/ /egenerate co/ons. However, chemically /iIIerent mo/iIications at position 34 woul/
expan/ the ability oI a t#NA to rea/ three or even Iour oI the IourIol/ /egenerate co/ons. One Ioun/ation
oI Crick's Wobble Hypothesis was that a near-constant geometry oI canonical base-pairing be maintaine/
in Iorming all three base-pairs between the t#NA antico/on an/ m#NA co/on on the ribosome. In
accepting an aminoacyl-t#NA, the ribosome requires maintenance oI a speciIic geometry Ior
the antico/on-co/on base-pairing. However, it is the post-transcriptional mo/iIications at t#NA wobble
position 34 an/ purine 37, 3'-a/iacent to the antico/on, that pre-structure the antico/on /omain to ensure the
correct co/on bin/ing. The mo/iIications create both the architecture an/ the stability nee/e/ Ior
/eco/ing through restraints on antico/on stereochemistry an/ conIormational space, an/ through
selective hy/rogen bon/ing. A physicochemical un/erstan/ing oI mo/iIie/ nucleosi/e contributions to
the t#NA antico/on /omain architecture an/ its /eco/ing oI the genome has a/vance/ #NA worl/
evolutionary theory, the principles oI #NA chemistry, an/ the application oI this knowle/ge to the
intro/uction oI new amino aci/s to proteins.
Analysis of action of wobble nucleoside modifications on
codon-anticodon pairing within the ribosome.
Lim V.
Institute oI Protein #esearch, #ussian Aca/emy oI Sciences, Pushchino, Moscow #egion.
Wobble rules Ior mo/iIie/ resi/ues in the Iirst antico/on position are /erive/. All known mo/iIications
are consi/ere/ in/ivi/ually. Stereochemical analysis was ma/e taking into account the interaction
between the ribosomal A an/ P-site boun/ co/on-antico/on /uplexes. The wobble base-pair was
consi/ere/ as the right one iI its Iormation /i/ not lea/ to an uncompensate/ loss oI hy/rogen bon/s or
polar atom-ion bon/s. From this requirement it Iollows that all mo/iIications oI U shoul/ restrict its
translational speciIicity to purines (with the exception oI xo5U, which shoul/ /eco/e A, G an/ U). The
restriction is carrie/ out in a uniIie/ way: mo/iIications inhibit the large propeller twist resulting Irom an
increase oI about 35
/egrees in the torsion angle oI the antico/on wobble base, interacting with the thir/ co/on base via a
hy/rogen-bon/e/ water molecule. Such a twist is require/ to avoi/ a loss oI the hy/rogen bon/ oI the
bon/e/ water molecule. The mo/iIications in S2U, Se2U an/ Um shoul/ weaken their pairing with G,
because they /eIorm one oI the two hy/rogen bon/s oI the guanine NH2 group. G shoul/ be recognize/
by Se2U better than by S2U Ior the reason that the hy/rogen bon/ Se...HN is weaker than the hy/rogen
bon/ S...HN. Among the mo/iIications oI C an/ G only that in k2C has a pronounce/ eIIect on wobble.
The nucleosi/e k2C shoul/ pair only with A. The N-2 atom oI k2C shoul/ be in the pyrami/al state. The
consequences Iollowing Irom the inter/uplex interaction are Iormulate/. Accor/ing to one oI them,
a/enosine in the wobble
position oI the P-site t#NA shoul/ /estabilize the A-site /uplex. This can serve as an explanation Ior the
Iact that a/enosine is very rarely observe/ in the antico/on wobble position.
Analysis of action of the wobble adenine on codon reading
within the ribosome.
Lim V.
Institute oI Protein #esearch, #ussian Aca/emy oI Sciences, Pushchino, Moscow #egion.
Computer graphics simulation oI the interaction between the co/on-antico/on /uplexes containing
a/enine in the Iirst (wobble) position oI the antico/ons, an/ boun/ to the ribosomal A- an/ P-sites, was
ma/e. This /emonstrate/ that wi/esprea/ use oI a/enine in the wobble
position in antico/ons shoul/ lea/ to a low eIIiciency oI ribosomal translation, since the wobble A oI the
P-site t#NA weakens the co/on-/epen/ent bin/ing oI aminoacyl-t#NA at the A-site via inter/uplex interaction.
Besi/es the canonical partner U, the wobble A oI aminoacyl-t#NA can recognize A, C, G in the thir/
position oI the co/on by the Iormation oI the propeller twist in the wobble pairs AA, AC, AG. The
conversion oI the wobble A into inosine improves its pairing with the co/on bases (the pairs IA an/ IC,
unlike AA an/ AC, shoul/ not Iorm the propeller twist lea/ing to the /eIormation oI base-base
hy/rogen bon/s) an/ shoul/ re/uce an a/verse eIIect oI the P-site wobble a/enine on the Iormation oI
the A-site /uplex. The consequence oI the interaction between the ribosomal P- an/ -site /uplexes has
been Iormulate/. Accor/ing to this the -site wobble A shoul/ enhance the probability oI IrameshiIting.
These properties oI the wobble A an/ I coul/ be a reason why A is very rarely observe/ in the Iirst
antico/on position an/ why evolutionary processes have /evelope/ the enzyme which mo/iIies the
wobble A to I. The results obtaine/ can be subiecte/ to /irect experimental tests.
Calculation of the relative geometry of tRNAs in the
ribosome from directed hydroxyl-radical probing data.
1oseph S. Whirl ML. Kondo D. Noller HF. Altman RB.
Center Ior Molecular Biology oI #NA, Sinsheimer Laboratories, University oI CaliIornia, Santa Cruz
95064, USA.
The many interactions oI t#NA with the ribosome are Iun/amental to protein synthesis. During the
pepti/yl transIerase reaction, the acceptor en/s oI the aminoacyl an/ pepti/yl t#NAs must be in close
proximity to allow pepti/e bon/ Iormation, an/ their respective antico/ons must base pair
simultaneously with a/iacent trinucleoti/e co/ons on the m#NA. The two t#NAs in this state can be
arrange/ in two nonequivalent general conIigurations calle/ the # an/ S orientations, many versions oI
which have been propose/ Ior the geometry oI t#NAs in the ribosome. Here, we report the combine/
use oI computational analysis an/ tethere/ hy/roxyl-ra/ical probing to constrain their arrangement. We
use/ Fe(II) tethere/ to the 5' en/ oI antico/on stem-loop analogs (ASLs) oI t#NA an/ to the 5' en/ oI
/eacylate/ t#NA(Phe) to generate hy/roxyl ra/icals that probe proximal positions in the backbone oI
a/iacent t#NAs in the 70S ribosome. We inIerre/ probe-target /istances Irom the resulting #NA stran/
cleavage intensities an/ use/ these to calculate the mutual arrangement oI A-site an/ P-site t#NAs in the
ribosome, using three /iIIerent structure estimation algorithms. The two t#NAs are constraine/ to the S
conIiguration with an angle oI about 45 /egrees between the respective planes oI the molecules. The
terminal phosphates oI 3'CCA are separate/ by 23 A when using the t#NA crystal conIormations, an/
the antico/on arms oI the two t#NAs are suIIiciently close to interact with a/iacent co/ons in m#NA
F-G-catalyzed translocation of anticodon stem-loop
analogs of transfer RNA in the ribosome.
1oseph S. Noller HF.
Center Ior Molecular Biology oI #NA, Sinsheimer Laboratories, University oI CaliIornia, Santa
(ruz. (A 95064. USA.
1ranslocation. catalvzed bv elongation actor Ll-G. is the precise moement o the tRNA-
mRNA complex within the ribosome ollowing peptide bond ormation. lere we examine the
structural requirement or A- and P-site tRNAs in Ll-G-catalvzed translocation bv
substituting anticodon stem-loop ASL, analogs or the respectie tRNAs. 1ranslocation o
mRNA and tRNA was monitored independentlv: mRNA moement was assaved bv toeprinting.
while tRNA and ASL moement was monitored bv hvdroxvl radical probing bv leII, tethered
to the ASLs and bv chemical ootprinting. 1ranslocation depends on occupancv o both A and P
sites bv tRNA bound in a mRNA-dependent ashion. 1he requirement or an A-site tRNA can be
satisied bv a 15 nucleotide ASL analog comprising onlv a 4 base pair bp, stem and a
nucleotide anticodon loop. 1ranslocation o the ASL is both Ll-G- and G1P-dependent. and is
inhibited bv the translocational inhibitor thiostrepton. 1hese indings show that the D. 1 and
acceptor stem regions o A-site tRNA are not essential or Ll-G-dependent translocation. In
contrast. no translocation occurs i the P-site tRNA is substituted with an ASL. indicating that
other elements o P-site tRNA structure are required or translocation. \e also tested the eect
o increasing the A-site ASL stem length rom 4 to 33 bv on translocation rom A to P site.
1ranslocation eiciencv decreases as the ASL stem extends bevond 22 bp. corresponding
approximatelv to the maximum dimension o tRNA along the anticodon-D arm axis. 1his result
suggests that a structural eature o the ribosome between the A and P sites. intereres with
moement o tRNA analogs that exceed the normal dimensions o the coaxial tRNA anticodon-D
arm.
&niversally conserved interactions between the ribosome
and the anticodon stem-loop of A site tRNA important for
translocation.
Phelps SS. 1erinic 0. 1oseph S.
Department o (hemistrv and Biochemistrv. Uniersitv o (aliornia. San Diego. 9500 Gilman
Drie. La Jolla. (A 92093. USA.
1he iteratie moement o the tRNA-mRNA complex through the ribosome is a hallmark o the
elongation phase o protein svnthesis. \e used svnthetic anticodon stem-loop analogs ASL, o
tRNAPhe, to svstematicallv identiv ribose 2'-hvdroxvl groups that are essential or binding and
translocation rom the ribosomal A site. Our results show that 2'-hvdroxvl groups at positions 33.
35. and 36 in the A site ASL are important or translocation. (onsistent with the iew that the
molecular basis o translocation mav be similar in all organisms. the 2'-hvdroxvl groups at
positions 35 and 36 in the ASL interact with uniersallv consered bases G530 and A1493.
respectielv. in 16S rRNA. lurthermore. these interactions are also essential or the decoding
process. indicating a unctional relationship between decoding and translocation.
Non-bridging phosphate oxygen atoms within the tRNA
anticodon stem-loop are essential for ribosomal A site
binding and translocation.
Phelps SS. 1oseph S.
Department oI Chemistry an/ Biochemistry, University oI CaliIornia, San Diego, 9500 Gilman Dr, La
Jolla, CA 92093-0314, USA.
The conIormation oI the antico/on stem-loop oI t#NAs require/ Ior correct /eco/ing by the ribosome
/epen/s on intramolecular an/ intermolecular interactions that are in/epen/ent oI the t#NA nucleoti/e
sequence. Non-bri/ging phosphate oxygen atoms have been shown to be critical Ior the structure an/
Iunction oI several #NAs. However, little is known about the role they play in ribosomal A site bin/ing
an/ translocation oI t#NA to the P site. Here, we show that non-bri/ging phosphate oxygen atoms within
the t#NA antico/on stem-loop at positions 33, 35, an/ 37 are important Ior A site bin/ing. Those at
positions 34 an/ 36 are not necessary Ior bin/ing, but are essential Ior translocation. Our results correlate
with structural /ata, in/icating that position 34 interacts with the highly conserve/ 16S r#NA base
G966 an/ position 36 interacts with the universally conserve/ t#NA base U33 /uring translocation to the
P site.
Stacking of Crick Wobble pair and Watson-Crick pair:
stability rules of G-& pairs at ends of helical stems in tRNAs
and the relation to codon-anticodon Wobble interaction.
Mizuno H. Sundaralingam M.
The occurrence oI the noncomplementary G-U base pair at the en/ oI a helix is Ioun/ to be governe/ by
stacking interactions. As a rule, a G-U pair with G on the 5'-si/e oI a Watson-Crick base pair exhibits
strikingly greater stacking overlap with the Watson-Crick base pair than a G-U pair on the 3'-si/e oI a
Watson-Crick base pair. The Iormer arrangement is expecte/ to be more stable an/ in/ee/ is observe/ 29
times out oI 32 in the known transIer #NA molecules. In accor/ance with this rule, the maior wobble
base pairs G-U or I-U in co/on-antico/on interactions have G or I on the 5'-si/e oI the antico/on.
Similarly, in initiator t#NAs, this rule is obeye/ where now the G is the Iirst letter oI the co/on (5'-si/e). In
the situation where U is in the wobble position oI the antico/on, it is usually substitute/ at C(5) an/may
also have a 2-thio group an/ it can rea/ one to Iour co/ons /epen/ing on its mo/iIications. A G at the
wobble position oI the antico/on can recognize the two co/ons en/ing with U or C an/ mo/iIication oI G
(unless it is I) /oes not change its rea/ing properties.
&nique tertiary and neighbor interactions determine
conservation patterns of Cis Watson-Crick A/G base-pairs.
Sponer 1. Mokdad A. Sponer 1. Spackova N. Leszczynski 1. Leontis NB.
Institute oI Biophysics, Aca/emy oI Sciences oI the Czech #epublic an/ National Center Ior
Biomolecular Research. Kraloopolska 135. 612 65 Brno. (zech Republic.
sponer(ncbr.chemi.muni.cz
X-rav. phvlogenetic and quantum chemical analvsis o molecular interactions and conseration
patterns o cis \atson-(rick \.(., AG base-pairs in 16S rRNA. 23S rRNA and other
molecules was carried out. In these base-pairs. the A and G nucleotides interact with their \.(.
edges with glvcosidic bonds oriented cis relatie to each other. 1he base-pair is stabilised bv two
hvdrogen bonds. the (1'-(1' distance is enlarged and the GN2, amino group is let unpaired.
Ouantum chemical calculations show that. in the absence o other interactions. the unpaired
amino group is substantiallv non-planar due to its partial sp3, pvramidalization. while the whole
base-pair is internallv propeller twisted and erv lexible. 1he unique molecular properties o the
cis \.(. AG base-pairs make them distinct rom other base-pairs. 1hev occur mostlv at the ends
o canonical helices. where thev sere as interaces between the helix and other motis. 1he cis
\.(. AG base-pairs plav crucial roles in natural RNA structures with salient sequence
conseration patterns. 1he kev contribution to conseration is proided bv the unpaired GN2,
amino group that is inoled in a wide range o tertiarv and neighbor contacts in the crvstal
structures. Manv o them are oriented out o the plane o the guanine base and utilize the partial
sp3, pvramidalization o the GN2,. 1here is a lack o AG to GA coariation. which. except
or the GN2, position. would be entirelv isosteric. On the contrarv. there is a rather requent
occurrence o GA to GU coariation. as the GU wobble base-pair has an unpaired amino group
in the same position as the cis \.(. GA base-pair. 1he cis \.(. AG base-pairs are not
consered when there is no tertiarv or neighbor interaction. Obtaining the proper picture o the
interactions and phvlogenetic patterns o the cis \.(. AG base-pairs requires a detailed analvsis
o the relation between the molecular structures and the energetics o interactions at a leel o
single l-bonds and contacts.
Recognition of nucleic acid bases and base-pairs by
hydrogen bonding to amino acid side-chains.
Che3 AC, Che3 , Iuhrma33 CN, Ira3kel AD.
Department o Biochemistrv and Biophvsics. Uniersitv o (aliornia. 513 Parnassus Aenue.
San lrancisco. (A 94143-0448. USA.
Sequence-speciic protein-nucleic acid recognition is determined. in part. bv hvdrogen bonding
interactions between amino acid side-chains and nucleotide bases. 1o examine the repertoire o
possible interactions. we hae calculated geometricallv plausible arrangements in which amino
acids hvdrogen bond to unpaired bases. such as those ound in RNA bulges and loops. or to
the 53 possible RNA base-pairs. \e ind 32 possible interactions that inole two or more
hvdrogen bonds to the six unpaired bases including protonated A and (,. 1 o which hae been
obsered. \e ind 186 "spanning" interactions to base-pairs in which the amino acid hvdrogen
bonds to both bases. in principle allowing particular base-pairs to be selectielv targeted. and
nine o these hae been obsered. lour calculated interactions span the \atson-(rick pairs
and 15 span the G:U wobble pair. including two interesting arrangements with three hvdrogen
bonds to the Arg guanidinum group that hae not vet been obsered. 1he inherent donor-acceptor
arrangements o the bases support manv possible interactions to Asn or Gln, and Ser or 1hr
or 1vr,. ew interactions to Asp or Glu, een though seeral alreadv hae been obsered. and
interactions to
U (or T) only iI the base is in an unpaire/ context, as also observe/ in several cases. This stu/y highlights
how complementary arrangements oI /onors an/ acceptors can contribute to base-speciIic
recognition oI #NA, pre/icts interactions not yet observe/, an/ provi/es tools to analyze propose/ contacts
or /esign novel interactions.
Statistical analysis of atomic contacts at RNA-protein
interfaces.
'J4- M, e8th41 L.
Laboratoire /e Biostatistique et /'InIormatique Me/icale, Faculte /e Me/ecine, Universite Louis Pasteur, 4
rue Kirshleger, F-67000 Strasbourg, France.
Forty-Iive crystals oI complexes between proteins an/ #NA molecules Irom the Protein Data Bank have
been statistically surveye/ Ior the number oI contacts between #NA components (phosphate, ribose an/
the Iour bases) an/ amino aci/ si/e chains. Three groups oI complexes were /eIine/: the t#NA
synthetases the ribosomal complexes an/ a thir/ group containing a variety oI complexes. The types oI
atomic contacts were a priori classiIie/ into ionic, neutral H-bon/, C-H...0 H-bon/, or van /er Waals
interaction. All the contacts were organize/ into a relational /atabase which allows Ior statistical analysis.
The main conclusions are the Iollowing: (i) in all three groups oI complexes, the most preIerre/ amino
aci/s (Arg, Asn, Ser, Lys) an/ the less preIerre/ ones (Ala, Ile, Leu, Val) are the same Tip an/ Cys are
rarely observe/ (respectively 15 an/ 5 amino aci/s in the ensemble oI interIaces) (ii) oI the total
number oI amino aci/s locate/ at the interIaces 22 are hy/rophobic, 40 charge/ (positive 32, negative
8), 30 polar an/ 8 are Gly (iii) in ribosomal complexes, phosphate is preIerre/ over ribose, which is
preIerre/ over the bases, but there is no signiIicant preIerence in the other two groups (iv) there is no
signiIicant prevalence oI a base type at protein-#NA interIaces, but speciIically Arg an/ Lys /isplay a
preIerence Ior phosphate over ribose an/ bases Pro an/ Asn preIer bases over ribose an/ phosphate Met,
Phe an/ Tyr preIer ribose over phosphate an/ bases. Further,
Ile, Pro, Ser preIer A over the others Leu preIers C Asp an/ Gly preIer G an/ Asn preIers U. Consi/ering
the contact types, the Iollowing conclusions coul/ be /rawn: (i) 23 oI the contacts are via potential H-
bon/s (inclu/ing CH...0 H-bon/s an/ ionic interactions), 72 belong to van /er Waals interactions an/
5 are consi/ere/ as short contacts (ii) oI all potential H-bon/s, 54 are stan/ar/, 33 are oI the C-H...0
type an/ 13 are ionic (iii) the Watson-Crick sites oI G, 06(G) an/ principally N2(G) an/ the hy/roxyl
group 02' is more oIten involve/ in H-bon/s than expecte/ the protein main chain is involve/ in 32 an/ the
si/e chains in 68 oI the H-bon/s consi/ering the neutral an/ ionic H-bon/s, the Iollowing couples are
more Irequent than expecte/-base A-Ser, base G-Asp/Glu, base U-Asn. The #NA CH groups interact
preIerentially with oxygen atoms (62 on the main chain an/ 19 on the si/e chains) (iv) the bases are
involve/ in 38 oI all H-bon/s an/ more than 26 oI the H-bon/s have the H /onor group on the #NA
(v) the atom 02' is involve/ in 21 oI all H-bon/s, a number greater than expecte/ (vi) amino aci/s less
Irequently in /irect contact with #NA components interact Irequently via their main chain atoms through
water molecules with #NA atoms in contrast, those Irequently observe/ in /irect contact, except Ser, use
instea/ their si/e chain atoms Ior water bri/ging interactions. Copyright 2001 John Wiley & Sons, Lt/.
Protein-RNA interactions: structural analysis and functional
classes.
Llli8 JJ, r44m M, J43e8 S.
Department o Biochemistrv. School o Lie Sciences. Uniersitv o Sussex. lalmer. BN1 9Rl.
United Kingdom.
A data set o 89 protein-RNA complexes has been extracted rom the Protein Data Bank. and the
nucleic acid recognition sites characterized through direct contacts. accessible surace area. and
secondarv structure motis. 1he dierences between RNA recognition sites that bind to RNAs in
unctional classes has also been analvzed. Analvsis o the complete data set reealed that an der
\aals interactions are more numerous than hvdrogen bonds and the contacts made to the nucleic
acid backbone occur more requentlv than speciic contacts to nucleotide bases. O the
base-speciic contacts that were obsered. contacts to guanine and adenine occurred most
requentlv. 1he most aored amino acid-nucleotide pairings obsered were lvsine-phosphate.
tvrosine-uracil. arginine-phosphate. phenvlalanine-adenine and trvptophan-guanine. 1he amino
acid propensities showed that positielv charged and polar residues were aored as expected.
but also so were trvptophan and glvcine. 1he propensities calculated or the unctional classes
showed trends similar to those obsered or the complete data set. loweer. the analvsis o
hvdrogen bond and an der \aal contacts showed that in general proteins complexed with
messenger RNA. transer RNA and iral RNA hae more base speciic contacts and less
backbone contacts than expected. while proteins complexed with ribosomal RNA hae less
base-speciic contacts than the expected. lence. whilst the tvpes o amino acids inoled in the
interaces are similar. the distribution o speciic contacts is dependent upon the unctional class
o the RNA bound.
Protein-nucleic acid recognition: statistical analysis of
atomic interactions and influence of DNA structure.
Le'eune D Del8aux N, Charl4teaux , %h4ma8 A, ra88eur R.
(entre de Biophvsique Moleculaire Numerique. laculte Uniersitaire des Sciences
Agronomiques. Gembloux. Belgium.
\e analvzed structural eatures o 11.038 direct atomic contacts either electrostatic. l-bonds.
hvdrophobic. or other an der \aals interactions, extracted rom 139 protein-DNA and 49
protein-RNA nonhomologous complexes rom the Protein Data Bank PDB,. Globallv. l-bonds
are the most requent interactions approximatelv 50,. ollowed bv an der \aals.
hvdrophobic. and electrostatic interactions. lrom the protein iewpoint. hvdrophilic amino acids
are oer-represented in the interaction databases: Positielv charged amino acids mainlv contact
nucleic acid phosphate groups but can also interact with base edges. lrom the nucleotide point o
iew. DNA and RNA behae dierentlv: Most protein-DNA interactions inole phosphate
atoms. while protein-RNA interactions inole more requentlv base edge and ribose atoms. 1he
increased participation o DNA phosphate inoles l-bonds rather than salt bridges. A statistical
analvsis was perormed to ind the occurrence o amino acid-nucleotide pairs most dierent rom
chance. These pairs were analyze/ in/ivi/ually. Finally, we stu/ie/ the conIormation oI DNA in the
interaction sites. Despite the prevalence oI B-DNA in the /atabase, our results suggest that A-DNA is
Iavore/ in the interaction sites
Amino acid-base interactions: a three-dimensional analysis
of protein-DNA interactions at an atomic level.
Lu8c4mbe NM, La8k4w8ki RA, %h4r3t43 JM.
Biomolecular Structures an/ Mo/elling Unit, Department oI Biochemistry an/ Molecular Biology,
University College, Gower Street, Lon/on WC1 6BT, UK.
To assess whether there are universal rules that govern amino aci/-base recognition, we investigate
hy/rogen bon/s, van /er Waals contacts an/ water-me/iate/ bon/s in 129 protein-DNA complex
structures. DNA-backbone interactions are the most numerous, provi/ing stability rather than speciIicity. For
base interactions, there are signiIicant base-amino aci/ type correlations, which can be rationalise/ by
consi/ering the stereochemistry oI protein si/e chains an/ the base e/ges expose/ in the DNA structure.
Nearly two-thir/s oI the /irect rea/-out oI DNA sequences involves complex networks oI hy/rogen
bon/s, which enhance speciIicity. Two-thir/s oI all protein-DNA interactions comprise van /er Waals
contacts, compare/ to about one-sixth each oI hy/rogen an/ water-me/iate/ bon/s. This highlights the
central importance oI these contacts Ior complex Iormation, which have previously been relegate/ to a
secon/ary role. Although common, water-me/iate/ bon/s are usually non-speciIic, acting as space-Iillers at
the protein-DNA interIace. In conclusion, the maiority oI amino aci/-base interactions observe/ Iollow
general principles that apply across all protein-DNA complexes, although there are in/ivi/ual exceptions.
ThereIore, we /istinguish between interactions whose speciIicities are 'universal' an/ 'context-/epen/ent'.
An interactive Web-base/ atlas oI si/e chain-base contacts provi/es access to the collecte/ /ata,
inclu/ing analyses an/ visualisation oI the three-/imensional geometry oI the interactions.
Deterministic features of side-chain main-chain hydrogen
bonds in globular protein structures.
L8war N, Ramakri8h3a3 C.
Molecular Biophysics Unit, In/ian Institute oI Science, Bangalore 560 012, In/ia.
A total oI 19 835 polar resi/ues Irom a /ata set oI 250 non-homologous an/ highly resolve/ protein
crystal structures were use/ to i/entiIy si/e-chain main-chain (SC-MC) hy/rogen bon/s. The ratio oI the
number oI SC-MC hy/rogen bon/s to the total number oI polar resi/ues is close to 1:2, in/icating the
ubiquitous nature oI such hy/rogen bon/s. Close to 56 oI the SC-MC hy/rogen bon/s are local
involving si/e-chain acceptor//onor ('i') an/ a main-chain /onor/acceptor within the win/ow i-5 to i5.
These short-range hy/rogen bon/s Iorm well /eIine/ conIormational motiIs characterize/ by speciIic
combinations oI backbone an/ si/e-chain torsion angles. (a) The Ser/Thr resi/ues show the greatest
preIerence in Iorming intra-helical hy/rogen bon/s between the atoms O(gamma)(i) an/ 0(i-4). More
than halI the examples oI such hy/rogen bon/s are Ioun/ at the mi//le oI alpha-helices rather than at
their
ends. 1he most aoured moti o these examples is alphaR,alphaR,alphaR,alphaR,g-,,. b,
1hese residues also show great preerence to orm hvdrogen bonds between Ogamma,i, and
0i-3,. which are closelv related to the preious tvpe and though intra-helical. these hvdrogen
bonds are more oten ound at the (-termini o helices than at the middle. 1he moti represented
bv alphaR,alphaR,alphaR,alphaR,g-,, is most preerred in these cases. c, 1he Ser. 1hr and
Glu are the most requentlv ound residues participating in intra-residue hvdrogen bonds
between the side-chain and main-chain o the same residue, which are characterized bv speciic
motis o the orm betag-,, or Ser1hr residues and alphaR,g-,g-,t, or GluGln. d, 1he
side-chain acceptor atoms o AsnAsp and Ser1hr residues show high preerence to orm
hvdrogen bonds with acceptors two residues ahead in the chain. which are characterized bv the
motis beta tt',alphaR and betat,alphaR,. respectielv. 1hese hvdrogen bonded segments.
reerred to as Asx turns. are known to proide stabilitv to tvpe I and tvpe I' beta-turns. e,
Ser1hr residues oten orm a combination o S(-M( hvdrogen bonds. with the side-chain donor
hvdrogen bonded to the carbonvl oxvgen o its own peptide backbone and the side-chain
acceptor hvdrogen bonded to an amide hvdrogen three residues ahead in the sequence. Such
motis are quite oten seen at the beginning o alpha-helices. which are characterized bv the
betag-,,alphaR,alphaR, moti. A remarkable majoritv o all these hvdrogen bonds are buried
rom the protein surace. awav rom the surrounding solent. 1his stronglv indicates the
possibilitv o side-chains plaving the role o the backbone. in the protein interiors. to satisv the
potential hvdrogen bonding sites and maintaining the network o hvdrogen bonds which is
crucial to the structure o the protein.
Database of non-canonical base pairs found in known RNA
structures.
Nagaswamy &. 'oss N. Zhang Z. Fox GL.(ollaborators 1,
Fox G.
Department o Biologv. Uniersitv o louston. louston. 1X 204-5934. USA.
Atomic resolution RNA structures are being published at an increasing rate. It is common to
ind a modest number o non-canonical base pairs in these structures in addition to the usual
\atson-(rick pairs. 1his database summarizes the occurrence o these rare base pairs in
accordance with standard nomenclature. 1he database. http:prion.bchs.uh.edu. contains
inormation such as sequence context. sugar pucker conormation. anti svn base conormations.
chemical shit. p K a,alues. melting temperature and ree energv. O the 29 anticipated pairs
with two or more hvdrogen bonds. 20 hae been encountered to date. In addition. our
unexpected pairs with two hvdrogen bonds hae been reported bringing the total to 24. Single
hvdrogen bond ersions o ie o the expected geometries hae been encountered among the
single hvdrogen bond interactions. In addition. 18 dierent tvpes o base triplets hae been
encountered. each o which inoles three to six hvdrogen bonds. 1he ast majoritv o the rare
base pairs are antiparallel with the bases in the anti coniguration relatie to the ribose. 1he most
common are the GU wobble. the Sheared GA pair. the Reerse loogsteen pair and the GA imino
pair.
Non-canonical base pairs and higher order structures in
nucleic acids: crystal structure database analysis.
Da8 J, Mukheriee S Mitra A, hattacharvva D.
Biophvsics Diision. Saha Institute o Nuclear Phvsics. 1Al Bidhannagar. Kolkata 00064.
India.
Non-canonical base pairs. mostlv present in the RNA. oten plav a prominent role towards
maintaining their structural diersitv. ligher order structures like base triples are also
important in deining and stabilizing the tertiarv olded structure o RNA. \e hae
deeloped a new program BPlIND to analvze dierent tvpes o canonical and non-canonical
base pairs and base triples inoling at least two direct hvdrogen bonds ormed between polar
atoms o the bases or sugar 02' onlv. \e considered 104 possible tvpes o base pairs. out o
which examples o 8 base pair tvpes are ound to occur in the aailable RNA crvstal structures.
Analvsis indicates that approximatelv 32. base pairs in the unctional RNA structures are
non-canonical. which include dierent tvpes o GA and GU \obble base pairs apart rom a
wide range o base pair possibilities. \e urther noticed that more than 10.4 o these base pairs
are inoled in triplet ormation. most o which plav important role in maintaining long-range
tertiarv contacts in the three-dimensional olded structure o RNA. Apart rom detection. the
program also gies a quantitatie estimate o the conormational deormation o detected base
pairs in comparison to an ideal planar base pair. 1his helps us to gain insight into the extent o
their structural ariations and thus assists in understanding their speciic role towards structural
and unctional diersitv.
Geometric nomenclature and classification of RNA base
pairs.
Le43ti8 N, e8th41 L.
(hemistrv Department and (enter or Biomolecular Sciences. Bowling Green State Uniersitv.
Ohio 43403. USA. Leontis(bgnet.bgsu.edu
Non-\atson-(rick base pairs mediate speciic interactions responsible or RNA-RNA sel-
assemblv and RNA-protein recognition. An unambiguous and descriptie nomenclature with well-
deined and nonoerlapping parameters is needed to communicate conciselv structural
inormation about RNA base pairs. 1he deinitions should relect underlving molecular
structures and interactions and. thus. acilitate automated annotation. classiication. and
comparison o new RNA structures. \e propose a classiication based on the obseration that
the planar edge-to-edge. hvdrogen-bonding interactions between RNA bases inole one o three
distinct edges: the \atson-(rick edge. the loogsteen edge. and the Sugar edge which includes
the 2'-Ol and which has also been reerred to as the Shallow-grooe edge,. Bases can interact in
either o two orientations with respect to the glvcosidic bonds. cis or trans relatie to the
hvdrogen bonds. 1his gies rise to 12 basic geometric tvpes with at least two l bonds
connecting the bases. lor each geometric tvpe. the relatie orientations o the strands can be
easilv deduced. ligh-resolution examples o 11 o the 12 geometries are presentlv aailable.
Biurcated pairs. in which a single exocvclic carbonvl or amino group o one base directlv
contacts the edge o a second base. and water-inserted pairs. in which single unctional groups on
each base interact directlv. are intermediate between two o the standard geometries. 1he
nomenclature Iacilitates the recognition oI isosteric relationships among base pairs within each geometry, an/ thus
Iacilitates the recognition oI recurrent three-/imensional motiIs Irom comparison oI homologous sequences.
Graphical conventions are propose/ Ior /isplaying non-Watson-Crick interactions on a secon/ary structure
/iagram. The utility oI the classiIication in homology mo/eling oI #NA tertiary motiIs is illustrate/
%44l8 14r the aut4matic ide3ti1icati43 a3d cla88i1icati43 41
RNA ba8e pair8.
Ya3g , J488i3et I, Le43ti8 N, Che3 L, e8tbr44k J, erma3 , e8th41 L. Department oI
Chemistry an/ Chemical Biology, #utgers University, NJ 08854-8087, USA.
Three programs have been /evelope/ to ai/ in the classiIication an/ visualization oI #NA structure. BPViewer
provi/es a web interIace Ior /isplaying three-/imensional (3D) coor/inates oI in/ivi/ual base pairs or base pair
collections. A web server, #NAview, automatically i/entiIies an/ classiIies the types oI base pairs that are
Iorme/ in nucleic aci/ structures by various combinations oI the three e/ges, Watson-Crick, Hoogsteen an/ the
Sugar e/ge. #NAView pro/uces two-/imensional (2D) /iagrams oI secon/ary an/ tertiary structure in either Postscript,
V#ML or #NAML Iormats. The application #NAMLview can be use/ to rearrange various parts oI the #NAView
2D /iagram to generate a stan/ar/ representation (like the cloverleaI structure oI t#NAs) or any layout /esire/ by the
user. A 2D /iagram can be rapi/ly reIormatte/ using #NAMLview since all the parts oI #NA (like helices an/ single
stran/s) are /ynamically linke/ while moving the selecte/ parts. With the base pair annotation an/ the 2D graphic
/isplay, #NA motiIs are rapi/ly i/entiIie/ an/ classiIie/. A survey has been carrie/ out Ior 41 unique structures
selecte/ Irom the NDB /atabase. The statistics Ior the occurrence oI each e/ge an/ oI each oI the 12 by Iamilies
are given Ior the combinations oI the Iour bases: A, G, U an/ C. The program also allows Ior visualization oI the base
pair interactions by using a symbolic convention previously propose/ Ior base pairs. The web servers Ior BPViewer
an/ #NAview are available at http://n/bserver.rutgers.e/u/services/. The application #NAMLview can also be
/ownloa/e/ Irom this site. The 2D /iagrams pro/uce/ by #NAview are available Ior #NA structures in the Nucleic
Aci/ Database (NDB) at http://n/bserver.rutgers.e/u/atlas/.
%he 343-at843-Crick ba8e pair8 a3d their a884ciated
i848tericity matrice8.
Le43ti8 N, St4mbaugh J, e8th41 L.
Chemistry Department an/ Center Ior Biomolecular Sciences, Overman Hall, Bowling Green State University,
Bowling Green, OH 43403, USA. leontis(bgnet.bgsu.e/u
#NA molecules exhibit complex structures in which a large Iraction oI the bases engage in non-Watson-Crick
base pairing, Iorming motiIs that me/iate long-range #NA-#NA interactions an/ create bin/ing sites Ior proteins an/
small molecule ligan/s. The rapi/ly growing number oI three-/imensional #NA structures at atomic resolution
requires that /atabases contain the annotation oI such base pairs. An unambiguous an/ /escriptive nomenclature
was propose/ recently in which #NA base pairs were classiIie/ by the base e/ges participating in the
interaction (Watson-Crick, Hoogsteen/CH or sugar e/ge) an/ the orientation oI the glycosi/ic bon/s
relative to the hy/rogen bon/s (cis or trans). Twelve basic geometric Iamilies were i/entiIie/ an/ all 12
have been observe/ in crystal structures. For each base pairing Iamily, we present here the 4 x 4 'isostericity
matrices' summarizing the geometric relationships between the 16 pairwise combinations oI the Iour
stan/ar/ bases, A, C, G an/ U. Whenever available, a representative example oI each observe/ base pair
Irom X-ray crystal structures (3.0 A resolution or better) is provi/e/ or, otherwise, theoretically plausible
mo/els. This Iormat makes apparent the recurrent geometric patterns that are observe/ an/ helps i/entiIy
isosteric pairs that co-vary or interchange in sequences oI homologous molecules while maintaining
conserve/ three-/imensional motiIs.
&nique tertiary and neighbor interactions determine
conservation patterns of Cis Watson-Crick A/G base-pairs.
Sp43er J, M4kdad A, Sp43er JL, Spack4va N, Le8zczv38ki J, Le43ti8 N.
Institute oI Biophysics, Aca/emy oI Sciences oI the Czech #epublic an/ National Center Ior
Biomolecular #esearch, Kralovopolska 135, 612 65 Brno, Czech #epublic.
sponer(ncbr.chemi.muni.cz
X-ray, phylogenetic an/ quantum chemical analysis oI molecular interactions an/ conservation patterns oI
cis Watson-Crick (W.C.) A/G base-pairs in 16S r#NA, 23S r#NA an/ other molecules was carrie/ out.
In these base-pairs, the A an/ G nucleoti/es interact with their W.C. e/ges with glycosi/ic bon/s oriente/ cis
relative to each other. The base-pair is stabilise/ by two hy/rogen bon/s, the C1'-C1' /istance is enlarge/ an/
the G(N2) amino group is leIt unpaire/. "uantum chemical calculations show that, in the absence oI
other interactions, the unpaire/ amino group is substantially non-planar /ue to its partial sp(3)
pyrami/alization, while the whole base-pair is internally propeller twiste/ an/ very Ilexible. The unique
molecular properties oI the cis W.C. A/G base-pairs make them /istinct Irom other base-pairs. They occur
mostly at the en/s oI canonical helices, where they serve as interIaces between the helix an/ other motiIs.
The cis W.C. A/G base-pairs play crucial roles in natural #NA structures with salient sequence
conservation patterns. The key contribution to conservation is provi/e/ by the unpaire/ G(N2) amino
group that is involve/ in a wi/e range oI tertiary an/ neighbor contacts in the crystal structures. Many oI
them are oriente/ out oI the plane oI the guanine base an/ utilize the partial sp(3) pyrami/alization oI the
G(N2). There is a lack oI A/G to G/A covariation, which, except Ior the G(N2) position, woul/ be
entirely isosteric. On the contrary, there is a rather Irequent occurrence oI G/A to G/U covariation, as the
G/U wobble base-pair has an unpaire/ amino group in the same position as the cis W.C. G/A base-pair.
The cis W.C. A/G base-pairs are not conserve/ when there is no tertiary or neighbor interaction. Obtaining
the proper picture oI the interactions an/ phylogenetic patterns oI the cis W.C. A/G base-pairs requires a
/etaile/ analysis oI the relation between the molecular structures an/ the energetics oI interactions at a
level oI single H-bon/s an/ contacts.
DIAL: a web server for the pairwise alignment of two RNA
three-dimensional structures using nucleotide. dihedral
angle and base-pairing similarities.
e77 , P43ty Y, L4re3z A, Io9e P.
Harvar/ Me/ical School, Chil/ren's Hospital, Hematology/Oncology Department, Boston, MA 02115,
USA.
DIAL (/ihe/ral alignment) is a web server that provi/es public access to a new /ynamic programming
algorithm Ior pairwise 3D structural alignment oI #NA. DIAL achieves qua/ratic time by perIorming an
alignment that accounts Ior (i) pseu/o-/ihe/ral an//or /ihe/ral angle similarity, (ii) nucleoti/e sequence
similarity an/ (iii) nucleoti/e base-pairing similarity. DIAL provi/es access to three alignment
algorithms: global (Nee/leman-Wunsch), local (Smith-Waterman) an/ semiglobal (mo/iIie/ to yiel/
motiI search). Suboptimal alignments are optionally returne/, an/ also Boltzmann pair probabilities
Pr(a(i),b(i)) Ior aligne/ positions a(i) , b(i) Irom the optimal alignment. II a non-zero suboptimal
alignment score ratio is entere/, then the semiglobal alignment algorithm may be use/ to /etect
structurally similar occurrences oI a user-speciIie/ 3D motiI. The query motiI may be contiguous in the
linear chain or Iragmente/ in a number oI noncontiguous regions. The DIAL web server provi/es
graphical output which allows the user to view, rotate an/ enlarge the 3D superposition Ior the optimal (an/
suboptimal) alignment oI query to target. Although graphical output is available Ior all three algorithms, the
semiglobal motiI search may be oI most interest in attempts to i/entiIy #NA motiIs. DIAL is available at
http://bioinIormatics.bc.e/u/clotelab/DIAL.

You might also like