FFinal Draft of MSC Project Report

Introduction
Evolution is the change in the inherited traits, or variations, of a

population of organisms through successive generations. The variation
comes from mutations in genetic material or the gene flow, and
reshuffling of genes through sexual reproduction (1).Mutations cause
change in the sequence of nucleotide in the DNA resulting in an altered
amino acids sequence in proteins. Comparison of sequences allows
determination of the extent of change in information that is preserved
throughout the evolution. When the genetic variation affects protein
sequences, the rate of amino acid substitution reflects both Darwinian
selection for functionally advantageous mutations and selectively
neutral evolution operating within the constraints of structure and
function (2).During neutral evolution, mutations accumulate by random
drift and amino acid substitutions are dependent on the intramolecular
and intermolecular interactions and the microenvironment. (3). During
the evolution there are restraints on sequence, due to their selectively
advantageous importance in the structure and function, leading to the
conservation of local sequences and structures in families and super
families (4). In closely related proteins the majority of sequence
differences are in their peripheral, solvent exposed residues, whereas
more distantly related proteins will additionally have mutations in their
buried residues resulting in more structural change. This is in
accordance with the observation that residues buried in the protein
core are more conserved both in terms of sequence and structure than
those that are solvent accessible (5). These studies have been
repeated with larger data sets with broadly similar conclusions.
Polypeptides fold into unique 3-D structures that make them
functional. Most folding involves local ordered structure due to
hydrogen bonding within the polypeptide such that the polar residues
are exposed at the outer surface while the side chains of non-polar
amino acids are hidden inside(6). Stereo-chemically, α-helix is a major
secondary structure with 3.6 amino acids per turn of the polypeptide
backbone. Similarly, the laterally hydrogen bonded regions of the
polypeptide form β-sheets and glycine residues favor their formation.
The remaining string is the Random Coil a major portion of which lies
at the surface of the protein. α helical regions of a protein are
considered as most important in terms of function, while β sheet often
serve as scaffolds to support globular proteins (7) .
The Phylogenetic tree based on the protein sequence can be used to
deduce the evolutionary relatedness among the species that are
known to have common ancestor. It plays a fundamental role in many
biological problems such as multiple sequence alignment, protein
structure and function prediction, and drug design (8).
1
There are two general categories of methods for calculating
phylogenetic trees: distance-based and character-based. The distance-
based methods compute a matrix of pair wise distances between
sequences in an alignment, and then construct a tree based entirely on
the distance computations. Neighbor-Joining (9), WEIGHBOR (10),
BIONJ (11), FASTME (12) and a latest approach considering maximum-
likelihood estimated triplets of sequences (13) belong to this category.
The disadvantages of distance-based methods include the inevitable
loss of evolutionary information when a sequence alignment is
converted to pair wise alignment and bad performance on large
datasets.
Character-based methods examine each column of the alignment

separately and look for the tree that best accommodates all of this
information, such as maximum parsimony (MP) (14) or maximum
likelihood (ML) (15). MP chooses tree that minimizes number of
changes required to explain data. ML, under a model of sequence
evolution, finds a tree that gives the highest likelihood of the observed
data. Character-based methods are information rich for there is a
hypothesis for every column in the alignment. However, the MP
method is NP-hard. ML has unknown complexity and is hard to solve in
practice. Primary sources of phylogenetic tree construction software
include PHYLIP (available at
http://evolution.genetics.washington.edu/phylip.html), Mr. Bayes (16),
PAUP (17), and TNT (18).
Protein sequences diverge during evolution, but we do not know

how this divergence impacts the secondary structure of polypeptides,
and which secondary structure of polypeptide contributes to the
evolution without affecting the function of protein The region of
polypeptide that remains conserved throughout evolution irrespective
of time frame of evolution is unknown.. One would therefore expect
that in closely related species, the amino acid sequences will be very
similar thereby giving rise to similar structural motifs after folding the
polypeptide.
In this study I test the possibility that discrete amino acid

stretches giving rise to specific secondary structural motifs would tend
to resemble across closely related species i.e. the phylogenetic
significance of polypeptide chains covering the three conformations
(alpha Helix, Beta sheet and random coil). Out of the 3 secondary
structure which one is most conserved and which one is most diverged
as compared to evolutionary time scale( by considering the species fo
different life forms.
2
Phylogenetic Tree:
A phylogenetic tree, also called an evolutionary tree, shows the

evolutionary relationships among various species or other entities that
have a common ancestor. In a phylogenetic tree, each node with
descendants represents the most recent common ancestor of the
descendants, and the edge lengths in some trees correspond to time
estimates. Each node is called a taxonomic unit. Internal nodes are
generally called hypothetical taxonomic units (HTUs) as they cannot be
directly observed.
Tree Structure and Terminology:
Root: is the common ancestor of all taxa.
Branch: defines the relationship between the taxa in terms of descent & ancestry
Node: a node represents a taxonomic unit. This can be a taxon (an existing
genus) or an unknown ancestor of 2 or more species.
Topology: is the branching pattern.
Branch length: represents the number of changes that have occurred in that
branch.
Distance scale: scale representing the number of differences between

sequences (e.g. 0.1 = 10% difference between two sequences).
3
Figure 1. Tree Structure & topology.
[http://users.ugent.be/~avierstr/principles/phylogeny.html
Methods of phylogenetic analysis:
There are two major groups of analyses to examine phylogenetic

relationships between sequences:
Phenetic methods(19): trees are calculated by similarities of

sequences and are based on distance methods. The resulting tree is
called a dendrogram and does not necessarily reflect evolutionary
relationships. Distance methods compress all of the individual
differences between pairs of sequences into a single number.
Cladistic methods(20): trees are calculated by considering the

various possible pathways of evolution and are based on parsimony or
4
likelihood methods. The resulting tree is called a cladogram. Cladistic
methods use each alignment position as evolutionary information to
build a tree.
Phenetic methods based on distances:
Starting from an alignment of DNA sequences, pair wise distances are

calculated as the sum of all base pair differences between two
sequences by assuming that (the most similar sequences are closely
related). This creates a distance matrix. All base changes can be
considered equally or a matrix of the possible replacements can be
used. Insertions and deletions are given a larger weight than
replacements. Insertions or deletions of multiple bases at one position
are given less weight than multiple independent insertions or
deletions. It is possible to correct for multiple substitutions at a single
site.
From the obtained distance matrix, a phylogenetic tree is calculated

with clustering algorithms. These cluster methods construct a tree by
linking the least distant pair of taxa, followed by successively more
distant Taxa.
Example Steps in creating the phylogenetic tree based on distance

method
Sequences
Sequence A ACGCGTTGGGCGATGGCAAC
Sequence B ACGCGTTGGGCGACGGTAAT
Sequence C ACGCATTGAA TGATGATAAT
Sequence D ACACATTGAG TGATAATAAT
B. Distances between sequences, the number of steps required to

change one sequence into the other.
NAB 3
NAC 7
NAD 6
NBC 6
NBD 7
NCD 3
C. Distance Table
A B C D
A __ 3 7 6
5
B __ __ 6 7
C __ __ __ 3
D __ __ __ __
D. The assumed phylogenetic tree for the sequences A-D showing

branch lengths. The sum of the branch lengths between any two
sequences on the trees has the same value as the distance between
the sequences.
Clustering methods
UPGMA clustering (Unweighted Pair Group Method using Arithmetic

averages) : this is the simplest method that assumes a constant rate of
evolution.
Neighbor Joining: this method tries to correct the UPGMA method for
its assumption that the rate of evolution is the same in all taxa.
Cladistic methods based on Parsimony:
For each position in the alignment, all possible trees are evaluated and
are given a score based on the number of evolutionary changes
needed to produce the observed sequence changes. The most
parsimonious tree is the one with the fewest evolutionary changes for
all sequences to derive from a common ancestor. This is a more time-
consuming method than the distance methods.
Cladistic methods based on Maximum Likelihood:
This method also uses each position in an alignment, evaluates all

possible trees, and calculates the likelihood for each tree using an
explicit model of evolution (<-> Parsimony just looks for the fewest
6
evolutionary changes). The likelihood's for each aligned position are
then multiplied to provide likelihood for each tree. The tree with the
maximum likelihood is the most probable tree. This is the slowest
method of all but seems to give the best result and the most
information about the tree.
Protein sequences used in this study.
Ubiquitin
Ubiquitin is a small, highly-conserved regulatory protein expressed

ubiquitously in eukaryotes. The polypeptide ubiquitin has 76 amino
acids and a molecular mass of 8.5 kDa. Key features include its C-
terminal tail and the 7 Lys residues. It is highly conserved among
eukaryotic species: Human and yeast ubiquitin share 96% sequence
identity. Ubiquitination (or ubiquitylation) refers to the post-
translational modification of a protein to which one or more ubiquitin
monomers attach covalently via an isopeptide bond. The most
prominent function of ubiquitin is labeling proteins for proteasomal
degradation. Ubiquitination also controls the stability, function, and
intracellular localization of a wide variety of proteins. The
ubiquitylation (or ubiquitination) cascade is started by the E1 enzyme.
Mammalian cells contain 30-40 UBCs. (21)
7
Random coil Beta sheet Alpha Helix
http://en.wikipedia.org/wiki/Ubiquitin
Ubiquitination Ubiquitylation is a process of tagging a protein with

ubiquitin (ubiquitylation or ubiquitination) and consists of a series of
steps:
Activation: Ubiquitin is activated in two-steps by an E1 ubiquitin-

activating enzyme that requires ATP as an energy source. In the first
step, a ubiquitin-adenylate intermediate is produced. In the second
step, ubiquitin is transferrd to the cysteine residue at E1 active site,
with release of AMP. This results in a thioester linkage between the C-
terminal carboxyl group of ubiquitin and the E1 cysteine sulfhydryl
group.
Transfer of ubiquitin from E1 to the active site cysteine of a ubiquitin-

conjugating enzyme E2 via a trans(thio)esterification reaction.
Finally, the ubiquitylation cascade creates an isopeptide bond between

a lysine of the target protein and the C-terminal glycine of ubiquitin. In
general, this step requires the activity of one of the hundreds of E3
ubiquitin-protein ligases (often termed simply ubiquitin ligase). E3
enzymes function as the substrate recognition modules of the system
and are capable of interaction with both E2 and substrate.In the
ubiquitination cascade E1 can bind with dozens of E2s which, in turn,
can bind with hundreds of E3s in a hierarchical way. Other ubiquitin-
like proteins (ULPs) are also modified via the E1–E2–E3 cascade. (22)
8
The ubiquitination system functions in a wide variety of cellular
processes, such as, Antigen processing, Apoptosis, Biogenesis of
organelles, Cell cycle and division, DNA transcription and repair,
Differentiation and development, and Immune response and
inflammation (23)
Cu/zn Superoxide Dismutase

Superoxide dismutases (SOD, EC 1.15.1.1) are a class of enzymes
that catalyze the dismutation of superoxide into oxygen and hydrogen
peroxide. As such, they are an important antioxidant defense in nearly
all cells exposed to oxygen
Reaction
The SOD-catalysed dismutation of superoxide may be written with the

following half-reactions :
• M(n+1)+ − SOD + O2− → Mn+ − SOD + O2

• Mn+ − SOD + O2− + 2H+ → M(n+1)+ − SOD + H2O2.
where M = Cu (n=1) ; Mn (n=2) ; Fe (n=2) ; Ni (n=2).
In this reaction the oxidation state of the metal cation oscillates

between n and n+1.(24)
There are three major families of superoxide dismutase, depending on

the metal cofactors: Cu/Zn, Fe and Mn and Ni. Copper and zinc enzyme
Cu-Zn-SOD) is most common in the cytosols of all eukaryotic cells.
Commercially available Cu-Zn-SOD is purified from the bovine
erythrocytes: The Cu-Zn enzyme is a homodimer of molecular weight
32,500 and the two subunits are joined by hydrophobic and
electrostatic interactions. The ligands of copper and zinc are histidine
side chains.
Prokaryotes and protists use Iron or manganese enzyme (e.g. E. coli)

and many other bacteria also contain the enzyme with iron (Fe-SOD);
some bacteria have Fe-SOD, others Mn-SOD, while some contain both.
Fe-SOD can be found in the plastids of plants. The active sites of Mn
and Fe superoxide dismutases contain the same type of amino acid
side chains. (25)
9
Mn-SOD –Mitochondria, and many bacteria contain Mn-SOD. The
ligands of manganese ions are 3 histidine side chains, an aspartate
side chain and a water molecule or hydroxy ligand depending on the
Mn oxidation state (respectively II and III). (26)
Ni-SOD, found in prokaryotes, is a hexameric structure built from right-

handed 4-helix bundles, each containing N-terminal hooks that chelate
a Ni ion. The Ni-hook contains the motif His-Cys-X-X-Pro-Cys-Gly-X-Tyr,
it provides most of the interactions critical for metal binding and
catalysis and is therefore a likely diagnostic iSOD(27)
Beta sheet Random coil
No Alpha Helix
http://en.wikipedia.org/wiki/Superoxide_dismutase
Crystallographic structure of the human SOD1 enzyme (rainbow

colored N-terminus = blue, C-terminus = red) complexed with copper
(blue-green sphere) and zinc (grey spheres).
Superoxide is one of the main reactive oxygen species in the cell and
serves a key antioxidant role. Defects in SODs results in the severe
pathologies. SOD knockout mice lacking SOD2 die several days after
birth, amidst massive oxidative stress. Mice lacking SOD1 develop
hepatocellular carcinoma, an acceleration of age-related muscle mass
loss, an earlier incidence of cataracts and a reduced lifespan. Mice
lacking SOD3 do not show any obvious defects and exhibit a normal
lifespan, but are more sensitive to hyperoxic injury. Knockout mice of
10
any SOD enzyme are more sensitive to the lethal effects of superoxide
generating drugs, such as paraquat and diquat. (28)
Calmodulin
Calmodulin (CaM = CALcium MODULated proteIN) binds calcium,
regulates a number of protein targets, thereby affecting a variety of
cellular functions
Random coil Alpha

Helix
No Beta sheet
http://en.wikipedia.org/wiki/Calmodulin
Structure
Calmodulin is a small, acidic protein approximately 148 amino acids

long (16706 Dalton) and, contains four EF-hand "motifs", each of which
binds a Ca2+ ion. It has two approximately symmetrical domains,
separated by a flexible "hinge" region. Calcium participates in an
intracellular signalling system by acting as a diffusible second
messenger to the initial stimuli. (29)
Function
CaM mediates processes such as inflammation, metabolism, apoptosis,

smooth muscle contraction, intracellular movement, short-term and
11
long-term memory, nerve growth and the immune response. CaM is
expressed in many cell types and can have different subcellular
locations, including the cytoplasm, within organelles, or associated
with the plasma or organelle membranes. Many of the proteins that
CaM binds are unable to bind calcium themselves, and as such use
CaM as a calcium sensor and signal transducer. CaM can also make
use of the calcium stores in the endoplasmic reticulum, and the
sarcoplasmic reticulum. CaM undergoes a conformational change upon
binding to calcium, which enables it to bind to specific proteins for a
specific response. CaM can bind up to four calcium ions, and can
undergo post-translational modifications, such as phosphorylation,
acetylation, methylation and proteolytic cleavage, each of which can
potentially modulate its actions. Calmodulin can also bind to edema
factor toxin from the anthrax bacteria. (30)
Mechanism
Calcium is bound via the use of the EF hand motif, which supplies an
electronegative environment for ion coordination. After calcium
binding, hydrophobic methyl groups from methionine residues become
exposed on the protein via conformational change. This presents
hydrophobic surfaces, which can in turn bind to Basic Amphiphilic
Helices (BAA helices) on the target protein. These helices contain
complementary hydrophobic regions. The flexibility of Calmodulin's
hinged region allows the molecule to "wrap around" its target. This
property allows it to tightly bind to a wide range of different target
proteins.(31)
Actin
Actin is a highly-conserved globular (42-kDa) protein found in all

eukaryotic cells ranging from algae and humans. Actin is a monomeric
subunit of two types of filaments, namely, the microfilaments that are
one of the three major components of the cytoskeleton, and thin
filaments that are part of the contractile apparatus in muscle cells.
12
Thus, actin participates in muscle contraction, cell motility, cell division
and cytokinesis, vesicle and organelle movement, cell signaling, and
the establishment and maintenance of cell shape and cell junctions. In
most of these processes actin interacts with cellular membranes. In
vertebrates, three main groups of actin isoforms, alpha, beta, and
gamma have been identified. The alpha actins are found in muscle
tissues and are a major constituent of the contractile apparatus. The
beta and gamma actins co-exist in most cell types as components of
the cytoskeleton, and as mediators of internal cell motility. (32)
Random coil Beta sheet Alpha Helix
Actin
http://en.wikipedia.org/wiki/Actin
Actin exhibits four main functions. It forms the most dynamic one of
the three subclasses of the cytoskeleton, which gives mechanical
support to cells, and hardwires the cytoplasm with the surroundings to
support signal transduction. It allows cell motility (see Actoclampin
molecular motors), phagocytosis of bacteria by macrophages. In
muscle cells, Actin forms the scaffold on which myosin proteins
generate force to support muscle contraction.
In non-muscular cells actin acts as a track for the transport of cargo

such as vesicles and organelles by non-conventional myosins such V
and VI at a rate much faster than diffusion using ATP hydrolysis.
13
Myosin V walks towards the barbed end of actin filaments, while
myosin VI walks toward the pointed end. Most actin filaments are
arranged with the barbed end toward the cellular membrane and the
pointed end toward the cellular interior. This arrangement allows
myosin V to be an effective motor for export of cargos and myosin VI
to be an effective motor for import. (33)
Nucleation and Polymerization
Actin polymerization and depolymerization is necessary in chemotaxis

and cytokinesis. Nucleating factors stimulate actin polymerization and
one such factor is the ARP complex that acts as a barbed end of actin
in its shape to stimulate the nucleation of G-actin (or monomeric
actin). The Arp2/3 complex can also bind to actin filaments at 70º to
form new actin branches of existing actin filaments. Actin filament
binds ATP, and the ATP hydrolysis stimulates destabilization of the
polymer. The growth of actin filaments can be regulated by thymosin
and profilin. Thymosin binds to G-actin to prevent it from polymerizing
while profilin binds to G-actin to promote monomeric addition to the
barbed, plus end. (34)
Microfilament
Individual subunits of actin are known as globular actin (G-actin). G-

actin subunits assemble into long filamentous polymers called F-actin.
Two parallel F-actin strands must rotate 166º in order to layer correctly
on top of each other. This gives the appearance of a double helix with
a diameter of 7nm with a loop of the helix repeating every 37 nm and
gives rise to microfilaments of the cytoskeleton. (35)
Tubulin ( Alpha and Beta )
14
Molecular structure of a tubulin dimer. The α-tubulin subunit is on top,
indicating a microtubule polarity with the (-) end towards the top of the
page. The two GTP subunits are drawn as space filling models, and the
paclitaxel molecule is attached to the β-tubulin subunit and drawn as a
ball and stick model.
Tubulin is a member of a small family of globular proteins and the

most common among the tubulin family are α-tubulin and β-tubulin
(MW 55kd) that make up microtubules. Microtubules are assembled
from dimers of α- and β-tubulin. These subunits are slightly acidic with
an isoelectric point between 5.2 and 5.8. Tubulin is specific to
eukaryotes. Recently, however, the prokaryotic cell division protein
FtsZ was shown to be evolutionarily related to tubulin. (36)
Alpha and Beta tubulin
To form microtubules, A dimer of α- and β-tubulin binds to GTP and

assembles onto the (+) ends of microtubules after which the bound
GTP hydrolyses into GDP through inter-dimer contacts along the
microtubule protofilament. The stability of the dimer depends on
whether the β-tubulin in the dimer in the microtubule remains bound to
GTP or GDP. Dimers bound to GTP tend to assemble into microtubules,
15
while dimers bound to GDP tend to fall apart; thus, this GTP cycle is
essential for the dynamic instability of the microtubule.
When tubulin polymerizes it initially forms protofilaments, microtubules

consist of 13 protofilaments and are 25 nm in diameter, each µm of
microtubule length being composed of 1650 heterodimers.
Microtubules are highly ordered fibers that have an intrinsic polarity.
Tubulin can polymerize from both ends in vitro, however, the rate of
polymerization is not equal. It has therefore become the convention to
call the rapidly polymerizing end the plus-end of a microtubule and the
slowly polymerizing end the minus-end. (37)
Class III β-tubulin is a microtubule element expressed exclusively in

neurons, and is a popular identifier specific for neurons in nervous
tissue. Katanin is a protein complex that severs microtubules at β-
tubulin subunits, and is necessary for rapid microtubule transport in
neurons and in higher plants.(38)
Materials and Methods
To fold polypeptides using J-Pred software, I have divided 6

polypeptides into two groups. The first group contains α-actin, α-tubulin
and β-tubulin, while the second group contains Ubiquitin, calmodulin
and Cu/znSOD. In each group, for each and every protein I searched
the database SWISSPROT for the species that have the protein. I made
a separate list of species for each protein. I then found species in
common for these 3 polypeptides in each group. The complete amino
acid sequence for each polypeptide of corresponding species is
retrieved from SWISSPROT website in EMBL format. These polypeptide
sequences were aligned using Multiple sequence alignment method T-
coffee(39). The aligned sequences were converted to Phylip format
(40) using Readseq (biosequence conversion tool) (41). I estimated
the distance matrix on the bases of genetic distance, constructed
phylogenetic tree with the Neighbour-Joining algorithm (42) that was
then visualized using MEGA (43).
16
The amino acid sequences were submitted to J-PRED ( J-PRED is a
secondary structure prediction program that folds given polypeptide in
their respective secondary structure) (44) to fold into different
secondary structures. From each folded sequence for given specie,
chains for α-helix, β-sheet and the random coil (RC) were dissected in
silico and regions of each conformation were concatenated in the same
order as they appear within the polypeptide. Concatenation is to link
together consecutive series of characters, the direction of
concatenation is from amino terminal to carboxy terminal. The
resultant concatenates of α helix, β sheet and RC were subjected to
Multiple sequence alignment by T-coffee to build NJ trees in Phylip and
visualized in MEGA.
Concatenation of secondary structures
Seq- MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDG
Subject to jpred
Seq MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDG
Sec structure -EEEEEE----EEEEEE----HHHHHHHHHHHH--------EEEE--------
E-Beta sheet, H- Alpha Helix, --- Random coil
Concatenation of random coil

MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDG
-EEEEEE----EEEEEE----HHHHHHHHHHHH--------EEEE--------
M LTGK EPSD EGIPPDQQ AGKQLEDG
Concatenated random coil-- MLTGKEPSDEGIPPDQQAGKQLEDG
Concatenation of Alpha helix

Concatenated Alpha Helix— TIENVKAKIQDK
17
Concatenation of Beta sheet
QIFVKT TITLEV RLIF

Concatenated Beta sheet-- QIFVKTTITLEVRLIF
The taxonomic ids of all the species were submitted to taxonomic

browser of NCBI which will make the taxonomic tree which will act as
the Benchmark tree. I then used the Taxonomic Fidelity algorithm
(Milner & Modak, unpublished, 2009), to compare all tree topologies for
entire polypeptides and concatenates of α-helix, β-sheet and RC with
the taxonomic tree built in NCBI Taxonomic browser for given set of
species. List out the species for each polypeptide in the group
separately from SWISSPROT database
List the species common for each group separately
Download sequences of the list of species for the

polypeptide from SwissProt/ EMBL/ GeneBank, etc
Multiple Sequence Alignment by T-coffee
Estimate evolutionary distances by PHYLIP

PROTDIST program
Uniparametric 2D Phylogenetic trees by PHYLIP

Steps Involved for construction of joining
Neighbour complete sequence Phylogenetic trees
program
Assessment of Taxonomic Fidelity

18
Steps Involved in construction of concatenated Sequence trees
List out the species for each polypeptide in the group

separately from SWISSPROT database
List the species common for each group separately
19
Submit the sequences to Jpred (Secondary Structure
Prediction server).
Concatenate sequentially a.a. sequence fragments forming

Alpha Helix, all Beta sheet and all Random coil separately
to create 3 sub-polypeptides and make a database.
Make database for the sequences which lie in the region

of Alpha helix, Beta Sheet & Random coil.
Multiple Sequence Alignment by T-coffee
Estimate evolutionary distances by PHYLIP
Uniparametric 2D Phylogenetic trees by PHYLIP.

Export trees for entire sequence, and concatenates of
random coil, alpha helix, beta sheet sequences for a
proteinAssessment of Taxonomic
to newick format by usingFidelity
MEGA MEGA
Fidelity analysis of the trees Using the Taxonomic Fidelity algorithm

(Milner & Modak, unpublished, 2009)
Prepare a word file containing name of the Trees of a
protein
Prepare the Cluster definition file from the NCBI trees
As per the algorithm submit the files ( tree file and

cluster definition file) to the program
Result file will show marks to each cluster ( In between 20

0 to 1 )
Software’s/Tools Used
Multiple Sequence Alignment

http://www.ebi.ac.uk/Tools/t-coffee/index.html
T-coffee (Tree-based Consistency Objective Function For alignment
Evaluation) is a multiple sequence alignment software using a
progressive approach. (45) It generates a library of pairwise
alignments to guide the multiple sequence alignment. It can also
combine multiple sequences alignments obtained previously and in the
latest versions can use structural information from PDB(PDB-The
Protein Data Bank (PDB) is a repository for the 3-D structural data of
large biological molecules, such as proteins and nucleic acids.) files
(3D-Coffee). It produces alignment in the aln format Clustal (Clustal- It
is a widely used multiple sequence alignment computer program which
accepts the file in aln format) by default, but can also produce PIR(PIR-
Protein Information Resource, a major protein information database
and bioinformatics resource.), and FASTA format(FASTA format (a.k.a.
Pearson format) is a text-based format for representing either
nucleotide sequences or peptide sequences, in which base pairs or
amino acids are represented using single-letter codes. The format also
allows for sequence names and comments to precede the sequences)
The most common input formats are supported (FASTA,PIR).
Secondary structure prediction Tool (Jpred)

www.compbio.dundee.ac.uk/www-jpred/
Jpred is a Protein Secondary Structure Prediction server and has been
in operation since approximately 1998(46) .It is a web server that
takes a protein sequence or multiple alignment of protein sequences,
21
and from these predicts secondary structure using a neural network
called Jnet. The server accepts two inputs types, a family of aligned
protein sequences or a single protein sequence. If a single protein
sequence is submitted a automatic process creates a multiple
sequence alignment, prior to prediction( 47).
Six different prediction methods using a different heuristic based upon

linear Discrimination DSC, nearest neighbours NNSSP, jury decision
neural network PHD, consensus single sequence method combination
MULPRED , hydrogen bonding propensities PREDATOR, conservation
number weighted prediction ZPRED are then run, and the result from
each method are combined into single file format. Predictions and
corresponding sequence alignment are rendered in coloured HTML,
Java and Postscript.
The consensus prediction achieved an average Q3 score of 72.9%,

where Q3 is the percentage of residues predicted correctly for the
three conformational states, strand, helix and loop. The prediction is
the definition of each amino acid residue into either alpha helix ('H'),
beta sheet ('E') or random coil ('-') secondary structures.
MEGA
www.megasoftware.net/
Molecular evolutionary genetics analysis (MEGA) software with its focus
on facilitating the exploration and analysis of the DNA and protein
sequence variation from an evolutionary perspective. Currently in its
third major release MEGA4.0 contains facilities for automatic and
manual sequence alignment, web based mining of databases,
inference of the phylogenetic trees, estimation of evolutionary
distances and testing evolutionary hypothesis.
It is designed for comparative analysis of homologous gene sequences
either from multigene families or from different species with a special
emphasis on inferring evolutionary relationships and patterns of DNA
and protein evolution. MEGA provides many convenient facilities for
the assembly of sequence data sets from files or web-based
repositories, and it includes tools for visual presentation of the results
obtained in the form of interactive phylogenetic trees and evolutionary
distance matrices.(48)
PHYLIP
http://evolution.genetics.washington.edu/phylip.html
PHYLIP (the PHYLogeny Inference Package) is a package of programs

for inferring phylogenies (evolutionary trees). It is available free over
22
the Internet. The programs are controlled through a menu, which asks
the users which options they want to set, and allows them to start the
computation. Most of the programs look for the data in a file called
"infile" -- if they do not find this file they then ask the user to type in
the file name of the data file.Output is written onto special files with
names like "outfile" and "outtree". Trees written onto "outtree" are in
the Newick format. There is no mouse-windows interface for PHYLIP
(49)
Tree view
http://taxonomy.zoology.gla.ac.uk/rod/treeview.html
TreeView is a simple program for displaying phylogenies. It runs on
both Apple Macintosh and Windows PCs, using almost identical
interfaces. It can read many different tree file formats (including
NEXUS, PHYLIP, Hennig86, NONA, MEGA, and ClustalW/X).(50)
Fidelity Algorithm
The Fidelity Algorithm compares all positions on the taxonomic
dendrograms as benchmark to the observed clades or clusters in a
phylogenetic trees based on quantifiable parameters, such as MSA of
nucleic acidss and polypeptides (Milner and Modak, 2009).
Result and Discussion

I made a database in which group 1 has 38 species which contain
Ubiquitin, Calmodulin and Cu/znSOD and group 2 has 44 species that
contain Actin, Alpha tubulin and Beta tubulin respectively. For each
protein containing list of species according to the group to which it
belong I have generated three sub directories, each one for the
concatenated secondary structure sequences. I have conducted Multiple
sequence alignment for the directory and its 3 subdirectory seperately
using T-coffee and constructed phylogenetic trees. The fidelity of each
species-cluster/clade is examined against the benchmark taxonomic
tree built for the given set of species used in the study. Table 1 shows
that for Ubiquitin random coil is most conserved while beta sheet is least
conserved. For calmodulin which do not have beta sheet sequence
random coil forms most conserved secondary structure. Similar
patterns are observed in case of Actin and Beta tubulin i.e random coil is
most conserved while beta sheet is least. Unlike above mentioned
23
proteins Alpha tubulin and Cu/znSOD shows that beta sheet as most
conserved secondary structure. In none of the proteins alpha helix has
highest score, its scores is between random coil and beta sheet. It shows
that the extent of conservation of alpha helix is intermediate between
random coil and alpha helix for the proteins under study. It appears that
in most cases the sequences in both α-helix and random coil are the
most conserved while β-sheet sequences are least conserved.
I conclude that the stretches of amino acids in the alpha helix
and random coil have a significant role to play in defining the
phylogenetic status. It also appears that beta sheets play the least
significant role in deciding the evolutionary status of a species.
Per cent Taxonomic Fidelity* of polypeptide secondary structures

subjected to multiple sequence alignment and construction of
phylogenetic trees.
Polypeptide Full seq Alpha Helix Random coil Beta sheet

Ubiquitin 59.5 49.45 59.9 41.58
Calmodulin 57.87 57.51 58.22 --------
Cu/znSOD 65.03 ------- 62.16 63.22
Actin 64.88 63.35 65.22 61.33
Alpha 63.61 60.08 58.72 65.47
tubulin
Beta tubulin 66.80 64.67 67.52 62.22
* Taxonomic Fidelity Algorithm. M. Milner & S.P. Modak (2008)
24
CONCLUSION
• From the study conducted on 6 proteins, I conclude that random
coil is most conserved sequence in a way that it doesn’t take
part in evolution.
• Beta sheet has significant role in evolution
• It also appears that beta sheets play the least significant role in
deciding the evolutionary status of a species.
• Alpha helix is intermediate between random coil and beta sheet
for its contribution to evolution.
BIBLIOGRAPHY
1. Futuyma, Douglas J. (2005). Evolution. Sunderland,
Massachusetts: Sinauer Associates.
2. Worth CL, Gong S, Blundell TL, Structural and functional
constraints in the evolution of protein families
Structural and functional constraints in the evolution of protein families
Worth CL, Gong S, Blundell TL. Biochemistry Department, University
of Cambridge, UK.
Structural and functional restraints in the evolution of protein families
and superfamilies.Gong S, Worth CL, Bickerton GR, Lee S,
25
Tanramluk D, Blundell TL. Department of Biochemistry, University
of Cambridge, Cambridge CB2 1GA, UK.
Hubbard, T. J., and T. L. Blundell. 1987. Comparison of solvent-
inaccessible cores of homologous proteins: definitions useful for
protein modelling. Protein Eng 159-171.
3. Pauling L, Corey RB, Branson HR (1951). "The structure of
proteins; two hydrogen-bonded helical configurations of the
polypeptide chain". Proc Natl Acad Sci USA 37 (4): 205–211.
4. Neurath, H (1940). "Intramolecular folding of polypeptide chains
in relation to protein structure". Journal of Physical Chemistry
44: 296–305.
5. Bull, J.J. and H.A. Wichman. (2001) Applied evolution. Ann.
Rev. Ecol. System. 16:183-217.
6. Neighbour Joining Method (Saitou and Nei, 1987).
7. Bruno WJ, Socci ND, Halpern AL . Weighted neighbor joining: a
likelihood-based approach to distance-based phylogeny
reconstruction. Mol Biol Evol. 2000 Jan;17(1):189-97.
8. Olivier Gascuel* and Mike Steel Neighbor-Joining
RevealedMolecular Biology and Evolution 2006 23(11):1997-
2000; doi:10.1093/molbev/msl072
9. Desper, Richard, Gascuel, Olivier Distance-based Phylogeny
Reconstruction (Optimal Radius), **1999, Atteson; 2005, Elias,
Lagergren (2008)
10. Ranwez, Vincent Gascuel, Olivier Improvement of Distance-
Based Phylogenetic Methods by a Local Maximum Likelihood
Approach Using... (2002)
11. Camin, J. H., and R. R. Sokal. 1965. A method for deducing
branching sequences in phylogeny. Evolution 19: 311-326.
12. Felsenstein, J. 1981a. Evolutionary trees from DNA
sequences: a maximum likelihood approach. J. Molecular
Evolution 17: 368-376.
13. John P. Huelsenbeck and Fredrik Ronquist MRBAYES: Bayesian
inference of phylogenetic trees Bioinformatics Vol. 17 no. 8
2001Pages 754-755.
14. David L. Swofford, Florida State University PAUP*:
Phylogenetic Analysis Using Parsimony (and Other Methods) 4.0
Beta.
15. Gonzalo Giribet TNT: Tree Analysis Using New Technology
Systematic Biology 2005 54(1):176-178.
16. R.R Sokal Phenetic Taxonomy; Theory and Methods,
Annual Review of Ecology and systematics.
17. Harvey P.H., and M.D. Pagel. 1991. The Comparative
Method in Evolutionary Biology. Oxford University Press, Oxford
and New York. 239 pp.
18. http://en.wikipedia.org/wiki/Ubiquitin
19. Case Study: Ubiquitin Eduardo Cruz-Chu and JC Gumbart.
26
20. Essays in Biochemistry, Volume 41 (2005): The Ubiquitin-
Proteasome System (Portland Press).
21. McCord JM, Fridovich I (1988). "Superoxide dismutase: the first
twenty years (1968-1988)". Free Radic. Biol. Med. 5 (5-6): 363–
9.
22. http://en.wikipedia.org/wiki/Superoxide_dismutase
23. Borgstahl GE, Parge HE, Hickey MJ, Beyer WF Jr, Hallewell RA,
Tainer JA (1992). "The structure of human mitochondrial
manganese superoxide dismutase reveals a novel tetrameric
interface of two 4-helix bundles.". Cell 71 (1): 107–18.
24. Barondeau DP, Kassmann CJ, Bruns CK, Tainer JA, Getzoff ED
(2004). "Nickel superoxide dismutase structure and
mechanism". Biochemistry (25): 8038–47.
25. Tainer JA, Getzoff ED, Richardson JS, Richardson DC (1983).
"Structure and mechanism of copper, zinc superoxide
dismutase.". Nature (5940): 284–7.
26. http://en.wikipedia.org/wiki/Calmodulin.
27. Stevens FC (1983). "Calmodulin: an introduction". Can. J.
Biochem. Cell Biol. 61 (8): 906–10.
28. Chin D, Means AR (2000). "Calmodulin: a prototypical calcium
sensor". Trends Cell Biol. 10 (8): 322–8
29. http://en.wikipedia.org/wiki/Actin
30. Kabsch W, Vandekerckhove J Structure and function of
actin. Annu Rev Biophys Biomol Struct. 1992;21:49-76.
31. Tobacman LS, Korn ED. The kinetics of actin nucleation and
polymerization. J Biol Chem. 1983 Mar 10;258(5):3207-14.
32. Higaki T, Sano T, Hasezawa S. Actin microfilament
dynamics and actin side-binding proteins in plants. Curr Opin
Plant Biol. 2007 Dec;10(6):549-56.
33. Nogales E, Downing KH, Amos LA, Löwe J (June 1998). "Tubulin
and FtsZ form a distinct family of GTPases". Nat Struct Biol 5
(6): 451–8.
34. Heald R, Nogales E (January 2002). "Microtubule dynamics". J
Cell Sci 115 (Pt 1)
35. McNally FJ, Vale RD (November 1993). "Identification of katanin,
an ATPase that severs and disassembles stable microtubules".
Cell 75 (3): 419–29.
36. Notredame C, Higgins DG, Heringa J. T-Coffee: A novel method
for fast and accurate multiple sequence alignment. J Mol Biol.
2000 Sep 8;302(1):205-17
37. http://www.bioinformatics.uthscsa.edu/www/phylip/
38. Readseq by D.G. Gilbert, 2.1.26 (18-Oct-2007) Software at
http://iubio.bio.indiana.edu/soft/molbio/readseq/java/
39. Saitou and Nei, 1987 Neighbour Joining Method
27
40. Tamura K, Dudley J, Nei M, Kumar S.MEGA4: Molecular
Evolutionary Genetics Analysis (MEGA) software version 4.0.Mol
Biol Evol. 2007 Aug;24(8):1596-9. Epub 2007 May 7.
41. Cole C, Barber JD & Barton GJ. Nucleic Acids Res. 2008
42. Feng DF, Doolittle RF. Progressive sequence alignment as a
prerequisite to correct phylogenetic trees. J Mol Evol.
1987;25(4):351-60.
43. Christian Cole, Jonathan D. Barber and Geoffrey J. Barton*
The Jpred 3 secondary structure prediction server
44. Cuff JA, Barton GJ. Application of multiple sequence alignment
profiles to improve protein secondary structure prediction.
Proteins. 2000 Aug 15;40(3):502-11.
45. . Kumar S, Nei M, Dudley J, Tamura K, MEGA: a biologist-centric
software for evolutionary analysis of DNA and protein
sequences. Brief Bioinform. 2008 Jul;9(4):299-306. Epub 2008
Apr 16.
46. Joseph Felsenstein (August, 2003). Inferring Phylogenies.
47. Page RD. TreeView: an application to display phylogenetic trees
on personal computers. Comput Appl Biosci. 1996
Aug;12(4):357-8.
48. Milner (2008) Ph.D. Thesis, University of Karnataka,
Dharwad, in preparation.
APPENDIX
Phylogenetic Trees of various proteins are listed in the appendix. These
phylogenetic trees were constructed using NJ method.
Figure 1 Ubiquitin Full seq
28
CULQ U
BO TFB
TRYCR
APLCA
CA NA L
NEO FI
PA RBR
YEA ST
PICG U
SCHPO
NEUCR
CHA GB
KLULA
ARA TH
PEA
HELA N
M AIZE
ORY SJ
SO LLC
SO Y BN
W HEA T
CA EEL
DICDI
DA NRE
SHEEP
M OUSE
RA BIT
XENLA
BO VIN
CA VPO
CHICK
DRO M E
ONCM Y
SA LSA
HUM AN
PA NTR
RA T
AEDA E
Figure 2 Ubiquitin concat of random coil seq
29
CULQU
BOTFB
CAEEL
WHEAT
ORYSJ
MAIZE
HELAN
ARATH
SOYBN
SOLLC
PEA
TRYCR
PICGU
NEUCR
PARBR
KLULA
NEOFI
DICDI
XENLA
RAT
RABIT
MOUSE
HUMAN
DROME
DANRE
CAVPO
SHEEP
CHICK
PANTR
SALSA
ONCMY
BOVIN
SCHPO
YEAST
CANAL
CHAGB
APLCA
AEDAE
Figure 3 Ubiquitin concat of alpha helix seq
30
CULQU
BOTFB
TRYCR
MOUSE
CAVPO
ONCMY
HUMAN
SALSA
RABIT
RAT
SHEEP
BOVIN
DICDI
CHICK
DROME
PANTR
XENLA
CAEEL
APLCA
ARATH
HELAN
MAIZE
ORYSJ
PEA
WHEAT
CANAL
PICGU
KLULA
PARBR
YEAST
SCHPO
CHAGB
NEOFI
NEUCR
SOLLC
SOYBN
DANRE
AEDAE
Figure 4 Ubiquitin concat of beta sheet seq
31
CHAGB
SALSA
HUMAN
ORYSJ
SHEEP
CHICK
PICGU
MOUSE
RABIT
DICDI
NEOFI
MAIZE
PEA
SOYBN
XENLA
APLCA
PARBR
HELAN
NEUCR
RAT
WHEAT
BOTFB
CULQU
BOVIN
CAEEL
CANAL
CAVPO
DROME
ARATH
ONCMY
SCHPO
YEAST
DANRE
TRYCR
KLULA
PANTR
SOLLC
AEDAE
Figure 5 Calmodulin Full seq
32
CULQU
APLCA
DROME
CAEEL
ARATH
HELAN
PEA
ORYSJ
MAIZE
WHEAT
SOYBN
SOLLC
TRYCR
BOTFB
NEOFI
PARBR
CHAGB
NEUCR
CANAL
PICGU
KLULA
YEAST
SCHPO
DICDI
PANTR
BOVIN
MOUSE
RAT
SHEEP
CAVPO
SALSA
CHICK
DANRE
HUMAN
ONCMY
RABBIT
XENLA
AEDAE
Figure 6 Calmodulin random coil seq
33
CULQU
DROME
APLCA
CAEEL
ARATH
PEA
MAIZE
ORYSJ
HELAN
WHEAT
SOLLC
BOTFB
PARBR
NEOFI
CHAGB
NEUCR
CANAL
PICGU
KLULA
YEAST
SCHPO
TRYCR
DICDI
SOYBN
ONCMY
SALSA
PANTR
CHICK
DANRE
BOVIN
MOUSE
RABBIT
RAT
SHEEP
CAVPO
HUMAN
XENLA
AEDAE
Figure 7 Calmodulin alpha helix seq
34
CAEEL
CULQU
APLCA
DROME
ARATH
SOYBN
HELAN
PEA
ORYSJ
MAIZE
WHEAT
SOLLC
TRYCR
BOTFB
CHAGB
NEUCR
NEOFI
PARBR
CANAL
PICGU
KLULA
YEAST
SCHPO
DICDI
BOVIN
DANRE
MOUSE
RAT
CAVPO
CHICK
SALSA
HUMAN
ONCMY
SHEEP
PANTR
RABBIT
XENLA
AEDAE
Figure 8 Cu/Zn Super oxide Dismutase Full seq
35
CULQU
DROME
APLCA
ARATH
WHEAT
HELAN
SOLLC
PEA
SOYBN
ORYSJ
MAIZE
CAEEL
DANRE
ONCMY
SALSA
BOVIN
SHEEP
HUMAN
PANTR
RABBIT
CAVPO
MOUSE
RAT
PICGU
TRYCR
XENLA
CHICK
BOTFB
NEOFI
CHAGB
NEUCR
PARBR
CANAL
KLULA
YEAST
SCHPO
DICDI
AEDAE
Figure 9 Cu/Zn Super oxide Dismutase random coil seq
36
CULQU
DROME
CHICK
CAEEL
APLCA
ARATH
HELAN
ORYSJ
SOYBN
SOLLC
MAIZE
PEA
WHEAT
BOTFB
YEAST
CANAL
NEOFI
PARBR
CHAGB
NEUCR
KLULA
SCHPO
DICDI
PICGU
TRYCR
BOVIN
SHEEP
HUMAN
PANTR
RABBIT
CAVPO
MOUSE
RAT
DANRE
ONCMY
SALSA
XENLA
AEDAE
Figure 10 Cu/Zn Super oxide Dismutase beta sheet seq
37
CULQU
DROME
TRYCR
APLCA
ARATH
WHEAT
DICDI
BOTFB
NEOFI
CHAGB
NEUCR
KLULA
PARBR
CANAL
SCHPO
YEAST
CAEEL
PICGU
HELAN
PEA
SOYBN
SOLLC
MAIZE
ORYSJ
DANRE
ONCMY
SALSA
XENLA
CHICK
HUMAN
PANTR
BOVIN
SHEEP
MOUSE
RAT
CAVPO
RABBIT
AEDAE
Figure 11 Actin full seq
38
TRYCR
LEIMA
EUG GR
GIALA
PLAYO
PLAF7
TO XGO
KARM I
ARATH
DAUCA
HORVU
W HEAT
ORYSJ
PEA
GOSHI
MAIZE
CHLRE
ASPFU
BOTFB
NEO FI
NEUCR
CHAG B
CANAL
YEAS7
YEAST
SCHPO
PNECA
DICDI
AEDAE
BOM M O
CAEEL
DANRE
SALSA
XENLA
XENTR
CHICK
HUM AN
MACFA
MO USE
PANTR
PIG
RAT
9TRYP
Figure 12 Actin random coil seq
39
TRYCR
LEIMA
GIALA
PLAYO
EUGGR
PLAF7
KARM I
TOXGO
CHLRE
ARATH
DAUCA
PEA
GOSHI
HORVU
W HEAT
MAIZE
ORYSJ
ASPFU
BOTFB
NEOFI
NEUCR
SCHPO
CHAGB
CANAL
YEAS7
YEAST
PNECA
AEDAE
BOM MO
CAEEL
DANRE
DICDI
SALSA
XENLA
XENTR
CHICK
HUM AN
MACFA
MOUSE
PANTR
PIG
RAT
9TRYP
Figure 13 Actin alpha helix seq
40
TRYCR
LEIM A
GIALA
PLAYO
EUGGR
KARM I
TOXGO
PLAF7
ASPFU
NEUCR
BOTFB
NEOFI
CHAGB
CANAL
YEAS7
YEAST
CHLRE
ARATH
HORVU
W HEAT
M AIZE
PEA
ORYSJ
GOSHI
BOM MO
DAUCA
DICDI
CAEEL
DANRE
AEDAE
XENTR
SALSA
XENLA
PNECA
SCHPO
CHICK
HUM AN
M ACFA
M OUSE
PANTR
PIG
RAT
9TRYP
Figure 14 Actin beta sheet
41
LEIMA
TRYCR
EUGGR
GIALA
CHLRE
ARATH
MAIZE
DAUCA
PEA
HORVU
W HEAT
ORYSJ
GOSHI
PLAF7
TOXGO
ASPFU
PLAYO
CANAL
BOTFB
NEUCR
NEOFI
CHAGB
YEAS7
YEAST
SCHPO
PNECA
BOM M O
KARM I
CAEEL
SALSA
XENLA
AEDAE
DANRE
DICDI
XENTR
CHICK
HUM AN
MACFA
MOUSE
PANTR
PIG
RAT
9TRYP
Figure 15 Alpha tubulin Full seq
42
TRYCR
LEIMA
EUGGR
ARATH
HORVU
MAIZE
ORYSJ
DAUCA
GOSHI
PEA
W HEAT
CHLRE
KARMI
PLAF7
PLAYO
TOXGO
GIALA
ASPFU
NEOFI
BOTFB
CHAGB
SCHPO
PNECA
CANAL
YEAS7
YEAST
NEUCR
DICDI
CAEEL
SALSA
XENLA
CHICK
MACFA
XENTR
AEDAE
BOMMO
DANRE
HUMAN
MOUSE
PANTR
PIG
RAT
9TRYP
Figure 16 Alpha tubulin random coil seq
43
TRYCR
LEIM A
EUG GR
ARATH
HO RVU
MA IZE
ORYSJ
GO SHI
PEA
W HEAT
DA CAU
PLA F7
PLA YO
CHLRE
KA RM I
TO XG O
GIA LA
ASPFU
NEO FI
BO TFB
CHA GB
SCHPO
CA NAL
YEA S7
YEA ST
PNECA
NEUCR
DICDI
SA LSA
CA EEL
XENLA
AEDA E
BO M M O
CHICK
MA CFA
XENTR
DA NRE
PIG
HUM AN
MO USE
PA NTR
RA T
9TRYP
Figure 17 Alpha tubulin alpha helix seq
44
TRYCR
LEIMA
CHLRE
EUGGR
KARM I
TOXGO
ARATH
HORVU
MAIZE
ORYSJ
DACAU
W HEAT
PEA
GOSHI
DICDI
PLAF7
PLAYO
GIALA
ASPFU
NEOFI
BOTFB
CHAGB
NEUCR
PNECA
CANAL
YEAST
YEAS7
SCHPO
CAEEL
XENLA
MACFA
XENTR
DANRE
AEDAE
BOMM O
SALSA
CHICK
PIG
HUMAN
MOUSE
PANTR
RAT
9TRYP
Figure 18 Alpha tubulin beta sheet seq
45
LEIMA
TRYCR
ARATH
EUGGR
GOSHI
HORVU
ORYSJ
MAIZE
CHLRE
PEA
W HEAT
DAUCA
DICDI
KARMI
PLAF7
PLAYO
TOXGO
ASPFU
NEOFI
CHAGB
BOTFB
CANAL
YEAS7
YEAST
GIALA
NEUCR
SCHPO
CAEEL
PNECA
XENLA
XENTR
AEDAE
BOMMO
DANRE
HUMAN
MOUSE
PANTR
RAT
PIG
CHICK
MACFA
SALSA
9TRYP
Figure 19 Beta tubulin full seq
46
TRYCR
LEIM A
EUG GR
KARM I
PLAF7
PLAYO
TO XG O
ARATH
DAUCA
GO SHI
PEA
HO RVU
M AIZE
W HEAT
ORYSJ
CHLRE
ASPFU
NEO FI
BO TFB
CHAG B
NEUCR
CANAL
YEAS7
YEAST
SCHPO
PNECA
DICDI
GIALA
BO M M O
CAEEL
CHICK
M ACFA
AEDAE
PIG
DANRE
SALSA
XENLA
XENTR
HUM AN
M OUSE
PANTR
RAT
9TRYP
Figure 20 Beta tubulin random coil seq
47
TRYCR
LEIM A
EUG GR
KARM I
PLAF7
PLAYO
TO XG O
CHLRE
GIALA
ARATH
DAUCA
PEA
GO SHI
M AIZE
W HEAT
HO RVU
ORYSJ
ASPFU
NEO FI
BO TFB
CHAG B
NEUCR
CANAL
YEAS7
YEAST
SCHPO
PNECA
DICDI
AEDAE
BO M M O
CAEEL
CHICK
M ACFA
PIG
DANRE
SALSA
XENLA
XENTR
M OUSE
HUM AN
PANTR
RAT
9TRYP
Figure 21 Beta tubulin Alpha helix seq
48
TRYCR
LEIM A
EUG GR
ARA TH
DA UCA
M AIZE
W HEA T
GO SHI
HO RVU
PEA
ORYSJ
CHLRE
PLA F7
PLA YO
TO XG O
KA RM I
ASPFU
NEO FI
BO TFB
NEUCR
CHA GB
PNECA
CA NA L
YEA S7
YEA ST
SCHPO
DICDI
GIA LA
AEDA E
BO M M O
CHICK
M ACFA
CA EEL
PIG
DA NRE
SA LSA
XENLA
XENTR
PA NTR
HUM A N
M OUSE
RA T
9TRYP
Figure 22 Beta tubulin Beta sheet seq
49
TRYCR
LEIMA
DICDI
ARA TH
GO SHI
W HEAT
HO RVU
PEA
MAIZE
ORYSJ
DAUCA
EUG GR
CHLRE
KARM I
PLAF7
PLAYO
TO XGO
GIALA
CHICK
MACFA
AEDAE
ASPFU
NEO FI
BO TFB
CHA GB
NEUCR
CANAL
YEAS7
YEAST
PNECA
BO MM O
CAEEL
SCHPO
PIG
DANRE
XENLA
XENTR
SALSA
HUM A N
MO USE
PANTR
RAT
9TRYP
Per cent Taxonomic Fidelity* of polypeptide secondary structures

subjected to multiple sequence alignment and construction of
phylogenetic trees.
50
Polypeptide Full seq Alpha Helix Random coil Beta sheet
Ubiquitin 59.5 49.45 59.9 41.58
Calmodulin 57.87 57.51 58.22 --------
Cu/znSOD 65.03 ------- 62.16 63.22
Actin 64.88 63.35 65.22 61.33
Alpha 63.61 60.08 58.72 65.47
tubulin
Beta tubulin 66.80 64.67 67.52 62.22
* Taxonomic Fidelity Algorithm. M. Milner & S.P. Modak (2008)
OBJECTIVES
• To determine the Taxonomic significance of phylogentic trees.

• To determine the phylogenetic significance of polypeptide chains
covering the three conformations (alpha Helix, Beta sheet and
random coil).
• To access the fidelity of the Phylogenetic tree with that of NCBI
trees for the same polypeptides with same set of species.
51
52

FFinal Draft of MSC Project Report

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FFinal Draft of MSC Project Report

Uploaded by

Copyright:

Available Formats

Introduction

Evolution is the change in the inherited traits, or variations, of a

Character-based methods examine each column of the alignment

Protein sequences diverge during evolution, but we do not know

In this study I test the possibility that discrete amino acid

A phylogenetic tree, also called an evolutionary tree, shows the

Tree Structure and Terminology:

Root: is the common ancestor of all taxa.

Topology: is the branching pattern.

Distance scale: scale representing the number of differences between

Methods of phylogenetic analysis:

There are two major groups of analyses to examine phylogenetic

Phenetic methods(19): trees are calculated by similarities of

Cladistic methods(20): trees are calculated by considering the

Phenetic methods based on distances:

Starting from an alignment of DNA sequences, pair wise distances are

From the obtained distance matrix, a phylogenetic tree is calculated

Example Steps in creating the phylogenetic tree based on distance

B. Distances between sequences, the number of steps required to

D. The assumed phylogenetic tree for the sequences A-D showing

UPGMA clustering (Unweighted Pair Group Method using Arithmetic

Cladistic methods based on Parsimony:

Cladistic methods based on Maximum Likelihood:

This method also uses each position in an alignment, evaluates all

Protein sequences used in this study.

Ubiquitin is a small, highly-conserved regulatory protein expressed

Ubiquitination Ubiquitylation is a process of tagging a protein with

Activation: Ubiquitin is activated in two-steps by an E1 ubiquitin-

Transfer of ubiquitin from E1 to the active site cysteine of a ubiquitin-

Finally, the ubiquitylation cascade creates an isopeptide bond between

Cu/zn Superoxide Dismutase

The SOD-catalysed dismutation of superoxide may be written with the

• M(n+1)+ − SOD + O2− → Mn+ − SOD + O2

where M = Cu (n=1) ; Mn (n=2) ; Fe (n=2) ; Ni (n=2).

In this reaction the oxidation state of the metal cation oscillates

There are three major families of superoxide dismutase, depending on

Prokaryotes and protists use Iron or manganese enzyme (e.g. E. coli)

Ni-SOD, found in prokaryotes, is a hexameric structure built from right-

Beta sheet Random coil

Crystallographic structure of the human SOD1 enzyme (rainbow

Random coil Alpha

Calmodulin is a small, acidic protein approximately 148 amino acids

CaM mediates processes such as inflammation, metabolism, apoptosis,

Actin is a highly-conserved globular (42-kDa) protein found in all

Random coil Beta sheet Alpha Helix

In non-muscular cells actin acts as a track for the transport of cargo

Nucleation and Polymerization

Actin polymerization and depolymerization is necessary in chemotaxis

Individual subunits of actin are known as globular actin (G-actin). G-

Tubulin ( Alpha and Beta )

Tubulin is a member of a small family of globular proteins and the

Alpha and Beta tubulin

To form microtubules, A dimer of α- and β-tubulin binds to GTP and

When tubulin polymerizes it initially forms protofilaments, microtubules

Class III β-tubulin is a microtubule element expressed exclusively in

Materials and Methods

To fold polypeptides using J-Pred software, I have divided 6

Concatenation of secondary structures

Concatenation of random coil

M LTGK EPSD EGIPPDQQ AGKQLEDG

Concatenated random coil-- MLTGKEPSDEGIPPDQQAGKQLEDG

Concatenation of Alpha helix