Multiple Alignment PDF

Multiple
Sequence Alignment
Colin Dewey
BMI/CS 576
www.biostat.wisc.edu/bmi576/
colin.dewey@wisc.edu
Fall 2016
Key concepts
The Multiple Sequence Alignment Problem

Scoring Multiple Sequence Alignments
Scoring an alignment of a profile and a sequence
Heuristic Algorithms for Multiple Sequence
Alignment
General strategies
Progressive alignment
Star alignment
Tree-based alignment
Iterative alignment
2
What is multiple sequence alignment?
Given: three or more related biological sequences
Do: identify the subsets of positions across sequences that

are truly related
In other words: find a simultaneous alignment of all input

sequences such that the implied pairwise alignments identify
the truly related positions between each pair of sequences
3
An example multiple sequence alignment
4
Why multiple sequence alignment?
Build phylogenetic trees (next module)

Determine evolutionary relationships between sequences
A multiple sequence alignment can represent a family of
proteins with similar function
Compare new sequence to a family of known proteins
For example the BLOCKS database used for BLOSUM contains several
ungapped alignments for known protein families
Discover common signatures or protein domains among a
group of proteins
Identify genetic variation among individuals of a population
5
The tasks in Multiple Sequence Alignment
Scoring an alignment
Algorithms for creating an alignment
6
Some notation
Let m denote a Multiple Sequence Alignment

mi is the ith column of the alignment m
mij is the ith column and jth row
cia count of residue a in column i
7
Example using notation
i 1
m3 =R
G A R F I E L D T H E F A T C A T
G A R F I E L D T H E - - - C A T j
G
G
A
A
R
R
F
R
I
Y
E
-
L
L
D
I
T
K
H
E
A
D
T
A
-
-
-
-
C
C
A
A
T
T
A
c3 =0
A
c2 =4
m2 m10 H
c10 =3
8
Scoring a Multiple Sequence Alignment
(MSA)
Key issue: how do we score a multiple sequence
alignment?
Usually, we assume that columns of an alignment are
independent
Score(m) = G(m) + S(mi )
i
gap function
score of ith column
For now, we will simplify the score by assuming a linear

gap penalty
Score(m) = S(mi )
i
9
Gap penalty (G)
We will use a simple linear gap penalty function

Penalty for a space: s
Let S(a,b) denote the cost of substituting a by b.
Linear gap penalty can be incorporated into the
substitution matrix
S(a,-)=-s=S(-,a)
S(-,-)=0
10
Two common ways of scoring a multiple
alignment
Entropy based scores
Sum of pairs
11
Entropy of a distribution
A measure of uncertainty of an outcome

For a discrete distribution P(X), where X takes k values x1, .. xk
it is defined as
Xk
H(X) = P (xi )logP (xi )
i=1
Entropy is greatest when we are most uncertain, that is, for a
uniform distribution
Entropy is least when we are most certain, e.g. deterministic
event
12
Score of a column: Entropy based
Score of the ith column of alignment m is
X
S(mi ) = cai log(pia )
a
pia : Probability of character a in column i
cai : Number of occurrences of a in column i
This has an entropy-based interpretation
Xi
Let be a random variable representing a character in
column i
Xi
Consider each entry of column i to be observations of
across multiple independent experiments cai
We estimate by
P (Xi = a) pia =
n
Column score is proportional to the entropy of Xi 13
Scoring an alignment: Entropy based score
High entropy: More uniform distribution/more variability of

characters
Low entropy: Less uniform distribution/less variability of
characters
X
S(mi ) = cai log(pia )
a
14
Scoring of a column: Sum of Pairs
Compute the sum of the pairwise scores

X
S(mi ) = s(mi , mi )
k l
k<l
Iterate over all pairs of rows in the column
s(mi , mi ) as BLOSUM or PAM

k l Substitution score from a substitution/match matrix such
15
Algorithms for performing a Multiple Sequence
Alignment
Dynamic programming
Not practical
Progressive alignment algorithms
Star alignment
Guide tree approach
Iterative alignment algorithms
16
Dynamic Programming (DP) for global multiple
sequence alignment
Assume columns are independent

Score of alignment is sum of column scores
Generalization of methods for pairwise
alignment
consider k-dimensional matrix for k sequences
(instead of 2-dimensional matrix)
each matrix element represents alignment score
for k prefixes (instead of 2 prefixes)
17
Notation for DP
Assume we have k sequences x1 , , xk

i1 denotes the length of the prefix for sequence 1
i2 denotes the length of the prefix for sequence 2

ik denotes the length of the prefix for sequence k
xkik denotes the character at ik position of sequence xk
F: k-dimensional matrix where
F (i1 , i2 , , ik )
denotes the score of the best alignment of the i1, i2.. ik prefixes of the sequences
18
Recall the DP for the pairwise alignment
8
>
<F (i1 1, i2 1) + S(x 1
i1 , xi2 )
2
F (i1 , i2 ) = max F (i1 , i2 1) + S( , x2i2 )

>
:
F (i1 1, i2 ) + S(x1i1 , )
19
DP for Multiple sequence alignment
F(i1 1,!, ik 1) + S(x1i ,!, x ik )
1 k
F(i1, i2 1,!, ik 1) + S(, x i2 ,!, x ikk )

2

F(i1 1, i2 ,!, ik 1) + S(x1i , ,!, x ik )
F(i1,!, ik ) = max 1 k
"
2
F(i
1 2 , i 1,!, ik ) + S(, x i2 ,!, )
"

max score of alignment for the k prefixes
How many items do we need to maximize over? 2k -1
20
DP algorithm is too expensive
For k sequences each of length n

Space complexity: O(nk)
Time complexity: O(nk2k)
21
Heuristic algorithms to Multiple sequence
alignment
Build the alignment of larger number of
sequences from partial alignments of subsets of
sequences
Iterative alignment
Possibly remove some of the aligned sequences
and re-align to see if score improves
22
Key heuristic: Align the most similar sequences

first
Rely on pre-computed pairwise similarity/distance
Pairwise sequence alignments
Algorithms differ in the extent to which the pairwise
similarity influences the final alignments
Two strategies
Star alignment
Tree alignments
Simple (quick and dirty) tree
At each time combine two, possibly singleton, sets of sequences
23
Star Alignment Approach
Given: k sequences to be aligned
x1 , , xk
c
x
pick one sequence as the center
i
= c
for each determine an optimal alignment
x 6 x
i c
between and
x x
Aggregate pairwise alignments
Shift entire columns when incorporating gaps
return: multiple alignment resulting from aggregate
24
Picking the center in star alignments
Two possible approaches:

1. try each sequence as the center, return the best multiple
alignment
2. compute all pairwise alignments and select the string that
xc
maximizes:X
sim(x , x )
i c
i6=c
25
Aligning to an existing partial alignment
Need to treat each partial alignment as a single entity

Partial alignment should not be changed other than gap insertions
Shift entire columns when incorporating gaps
TGTTAAC -TGTTAAC
-TGT AAC -TGT-AAC
-TGT -AC -TGT--AC
ATGT --C ATGT---C
ATGT GGC ATGT-GGC
26
Star Alignment Example
Given:
ATGGCCATT
ATTGCCATT ATTGCCATT
ATGGCCATT
ATCCAATTTT ATC-CAATTTT
ATCTTCTT ATTGCCATT--
ATTGCCGATT
ATTGCCATT
ATTGCCGATT ATCTTC-TT
ATTGCC-ATT ATTGCCATT
27
Aggregate pairwise alignments

present pair Current multiple alignment
ATGGCCATT ATTGCCATT
1. ATTGCCATT ATGGCCATT
ATC-CAATTTT ATTGCCATT--
2. ATTGCCATT-- ATGGCCATT--
ATC-CAATTTT
28
present pair Current multiple alignment
ATCTTC-TT ATTGCCATT--
3. ATTGCCATT ATGGCCATT--
ATC-CAATTTT
ATCTTC-TT--
ATTGCCGATT ATTGCC- A TT--

4. ATTGCC-ATT ATGGCC- A TT--
ATC-CA- A TTTT
ATCTTC- - TT--
shift entire columns ATTGCCG A TT--
when incorporating a gap 29
Comments about Star alignment
Conceptually simple
Dependent only upon pairwise alignments
Does not consider any position-specific
information of the partial multiple sequence
alignment while aligning a new sequence to it
30
Tree-based progressive alignments
Align sequences according to a guide tree

leaves represent sequences
internal nodes represent alignments
Determine alignments from bottom of tree
upward
return multiple alignment represented at the
root of the tree
One common variant: the CLUSTALW algorithm
[Thompson et al. 1994]
31
Tree-based progressive alignment
Depending on the internal node in the tree, we may have
to align a
a sequence with a sequence
a sequence with a partial alignment
a partial alignment with a partial alignment
In all cases we have the option of inserting gaps or
substitutions
For aligning alignments, we will use sum of pairs
scoring
To choose between options we will use an idea similar
to the pairwise sequence alignment case
32
Tree alignment example
Starting sequences
x1 TGTTAAC
x2 TGTAAC
x3 TGTAC
x4 ATGTC
x5 ATGTGGC
Create a guide tree
Using pairwise distances (we will cover this in subsequent lectures)
Approach similar to but simpler than phylogenetic trees
x1 x2 x3 x4 x5
33
Tree Alignment Example
TGTAAC
TGT-AC
TGTTAAC TGTAAC TGTAC ATGTC ATGTGGC

34
TGTAAC ATGT--C
TGT-AC ATGTGGC

35
Aligning two alignments
-TGTAAC
-TGT-AC
ATGT--C
ATGTGGC
TGTAAC ATGT--C
TGT-AC ATGTGGC

36
-TGTTAAC Aligning sequence to alignment
-TGT-AAC
-TGT--AC
ATGT---C
ATGT-GGC
-TGTAAC
-TGT-AC
ATGT--C
ATGTGGC
TGTAAC ATGT--C
TGT-AC ATGTGGC

37
Scoring an alignment of partial alignments
Recall the sum of pairs score for a column i

X
S(mi ) = s(mki , mli )
k<l
Let 1 to n represent sequences from the first alignment

Let n+1 to N represent sequences from the second alignment,
N denotes total number of sequences
Alignment at column i can be written as
X
S(mi ) = s(mki , mli ) Within first alignment
k<ln
X
+ s(mki , mli ) Within second alignment
n<k<lN
X
s(mki , mli ) Between two alignments38
kn,n<lN
Computing the sum of scores for two
alignments
Assume we have two alignments corresponding to
intermediate nodes of the guide tree
Alignment A1 Alignment A2
AAAC AGC
-GAC ACC
Alignment of two alignments = pairwise alignment of
sequences of columns
Filling entry (i, j) of the DP matrix we maximize over
aligning column i in A1 to a column j in A2
aligning column i in A1 to gaps in A2
aligning column j in A2 to gaps in A1
39
Comments about tree-based progressive
alignment
Exploits partial alignment information
But, greedy
The tree might not be correct, that is, reflect an
incorrect ordering of how sequences should be
stacked up in the alignment
Final results prone to errors in alignment
Some positions might be misaligned (that is have a
lower score than if a different ordering is used).
40
Ordering matters
Consider aligning GG, DGG and DGD

1 2
D G D D G D
- G G G G -
Are as good. But when we include DGG

1 2
D G D D G D
- G G G G -
D G G D G G
1 is better than 2, assuming a match score of 2, mismatch score =1, gap penalty=-2
41
Iterative refinement methods
The order of selection of sequences can influence the

alignment
ClustalW overcomes some of these issues but has many heuristics and
parameters
How to avoid committing to a non-optimal pairwise decision?
Revisit alignments
This is the focus of iterative alignments
Basic iterative refinement algorithm
Remove a sequence from the current multiple alignment
Realign the removed sequence back to the multiple alignment
Repeat until removal and realignment of any sequence does not
improve the alignment score
42
Additional notes about the ClustalW algorithm
Tailored to handle very divergent sequences: 25-30%

similarity
Dynamically varies the gap penalties in a position and residue
specific manner
Weight different sequences differently
Closely related sequences need to be down-weighted
Divergent sequences are up-weighted
Dynamically switch between substitution matrices depending
upon the average similarity between sequences being aligned
43
Applying ClustalW to SH3 domain proteins
Nucleic Acids Research, 1994, Vol. 22, No. 22 4679
ASV_vSRC ttfvalydyesrte----t41sfk---itgjr1qivnnt ---------igdwwlahslttg ---------qtgyipsnyvapsd

RSV_vSRC ttfvalydyeswte-----tdl sfk - --kgir lqivnnt ----------g4nl1ahslttg ---------qtgyipsnyvapa4$
H_csRC1 ttfvalyt'esrte-----td'lsfk --- kgerlqivnnt ---------e*gdwwlahslstg ---------qtgyipsnyvapsd
Xl1cSRC1 ttfvalyzdyesrte-----tdlofk - -- lrqivnnt---------e6g4ww1arslssg ---------qtgyipsnyvaps~
H nSRC ttfvaly4yesrte ---- tklsfk---I..kg#rlqivnntrkvd ------vrqgdww1ahslstg ---------qtgyipsnyvaps4
Xl cSRC2 t t fva lydyeeorte-----td1sfr---kger1qivnnt ---------sgdwlarslssg ---------qtgyipenyvapst
ASV_vYES tvfvaly4tyeartt----dGlsfk---kglrfqiinnt -gwasagkgisyaa
C cYES tvfvalydyeartt----d4lsfk---.cg4Wrfqiinnt----------*g4wwearsiatg ---------ktgyipsnyvapa4
HcYESl tifvalyd(yeartt-----edl.sfk---kg fiin-gdwwearsiatg---------kngyipsnyvapa
Xl cYES tVfValyttyeartt-----e41sfr---kg~rfgiinnt ----------eogdwwearsiatg ---------ktgyipsnyvapad
Xl cFYN tlfvaly4yearte---- d:dlsf q--- ~g~,kfqilnssa--e------gdwwearslttg ---------gtgyipsnyvapv
H cFYN tlfvalyIayearte-----ddlafh ---kgekfqilnss---------eog4wwearslttg ---------etgyipsnyvapv
M_cFGR tifvalydyeartg ---- ddlt ft---tg4,kfhilnnt--------- 1ty4wwearslssg ---------hrgyvpsnyvapv
H_cFGR tlfialyd4yearte---- d4ltft --- q"kfhilnnt ---------og4nwearslseg---------ktgcipsnyvap4
Ha_STK tifv&ly4yearie ---- e4lsfk --- ger1qiinta ----------dgdwwyarslitn ---------segyipatyvapek
*RHOK iivvaly4tyeaihh-----e4lsfq---.kg4qxuvvlees----------gewwkarslatr----------kegyipsnyvarv4q
H HCK tivvalydyeaihr---- e4lsfq----kgdqinvvleea----------gewwkarslatk ---------kegyipsnyvarvn
*HLYN divvalypydgihp-----ddlsfk --- kg~kkvleeh----------gewwkakslltk ---------kegfipsnyvakln
HBLK rfvvalfalyaa'vnd ----- 4lqvl --- kgklqvlrst ---------gawwlarslvtg ---------regyvpsnfvapve
H_LSKT nlvialhsyepshd----gqd1gfe---kguMq1ri1eqs----------gewwkaqslttg ---------qegfipfnfvakan
HILCK nlvialhsyepshd-----gdlgf e - - -tgqljerilIeqs----------gewwkaqsttg----------qegfipfnfvakan
FSV vABL nlfvalyafvasgd -----tlisit--kg:~klrvlgynh---------ngewceaqtkng ----------qgvvpsnyitpvn
Din ABLI qlfvalydfqagge---- ng1s1k---kg01qvrilsynk---------sgewceahssgn ----------vgwvpsnyvtpln
C cTKL klvvalydyepthd-----gd1g1k---qgM'k1rv1ees----------gewwraqslttg----------qegliphnfvaxnvn
Ce_sem5/1 mneavael4fqagsp-----delsfk --- rgn__t1kv1nk4d-------- efhwykaeld--g ---------negfipsnyirmnte
ce_sem5/2 kfvqaifdfnpqes ----g:*1afk---tgdvit1in---------kd4pnnwegq1n- -n ---------rrgifpsnyvcpyn
Din_SRCl rvvvs1y4yksr e-----sdlsfmn--- kgdrmnevi4dt ----------sdnwrvvn1ttr ---------gegliplnfvaeer
ASV GAGCRK eyvtralfdfkgn4d g1pk--gilkirlk-ewnem5--rzivyec
C Spca elvialydygeksp---- revtink---.kg4i1t11n --------k------- kv]evn--d ---------rqgfvpaayvkklq
DmnSpca ecvvalydyteksp---- revsmnk--- cgdvltlln ---------snnkdwwkvevn--d--------- rqgfvpaayikkia%
Alignment blocks DinSpcb phvkslfpfgqmm---gtrn11kskt ---------nddwwcvrkdn-g ---------vegfvpanyvreve;
H_PLC rtvkalyaykakrs ---- delfc---rga1ihnvs---------kepggwwkgdygt-r---------iqqyfpsnyvedis
R_PLCII cavkalfdykaqre-----d*ltft---ksaiiqnve-----------kdggwwrgdygg-k ---------kqlwfpsnyveemni
E PLCII cavkalfdykaqre-----deltft --- ksaiiqnve ----------qeggwwrgdygg-k ---------kqlwfpsnyveeumv
correspond to beta H-PLCI cavkalfdykaqre----d*ltfi --- ksaiignve ---------kqeggwwrgdygg-k ---------kqlwfpsnyveeinv

H_RASA/GAp rrvrailpytkvpd----d Ia- - -kg4mf ivhn ---------ele:dgwmwvtnlrtd---------eqgliveidlveevg
Ac M4ILE pqvkalydlydaqtg ---- diltfk- - -e g4t iivhq---------kdPagwwege1n--g ---------krgwvpanyvqdi
strand secondary
Ac-MILC eqaralydfaaenp----de1tfn---egavvtvin ---------ksnpd1wwegeln--g ---------grgvfpasyvelip
H_HS1 isavlydyqgegs-----d:elafd --- pdavitdie ----------v4egvwvrgrch- -g ---------hfglfpanyvklle
H VAV gtakarydfcar4r ----ees01sk---egdjiiki1nkk---------gqqgwwrgeiyg ----------rvgwfpanyveedy
Din_SRC2 klvvalyi1gkaie;g-----gd1svge--kn_aeyevidds ---------gehwwkvkdialg----------nvgyipsnyvqaea
structures R-CSK teciakynfhgtae-----qdlpfc --- kg4lvltiv-avtk---------dpnwykaknikvg----------regiipanyvgkre

H-NCK/l vvvnakfayvaqqe ------1dik- - -Icner1w1lds ----------kswwrvrns-nmn ---------ktgfvpsnyverkn
H_NCK/2 inpayvkfnymnaere-----dels ij- - -ozgtkgaizmIka---------dgwwrgsyn--g ---------qvgwfpsnyvteeg
H NCK/3 hvvqalypfsssnd---- ee1nfe---k-g_4vmndviekp --------enalpewwkcrkin-g ----------vglvpknyvtvznq
H_NCF1/l qtyraianyektsg----sBeMals --- tg4vvevveks ----------sgwwfcqznk--a ---------krgwipasf1ep,l4
H-NCF1/2 epyvaikaytaveg-----devsll --- egeavevihk -l--------1dgwwvirkd--d ---------vtgyfpenmylqksg
H_NCF2/1 eahrvlfgfvpetk-----eelqvnu--- pgnivfvlkkg ---------ndnwatvmfn--g ---------qkglvpcnylepve
H_NCF2/2 sqvealfsyeatgp-----ed1efq---eg4ii1v1skvn ---------eewlegeckg----------kvgifpkvfvedca
Y-ABPI pwataey4lydaaed-----ne1tfv---en4eqkiinie--------- f v4jddlgelkd-g ---------skglfpsniyvslgn
Y_EEMl/l kvikaky7syqaqts----ke1sfmn---egeWffyvsgd ---------e~kdwykasnp'stg ---------kegvvpktyfevft4
YBEEMl/2 lyaivlydfkaeka-----deltty --- vg466lficahh ---------ncewfiakpigrlg---------gpglvpvgfvsiid
C PBO/85 itaialy4yqaagd-----deisfd---pd4iitnie ---------mi4dgwwrgvck--g ---------ryglf panyvelrg-
YCDC25 g'ivvaay4fnypikk-dss-sq1lsvq---ggtiyilnkn ---------esagwwdglvidasngkv -------nrgwfpqnfgrplr
Y_SCD25 dvvectyqyftksr-----nklslr---vgdliyvltkg ---------sngwwdgv1irhsannn=ns1ail----drgwfppsftrsil
y-Fus1 ktytviqdyeprlt-----diiiris ---l1g*kvkilath ---------tgcvknqsivvakrlegvpdlea
OC_CACb favrtnvgynpspgd~vpvmilg,aJfr---pkdflhikeky---------tndwwiglvkctkegibv-----------nedrgfipspgvcldl
DinDL lyva1lf4ydpnrdd-glp-sr1pf--g41i1hvtnas--------- cdd-ewwqarrvlgdneieqgvsrwr
H P55 mnfmraqfd$ydpkkdn-lip-c a 1k-f gdiiqiinkI ---------dsnwwqgrvegsske--------saglipspelqewr
E P85A fgyralypfrrerp-----edlell---pg4vlvvsraalqalgvaigniirc-pqevgwmpglnertr ---------qrgdfpgtyveflg
E P85B ycqyralydykkere-----ediTlh --- lgdiltvnkgslvalgfsdgq*aJ&-peiiigwlngynettg ---------ergdfpgtyveyig
H_P8BE ycyralydykkere-----edidlh --- lg4iltvnkgslvalgfsdgp4a&.-pe4igwlngynettg ---------ergdfpgtyveyig
Sp_STEE fqttaisdyenssn ------ kt--- ag4tiiviev1----- ""-4dgwcdgics--e ---------krgwfptscidssk
H Atk kkvvalydymupina----nalqlr --- kgeyfilees ---------nl1pwwrardkn-g-------- -q-egyipsnyvteae
Proteins share <12% sequence identity

Figure 4. CLUSTAL W alignment of a set of SH3 domains taken from Musacchio et al. (23). Secondary structure assignments for the solved Spectrin (24) and 44
Thomson et al, 1994 Fyn (39) domains are according to DSSP (40). The alignment was generated in two steps using default parameters. After full multiple alignment, the aligned sequences
were realigned. Segments which were correctly aligned in the second pass are underlined. The single misaligned segment in H-P55 and the misaligned residue
in H_NCKI2 are boxed. The sequences are coloured to illustrate significant features. All G (orange) and P (yellow) are coloured. Other residues matching a frequent
occurrence of a property in a column are coloured: hydrophobic = blue; hydrophobic tendency = light blue; basic = red; acidic = purple; hydrophilic = green;
unconserved = white. The alignment figure was prepared with the GDE sequence editor (S.Smith, Harvard University) and COLORMASK (J.Thompson, EMBL).
Summary
Multiple sequence alignment is the problem of finding
corresponding positions among more than two sequences
Scoring function:
Entropy based
Sum of pairs
Algorithms
Progressive
Star
Dependent upon a center
Keep adding all pairs of aligned sequences with the current alignment
Tree
Create an approximate guide tree
Use tree to align the sequences
Iterative
Dont commit to the fixed ordering, revisit the alignment until score does not
change
45

Multiple Alignment PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiple Alignment PDF

Uploaded by

Copyright:

Available Formats

Multiple

The Multiple Sequence Alignment Problem

Given: three or more related biological sequences

Do: identify the subsets of positions across sequences that

In other words: find a simultaneous alignment of all input

Build phylogenetic trees (next module)

Let m denote a Multiple Sequence Alignment

For now, we will simplify the score by assuming a linear

We will use a simple linear gap penalty function

A measure of uncertainty of an outcome

High entropy: More uniform distribution/more variability of

Compute the sum of the pairwise scores

s(mi , mi ) as BLOSUM or PAM

Assume columns are independent

Assume we have k sequences x1 , , xk

F (i1 , i2 ) = max F (i1 , i2 1) + S( , x2i2 )

F(i1, i2 1,!, ik 1) + S(, x i2 ,!, x ikk )

max score of alignment for the k prefixes

How many items do we need to maximize over? 2k -1

For k sequences each of length n

Key heuristic: Align the most similar sequences

Two possible approaches:

Need to treat each partial alignment as a single entity

Aggregate pairwise alignments

present pair Current multiple alignment

ATTGCCGATT ATTGCC- A TT--

Align sequences according to a guide tree

TGTTAAC TGTAAC TGTAC ATGTC ATGTGGC

TGTTAAC TGTAAC TGTAC ATGTC ATGTGGC

Aligning two alignments

TGTTAAC TGTAAC TGTAC ATGTC ATGTGGC

TGTTAAC TGTAAC TGTAC ATGTC ATGTGGC

Recall the sum of pairs score for a column i

Let 1 to n represent sequences from the first alignment

Consider aligning GG, DGG and DGD

Are as good. But when we include DGG

The order of selection of sequences can influence the

Tailored to handle very divergent sequences: 25-30%

ASV_vSRC ttfvalydyesrte----t41sfk---itgjr1qivnnt ---------igdwwlahslttg ---------qtgyipsnyvapsd

correspond to beta H-PLCI cavkalfdykaqre----d*ltfi --- ksaiignve ---------kqeggwwrgdygg-k ---------kqlwfpsnyveeinv

structures R-CSK teciakynfhgtae-----qdlpfc --- kg4lvltiv-avtk---------dpnwykaknikvg----------regiipanyvgkre

Proteins share <12% sequence identity

You might also like