Professional Documents
Culture Documents
Studies
DEPT. OF ANIMAL GENETICS & BREEDING
COLLEGE OF VETERINARY SCIENCE AND ANIMAL HUSBANDRY
ANAND AGRICULTURAL UNIVERSITY
ANAND - 388 001.
Maul! U"a#$%a%
REG. NO.&- O'-1'0(-)010
AGB-*+1
MAJOR ADVISOR
DR. C. G. Joshi
Professor and Head
Dept. of Animal Biotechnology
MINOR ADVISOR
DR. D. N. Rank
Professor and Head
Dept. of Animal Genetics & Breeding
POST-GRADUATE
SEMINAR
ON
1
Conclusion
Conclusion
Defnition, need and
scope
Methods to control Multiple correction
Single SNP Multiple SNP
Haplotype
models
Bayesian
introduction
SNP quality control Missing data Imputation
Defnition, need and outline in!age and "ssociation studies
#lo$ o% Presentation
&
Increasing 'rend((((
Nature Reviews Genetics
carried nearly )* re+ie$
articles related to
association analysis one
$ay or another, -.p
to,&**/0,
Lancet pu1lished a series
o% re+ie$ and introductory
articles in &**2 on genetic
epidemiology $ith
association as the ma3or
component,
Annual Review 3ournals
pu1lished many re+ie$s
that can 1e lin!ed to
association studies,
-ee,&**/0
)
'
o
t
a
l
N
u
m
1
e
r
o
%
P
u
1
l
i
c
a
t
i
o
n
s
Calendar 4uarter
951
Pu1lished 56" 7eports, &**2 8 9:&*11
;
!http"##$$$.genome.go%#g$ast&dies#'
Defnition
"n association 1et$een a SNP < a phenotype that is present in the
population %rom $hich a sample is ta!en,
-Stephens and Balding, &**=0
5enetic association studies aim to detect association 1et$een one
or more genetic polymorphisms and a trait, $hich might 1e some
quantitati+e characteristic or a discrete attri1ute or disease,
-Cordelland Clayton,
&**20
5enetic association studies assess correlations 1et$een genetic
+ariants and trait di?erences on a population scale,
-Cordon and Bell, &**10
2
@Aample o% "ssociation in CaseBControl
Study
Control C * 1 1 1 1 1 * 1 * & 1 & & * 1 * * * 1 1
& * 1 1 1 & * * * 1 * 1 1 * 1 1 * 1 * *
& * 1 & & * 1 & 1 * * 1 1 * 1 * * 1 1 1
1 & 1 1 & 1 1 1 1 * 1 1 1 * * & & & * &
Cases C 1 1 & 1 * 1 & 1 1 1 1 & 1 & 1 & 1 & 1 1
& & 1 & * 1 * * * 1 & & 1 & 1 & 1 * & 1
* 1 1 * * & 1 * * & 1 1 1 & 1 1 & * 1 *
* 1 1 * * 1 * & & 1 1 1 1 & * 1 & 1 1 &
5oal C 'o identi%y the genetic 1asis o% gi+en
phenotypes or diseases
-7e%, C DA%ord .ni+ersity 6e1site,
httpC::$$$,stats,oA,ac,u!:Emc+ean:g$a;,pd%0
9
in!age and "ssociation
"ssociation
di?ers %rom
lin!age in that
the same allele
-or alleles0 is
associated $ith
the trait in a
similar manner
across the $hole
population, $hile
lin!age allo$s
di?erent alleles
to 1e associated
$ith the trait in
di?erent
%amilies,
-Cardon and Bell, &**10
/
-7e%, C DA%ord .ni+ersity 6e1site,
httpC::$$$,stats,oA,ac,u!:Emc+ean:g$a;,pd%0
F
Causes o% association
the polymorphism has a causal role -Direct association0
the polymorphism has no causal role 1ut is associated $ith
a near1y causal +ariant -Indirect association0G or
the association is due to some underlying stratifcation or
admiAture o% the population -Con%ounded association0,
-Cordell and Clayton, &**90
=
'ypes o% genetic association
Candidate polymorphism
Candidate gene
#ine mapping
5enome $ide association
-Stephens and Balding, &**=0
1*
Designs %or genetic association
studies
Foo!in" are di#erent t$%es
of desi"ns of "enetic
association studies
Statistica ana$sis
10, Cross sectional ogistic :inear regression, chiB
square test
&0, Cohort studies Sur+i+al analysis method
)0, Case control ogistic :inear regression, chiB
square test
;0, @Atreme +alue inear regression < Permutation
approach
20, CaseBParent triad 'D', ogistic, ogBlinear method
90, CaseBParentB5rand parent
septets
ogBlinear methods
/0, 5eneral pedigree PD', #amily 1ased association
test, 'D'
F0, Case only ogistic regression, ChiBsquare
=0, DN" Bpooling Hariance component estimation
-Cordell and Clayton,
&**20
11
'est o%
"ssociation
'est o%
"ssociation
Single SNP
association
Single SNP
association
ChiBsquare
test
ChiBsquare
test
"rmitage test
"rmitage test
#isherIs eAact
test
#isherIs eAact
test
5eneral
linear model
5eneral
linear model
ogistic
regression
models
ogistic
regression
models
-Balding, &**90
1&
'est o%
"ssociation
'est o%
"ssociation
Multiple SNP
association
Multiple SNP
association
MD7
MD7
SNP set
association
SNP set
association
ogistic
7egression
ogistic
7egression
Haplotype
1ased
regression
model
Haplotype
1ased
regression
model
1)
SNP 4uality Control
'he quality control -4C0 fltering o% single nucleotide
polymorphisms -SNPs0 is an important step especially in genomeB
$ide association studies to minimiJe potential %alse fndings,
SNP 4C commonly uses eApertBguided flters 1ased on 4C
+aria1les, to remo+e SNPs $ith insuKcient genotyping quality,
such as C
( Hardy86ein1erg equili1rium
( missing proportion -MSP0
( minor allele %requency -M"#0
#ollo$ing are some o% the criteria %or SNP 4CC
-i0 percentage o% SNPs eAcluded due to lo$ quality
-ii0 inLation %actor o% the test statistics -)
-iii0 num1er o% %alse associations %ound in the fltered dataset
-i+0 num1er o% true associations missed in the fltered dataset,
-Pongpanich et al., &*1*0
1;
SNP quality control -4C0 is commonly sa%eguarded 1y Msuper+isedI -i,e,
eApertBguided0 flters to eAclude lo$Bquality SNPs,
'he Msuper+isedI eApert flters aim to remo+e SNPs that %all into the
eAtremes o% 4C +aria1les including Hardy86ein1erg equili1rium -H6@0,
missing proportion -MSP0 and minor allele %requency -M"#0,
'he rationale is clearC
( eAtreme de+iation %rom H6@ is typically used to identi%y gross
genotyping error -'eo et al., &**/0
( a high MSP indicates poor genotype pro1e per%ormance and lo$
genotyping accuracy -Neale and Purcell, &**FG 6'CCC, &**/0
( SNPs $ith lo$ M"# are more prone to error, as %e$er samples $ould
1e $ithin a genotype cluster and most clusteringB1ased calling
algorithms do not per%orm $ell $ith rare alleles -Neale and Purcell,
&**FG 'eo, &**F0
12
#or singleBSNP analyses, i% a %e$ genotypes are missing there is
not much pro1lem,
#or multipoint SNP analyses, missing data can 1e more
pro1lematic 1ecause many indi+iduals might ha+e one or more
missing genotypes,
Dne con+enient solution is data imputationC replacing missing
genotypes $ith predicted +alues that are 1ased on the o1ser+ed
genotypes at neigh1ouring SNPs,
5enotype imputation is the term used to descri1e the process o%
predicting or imputing genotypes that are not directly assayed
in a sample o% indi+iduals,
Missing 5enotypic
Imputation
19
!Balding) *++,'
'here are se+eral distinct scenarios in $hich genotype imputation is
desira1le, 1ut the term no$ most o%ten re%ers to the situation in
$hich a re%erence panel o% haplotypes at a dense set o% SNPs is
used to impute into a study sample o% indi+iduals that ha+e 1een
genotyped at a su1set o% the SNPs,
- Marchini and Ho$ie, &*1*0
Imputation methods $or! 1y com1ining a re%erence panel o%
indi+iduals genotyped at a dense set o% polymorphic sites -usually
singleBnucleotide polymorphisms, or MMSNPsII0 $ith a study sample
collected %rom genetically similar population and genotyped at a
su1set o% these sites, -Ho$ie et al., &**=0
Imputation methods either see! a M1estI prediction o% a missing
genotype, such as a maximum likelihood estimate -single
imputation0, or randomly select it %rom a pro1a1ility distri1ution
(multiple imputations0, -Balding, &**90
'he goal is to predict the genotypes at the SNPs that are not
directly genotyped in the study sample,
1/
"": "C: CC
"": "': ''
55: 5': ''
"": "5: 55
"": "C: CC
CC: C5: 55
"C
BB
5'
""
BB
55
":C *,* 1,* *,*
":' *,& *,2 *,)
5:' *,* 1,* *,*
":51,* *,* *,*
":C *,1 *,* *,=
C:5*,* *,* 1,*
D1ser+ed
5enotypes
Imputation
7e%erence
Predicted
5enotypes
Some
"lgorithms
Posterior
Pro1a1ility
Imputation o%
5enotypes
1F
- Marchini and Ho$ie, &*1*0
1=
Genot$%e I&%utation Met'ods (o! it )or*s+
I!"#$ v% @Atension o% HMM
I!"#$ v& More LeAi1le than M+1I, SNP
di+ided into t$o sets C Set ' <
Set ., uses HMM < MCMC
'ast!(A)$ .ses the o1ser+ation that
haplotypes tend to cluster into
groups o% closely related or
similar haplotypes, HMM
*I*A Bayesian "pproach
M"CH HMM, Iterati+ely assigns
haplotypes to the genotypes
1ased on the con+erging model
B@"5@ -C@0 5raphical model o% a set o%
haplotypes, Iteration method
PINN-C@0
SNP tagging approach
SNPS'"'
.NPH"S@D
'.N"-C@0
- Marchini and Ho$ie, &*1*0
&*
.ses o%
Imputatio
n
Boostin
g Po$er
#ine
Mappin
g
Meta
"nalysis
Imputati
on o%
untyped
+ariation
Imputati
on o%
NonBSNP
+ariation
Correctio
n o%
genotypi
ng
+ariation
- Marchini and Ho$ie, &*1*0
&1
Single ocus association
analysis
&&
Pearson goodnessBo%Bft test
Categorical data may 1e displayed in contingency ta1les
'he chiBsquare statistic compares the o1ser+ed count in each
ta1le cell to the count $hich $ould 1e eApected under the
assumption o% no association 1et$een the ro$ and column
classifcations
'he chiBsquare statistic may 1e used to test the hypothesis o% no
association 1et$een t$o or more groups, populations, or criteria
&)
#or a single SNP $ith alleles " and B tested in a case control
study, the data generated consist o% siA counts o% the
num1ers o% genotypes -"", "B and BB0 in cases and controls
C
D1ser+ed +alue %or "" genotypes in cases, D
1
C a
@Apected +alue %or "" genotypes in cases, @
1
C
ChiBSquare statistic C
AA A, ,, Tota
Cases a 1 C n
case
-7
1
0
Control
s
d e % n
cont
,
-7
&
0
'otal n
""
-C
1
0 n
"B
-C
&
0 n
BB
-C
)
0 ON
-a. Fu "enot$%e ta/e for a "enera "enetic
&ode
&;
AA A,0,,
Case a 1Pc
Control d eP%
A ,
Case &aP1 1P&c
Control &dPe eP&%
AA0A, ,,
Case aP1 C
Control dPe %
-10 Dominant modelC allele B increases ris!
-c0 7ecessi+e modelC t$o copies o% allele B required %or
increased ris!
-d0 Multiplicati+e modelC rB%old increased ris! %or "B, r
&
increased ris! %or
BB, "nalysed 1y allele, not 1y genotype
-e$is, &**&0
&2
Consider a sample o% SNP genotypes %or N unrelated diploid
indi+iduals measured at an autosomal locus,
n
"
C 7are copy o% allele
n
B
C Common allele
n
""
C n
BB
C
@Aactly n
"B
heteroJygotes C
'hus, under the assumption o% H6@, the pro1a1ility o%
o1ser+ing eAactly n
A*
hetero+,-otes in a sample o% N
individuals with n
A
minor alleles is
'his equation holds %or each possi1le num1er o%
heteroJygotes, n
A*
.
-6igginton et al.,
&**20
&/
'he eApression %or P-N
"B
O n
"B
QN, N
"
0 gi+en in equation
leads to natural tests %or H6@,
DneBsided test C
Defcit o% heteroJygotes, P
lo$
O P-N
"B
R n
"B
QN, N
"
0 -In1reeding,
Stratifcation0
@Acess o% heteroJygotes, P
high
O P-N
"B
S n
"B
QN, N
"
0 -5enotyping
error0
In each case, the statistic can 1e calculated 1y simply
summing o+er equation, to include all possi1le +alues o% N
"B
that are lo$er -%or P
lo$
0 or higher -%or P
high
0 than those
o1ser+ed in the actual data
-6igginton et al.,
&**20
&F
Control genotypes should 1e in Hardy86ein1erg equili1rium,
pro+ided the population they are selected %rom is random mating
and is large in siJe,
Suppose the population %requency o% allele " is p and allele B is qO
1Bp, then the genotypes "", "B and BB should ha+e %requency p
&
,
&pq and q
&
,
Pro+ided the controls are in H6@, the cases may then 1e tested, I%
the SNP has a true genetic e?ect that is no controlled 1y a
multiplicati+e model, the cases $ill not 1e in H6@ -although again,
the test has little po$er to detect small departures %rom H6@0, I% the
cases are in H6@, the data may 1e analysed 1y allele counting, as
any genetic e?ect is consistent $ith a multiplicati+e model,
" signifcant result sho$ing that controls are not in Hardy86ein1erg
equili1rium -H6@0 could arise 1ecause o%C
( random chance
( genotyping pro1lems
( heterogeneous population
-e$is, &**20
&=
)*
T'e Odds Ratio 1 a Measure of
Association
" use%ul statistic %or measuring the le+el o% association in
contingency ta1les is the odds ratio,.
I% the odds are equal, their ratio equals one, " sample estimator o%
the odds ratio , .R is
Ddd 7atio C " measurement o% association that is commonly used in
caseBcontrol studies, It is defned as the odds o% eAposure to the
suscepti1le genetic +ariant in cases compared $ith that in
controls, I% the odds ratio is signifcantly greater than one, then the
genetic +ariant is associated $ith the disease
-6ang et al.,
&**20
DD -T0 O
)1
Confdence Inter+al < Interpretation
Standard error is +ery much necessary to fnd confdence
inter+al %or null hypothesis o% no association C
CI %or D7 O D7U1,=9VD7VW
CI %or DD O DDU 1,=9V W
SNP has no inLuence on disease i% the =2X CI %or D7
includes M1I or CI %or DD includes M*I
)&
))
"rmitageIs 'rend test
'he disad+antages o% Population stratifcation and con%ounding %actor
is o+ercomed, to some eAtent, 1y applying the "rmitageYs trend test,
as suggested 1y "rmitage -1=220, Sasieni -1==/0, and Schaid and
Zaco1sen -1===0,
'here are three common choices o% scoring systemC
10 coBdominant scoreC A
*
O *, A
1
O 1, and A
&
O &G
&0 dominant scoreC A
*
O *, A
1
O 1, and A
&
O 1G
)0 recessi+e scoreC A
*
O *, A
1
O *, and A
&
O 1,
Here, the names o% scoring systems are in %a+our o% the minor allele
[m\,
-#ang et al., &**=0
Genot$%es
MM Mm mm 'otal
Case n
1*
n
11
n
1&
N
1
Control n
**
n
*1
n
*&
N
*
'otal N
P*
N
P1
N
P&
N
Score ]
*
]
1
]
&
);
0
,
1
/ re-ression parameters
1 / random error
)=
So%t$are used C S"S F,*&
-P7DC 5M0
;*
5eneraliJed inear Model
5eneraliJed linear models -5Ms0 are a large class o% statistical
models %or relating responses to linear com1inations o% predictor
+aria1les, including many commonly encountered types o%
dependent +aria1les and error structures as special cases,
-Za!man, &**&0
"d+antages o% using 5MsC
( No need to trans%orm the data into normality
( 5Ms uni%y a $ide +ariety o% statistical methods,
" 5M generaliJes ordinary regression models in t$o $aysC #irst, it
allo$s 2 to have a distri0ution other than the normal. )econd, it
allo$s modeling some %unction o% the mean,
Both generaliJations are important %or categorical data,
-"gresti, &**/0
;1
5Ms %or
1inary data
5Ms %or
1inary data
ogit Model
ogit Model
ogit in!
ogit in!
Pro1it Model
Pro1it Model
Pro1it in!
Pro1it in!
'rans%orm to
M^I scores
%rom snd
'rans%orm to
M^I scores
%rom snd
;&
ogit %or single SNP
@ach su13ect in our sample consists o% a -y
i
G A
i
0 pair $here y
i
is case:control status -1:*0 and A
i
-*,1,&0 is the genotype at
typed locusC
Genot$
%e
4
i
Odds Para&e
ters
aa * _ `
*
"a 1 _ -1Pa0 `
1
"" & _
-1Pa0
&
`
&
-7e%, C DA%ord .ni+ersity 6e1site,
httpC::$$$,stats,oA,ac,u!:Emc+ean:g$a;,pd%0
;)
No$ trans%ormation logit -3) / lo- (3 4 (% 5 3)) is applied to 3
i
, the
disease risk o' the i6th individual.
'he +alue o% logit -b
i
0 is equated to either `
*
, `
1
, or `
&
,
according to the
genotype o% indi+idual i -`
1
%or heteroJygotes0,
'he li!elihoodBratio test o% this general model, against the null
hypothesis `
*
O`
1
O `
&
, has & d,%, ,and %or large sample siJes is
equi+alent to the Pearson &Bd% test,
.sers can impro+e the po$er to detect specifc disease ris!s, at the
cost o% lo$er po$er against some other ris! models, 1y restricting the
+alues o% `
*
, `
1
and `
&
,
'ests %or recessi+e or dominant e?ects can 1e o1tained 1y requiring
that `
*
O `
1
or `
1
O `
&
,
-Balding, &**90
;;
ogistic 7egression o%
Melanoma status on
5enotype
Ris* Factor Odds Ratio 9556I P 7aue
Models $ithout Co+ariate
SNPC no, o%
copies ['\
alleles
*,/F *,9/B*,=) *,**;
Models $ith intermediate %actor as co+ariate
SNPC no, o%
copies o% ['\
allele
*,F= *,/;B1,*/ *,&)
Ne+us count &,9* &,&FB&,=/ c1*
B;)
;2
!.eggini and /orris) *+00'
'est o% association C Multiple SNPs
;9
Set association, to e+aluate sets o% SNP mar!ers at +arious positions
in the genome -in particular, in di?erent suscepti1ility genes0,
'his method per%orms a simultaneous signifcance test on se+eral sets
o% loci $hile !eeping the o+erall type I error in control,
SNPBsetB1ased analysis 1orro$s in%ormation %rom di?erent 1ut
correlated SNPs that are grouped on the 1asis o% prior 1iological
!no$ledge and hence has the possi1ility o% pro+iding results $ith
impro+ed reproduci1ility and increased po$er, especially $hen
indi+idualBSNP e?ects are moderate, as $ell as impro+ed
interpreta1ility,
'o increase the po$er o% the test, sometime it is %easi1le to com1ine
rele+ant sources o% in%ormation %or a gi+en SNP, such as C
"llelic association -""0, HardyB6ein1erg disequili1rium -H6D0, and
e+idence %or genotyping errors, -Heidema et al,,
&**/0
SNP set analysis
-6u et al.,&*1*0
;/
'his mode o% analysis proceeds +ia a t$oBstep procedureC
( SNP are assigned to set on the 1asis o% some meaning%ul
1iological criteria -genomic %eatures0 e,g, 5enes
( 'hen, tests %or the association 1et$een each genomic %eature
and a disease phenotype are per%ormed $ith the use o% a
logistic !ernel machineB 1ased multilocus test, across the
genome,
SNPBset analysis can pro+e ad+antageous o+er the standard
analysis o% indi+idual SNPs, By %orming SNP sets and testing each
SNP set as a unit, $e are reducing the num1er o% hypotheses 1eing
tested and thus relaAing the stringent conditions %or reaching
genomeB$ide signifcance in case o% 56",
'here are %ollo$ing $ays o% grouping SNPs into set C
( SNP location in the gene as or near to gene -gene 1ased
set analysis0
( Set %ormation on the 1asis o% N@55 path$ay
( 5roup SNPs onto e+olutionary conser+ed regions
( 5rouping SNPs into haplotype 1loc!s
-6u et al.,&*1*0
;F
5enome $ide SNP set testing
"ssume population 1ased caseBcontrol status -#or a single set0C
( let J
i1
, J
i&
,,, J
ip
1e genotype +alues %or the SNPs in the SNP set
%or the IIth su13ect -i O 1,d,n0,
( 'he caseB control status %or the iIth su13ect is denoted 1y y
i
-y
i
O 1 %or cases, and yO * %or controls0,
( J
i3
O *, 1, & corresponding to homoJygotes %or the ma3or allele,
heteroJygotes, and homoJygotes %or the minor allele,
respecti+ely,
( #urther assume collection o% MmI additional set o% demographic,
en+ironmental and other con%ounding +aria1les,
#or the iIth su13ect let A
i1
, A
i&
,,,,, A
im
denote the +alues o% the
co+ariates that $e $ould li!e to ad3ust %or,
'he goal o% SNPBset analysis is then to test the glo1al null o% $hether
any o% the p SNPs are related to the outcome $hile ad3usting %or the
additional co+ariates
-6u et al., &*1*0
;=
ogistic Nernel Machine 7egression
Model
'he !ernelBmachine %rame$or! has 1ecome +ery popular %or
modelling highBdimensional 1iomedical data 1ecause o% its a1ility to
allo$ %or compleA:nonlinear relationships 1et$een the dependent and
independent +aria1les -Bro$n et al., &***0 $hile ad3usting %or
co+ariate e?ects,
.nder the logistic Nernel Machine 7egression Model, #ollo$ing is the
model %or SNP 3oint interaction and considering other co+ariates C
( In $hich _
*
is the intercept
( _
1
, _
&
, d,, _
m
are regression coeKcients corresponding to the
en+ironmental and demographic co+ariates,
( 'he SNPs, J
i1
, ,, J
ip
, inLuence y
i
through the general %unction h-
3
0,
$hich is an ar1itrary %unction that that has a %orm defned only 1y
a positi+e, semi defnite !ernel %unction N-
3
,
3
0,
-6u et al., &*1*0
2*
MD7 is a nonparametric data mining approach
'o reduce t$o or more SNPs, %or eAample, to a ne$ single
+aria1le that is then e+aluated using a classifer such as
Bayes or logistic regression,
In MD7, each multiBlocus genotype o% a SNP com1ination is
assigned to a highBris! or lo$Bris! group, depending on the
ratio o% cases and nonBcases $ith this multiBlocus genotype,
I% this ratio eAceeds a certain threshold, this multiBlocus
genotype is assigned to as highBris!, other$ise it is
assigned to as lo$Bris!,
By assigning all multiBlocus genotypes %or a certain
com1ination o% SNPs to either highBris! or lo$Bris!, MD7
reduces the num1er o% multiBlocus genotypes to one ris!
%actor consisting o% t$o le+els, highBris! or lo$Bris!,
'he aim is to construct a ne$ ris! %actor that %acilitates the
detection o% nonlinear interactions among SNPs such that
the prediction o% the outcome +aria1le is impro+ed o+er the
original representation o% the data,
MultiBDimensional 7eduction
-7itchie et al., &**10
21
-ee et al.,
&**F0
2&
2)
ogistic regression
ogistic regression analyses %or SNPs are a natural eAtension o% the
singleBSNP analyses that are discussed in pre+ious slidesC there is no$
a coe%%icient -`*, `1 or `&0 %or each SNP, leading to a general test $ith
& d%, By constraining the coeKcients, tests $ith d% can 1e o1tained,
Co+ariates such as seA, age or en+ironmental eAposures are readily
included, Similarly, interactions 1et$een SNPs can 1e included,
-Balding, &**90
'his con+eys little 1eneft, and can reduce po$er to detect an
association, i% there is a single underlying causal +ariant and little or
no recom1ination 1et$een SNPs, 1ut it is potentially use%ul %or
in+estigating epistatic e?ects,
-6u et al., &*1*0
2;
Haplotype 1ased methods
6hen hundreds o% thousands o% SNPs are genotyped, it
happens that most o% them are in high lin!age
disequili1rium -$hich are called haplotype, i% they happen
to 1e ad3acent on the chromosome0,
#e$ methods ha+e 1een proposed in the literature %or
identi%ying haplotype 8 PH"S@ -Stephens et al.,
&**10,SNPH"P, #"S'PH"S@ -Scheet < Stephens, &**90,
Haplo+ie$, PINN -Purcell et al., &**/0 etc, Most o% these
are a+aila1le as so%t$are,
"1o+e tools can 1e used to identi%y haplotype in 56"
datasets and replace the entire Haplotype 1loc! $ith a
representati+e SNP called a I'ag SNPI,
22
tSNP can sometimes pro+ide greater analytical po$er than
singleBmar!er analysis %or genetic association studies,
'his is 1ecause haplotypes are inherited together in the
ma3ority o% cases, and they incorporate linkage
disequilibrium in%ormation -"!ey and ]iong, &**)G Schaid,
&**;0,
Con+ersely, haplotypeB1ased statistical analysis has a
$ea!ness since haplotypes are o%ten not directly
o1ser+a1le,
Hence, haplotypes and their %requencies are in%erred 1y
statistical methods such as the @ApectationB MaAimiJation
-@M0 algorithm -Dempster et al., %7889 @AcoKer and
Slat!in, 1==20 or the Bayesian method -Stephens et al.,
&::%9 in et al., &::&).
29
5i+en haplotype assignments, the simplest analysis in+ol+es
testing %or independence o% ro$s and columns in a & e k
contin-enc, ta0le, where k denotes the num0er o% distinct
haplotypes -Sham, 1==F0,
"lternati+e approaches can 1e 1ased on the estimated
haplotype proportions among cases and controls, $ithout an
eAplicit haplotype assignment %or indi+iduals -Schaid, &**;0C
the test compares the product o% separate multinomial
li!elihoods %or cases and controls $ith that o1tained 1y
com1ining cases and controls,
Haplotype 1ased regression model is +ery use%ul in
haplotype 1ased association study
2/
!1ang et al., *++2'
2F
7egression Models %or
Haplotypes
6ithin the %rame$or! o% the generaliJed linear model -5M0, the
haplotype e?ect on traits can 1e statistically descri1ed and tested,
'he model can 1e eApressed as @-f0 O %
B1
-3')
Human #actor
SNP 4C so%$are
DBSC"N
PC"