Professional Documents
Culture Documents
Gururaj p
Overview
Biological Background Terminology SNP related general information SNP detection techniques SNP Applications References
Biological Background
How can researchers hope to identify and study all the changes that occur in so many different diseases? How can they explain why some people respond to treatment and not others?
So what exactly are SNPs? How are they involved in so many different aspects of health?
What is SNP ?
A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more than 1 percent) of a large population.
Variations in Genome
Terminology
Polymorphism Linkage Disequilibrium
Correlation of characters states among polymorphic sites Insufficient passage of time to randomize character states by meiotic recombinations
Haplotype
Some Facts
In human beings, 99.9 percent bases are same. Remaining 0.1 percent makes a person unique.
Different attributes / characteristics / traits
how a person looks, diseases he or she develops.
SNP facts
SNPs are found in
coding and (mostly) noncoding regions.
The abundance of SNPs and the ease with which they can be measured make these genetic variations significant. SNPs close to particular gene acts as a marker for that gene. SNPs in coding regions may alter the protein structure made by that coding region.
SNP maps
Sequence genomes of a large number of people Compare the base sequences to discover SNPs. Generate a single map of the human genome containing all possible SNPs => SNP maps
SNP Maps
SNP Profiles
Genome of each individual contains distinct SNP pattern. People can be grouped based on the SNP profile. SNPs Profiles important for identifying response to Drug Therapy. Correlations might emerge between certain SNP profiles and specific responses to treatment.
SNP Profiles
Hybridization Techniques
Micro Arrays
Sequencing by hybridization utilize a set of tiling oligonucleotides somewhat complex pooling and processing of PCR amplicons that are subsequently hybridized to a DNA micro array and visualized. Theoretically capable of genotyping thousands of polymorphisms simultaneously Success rate 97% (Somewhat low for this kind of analysis) High False rates 11 21% Design and fabrication of micro arrays is expensive, hence users are confined to the set of genotypes established by the manufacturer.
Nucleotide Extension
Cleavage
The InvaderTM assay utilizes the exonuclease activity of Cleavase VIII on overlapping oligonucleotide strands. Two oligonucleotides, an invader probe and either a wild-type or wildmutant primary probe, overlap each other at a single nucleotide position on the template only if they are complementary to the polymorphism being queried. Cleavage occurs when the specific overlapping conformation is present, freeing an oligonucleotide referred to as a flap . This flap can be detected in a multiplex manner by size, mass or sequence Commonly the flap participates in a second cleavage assay with another complementary target, causing release of a fluorescent signal. Advantage - the same flap may bind to many targets, generating a cascading signal amplification and thereby obviating the need for PCR amplification. Single-tube one-step reaction. Singleone-
Cleavage
Ligation
One of the most specific assays due to the high specificity of T4 ligase (oligo ligation assay) and even higher specificity of thermostable ligases (ligation detection reaction, LDR) Two primers are designed to anneal adjacent to one another on the target of interest Generally, the upstream primer (discriminating primer) contains a fluorescent label at the 5' end, with the 3' nucleotide overlapping the polymorphic base. The fluorescent signal corresponds to the allele being queried at the 3' position of the discriminating primer When the discriminating primer forms a perfect complement with the target at the junction, the ligase covalently attaches the adjacent downstream primer (common primer) The resulting product is approximately twice as long as each of the individual primers and can be easily monitored for detection by means of capillary electrophoresis or by display on a microarray Advantage Very good sensitivity and specificity
Direct Sequencing Microarray Cleavage / Ligation Electrophoretic mobility assays Comparison of Techniques used
Direct Sequencing
Sanger dideoxysequencing can detect any type of unknown polymorphism and its position, when the majority of DNA contains that polymorphism. Misses polymorphisms and mutations when the DNA is heterozygous limited utility for analysis of solid tumors or pooled samples of DNA due to low sensitivity Once a sample is known to contain a polymorphism in a specific region, direct sequencing is particularly useful for identifying a polymorphism and its specific position. Even if the identity of the polymorphism cannot be discerned in the first pass, multiple sequencing attempts have proven quite successful in elucidating sequence and position information.
Microarray
Variation detection arrays (VDA) scans large sequence blocks and identify regions containing unknown polymorphisms. This methodology suffers from the same limitations in fabrication and design as observed in known polymorphism analysis, but has demonstrated much greater success in the context of unknown polymorphism detection for both SNP and tumor analysis. With respect to SNP analysis, a recent study of chromosome 21 successfully identified approximately half of the estimated number of common SNPs (frequency of 10 50%) across the entire chromosome. The experimental design required a sacrifice in sensitivity in order to minimize false positives. This explains the decrease in successful identification from 80 to 50%.
Cleavage/Ligation
Unknown polymorphisms can also be identified by the cleavage of mismatches in DNA DNA heteroduplexes. This can be achieved either chemically [chemical cleavage method (CCM) or enzymatically (T4 Endo nuclease VII, MutY cleavage or Cleavase). Typically, at least two samples are PCR amplified (one sample can be sufficient for solid tumor samples with high levels of stromal contamination), denatured and then hybridized to create DNA DNA heteroduplexes of the variant strands. Enzymes cleave adjacent to the mismatch and products are resolved via gel or capillary electrophoresis. Unfortunately, the cleavage enzymes often nick complementary regions of DNA as well. This increases background noise, lowers specificity, and reduces the pooling capacity of the assay.
Cleavage / Ligation
SNP Applications
Gene discovery and mapping AssociationAssociation-based candidate polymorphism testing Diagnostics/risk profiling Response prediction Homogeneity testing/study design Gene function identification
Abstract
Authors are describing a high-resolution analysis highof the haplotype structure across 500 KB on chromosome 5q31 using 103 SNPs in a European derived population. They developed an analytical model for Linkage disequilibrium (LD) mapping based on highhighresolution haplotype blocks, which offers a coherent framework for creating a haplotype map of the human genome.
Data used
500 kb region on human chromosome 5q31 that is implicated as containing a genetic risk factor for Crohn disease.
Rioux, J. D et al. Hierarchical linkage disequilibrium mapping of a susceptibility gene for Crohn s disease to the cytokine cluster on chromosome 5. Nature Gene. 29, 223223-228(2001)
103 common (>5% minor allele frequency) SNPs genotyped from a European-derived population. EuropeanStudy describe 258 chromosomes transmitted to individuals with Crohn disease and 258 untransmitted chromosomes.
Data used
The genotype data used in study provides the highest-resolution picture of the patterns highestof genetic variation across a large genomic region, with a market density of 1 SNP roughly every 5 kb.
Study
Focus on identifying the underlying haplotypes. Authors initial focus was on untransmitted control chromosomes, however, the same haplotype structure was seen in the chromosomes transmitted to individuals with Crohn disease, with the only difference being that one of the haplotypes was enriched in frequency, reflecting its association with Crohn disease.
Study
It became evident during the study that the region could be largely decomposed into discrete haplotype blocks, each with a lack of diversity. As haplotype block structure was the same in both groups, they presented combined data from all chromosomes (transmitted and untransmitted).
a. Common haplotype patterns in each block of low diversity. Dashed lines indicate locations where more than 2% of all chromosomes are observed to transition one common haplotype to a different one.
b. Percentage of observed chromosomes that match one of the common patterns exactly (total chromosomes = 258 transmitted + 258 untransmitted).
-The haplotype blocks span up to 100 kb and contain multiple (five or more) common SNPs. -The blocks have only few (2-4) haplotypes, which show no evidence of being derived from one another by recombination, and which account for nearly all chromosomes (>90%) in all cases in the sample.
For example, an 84 kb block shows only two distinct haplotypes that together account for 95% of the observed chromosomes (table -1).
Study
The discrete blocks are separated by intervals in which several independent historical recombination event seem to have occurred, giving rise to greater haplotype diversity for regions spanning the blocks. The most common recombination events are indicated in previous figure by lines connecting the haplotypes. The recombination events appear to be clustered; multiple obligate exchanges must have occurred between most blocks, with little or no exchange within block.
Study
Although there is detectable recombination between blocks, it is modest enough for there to be clear long-range correlation longamong (that is, LD) blocks. The haplotypes at the various blocks can be readily assigned to one of the four ancestral longlong-range haplotypes. Indeed, 38% of the chromosomes studies carried one of these four haplotypes across the entire length of the region.
Study
Using HMM, they developed an approach to define the block structure formally. The HMM simultaneously assigns every position along each observed chromosome to one of the four ancestral haplotypes and estimates the maximum-likelihood values of maximumthe historical recombination frequency ( ) between each pair of markers. markers.
Study
The quantity provides a convenient summary of the degree of haplotype exchange across inter-marker intervals and interrelates directly toe conventional measures of LD. In this study, is estimated at less than 1% for 73 of the inter-marker intervals, 1-4% for inter114 of the intervals, and more than 4% for only 9 of the intervals.
To ensure the ability to reconstruct multi-marker multihaplotypes, SNPs for haplotype analysis were selected from the set of markers for which full genotypes were available for all members. SNPs at CpG sites were not included to prevent potential confounding of common haplotype patterns from recurrent mutations.
Methods: Haplotype counting Haplotype percentages in Haplotype block structure in 5q31 figure were computed using haplotypes generated by the transmission disequilibrium test (TDT) implementation in Genehunter 2.0 (ref. 22 in the paper), followed by use of an EM-type EMalgorithm (ref. 23,24 in paper), to include the minority of chromosomes that had one or more markers with ambiguous phase or where one marker was missing genotype data.
Discussion of Study
The region of chromosome 5q31 may be largely divided into discrete blocks of 10-100 10kb; each block has only a few common haplotypes; and the haplotype correlation between blocks gives rise o long-range LD. longFocusing on haplotype blocks greatly clarifies LD analyses. Once the haplotype blocks are identified, they can be treated as alleles and tested for LD (instead of single-marker singleanalyses of LD).
Discussion of Study
In analogous fashion, the haplotype structure provides a crisp approach for testing the association of genomic segments with disease. By contrast, disease association studies transitionally involve testing individual SNPs in and around a gene. Once the haplotype blocks are defined, it is straightforward to examine a subset of SNPs that uniquely distinguish the common haplotypes in each block. This allows the common variation in a gene to tested exhaustively for association with disease.
Discussion of Study
This approach provides a precise framework for creating a comprehensive haplotype map of the human genome. By testing a sufficiently large collections of SNPs, it should be possible to define all of the common haplotypes underlying blocks of LD. Once such a map is created, it will be possible to select an optimal reference set of SNPs for any subsequent genotyping study. This detailed understanding of common human variation represents an important step in the Human genome project.
Linkage Disequilibrium
Uses unrelated individuals Good for fine scale mapping because there is greater opportunity for recombination to occur. Map of loci that contribute to inherited genetic disorders States can not be considered independent because they are related by distance and recombination, so individual haplotypes may not be the cause of disease, but rather a combination of several haplotypes in blocks
Linkage Disequilibrium
Greater distance between genes, the greater chance of recombination Lesser distance between genes, the less chance of recombination Knowing the above and observing inherited alleles, one can estimate the relative distance between genes
What we know
LD, which has a non-random association of nonhaplotypes to a disease, is likely strongest around the DS(Disease Susceptibility) gene. A locus will most likely be where the strongest associations are.
Notation
Haplotype Map M has k parameters; (m1, ,mk) (m The haplotype pattern P on M consists of the vector space (p1, ,pk), where each pi is an allele of mi or a wild-card wild(*) P occurs on the haplotype vector, which is simply the chromosome (H), so H = (h1, ,hk) where hi = pi or (H (h hi = * Example:
P1 = (*, 2, 5, *, 3, *, *, *, * , *) PC = (4, 2, 5, 1, 3, 2, 6, 4, 5, 3)
2.
Gaps in sequences
Accounts for mutations, errors, missing data, and recombination Gap size and number can be controlled in HPM
Procedure
DepthDepth-first search finds all haplotype patterns that exceed the lower bound threshold and meets the association measure Calculate the frequency f(mi) of marker mi with respect to (M, H, Y, x), where Y= phenotype and x = positive x), association threshold Markers with highest frequencies are predicted to be the area of the DS gene, assuming a DS gene is present. Prediction of granularity of marker density Ranked based on frequency
List of 11 most strongly disease-associated haplotype diseasepatterns in the simulated data Chromosome has 101 markers Dashed line indicates the true gene location
Frequency histogram of previous slides data, but with patterns exceeding the threshold of association Dashed line indicates the true gene location Marker 5 now has the highest frequency
a) b) c) d)
Mutation carrying chromosomes, denoted by A Sample founder population size Corrupted data Missing data
Frequency vs. Map Location of HLA markers ___ HPM calculated frequencies ----- Background LD frequencies Vertical lines indicates true locations of markers
References
Introduction to SNPs: Discovery of Markers of Disease SNP seeking long term association with complex diseases SNP mapping using Genome-wide Unique Sequences GenomeThe Structure of Haplotypes Blocks in Human Genome Using Haplotype blocks to map human complex trait loci High Resolution haplotype structure in human genome Detection of regulatory variation in mouse genes http://linkage.rockefeller.edu/wli/lld.html http://statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt http://www.cs.helsinki.fi/u/htoivone/pubs/ajhg_2000.pdf Resolution of Haplotypes and Haplotype Frequencies from SNP Genotype of Pooled Samples http://www.journals.uchicago.edu/AJHG/journal/issues/v71n6/024386/024386.html http://www.ncbi.nlm.nih.gov/entrez/utils/fref.fcgi?http://www.sciencemag.org/cgi/pmidl ookup?view=full&pmid=11452081 http://www.genome.gov/10001665 http://walnut.usc.edu/~magnus/papers/tig.pdf