Professional Documents
Culture Documents
RNA stands for ribonucleic acid. In comparison to DNA, it seems smaller. RNA only consist from 100 to 1000 nucleotides long.
Nucleotide structures: Contains an attached phosphate group, that gives it its acidic character and under intracellular conditions is negatively charged. o Covalently linked via a phosphor-ester bond to a pentose sugar: ribose. Ribose differs between RNA and DNA. In RNA at the 2 carbon position, there is an OH group. In DNA at the 2 carbon position there are only H. Pentose sugar molecule is typically linked to a basic ring structure that specifies the type of nucleotide monomer it is. o Purines have fused ring structures and composed of adenine (A) and guanine (G) Guanine has a characterized carbonyl group. o Pyrimidines are single ring structures and composed of thymine (T) and cytosine (C). Uracil instead of thymine in RNA. Uracil and thymine differ by a methyl group attachment at the double bond of the ring. Cytosine has an amino functional group attached to the ring. Nucleosides is the SUGAR and BASE without the phosphate group. Adding phosphate groups will mean were talking about nucleotides. o Ex. the nucleoside monophosphate is a nucleotide. o We can add multiple phosphate groups onto a nucleoside to make mono-, di-, tri-. Ex. Adenosine tri-phosphate.
Amino acids are the building blocks of polypeptides. The shape of a protein is very important. A proteins shape is also very dynamic. Mutations that occur in the gene that might change an amino acid in the polypeptide can lead to a dramatic change in the polypeptide. 20 standard types of amino acids.
Proteins functions are derived from 3-D structures, which are specified by the amino acid sequence. When we talk about 3-D structure, we talk about the conformation of the protein. Hierarchy application of protein structure: primary->secondary (local folding)->tertiary>quaternary->supramolecular (interacting with other proteins). Some proteins can function on their own, while others require the combination of others as well to function. Primary structure is the linear sequence of amino acids that make up a polypeptide. o The polymer ALSO has a bias, like DNA, an amino N-terminus and a carboxyl C-terminus. o Amino acids are bound to each other via peptide bonds. o Residue refers to amino acids in a polypeptide, ex the # of amino acids. Secondary structure refers to interactions that occur at local levels, which are stabilized by noncovalent interactions, such as hydrogen bonds. o Alpha helices A right-handed helix like DNA. Hydrogen bond forms between the O of the carboxylic group and the H between every 4th amino acid. The R groups are pointed outwards! o Beta sheets Beta sheets are made up of laterally packed beta strands. Beta strands are stabilized by hydrogen bonds BETWEEN each beta strand. Beta strands are short, around 5-8 residues in length. Beta strand can be oriented in an antiparralel or a parallel lateral arrangement The R groups are projected perpendicular to the plane of the sheet. o Beta turns To maintain the shape of the polypeptide, you will have sharp Beta turns, which are also sustained by hydrogen bonds. Glycine is a very compact because of the H R group. Proline has a ringed R group structure that forces a bend in the ring structure. Therefore, beta turns are rich in glycine and proline. o No noncovalent interactions will result in a random coil Tertiary structures are caused by long distance interactions within a polypeptide chain. The end structure for most proteins. o Stabilized by hydrophobic interactions between nonpolar side chains, hydrogen bonds between polar side chains and disulfide bonds between cysteine residues. o Included in tertiary structures are compacted secondary structures.
Transcription is the creation of mRNA from a DNA template Ingredients needed for transcription in prokaryotes: o DNA template that serves as a blueprint. o Ribonucleotides to serve as monomers for the mRNA. o RNA polymerase as an enzyme to catalyze the synthesis of RNA. A gene that is going to be transcribed will have several features: o The coding gene is downstream from the start of transcription, which occurs at +1. Going downstream into the coding gene, the gene number will be increasing.
Transcription process: 1. RNA polymerase needs to find and park over the promoter region. a. When parked over the promoter region, the start of transcription can be found by the polymerase. 2. The polymerase locally denatures the DNA near the transcription start site, forming a transcription bubble, less than 15 nucleotides in length, pushing down the non-template strand. a. Polymerase will catalyze the phosphodiester linkage between the first two rNTPs. b. Note that in transcription, there is no primer required, as opposed to DNA replication. 3. Elongation of the RNA by reading from the 3 to the 5 of the DNA, moving down the template strand, increasing the size of the bubble and adding more rNTPs. a. During elongation, we transiently produce a hybrid between DNA and RNA, possibly forming an A-DNA complex. 4. Incoming rNTP, can be A,G,C, or U, will come to the free 3 end (growing 5 to 3), react with the 3 OH group and the alpha phosphate group of the tri-phosphate and forms a phospho-diester bond.
Translation, prokaryotic protein synthesis can be divided into three steps: 1. Initiation a. Starts off with the 30S subunit, where 3 initiation factors (IFs) are loaded on. IF1, IF2 bound to GTP, and IF3. When bounded, it forms the preinitiation complex.
Nucleotide polymerization ingredients: DNA polymerase catalyzes the polymerization of DNA. Deoxynucleoside 5-triphosphates (A,T,C,G) dNTP monomers. o Incoming dNTPs are covalently linked to the previously 3 OH at the phosphate group and liberates the pyrophosphate molecule. Requires a PRIMER, which can be either DNA or RNA, as opposed to transcription. o The dNTPs are added at the free 3 hydroxyl group on the terminal pentose sugar. Similar to transcription, the polymerization direction is 5 to 3 while it reads the template from 3 to 5.
RNA primers have once again need to be removed by Ribonuclease H and FEN-1, replaced by polymerase complexes and sealed by DNA ligase.
Polymerase Chain Reaction (PCR), a few hours to clone DN. Modern day rapid cycling can finish this 20 minutes:
Coding genes is the entire nucleic acid sequence that is necessary for the synthesis of a funtional gene product (Ribozyme or polypeptide) Transcription difference between prokaryotes and eukaryotes: Prokaryote genes are located in operons, such as the trp operon. These genes are also related because they are on the same biochemical pathway. o The products for a biosynthetic pathway are all transcribed at the same time o This is called a polycistronic message. o In operons, translation occurs at different points within gene, and results in several proteins. Starts off with one large transcript and results in multiple translational enzymes.
o o
Noncoding DNA would include introns and UTRs (untranslated regions), and intergenic regions, such as those between one coding region and another. Promotors are a region of DNA that initiates transcription of a particular gene. Transcription will NEVER start without promoters. o In prokaryotes, there are specific sequences recognized by the sigma factor of holoenzyme o In eukaryotic genes, they are very close to the start of transcription, and several DNA binding proteins will recognize the motif. Cis-regulatory modules (CRMs) are regions from (100-1000bp in length) bind and regulate the level of expression of nearby genes o Enhancers elevate the levels of transcription o Silencers suppress the level of transcription o Insulators make sure the cis-regulatory modules only affect the nearby gene, and can interact with proteins that are associated to enhancers and silencers. o Cis- regulatory modules are DNA that rests upstream of a transcriptional area and are SITES where regulatory proteins bind.. o Trans-regulatory modules are the genes that create translatable protein that interacts with the cis-regulatory modules. Usually located on different chromosomes or far away from its cis-regulatory interaction site. These genes also have a cis-regulatory module, which is in turn interacted upon by the protein of another trans-regulatory module. o You can have multiple CRMs infront of a gene to modulate its expression. Simple sequence repeats have two types: o Minisatellite DNA are an array of units that are very identical in sequences, around 14bp-100bp in length. The number of 14bp-100bp length repeats spans from 20-50. Tandem repeat units. Sums to around 1-5 kbp in length. Often found in centromeres and telomeres, rich in minisatellite DNA. o Microsatellite DNA are just smaller, with repeats being 1-4 bp in length. Ex. AAAA Can sum up to around 600bp in length.
They all move through the same central dogma: DNA transcription-> RNA REVERSE TRANSCRIPTION via reverse transcriptase-> DNA. Reverse transcriptase is actually a DNA polymerase, which can also use RNA as a template to produce DNA. o It can actually still use DNA as a template, but much less efficiently. o It follows all the rules of DNA polymerases (requires primers, goes 5 to 3)
To understand the function of reverse transcriptase, we will look at retroviruses. They use reverse transcriptase in their life cycle. Their genome is SINGLE STRANDED RNA molecule. Retroviral life cycles usually do not involve killing the infected host cells. Oncoviruses lead to the formation of cancer due to retroviruses that have ACQUIRED a host gene. o The virus puts the host gene under regulation of the retrovirus, RATHER than under regulation of the host. Retrovirus life cycle: o The virus is encapsulated in a lipid membrane envelope with various embedded proteins. Inside the envelope is a protenatious nucleo-capsid. There are two strands of SINGLE STRANDED RNA molecules, and two proteins of reverse transcriptase. o The virus first fuses to the host cell, which will release the capsid. o The reverse transcription activity through reverse transcriptase will convert the single stranded RNA into double stranded DNA. o The double stranded DNA molecule need to be INTEGRATED into the host genome. The DNA molecule can stay there for a relative long time. o The transcription of the genome can have will transcribe the viral genome back into RNA and then translates it back into the protein capsid.
How do you go from this single stranded RNA transcript to creating a double stranded part that contains the LTR? 1. The primary transcript of the retrovirus RNA genome first needs a PRIMER (as reverse transcriptase needs a primer). (5 end R|U5|PBS|coding region|U3|R|POLY-A 3 end) a. The primer is provided by the HOST by a tRNA molecule (not produced by the virus). b. The primer binds RIGHT at the primer binding site (PBS) outside the U5 region of the 5 end, closer towards the 3 end. c. Now we have a free 3 end of the primer. 2. The DNA polymerase, reverse transcriptase, will continue coding (DNA bases) from the free 3 of the primer towards 3, which ends at R, which is the END of the RNA transcript (the 5 border of the original RNA transcript, or the 3 end of the newly created DNA). a. Free 3 end, but no more complementary RNA transcript. 3. Now, there is a SPECIFIC RNA degradation enzyme that degrades only the RNA during an RNADNA complex called RNA-ase H, which is viral encoded RNA-ase gene. a. It will degrade the RNA 5 end (U5 and R) attached to the extended DNA 3 end. b. This makes the 5 of the RNA transcript missing, does not allow further DNA polymerization. 4. The nucleic acid held by the hydrogen bonds at the tRNA primer will be released (at a low frequency) and DNA complementary to the R section will bind to the RNA R section at the 3 end of the RNA transcript. 5. Now that there is a free DNA 3 end and a corresponding template, reverse transcriptase will then continue on coding from the complementary RNA 3 R section to the RNA 5 end of the RNA transcript. 6. RNA-ase H once again degrades the RNA sections connected to the DNA, EXCEPT for one region, connected adjacently to the U3 border on the DNA at the 5 end, that is poly-purine rich! a. The degradation slowly occurs, but slowly enough that the RNA will last for enough time to act as a PRIMER for DNA polymerase, now reading the newly formed DNA
Lecture 11
o o o
This will lead to the extension of the nicked DNA, via ORF 2s reverse transcriptase activity, while reading complementary to the LINE. The ORF2 LINE transcript is still associated with the nicked end so that it doesnt just float off into space. After the LINE has been DNA copied, the LINE DNA will move back into alignment and the LINE RNA will be inserted on the other strand, creating a DNA-RNA hybrid There is a break in the sugar phosphate backbone, because ORF 2 is associated in both the STAGGERED END and the LINE transcript, it reads OVER the break in the backbone end and continues to polymerize.
o o o o
SINES have NO protein coding capacity, differ from LINES. Basically derived cellular non-coding RNA genes (mRNA, tRNA), specifically the human 7SL RNA gene. Basically a zombie gene, because it was previously inactivated and no longer used, but it became capable of movement, but does nothing for the host. Other types of non-coding RNA genes can ALSO be mobilized. In plants, tRNA can be mobilized. Ranging to around 100-400 bp in length. In humans, they make up around 1.6 million copies, but only make up around 13% of the total nucleotide sequence. We do not understand how SINES move, but they seem to pirate the same enzymes that LINEs use: ORF1 and ORF2. THEY cannot move unless there is an active LINE somewhere around.
Processed Pseudogenes are non-functional, decaying genes, but they are processed. They are mobilized protein-coding genes. They originate from a reverse transcription of mRNA and are extremely rare and definitely NOT as common as LINES, SINES. For pseudogenes, they arise from taking mature transcripts of RNA and then reverse transcribing them. o Because of using mRNA as a template, they LACK introns and control regions! o Therefore, because there are no control regions nor promoters, they are DEAD in the water.
DNA transposons are excised, so they are GONE from the original location that they move from. HOWEVER, with retrotransposons, they NEVER leave the original place of insertion. They are simply transcribed into an RNA intermediate and then integrated into another place in DNA Therefore, as long as there are many RNA intermediates, we will get a burst of retrotransposon activity. Reverse transcriptase is VERY error prone and has a lot of mutations. This allows retroviruses to become very variability and escape host immune suppression mechanisms.
In terms of genome content contribution: Yeasts genome is very small because it has VERY FEW transposable elements in its genome. Rice is one of the model genome for plants, has a rather high genome content with transposable elements: 30%. In humans, when the human genome project was finished, AT LEAST 50% of the genome was made up of transposable element. Maize have a VERY large genome has a VERY HIGH content of transposable elements 90%. o ALMOST ENTIRELY LTR-retrotransposons. Lily genome size is LARGER than a humans, which is because over 99% of the genome is due to LTR retrotransposons.
The last contributor to the non-coding DNA is spacer DNA. We have NO IDEA what it is. As computational techniques are getting better, we are getting to know regions of the spacer DNA. As transposable elements mutate and decompose, they become spacer DNA.
Transposons and Evolution: When considering evolution, many people only look at protein coding genes. There is a negative correlation between transposon mobility and fitness. NOT GOOD for natural selection.
The effects of transposons in evolution: The insertion of transposons can cause mutations! o The insertion of a DNA transposon in maize knocks out the purple color gene in maize. o Floral colors are also effects of insertion knockouts. o There are many cases with insertion knockouts were often SELECTED FOR (lack of purple color in maize) due to human interaction. o An element called P in drosophilia are often inserted into the S6 Kinase gene and creates a dramatic effect in morphological size in drosophilia.
Transposons can also move simultaneously when transposase recognizes ONE end of ONE transposon and the other end of ANOTHER transposon, and everything inbetween gets shuffled. o LINEs can also move exons around as the transcription while making the LINEs does not stop at a weak PolyA signal, but continues and also transcribes terminal exons that have polyA signals after them. This can then be reverse transcribed and integrated into gene 2, and the extra exon is added into the new gene. o Very important in creating novel polypeptides, if you knock out the protein coding domains, the polypeptides cannot work. Also an important role in cis-regulatory module mechanisms, which are very important with the regulation of genes in control regions. o Transposable elements also have coding capacities with transposase genes. They have varying coding capacities. They have their own promoters and cis-regulated modules. o CRMs are enhancers and silences. o When transposon elements insert themselves UPSTREAM of the gene, they will add a novel cis-regulatory module to the specific gene. o Usually when introduce insecticides, the insects can develop resistance.
o o o
Final exam will be December 14th at 9AM in the McGill Gym. Transposons and evolution: Transductions is the process through which retroviruses and LTR-retrotransposons can acquire genes from its host. o Retrovirus can acquire host genes, LTR-retrotransposons can also acquire genes from the host. o Applies only to LTR-retrotransposons and retroviruses. o The acquired gene does NOT have introns Transduplication is the acquisition of host DNA into DNA transposons. o Often intakes host GENES into the DNA transposon. o The acquire gene DOES have introns because there is no RNA intermediate. By acquiring these small gene fragments, LTR-transposons, retrovirus and DNA transposons are fishing around for fragments that may have a specific benefit to them. o Non-beneficial ones will degrade over time. o Beneficial ones will be maintained! o An example is the acquisition of the env-ORF, which is the transition of an LTRretransposon into a retrovirus.
Domesticated transposable elements were originally mobile transposons, but then become harvested and immobilized by the host and have a specific function for the host. Most of the time, this specific function is VERY necessary to the host.
Organellar DNA genome in mitochondria and chloroplasts: Involved in respiration and photosynthesis. These organelles were derived from free living bacteria, became endocytosed and become endosymbionts. They still look similar to prokaryoutes: circular DNA, genes lack introns (specific to eukaryote organisms) and gene products resemble prokaryotic RNAs and proteins. In eukaryote cells, there is only one nucleus containing cellular DNA, while there are many mitochondria and chloroplasts that contain multiple genomes. o There are multiple copies of mitochondrial genome within EACH mitochondria. The organelles were sustained in cells because they offered a biological advantage: oxidative phosphorylation and photosynthesis. By eating the green algae, the photosynthetic pigments in the algae reach the slugs epidermal cells and can photosynthesize. VERY similar to endocytosis. o Not passed on to the next generation. Over time, genes that were considered as endosymbiont in the mitochondria and chloroplasts have been transferred to the nucleus. o ALSO evidence of the reverse! Nuclear DNA can go into mitochondria and chloroplasts, as well exchange between the two organellar genome as well. The organellar genes are a small subset of the original gene that is necessary to keep the organelle functioning. o Therefore looking at the sequence of the mitochondrial genome, it is very small. Contains only 37 genes which are required for translation. Lost a LOT of genes in comparison to Ecoli. NO INTRONS. o The proteins coded by the mitochondrial DNA NEVER LEAVE the mitochondria. o There is also alterations to the standard genetic code, where stop codons in DNA are no longer stop codons in mitochondrial DNA. o Known as codon bias between mitochondrial and nuclear genome.
DNA Barcoding relies on exploiting the sequences in mitochondrial and chloroplast genomes. Unique identifier for each species on the planet In comparison to other molecules to uniquely identify a speices: o Proteins and polysaccharids are VERY hard and expensive to sequence. o RNA is very unstable to last as a barcode. DNA is extremely stable and there is PCR to amplify and get access to the information of the genome. o There exists a barcode sequence in a gene that acts as a unique identifier. It is flanked by regions that act is primers for PCR. Some considerations for choosing the right DNA sequence for barcoding (does not have to a specific gene, but rather just FIT these specific criteria): o However, we cannot use any sequence because there are too many similar sequences between species. o Sequence differences (divergences) have to be high enough to be distinguishable, but low enough to not defect within species. o The sequence must also be able to be amplified via PCR and the region must be flanked by an ULTRA-CONSERVED region (for all species) for primer annealing Around 20 bp in length. o The length of the barcode sequence must not be variable: therefore the sequence must not have introns or transposons within them. The differences in nuclear DNA between human and chimpanzee is only 0.9% However, the differences between the mitochondrial DNA between human and chimpanzee is 9% o Better to use for its hypervarying property. o The mitochondrial DNA genes between two species are similar for ribosomal coding genes, but are different for protein coding genes. For most eukaryotic organisms, and plants, the barcoding use a section of DNA within the mitochondrial cytochrome c oxidase subunit 1 gene (CO1)
Advantages to DNA barcoding: Very fast and cheap because the PCR and sequencing is in the realm of $6, and easily accessible.
Disadvantages of DNA barcoding: It is potentially inaccurate when you isolate DNA from carnivorous insects or animals, where you might find DNA from the ingested animal as well. Costly to set up the initial data base.
DNA is first organised into chromatin, and then finally it exists as a chromosome. During interphase (longest cycle in the DNA life cycle), DNA exists as nucleoprotein complexes known as chromatin. Chromatin DNA isolated in an isotonic buffer has an equal proportions of DNA to protein. In a low salt solution, then the chromatin extends out and becomes more visually detailed. Called extended form of chromatin, or beads on a string. o The beads are nucleosomes (10 nm in diameter) Nucleosomes consist of histones and wrapped DNA. o There are 5 major types of histones: H1, H2A, H2B, H3 and H4. o Rich in positively charged + amino acids to interact with charged phosphate groups. o Since nucleosomes are used in all eukaryotic DNA, the H2A, H2B, H3 and H4 are HIGHLY conserved. H1 is slightly variable. o Nucleosomes consists of wrapped, DNA, ~147 bps wrapped almost two full turns around the surface of the protein. Protein cores are an octamer of histones, with 2 copies of H2A, H2B, H3 and H4. The N-terminal tails of these histones polypeptides stick outside of the nucleosome complex. For H2A and H2B, there are also C-terminal tails extending out. o Nucleosomes are attached to one another via linker DNA, around 10-90 bp in length. In a high, physiological salt concentration, the chromatins exist in a condensed, fiber-like form, and is referred to as a 30-nm chromatin fiber. o Beads on a string are compacted by stacking every other nucleosome ontop of each other, forming a zig-zag ribbon compaction and then a two-start helix. o The diameter of the helix is restrained by the length of the linker DNA. o H1 is not a part of the protein core, but rather plays a role in stabilizing the 30-nm chromatin fiber. Associates with the helix and probably the linker DNA. Note that this level of chromatin compaction, into the two-start helix via the nucleosomes is only occurring during interphase. It is also a dynamic event.
o o
An experiment that shows the importance of these three functional elements: Take a plasmid that ONLY has the gene that allows for the synthesis of leucine. Introduce this plasmid into a mutant yeast cell that CANNOT synthesis leucine, a Leu cell. Even though the introduced plasmid has a functional leucine, none of the yeast cells have the ability to create leucine. o We need an origin of replication or else leucine will not be created. In yeast, the origin replication is called ARS: autonomous replication sequence. Even when the plasmid containing the LEU and the ARS are introduced, very FEW of the yeast cells take in the gene. o This is because of mitotic segregation of the chromosomes, and since the plasmid does not segregate properly without a centromere and does not get incorporated into the cell. o Yeast cells are not prokaryotic. When we introduce a centromere into the plasmid and yeast cell, then the plasmid becomes carried over and has a ARS and a Leu gene. However, in eukaryote cells, the DNA is not circular but linear. If we cut the plasmid and make it linear, then none of it is uptaken because the linear molecule is becoming degraded after each DNA replication cycle. o Any degrading mechanisms in the cell will be degrading both the LEU and the ARS. Loss of either of them will cause in complete functional loss. After adding yeast telomeres to the linear plasmid, while keeping the centromere, and the ARS, then the transfer occurs fine.
-Omics era: Functional genomics is the function of EVERY gene in the genome. Proteomics is the study of all the gene products, all the proteins produced. Referred to as the proteome. Evolutionary genmoics is the study of genomes to understand its underlying influence on the evolution of organisms. Transcriptomics is trying to understand the sequences and nature of all the transcripts of the organism. Phenomics is looking at the complete phenotypes of the organism. Spliceomics is looking at the splice sites on the genome. All the omics have a common sense of looking at the OVERALL view.
Since the first genomic sequence in 1984 of the Epstein-Barr virus, advancements of DNA sequencing technology allowed for thousands of other organism sequencing to occur. In 2005, only 186 microbial DNAs have ben sequenced. In 2012, a coupe days ago, already 2472 genomes have been sequenced. The influenza virus was one of the first things to be sequenced, very small genome. However, they also tried very hard to sequence the genomes of model organisms, such as yeast, C.elegans, drosophila, mice, zebrafish and plants. There are ~2300 completed or ongoing eukaryotic genomic projects. Human genome has been resequenced SEVERAL thousand times, because now it can be done in a week for 10k. Mosquitoes and rice were also sequenced very early on, one for malaria and the other for calorie provider. Around 2000. Many organisms who are associated with disease have been sequenced.
Dideoxy sequencing: Shotgun sequencing The length of sequencing for didioxy sequencing is around 400-500 nucleotides, VERY SHORT That means we have to break down a large geonome into SMALL sequences to be screend. Mechanical shearing breaks down a large genome into small bits, with different lengths. We will first have one patch of DNA that we know matches to the overall genome. Then we have to look for OVERLAPPING fragments to indicate that the fragment comes after the previously associated patch. o Tiling paths of overlapping fragments are called contiguous sequences, of contigs.
Next generation sequencing was created to associate specific mutative genomic changes that cause disease. To sequence genome very quickly at a very low cost. Refers to the Illumina Solexa sequencing, and the Roche 454 pyrosequencing. We have to have high throughput to allow sequencing to happen very quickly. Massively parallel means that hundreds of sequences will be deciphered AT THE SAME TIME. Microfluidics are new technologies that allow the moving around of very very small amounts of liquid. Fixed synthesis refers to fixation of the template in ONE coordinate, or sometimes the DNA polymerase, which is required for the function of a detector. Read length is the length limit that each segment can be to be read. In dideoxy reading, it is only 400-500 bp, but in new technologies, much longer. General steps for next generation sequencing: 1. DNA isolation 2. Fragmentation like mechanical shearing 3. Producing a library, by getting the templates that are about to be sequenced in a fixed coordinate 4. Amplification the template at the specific coordinate, done through a protocol very similar to PCR. 5. Sequencing chemistry, which are different 6. Assembly of the data, which can be up to gigabases in length
Illumina sequencing: Sequencing: o DNA replication still occurs as per usual, needs a template and a primer with a free 3 end. o ddNTPs are present in solution, but are instead reversible terminators, in which the obstacle can be removed and polymerization can continue. The blocking agents are fluorescently labeled dyes. Once attached and terminated the polymerization, the rest of the dyed ddNTPs are removed. Detector can sense that there is a dye corresponding to the base at the terminus site. After detection the dye is removed and the 3 end is re-exposed. Fluorescently colored ddNTPs are back and cycle continues for each bp. After fragmenting, the ends of the DNA need to be fixed. They also add a tail at the end of the fix similar to a poly A tail. We will add an adapter to either terminal end of the fragment, similar to the oligonucleotides that are customly made. To immobilize the DNA on a matrix, it is done via cluster generation. o The various fragments will be immobilized on a tray as a matrix. o Holes, called flow cells, which are independently used for sequencing. o Flow cells are just a lawn of oligonucleotides, they are complementary to the adapter that is added. Following a denaturation step in which one strand is always removed, a process similar to PCR will be conducted. o Since the oligonucleotide lawn is fixed, the two adapters added to either ends will form a loop . o All the clusters of loops will have the same template single strand.
454 Sequencing: We need basic ingredients: template, primer, dNTPs, DNA polymerase, ATP sulfurylase, adenosine 5 phosphosulfate, luciferase, luciferin and apyrase. DNA polymerase occurs with normal DNA and dNTP, and forms a and releases a pyrophosphate. Then, the PP+APS is used by ATP sulfurylase as the enzyme to make sulfate and ATP. Luciferin+ATP, in conjunction with luciferase as the enzyme, produces oxyluciferin with light. Requires apyrase to degrades dNTPs into dNDP, dNMP and phosphate, and ATP into ADP.
454 fragmentation and library prep is similar to that of Illumina: Fragmentation and adapter DNA is added to either ends. Instead of being immobilized in a flow cell, they are immobilized on a bead, and then amplification occurs through PCR. The bead itself is immobilized in the well on a plate, and then detection can be easily conducted by the detector now that it is fixed.
Pros/Cons Read lengths (bases) Dideoxy Sequencing Pyro Illumina 1000 500 75 Runtime (days per gigabase) 500 2 0.5 Cost ($ per 1000 bases) $0.10 $0.02 $0.001
Dideoxy sequencing have HUGE read lengths, but the runtime is much larger and the cost per 1000 bases is higher. Pyro has a relatively long read length and has a runtime of 2 days and costs 2 cents per 1000 bases Illumina, which has a read length of 75, very small, a runtime of 0.5 days and the cost DROPS SIGNIFICANTLY. o Because the fragments are smaller, it means that the computing assemblies that piece the puzzles back together must be very good. o Larger fragment are easier to reassemble. Typical genoming projects us all of these sequencing.
As these DNA technologies are so popular, there are BETTER technologies these days: 3rd generation sequencing: Pacific Biosciences or PacBio: o Uses a single DNA molecule in each well (rather than amplification in each well). Therefore, the sensor MUST be very sensitive! o It is the DNA polymerase being immobilized rather than just the template. o Read lengths of 10,000 bases. Life Technologys Ion Torrent, or Ion. o Sequences on a semiconductor chip.
Bioinformatics: Assembly and quality assessment o Quality assessment is an important part of the assembly procedure so there is no incorrect input. Sequence analysis o Find genes Genome mining and database organization and management. o How to manage all the DNA data to facilitate 10s of thousands of terabytes. Phylogenetic inference o Recontrusction of a family tree Pattern recognition and image analysis. o Utilizing mutants to figure out what happens to specific genes, globally. To look at the global phenotype of the organism. o Using it for facial anaylsis Gene and regulatory networks: o How th enzymes react with each other in each reaction pathway o How do gene products interact with each other? o How the enzymes created from one part of the gene act on another part of the gne.e Modeling of complex biological and molecular processes.
Annotation was originally why bioinformatic get started. Where are each type of gene located: Exons or OPEN READING FRAME Poloyadenylation Where the transposable elements are located. Know something about non-coding DNA.
Programs for Sequence Analysis: There are two flavors of BLAST: Basic Local Alignment Search Tool. BLASTN looks at nucleotide sequence similarities. BLASTP looks at protein seuqnece similarities. We start off with a query sequence, which the program will probe a database for. o BLASTN looks for a perfect match with a certain sized region of your sequence. o Looks for local similar regions to extend the match o Extends the similarities by allowing for gaps in alignment.
Fringe Genomics: How can genomics correct a wrong, or bring back something that is distinct. Story of the Tasmanian tiger, Thylacine, which is a marsupial. Back in 1936, the settlers in Tasmania put a bounty on the Tasmanian tiger, which quickly drove the animal to extinction. They tried to have a Tasmanian tiger project, where they take the preserved DNA of a Tasmanian tiger and inject it into an embryo of a numbat, the closest relative of the Tasmanian tiger.