BIOL 200 Molecular Biology Lecture Notes

Molecular Biology BIOL 200 Notes U1
Lecture 2: Nucleotides and Amino Acids

Typical primary structure of a polypeptide is made up of amino acid building blocks, and an amino and carboxylic end. Nucleotides are the building blocks of nucleic acids. Term coined by Miescher, when he isolated a compound from nuclei that turned out to be acidic, thus nucleic acids.
RNA stands for ribonucleic acid. In comparison to DNA, it seems smaller. RNA only consist from 100 to 1000 nucleotides long.
DNA stands for deoxyribonucleic acid. Can be millions of nucleotides long.
Nucleotide structures: Contains an attached phosphate group, that gives it its acidic character and under intracellular conditions is negatively charged. o Covalently linked via a phosphor-ester bond to a pentose sugar: ribose. Ribose differs between RNA and DNA. In RNA at the 2 carbon position, there is an OH group. In DNA at the 2 carbon position there are only H. Pentose sugar molecule is typically linked to a basic ring structure that specifies the type of nucleotide monomer it is. o Purines have fused ring structures and composed of adenine (A) and guanine (G) Guanine has a characterized carbonyl group. o Pyrimidines are single ring structures and composed of thymine (T) and cytosine (C). Uracil instead of thymine in RNA. Uracil and thymine differ by a methyl group attachment at the double bond of the ring. Cytosine has an amino functional group attached to the ring. Nucleosides is the SUGAR and BASE without the phosphate group. Adding phosphate groups will mean were talking about nucleotides. o Ex. the nucleoside monophosphate is a nucleotide. o We can add multiple phosphate groups onto a nucleoside to make mono-, di-, tri-. Ex. Adenosine tri-phosphate.
Amino acids are the building blocks of polypeptides. The shape of a protein is very important. A proteins shape is also very dynamic. Mutations that occur in the gene that might change an amino acid in the polypeptide can lead to a dramatic change in the polypeptide. 20 standard types of amino acids.

Non standard amino acids also exist. These amino acids properties are determined by the varying R groups. o They vary in size, shape, charge, hydrophibicity and reactivity. For each amino acid, there are 2 stereoisomers have very different reactivity in the body. o In most polypeptides in living animals, you will only find L isomers. Structure consists of an amino group, a carboxyl group, a hydrogen and an R group side chain. People take advantage of naturally made L isomers by manufacturing R isomers, which cannot be degraded by a bodys natural mechanisms as it is not accustomed to it. Hydrophilic amino acids include: o Lysine, arginine and histidine. o They are polar and water soluble, and carry a charge. Hydrophobic amino acids are non-polar and insoluble in water. Special amino acids include: o Cysteine, which has a sulfhydryl group, which can form disulfide bonds that help stabilize polypeptides when they fold. o Glycine only has an H as an R group. It is extremely stable and compact. In reference to the shape of polypeptides, there will be sharp bends, which will be rich in compact amino acids like glycine. o Proline forms a ring structure with the R group and the amino group which is rigid and compact.
Lecture 3: DNA, RNA, Protein Structure:

DNA, RNA and proteins are polymers that are made from the elongation of monomers: nucleotides and amino acids. Nucleic acids: Oligonucleotide is a very short nucleic acid. Contains a very prominent sugar phosphate backbone connected via phosphor-diester bonds that link phosphate and pentose units. Nucleic acids have a bias where the 5 end has a free phosphate group on the 5 carbon of the pentose sugar. The 3 end has a free hydroxyl group coming of f the 3 carbon of the pentose sugar. Since the synthesis of DNA and RNA proceeds in the 5 to 3 direction, nucleic acids are usually written by convention from 5 to 3 o 5-CAG-3 o With no clear indication, the amino acid is always given from 5 to 3. In 1962, Watson and Crick won a Nobel prize for the determination of the double helix structure of DNA. Would not have been possible without the high resolution X-Ray crystallography done by Rosalind Franklin The double helix can be represented by a flat version o Where the sugar-phosphate backbone resides on the OUTSIDE of the double helix.

Very regular and repetitive, contains NO information. o The information contained in the nucleic acid is coded by the base pairs on the inside of the helix and form parallel planes. o Orientation of the end bias are antiparallel, as in the 3 and the 5 will be at the same end. o Bases pairs form Watson-Crick base pairs, where the bases bond specifically via hydrogen bonds: . This suggests base complimentarity. A purine (double ring) MUST bond with a pyrmadine (single ring) When viewing the double helix in a 3-D space filling model: o There is a major groove and a minor groove, where proteins interact with base pairs of the nucleic acids. o There are 3 conformations of the double helix, which depend on the DNA context and the nucleus environment. B-DNA: a right handed, elongated, horizontal base pair orientation helix. most DNA in the body are in this conformation. A-DNA: a right handed, squished, wider, slanted base pair orientation helix. Less base pairs per turn. Originally determined in vitro, in a dehydrating environment Occur in vivo as well, and occur in RNA-DNA and RNA-RNA helices as well. Z-DNA: a left-handed helix, thinner and elongated, and horizontal base pair orientation helix. Transiently formed shortly after transcription, therefore a tag for actively transcribed genes. Also occurs of DNA molecules consisting of alternating purine and pyrimidine on the same side. o When DNA binding proteins interact with DNA, a TATA box-binding protein will force a BEND in the double stranded helix. Therefore the DNA helix is NOT a rigid rod, but a very dynamic molecule. In vitro, we can take double stranded DNA, denature it via heat, and separate the two strands into single strands. We can also cool the DNA down and cause renaturation and the reformation of the DNA double strand faithfully. o By exploiting the fact that double stranded DNA and single stranded DNA have different absorption of 260nm of light, we can graph the separation of the two strands. o The temperature point at which 50% of double stranded DNA becomes single strand DNA is called . is a function to the G-C base pairs because they have 3 hydrogen bonds, thus more energy to break GC bonds than AT. o Note that the single stranded DNA absorption is sloped; proportional with temperature. RNA can also have 3-D single-stranded structures by folding onto itself. o This localized folding can result in base pair formation.

o o The stem-loop and hairpin secondary structures have a double-helical stem region where the base pairs do infact form bonds. Tertiary structures include the pseudonot, where stems attach to loops, and other extremely elongated RNA molecules, 1000bp in length, can form tertiary structures similar to proteins.
Proteins functions are derived from 3-D structures, which are specified by the amino acid sequence. When we talk about 3-D structure, we talk about the conformation of the protein. Hierarchy application of protein structure: primary->secondary (local folding)->tertiary>quaternary->supramolecular (interacting with other proteins). Some proteins can function on their own, while others require the combination of others as well to function. Primary structure is the linear sequence of amino acids that make up a polypeptide. o The polymer ALSO has a bias, like DNA, an amino N-terminus and a carboxyl C-terminus. o Amino acids are bound to each other via peptide bonds. o Residue refers to amino acids in a polypeptide, ex the # of amino acids. Secondary structure refers to interactions that occur at local levels, which are stabilized by noncovalent interactions, such as hydrogen bonds. o Alpha helices A right-handed helix like DNA. Hydrogen bond forms between the O of the carboxylic group and the H between every 4th amino acid. The R groups are pointed outwards! o Beta sheets Beta sheets are made up of laterally packed beta strands. Beta strands are stabilized by hydrogen bonds BETWEEN each beta strand. Beta strands are short, around 5-8 residues in length. Beta strand can be oriented in an antiparralel or a parallel lateral arrangement The R groups are projected perpendicular to the plane of the sheet. o Beta turns To maintain the shape of the polypeptide, you will have sharp Beta turns, which are also sustained by hydrogen bonds. Glycine is a very compact because of the H R group. Proline has a ringed R group structure that forces a bend in the ring structure. Therefore, beta turns are rich in glycine and proline. o No noncovalent interactions will result in a random coil Tertiary structures are caused by long distance interactions within a polypeptide chain. The end structure for most proteins. o Stabilized by hydrophobic interactions between nonpolar side chains, hydrogen bonds between polar side chains and disulfide bonds between cysteine residues. o Included in tertiary structures are compacted secondary structures.

Quaterary structures are exhibited in some proteins in which they interact with nearly identical copies of itself. o An example would be a potassium ion channel, where 4 nearly identical polypeptides associate with each other to form the channel. Supramolecular structures, which are macromolecular assemblies and are over in 1 megadalton in size and can be 100s of polypeptide chains to make one functional unit. o An example is the agglomeration of many general transcription factors to make the transcription preinitiation complex. Motifs and domains are referring to the repeating particular unit in a polypeptide that has a particular activity. When these motifs and domains are present in proteins, it often dictates the function of the protein. o Coiled-coil motif, which is rich in hydrophobic residues which are trying to get away from water, where the R groups are facing inside, bound in a two strand motif. They are used in various fibrous proteins such as collagen, caretin. o EFhand motif is often referred to as helix-loop-helix motif, which means there is a helix, a loop and a helix structure. Bound to the loop is a calcium ion and thus the proteins with these motifs are referred to as calcium binding proteins. Includes many transcription factors. o Zinc-finger motif, which has an alpha helix and 2 beta strands, which allows polypeptides to interact with macromolecules such as RNA and DNA. This often means that proteins with this motif acts with DNA and RNA. o Pyruvate kinase has three different types of domains, each with a specific function. o Motifs and domains can be conserved throughout evolution, in which they reappear in different functional proteins.
Lecture 4: Transcription and Translation:

Central dogma of protein creation in PROKARYOTIC models: DNA-transcription->RNA-translation->Proteins. The fidelity of transcription and translation has to be extremely high as any mistakes in transcription or translation will both result in changes in primary structures of the protein.
Transcription is the creation of mRNA from a DNA template Ingredients needed for transcription in prokaryotes: o DNA template that serves as a blueprint. o Ribonucleotides to serve as monomers for the mRNA. o RNA polymerase as an enzyme to catalyze the synthesis of RNA. A gene that is going to be transcribed will have several features: o The coding gene is downstream from the start of transcription, which occurs at +1. Going downstream into the coding gene, the gene number will be increasing.

Going upstream from the start of transcription, 5 of the coding sequence of the gene will decrease the number. o The promoter is upstream (-) of the start of transcription and is required for the recognition of the RNA polymerase to start transcription. o The upper strand is called the nontemplate strand, (5 to 3), and the bottom stream is called the template strand (3 to 5) The sense strand is the nontemplate strand, because it is very similar to the mRNA strand, except where there is a T, there is an U. The anti-sense strand is the template strand. RNA polymerase consists of several subunits: o Found as a core enzyme, which consists of 2 units and a which are used for catalysis. While the factor grabs DNA onto the template strand. o There is also an factor that is associated with the stability of the core enzyme. o The addition of the factor will results in the functional enzyme called holoenzyme. Would have . The factor is responsible for identifying where the promoter sequence is located. o The holoenzyme scans the DNA sequence and looks for a promoter. Once factor finds the promoter, the polymerase will denature the DNA, kicks out the factor, and initiates transcription. o factors recognize sequence motifs as promoter regions. In E.coli this promoter sequence is conserved in all genes, and can vary in strength, which is consequently rate of transcription. Strong promoters are absolutely identifiable by the factor, while weak is only sometimes. Crudely regulates the rate of transcription.
Transcription process: 1. RNA polymerase needs to find and park over the promoter region. a. When parked over the promoter region, the start of transcription can be found by the polymerase. 2. The polymerase locally denatures the DNA near the transcription start site, forming a transcription bubble, less than 15 nucleotides in length, pushing down the non-template strand. a. Polymerase will catalyze the phosphodiester linkage between the first two rNTPs. b. Note that in transcription, there is no primer required, as opposed to DNA replication. 3. Elongation of the RNA by reading from the 3 to the 5 of the DNA, moving down the template strand, increasing the size of the bubble and adding more rNTPs. a. During elongation, we transiently produce a hybrid between DNA and RNA, possibly forming an A-DNA complex. 4. Incoming rNTP, can be A,G,C, or U, will come to the free 3 end (growing 5 to 3), react with the 3 OH group and the alpha phosphate group of the tri-phosphate and forms a phospho-diester bond.

Translation is the formation of amino acid polymers from reading the mRNA. mRNA is produced from the gene from the process of transcription. o Holds the information for the sequence of amino acids. tRNA which are important in interpreting the mRNA and eventually adding the protein sequence together. o Cloverleaf secondary structure, but does not have a tertiary structure. o Rather short transcript, but can be processed posttranscriptionally. Cause the addition of ribothymidine , which modified transcript monomers. o The D loop is located on the left. o There is an acceptor stem, which is 7 nucleotides long 5-CCA-3 is added onto the free 3 end posttranscriptionally. o The anticodon loop contains the complimentary codon that base pairs with the mRNA codon (therefore, similar to the specific codons of the template DNA strand) ONLY in the first position of the anti-codon (3rd base of the mRNA) are wobble base pairs, which is flexibility of adding non-standard base pairs to make weak Watson-Crick base pairs. If you have an anticodon 3-GAC-5, you can base pair with a codon 5-GUC-3 or 5-GUU-3. o The coupling of the tRNA molecules to their associated amino acid is done by the enzyme aminoacyl-tRNA synthetase, which specific for EACH amino acid! Reaction will occur between the OH group of the tRNA and the carboxyl group of the amino acid to create a high-energy ester bond, which is an ATP reaction. o There are redundancies in the codon to tRNA associations: AUG is the start codon that codes for Met. There are 6 codons that code for Ser. There are 3 codons for STOP o In prokaryotes, there are fewer than 61 tRNAs (-3 for stop codons) because we allow for wobble base pairs at the 1st anti-codon position to allow for one tRNA to be associated to multiple codons. Inosine is the posttranscriptional modification of deamination of adenosine. rRNA, ribosomal RNAs are associated with ribosomes, which is important in the catalytic and structural function of ribosomes. o In prokaryotic 70S ribosomes, they are made of 2 subunits: 30S + 50S. Both subunits are made up of rNTs (rRNA) and proteins. There are tRNA and rRNA DNA genes that are associated with transcribing them, but are NEVER translated themselves.
Translation, prokaryotic protein synthesis can be divided into three steps: 1. Initiation a. Starts off with the 30S subunit, where 3 initiation factors (IFs) are loaded on. IF1, IF2 bound to GTP, and IF3. When bounded, it forms the preinitiation complex.

b. With the activity of IF1 and IF3, the 30S initiation complex is loaded onto the mRNA at the initiation codon, AUG, as well as the aminoacyl-tRNA for Met. This entire figure is the 30S RNA initiation complex. i. However, it can only bind to the specific AUG that has a Shine-Dalgarno sequence upstream of it, similar to RNA polymerization and the function of the sigma factor. This is specific for prokaryotes! c. After the RNA initiation complex has been made, IF1 and IF2-GTP facilitate the loading of the 50S subunit forming the 70S initiation complex i. Contains a P-site, where the Met amino acid tRNA was initially parked. ii. Contains an A-site, where incoming new tRNA come in. iii. Contains an E-site, where the contributed tRNAs will be ejected. 2. Elongation a. Elongation factors (EFs) are required for the stepwise addition of amino acids. b. It is a ribozyme, the 23S rRNA that carries over the peptidyltransferase reaction. i. RNA molecule with enzymatic activity. c. Peptide bond forms between the two amino acids present in the P and A site. d. A motion will transfer the A site tRNA into the P site, and kick out the P site tRNA via E site. 3. Termination: a. Coming across the stop codon will not allow a tRNA to fit into the A site, but rather attracts a series of release factors (RF) that mediates the termination of protein synthesis. i. Two varieties, RF1 and RF2. ii. They mimic amino acid tRNAs, but they themselves are not! b. Once the RF is placed in the A-site, it will be recognized by RF3 and clips the growing polypeptide chain from the last amino acid tRNA.
Lecture 5: DNA Replication

As DNA is a duplex, we can use one strand as a template for the newly replicated one, and they will be complementary to the parental strand. Referred to as a semiconservative mechanism
Nucleotide polymerization ingredients: DNA polymerase catalyzes the polymerization of DNA. Deoxynucleoside 5-triphosphates (A,T,C,G) dNTP monomers. o Incoming dNTPs are covalently linked to the previously 3 OH at the phosphate group and liberates the pyrophosphate molecule. Requires a PRIMER, which can be either DNA or RNA, as opposed to transcription. o The dNTPs are added at the free 3 hydroxyl group on the terminal pentose sugar. Similar to transcription, the polymerization direction is 5 to 3 while it reads the template from 3 to 5.

Process of nucleotide polymerization in prokaryotes: The primer that is required for the polymerase is actually made from RNA, but at the end of DNA duplication, there is NO RNA. o Primer is synthesized by a specialized RNA polymerase called primase (not used for RNA polymerization). DNA replication starts at the origin of replication, in prokaryotic organisms, there usually is only 1, but for eukaryotes, there are MANY to ensure replication occurs in a timely fashion. We need a replication bubble, which is formed by the DNA helicase, which unzips the duplex o At each replication bubble, there are two replication forks. Replication forks, or growing forks, are where helicase is unwinding the double stranded DNA into single stranded. o As helicase is unzipping, the DNA polymerase is synthesizing DNA behind it on the leading strand (where the fork is going in the 5 direction). DNA polymerase is parked RIGHT at the fork. o Topoisomerase relieves the supercoils infront of the duplex DNA as it is unwound as the ends would normally have to spin as it is unwound. Leading strand VS Lagging strand synthesis because the two strands of DNA is antiparallel. o DNA polymerase only reads from 3 to 5, so in the leading strand, the DNA polymerase can keep reading as the fork unzips towards the 5 direction, and polymerizes TOWARDS the replication fork. Polymerizes in a continuous manner. o HOWEVER in the lagging strand, as the DNA is unzipping towards the 3, primers have to be placed at the replication fork, and polymerizes AWAY from the replication fork. Replication occurs in a discontinuous fashion, producing, from the RNA primer at the 5 end to the DNA up to the next primer at the 3 end, Okazaki fragments. The size of Okazaki fragments in prokaryotes are around 1000 bp while in eukaryotes around 100-200 bp. To ensure the lagging strand synthesizes as quickly as the leading strand, specialized RNA polymerase that makes the primers is also parked at the replication fork. Primase is laying down RNA primers at certain intervals, in prokaryotes, around every 1000 NT in template, in eukaryote, it ranges from 100 to 200. Once the primers have been laid down, DNA polymerization will occur until it hits the next primer, and cannot make the final phosphor-diester linkage to the RNA. The RNA primer is removed, DNA synthesis occurs to replace the old position of the RNA primer. The final phosphor-diester linkage between the newly added dNTP and the already synthesized DNA strand need to be made by DNA ligase.
Molecular machinery involved in eukaryotic DNA replication:

Most of what we know about DNA replication in prokaryotes comes from the infection of the virus SV40, because it multiplicate VERY quickly by pirating replication mechanisms of the eukaryotic host cell. o The rate limiting factor of the reaction is the DNA helicase, so the virus encodes a viral DNA helicase of its own. o It encodes a large T-antigen, which is a helicase hexamer. o Ideal model to study DNA replication because the replication happens REALLY fast. o The unwinding of the DNA will result in two single-stranded DNAs, which tends to collapse back onto itself to form double strands. This is prevented by the Replication Protein A (RPA), which binds to single stranded DNA and keeps it in optimal conformation for DNA polymerase. RPA have two sheets that wrap around the DNA and keep it extended. RPA is stripped off as DNA polymerization occurs. o The DNA polymerase complex is made up of: DNA polymerase Replication factor C (Rfc) Very important to the loading and unloading of the complex. Proliferating cell nuclear antigen (PCNA) Acts as a clamp for the complex to make sure the complex stays on the template. PCNA is a homotrimeric protein, which is a three identical subunit protein, with a pore in the middle for DNA. o The lagging strand requires primers formed at intervals, which is done by the Primase/Pol complex, which is tightly associated to each other and called a primosome. (IN EUKARYOTES) The primase creates the RNA component of the primer and DNA Polymerase extends the primer with DNA. The nucleic acid extension of DNA polymerase is used as a primer for DNA polymerase , to add on the Okazaki fragment to the synthesized DNA that has already been extended. The complex of Pol , Rfc, and PCNA, the same from the leading strand, will replace the primasome complex and finish the Okazaki fragment. Therefore, Okazaki fragments in eukaryotes consist of RNA from the primase, a bit of DNA from the Pol , and then DNA from the polymerase complex. Ribonuclease H and FEN-1 removes the RNA components at the 5 end of the Okazaki fragment Pol replaces the single stranded region with DNA, and then DNA ligase puts it back together.
DNA replication proceeds bidirectionally:

This is due to two replication forks being formed from one replication bubble when you have two DNA helicase unwinding in both directions. The origin of replication, which is A=T rich, is denatured by a mechanism to form a small bubble, BEFORE the helicases are loaded on. o Denatured by the Origin Recognition Complex (ORC), which is a six subunit protein that binds to A=T rich origin regions and facilitates the loading of helicases. (NOT T-antigen, from the S40 virus, but MCM) o Eukaryotes have MULTPLE replication origins. At the beginning of DNA polymerization, on the lagging strand at both sides (which become the leading strand of the other side, such that the primer is used for leading strand synthesis), you need 2 primer, which is laid down the primasome complex. With the primer, we can load on the polymerase complex (Po , Rfc, PCNA) onto the free 3 end and code down the leading strand. However, now that the helicase is unwound further on both sides, the leading strand primers are now the primers on the lagging strand of the closer side. o Two more primers are needed to be created by primasome for the 2 lagging strands. Note that lagging strand DNA synthesis and leading strand DNA synthesis are OPPOSITE to each other, where one strand has leading and lagging. 4 DNA polymerase complexes (DNA pol , Rcf, PCNA) are required, for each strand (x2): one for the extension of the leading strand and one for extension of Okazaki fragments. o 2 at each replication fork (on the leading strands)
RNA primers have once again need to be removed by Ribonuclease H and FEN-1, replaced by polymerase complexes and sealed by DNA ligase.
Lecture 6: DNA Mutations and Repair

Following the central dogma of protein synthesis, if DNA undergoes mutations, it will be transferred to the RNA and the proteins. Mutations are defined as permanent, transmissible changes to the genetic material of the cell. Can also be defined as somatic mutations or germline mutations, latter of which can be passed on through generations. Mutations can happen spontaneously, but also caused by mutagens, such as chemical compounds, UV radiation or ionizing radiation that increases the frequency of mutations. Carcinogens are agents that cancer, and all carcinogens are mutagens.
DNA repair mechanisms counteract with mutations:

Diseases can occur from mutations to DNA repair, which will cause mutations to accumulate progressively. o Xeroderma pigmentosium is an alteration in nucleotide excision repair, NER, and leads to sensitivity to UV radiation and skin carcinomas. Proofreading by DNA polymerase is one of the most important, and primary mechanisms. o Error rate of E.coli DNA polymerase is around 1/10,000, but the genome is SIGNIFICANTLY larger. o However, the actual, measurable error rate in the DNA is 1/1,000,000,000. 5 orders of magnitudes lower. o The DNA polymerase itself has proofreading mechanisms: Inside the molecule, the single stranded DNA enters through the palm region of the molecule, and at the fingers where the free 3 is being created, the enzyme will recognize any incorrectly matched pairs and shift the growing strand to growing polynucleotide into another part of the polypeptide that has an exonuclease 3 to 5 activity. Exonuclease is the trimming of the sugar phosphate backbone at the 3 end or the 5 end. Endonuclease cut the sugar phosphate backbone within a longer stand. At the exonuclease domain of the enzyme, it will trim off a few bases of the new elongating DNA and then shift it back to the elongating finger domain. o In eukaryotes: DNA polymerase DOES NOT have exonuclease proofreading activity, because it only makes a little bit of DNA at the end of the primer, so the amount of polymerization is very low. Its job is also to synthesize quickly, so it doesnt have time to go back and forth for mistakes. Base excision repair (BER) involves the replacement of 1 deaminated nucleotide, and also snips the sugar-phosphate backbone. o Important because of the spontaneous deamination (removal of amine group) of cytosine to uracil, or the deamination of methylated cytosine (which normally occurs to cytosine for gene regulation, transposon silencing and chromatin remodeling) to thymine. o This will cause a change from (C-G) to (T-A) and (C-G) after one replication if not corrected o We need to a specific enzyme that can recognize this situation of mismatching, but also to recognize which base pair was deaminated. These enzymes specifically know to remove only T or U, and not G. o SPECIFIC (for thymine/uracil) DNA glycosylase hydrolyzes the bond between the mismatched base pair and the sugar-phosphate backbone, so the base can be removed. o Apurinic/apyrimidinic endonuclease 1 (APE1) snips the DNA backbone with endonuclease activity. o AP lyase, part of DNA polymerase in eukaryotes, cuts the other side of the flap and allows for the removal of the sugar-phosphate backbone.

The gap that is left allows the incorporation of 1 base pair via DNA , and the connection of the backbone via DNA ligase. Mismatch Excision Repair (MER, MMR) will excise a large section of the mismatched strand after replication. o Know a lot about this in prokaryotes and just starting in eukaryotes, in which it differs. We will examine it in humans o When there is a persisting non-standard Watson-Crick base pair, mechanisms have to recognize which strand is the incorrect base by recognizing which is the template strand. Therefore, the daughter strand must have the misincorporation. o Two enzymes, MSH2 and MSH6 form a complex, which parks itself on the misincorporated pair, and incorporates MLH1 endonuclease and PMS2, which has endonuclease activity and cut the mismatched base pair out. o The daughter strand is then unwound and digested by DNA exonuclease on either side, leaving a large gap. o The large gap is then filled in by DNA (as opposed to DNA Pol in BER) and then the gap repair of the sugar-phosphate backbone is done by DNA ligase. Nucleotide excision repair will excise a large region of the distorted daughter strand due to chemical or radiation activity. o This repair only recognizes and repairs a large distortion of the DNA duplex: Chemical adducts, which can covalently link to bases in the duplex DNA. Aflatoxin B, which is in peanuts, will be covalently linked to G, which will cause an ostensible distortion in the duplex. UV irradiation can cause covalent linkages of adjacent thymine bases. o Thymine dimmers are initially recognized by XP-C (xerodermal pigmentosa) and 23B, which will activate XP-C. o Parks a larger complex consisting of TFIIH, which has DNA helicase activity, RPA, which prevents the recombination of single stranded DNA, and XP-G, an endonuclease, which displaces the original complex. o The addition of another enzyme XP-F will allow for the removal of a large section of the distorted strand through double snipping endonuclease. o DNA polymerase (sometimes another called ) and DNA ligase does the gap repair. Double-strand Break Repair by End-joining joins double strand breaks, but can introduce mutations. o Double strand or single stand breaks can occur through ionizing radiation or anticancer drugs like bleomycin. Can lead to loss of genetic material, or genomic rearrangements and translocations. o End joining is simply the reattachment of two broken ends, but does incorporate more mutations at the joint areas. The ends are staggered because exonucleases immediately recognize the ends and start degrading. o

A complex is formed by DNA dependent protein kinase (DNA-PK) and Ku80/Ku70 heterodimers, which associate with the ends of a double strand break. This complex, as well as other proteins, polish the staggered ends to become even. o The two polished ends are joined by a DNA ligase. o However, the nucleotides that were trimmed off by exonucleases, or by the polishing, are lost forever. o At low frequency, plasmid DNA might be ADDED into these double strand breaks before they are melded together. Double-stranded Break Repair by Homologous Recombination o After DNA duplication, you have a parental and a daughter duplex. If one suffers a double stranded break, then endonuclease chews from 5 to 3. o RecA (prokaryotes) or Rad51 (eukaryotes) will allow the base pairing of the NONexonucleased sections of the 5 to 3 strand with a free 3 end to invade the daughter strand and form standard base pairs. The crossover of the broken parent to the daughter is 1 nucleotide in space. Since it has a free 3 end and a template in the daughter strand, the 5 to 3 broken strand will enlongate and displacing the daughter 5 to 3 strand. Now, the displaced 5 to 3 daughter strand can form standard Watson-Crick base pairs with the 3 to 5 broken strand. The displaced daughter strand will act as a template for the 3 to 5 strand and the free 3 end will code. After reading from the displaced daughter strand template, and the invading template strand reading from the daughter strand, both ends would be able to connect back to the template strand via DNA ligase. o However, now that the two strands are still connected ,the phosphodiester bond can be cut in 2 ways. We can view this as a holiday structure by twisting the crossovers in space. Cut the holiday structure in two ways, one down the middle of the reoriented crossover strands, to form one two homologous strands and two hybrid strands. Cut down the middle of the outer strands, and form two homologous strands and two hybrid strands. o Both cuts will restore the nucleotide sequences of both strands. o
Lecture 7: Cloning, PCR, DNA Sequencing

A lot of DNA sequencing techniques requires a large amount DNA before it can be read. DNA cloning and PCR are used amplify the amount of DNA DNA cloning is accomplished via plasmids, a rather slow process taking 4-5 days:

A plasmid is a small circular double-stranded DNA molecule, and only encodes for a couple of genes at maximum. Plasmids are extrachomosomal because it is not incorporated in the formal genome of the bacterial. Found in bacteria, as well as lower eukaryotes. Replication of the plasmid occurs before cell division, so it allows for the multiple duplications of the plasmid. Plasmids gene structures are very well understood so we can modify them: o There is an ORI gene, that stands for the origin of replication in the plasmid to start replication. o We also need some type of resistance gene, such as ampr, which codes for a protein that breaks down ampicillin. This resistance gene will provide protect for the bacteria if it has the plasmid, so it will be selected for. We want to place our piece of DNA that we want to replicate in a specific place that is NOT in ORI or ampr, but at a specific position called the polylinker, which is recognized by a series of enzymes called restriction endonucleases. o Endonucleases tend to cleave the phosphodiester bonds within the polynucleotide chain. o Restriction endonucleases recognize specific base sequences to determine where they want to cut, and sequences tend to be palindromic. The specific base sequences that the endonuclease recognizes are all within the polylinker. Once the plasmid is cut, it will turn from a circular molecule to a linear one. o EcoRI makes a staggered cut, and separate to form sticky ends. The DNA that we want to insert into the polylinker site must be cut by same restriction enzyme as the vector so they have compatible ends for the fragments to anneal back together. o When the DNA forms the Watson-Crick base pairs, the sugar-phosphate backbone needs to be phosphor-diester bonded back together, done by T4 DNA ligase. Once the DNA has been inserted into the plasmid, you have formed a recombinant plasmid. For transformation: we have to mix bacterial cells with our recombinant plasmids, add and heat shocking the cell. Possibly because we perforate the cell wall, which reseals. Real mechanisms unclear Since not ALL bacterial cells incorporate the plasmid, we place all the bacteria into a plate of ampicillin, which will select for the bacterial cells that have the resistance plasmid. o The plasmid rapidly multiply within the prokaryotic cell. o As the bacterial proliferate, the bacteria will divide and form visible colonies on the ampicillin plate. o Each colony have genetically identical bacteria, but between one colony and another may exist different plasmid DNAs.
Polymerase Chain Reaction (PCR), a few hours to clone DN. Modern day rapid cycling can finish this 20 minutes:

A single tube reaction containing: o DNA template of our interested DNA o Oligonucleotide primers, which act as our primers for DNA duplication, and are around 20 nucleotides in length. However, this also means that we need to know something about the sequence that we are amplifying, to be able to design the primers. Primers are synthesized in-vitro based on the sequence that we give them. We need 2 primers at the extreme ends of the target sequence. o Taq DNA polymerase, from organisms living in thermal vents and can function under extreme temperatures, thermal stable. o dNTPs for the synthesis of DNA. We would place the ingredients in a thermocycler, which raises and lowers heat via a program. o This is used to denature the duplex, and once heat is lowered, renature the duplex faithfully. The steps of the thermocycle program: 1. to denature the DNA to form two single stranded templates. 2. to anneal the primers onto the specific location of the template (as we designed the primers). This allows the primer to anneal, but not the duplex. 3. which is the optimal temperature for Taq DNA polymerase used for DNA extension. Repeat this cycle 20 to 40x. We are replicating fragments of just the target sequence of DNA, specified by our primers, and through many repeating cycles, the original, entire sequences will be outnumbered. Manual (very old) vs Automated DNA Sequences: DNA sequencing can done by a dideoxy chain-termination method of DNA sequencing. Another single tube reaction, but can be divided up into 4 reactions: Reaction 1 consists of: o DNA polymerase o Oligonucleotide primer o DNA template, of the fragment that we want to sequence, such as a PCR fragment or plasmid fragment. o dNTPs (100 mM) o ddATP (1 mM). A dideoxyribonucleoside triphosphate does not have a OH at the 3 position, so when they are incorporated into a growing chain, DNA polymerase is unable to add more dNTPs onto the end of the chain so it terminates the chain. . Reaction 2, 3 and 4 are done with the same reaction with the only difference between that the ddATP is replaced with ddGTP, ddTTP and ddCTP, the other nucleotide bases. o Often, these are fluorescently labeled, the colors differ based on which base is attached to them.

o o o Because the ddNTP is in low concentrations, ddNTPs will not always be incorporated everytime its complementary base is present on the template. Every possible position where the ddNTP can be incorporated will form a different length segment. To take advantage of the size differences and the fluorescence labeling of the different bases, we will denature the duplexes and resolve the single stranded regions through electrophoresis. Gel electrophoresis will place the single strands on a porous chemical polyacrylamide to isolate the fragments to a resolution of one nucleotide difference. Once the two ends become charged, then the smaller fragments will migrate farther towards the poles, while the larger fragments will not move as much. Coming off the gel will be smallest first, largest last, you are reading, based on the color of the ddNTP, each base from the primer out towards the 3 end of the newly synthesized DNA (inverse and complementary for template)
Lecture 8: Eukaryotic Gene Structure and Non-Coding DNA

The genome is the entirety of an organisms hereditary information, consisting of DNA (viruses have RNA), a mix of coding and non-coding DNA (eukaryotic genomes: 99%) Genome size and organismal complexity: Yeast: 12 Mb Drosophila: 180 Mb Chickens: 1300 Mb Humans: 3300 Mb. Ameoba dubia: 660000 Mb. As genomic size increases, the non-coding sequences increases proportionally. Gene density varies between eukaryotes based on the size of introns and intergenic regions.
Coding genes is the entire nucleic acid sequence that is necessary for the synthesis of a funtional gene product (Ribozyme or polypeptide) Transcription difference between prokaryotes and eukaryotes: Prokaryote genes are located in operons, such as the trp operon. These genes are also related because they are on the same biochemical pathway. o The products for a biosynthetic pathway are all transcribed at the same time o This is called a polycistronic message. o In operons, translation occurs at different points within gene, and results in several proteins. Starts off with one large transcript and results in multiple translational enzymes.

Eukaryote genes are not in operons, but separated on different chromosomes and often disconnected via non-coding regions. o Difference is that the transcription for EACH product is separate, but the end product of enzyme is the same. o Smaller transcripts. o The transcripts are heavily processed in eukaryote organisms than prokaryote organisms. Eukaryotic transcripts are polyadenylated, a poly-A tail added at the 3 end. The information for where the poly-A tail should be is in the DNA, but the real tail itself is only added to the RNA. Exons hold the information of the coding protein, but introns no information. Primary transcript introns are spliced out, resulting in exons ligated to one another, forming the open reading frame (ORF) (can also be used to describe the sequences within the exons). The 5 end of the primary RNA transcript, a cap is added post-transcriptionally, called the 7-methylguanylate, which is connected via a 5-5 linkage In extreme ends of the gene, as well as the transcribed mature mRNA, are noncoding regions called untranslatable regions, as the ribosome parks downstream to it. o Genes are expressed in transcription units. o Simple transcription units contain: Exons, which are regions between introns. Control regions, which are upstream of the exons. All the genes with every cell in the body are the same, but what differs is the how the genes are expressed, which is regulated by the control regions, when the genes are turned on. Regions within the boundary of the exons and introns are called splice sites, where the introns can be taken out. o Complex transcription units have various forms: A gene has an exon that is sometimes skipped. Exon skipping The machinery for splicing recognizes the adjacent introns, but not the exon itself, so it is taken out. The resulting mRNA lacks the removed exon, and wont be expressed in the polypeptide. A gene has a poly-A processing at the end of two exons, so if it takes the poly-A processing of the first exon, then the second axon cannot be transcribed. A gene with two control regions and two cap sites, in which transcription will only start at one of the control regions and leave out the other (even if it starts at the first one. o Introns vary in size considerably, from 10,000 bp to 1Mbp. A vast majority is the human genome 95% is non-coding, mostly intron sequences. Plant introns tend to be very short, only 100-500 bp in length.

However, the regions between protein coding DNA is very large, rather than the introns theselves. Gene size is dependent on intron size, where larger introns will result in larger genes. Only 25-50% of protein coding genes are represented once in the genome (solitary, or single-copy genes) and the rest occur as duplicates or multiple copies. Sets of related genes that have duplicates are gene families. Example is the human -globin gene cluster: which are clusters that have duplicated through gene duplication. New copies can evolve a new function or degenerate over time and become pseudogenes.
o o
Noncoding DNA would include introns and UTRs (untranslated regions), and intergenic regions, such as those between one coding region and another. Promotors are a region of DNA that initiates transcription of a particular gene. Transcription will NEVER start without promoters. o In prokaryotes, there are specific sequences recognized by the sigma factor of holoenzyme o In eukaryotic genes, they are very close to the start of transcription, and several DNA binding proteins will recognize the motif. Cis-regulatory modules (CRMs) are regions from (100-1000bp in length) bind and regulate the level of expression of nearby genes o Enhancers elevate the levels of transcription o Silencers suppress the level of transcription o Insulators make sure the cis-regulatory modules only affect the nearby gene, and can interact with proteins that are associated to enhancers and silencers. o Cis- regulatory modules are DNA that rests upstream of a transcriptional area and are SITES where regulatory proteins bind.. o Trans-regulatory modules are the genes that create translatable protein that interacts with the cis-regulatory modules. Usually located on different chromosomes or far away from its cis-regulatory interaction site. These genes also have a cis-regulatory module, which is in turn interacted upon by the protein of another trans-regulatory module. o You can have multiple CRMs infront of a gene to modulate its expression. Simple sequence repeats have two types: o Minisatellite DNA are an array of units that are very identical in sequences, around 14bp-100bp in length. The number of 14bp-100bp length repeats spans from 20-50. Tandem repeat units. Sums to around 1-5 kbp in length. Often found in centromeres and telomeres, rich in minisatellite DNA. o Microsatellite DNA are just smaller, with repeats being 1-4 bp in length. Ex. AAAA Can sum up to around 600bp in length.

Occasionally they can be found in the transcription unit, even within the exons. Their expansion underlies several neuromuscular diseases like myotonic dystrophy and spinocerebellar ataxia. Anticipation is the expansion of microsatellite DNA. Expansion can occur through backward slippage is where CAG-CAG-CAG might cause a looping out of out of the CAG and no longer base pairing to the parental strand. This will cause replication to occur and a daughter strand to have n+1 number of repeats. Expansions occur very frequently, and so microsatellite DNA are called hypervariable. They can be exploited in DNA fingerprinting, where the number of SSR (single sequence repeats) varies between organism and organism. o Determine the region of the SSR (since we have already sequenced our DNA). o Design primers infront and behind the sequence. o PCR amplify this agent. o Use gel-electrophoresis o We would compare the difference between the SIZES of different SSR loci to determine heritability. SSRs be both micro and minisatellite. Non-coding RNA genes include the tRNA, rRNA. They do not have protein coding potential. o New non-coding RNA genes are found all the time. o The copy number in the human genome is relatively low, in comparison to the 25000 protein coding genes. o They do have introns that have to be spliced out for the final RNA transcript to function effectively.
Lecture 9: DNA transposons

The sum total of the non-coding DNA of promoters, cis-regulatory modules, simple sequence repeats and non-coding RNA genes is only around 2% of DNA. The vast majority of contributions is from transposable elements, which hare mobile DNA that are interspersed, there are more transposable elements than protein coding regions. Discovered by Barbara McClintock in her work with maize. It was ground breaking work before molecular biology was even around, but rather pure genetics. o Was the first woman to receive the Nobel prize in 1983 for her work. Found in both prokaryotes and eukaryotes Referred as selfish DNA because they propagate at the detriment of the host. o Parasitic DNA and junk DNA.

This lead to a paradigm that host genes, the intrinsic protein coding genes are the bright side of genomics, and the transposons, whose role in genes is seen as the dark side because they usually lead to mutations. They can be classified as two basic types, based on their mode of mobility, the first being a cut and paste mechanism, known as a DNA transposon. These do not accumulate to huge numbers as they usually excise from one region to another. o From one side to another: ->Target-site direct repeats (5-11 bp)->[Inverted Repeats (50 bp)->Protein Coding region ->Inverted Sequence]->Target-site direct repeats)-> o The insertion sequence limits exactly where the transposable element is located, which includes the two inverted repeats and the protein-coding region. The insertion element is around 1-2 kb in size. o Not all, but some DNA transposons have coding capacity, making the transposase gene. o At the terminal ends of the insertion elements are the sites of inverted repeats, or TIRs. The sequence on one inverted repeat is reversed, but exhibited only on the OPPOSITE strand. Ex: One side: 5-GGCTTCTAT-3, other side: 3-TATCTTCGG-5 NOT a palindrome! o Outside of the DNA transposons, of the insertion areas, are target-site direct repeats or target-site duplication (TSD). If you read on the same strand, from one side to another, it reads the same sequence: Ex.: 5-TCGGA----------TCGGA-3 TSDs are created when the element inserts itself into the genome. o DNA transposons are extremely diverse! Varies in: Length, TIR length, TSD, DNA sequence, Copy number. They can also nest by jumping into each other. o DNA transposons move through a cut and paste mechanism. Excision of the DNA transposon from the host genome by transposase, which has an endonuclease activity at the end of the TIR Exists as a DNA intermediate Inserts into a new genomic location with a staggered cut made by transposase as well. The ligation is also done by transposase when the DNA transposon is inserted in occurs at the 5 overhangs. The complementary strand to the 5 overhangs gets filled by DNA polymerase and a final ligation is done by DNA ligase. o DNA transposons that have the transposase gene are called autonomous elements because they have the ability to move on their own. o More commonly, DNA transposons do not have the transposon gene and are called nonautonomous elements. They can move through the trans- regulation of another autonomous DNA transposon, whose TIR are similar to the TIRs of the nonautonomous DNA transposon. The non-autonomous regions move in trans-. o DNA transposon can jump into a gene and knock out the gene, which occurred in purple maize.

o DNA transposon accumulate in copy number by going through DNA replication. If the transposon moves from AFTER replication fork to a region BEFORE the fork, then the number will increase by +1.
Lecture 10: Infested Genome: Retrotransposons

The DNA transposons move through a cut and past mechanism while the ones today move through an RNA intermediate. Contains various RNA intermediate retrotranpsosons (also called retro-posons): LTR-Retrotranspons LINEs SINEs Processed Pseudogenes
They all move through the same central dogma: DNA transcription-> RNA REVERSE TRANSCRIPTION via reverse transcriptase-> DNA. Reverse transcriptase is actually a DNA polymerase, which can also use RNA as a template to produce DNA. o It can actually still use DNA as a template, but much less efficiently. o It follows all the rules of DNA polymerases (requires primers, goes 5 to 3)
To understand the function of reverse transcriptase, we will look at retroviruses. They use reverse transcriptase in their life cycle. Their genome is SINGLE STRANDED RNA molecule. Retroviral life cycles usually do not involve killing the infected host cells. Oncoviruses lead to the formation of cancer due to retroviruses that have ACQUIRED a host gene. o The virus puts the host gene under regulation of the retrovirus, RATHER than under regulation of the host. Retrovirus life cycle: o The virus is encapsulated in a lipid membrane envelope with various embedded proteins. Inside the envelope is a protenatious nucleo-capsid. There are two strands of SINGLE STRANDED RNA molecules, and two proteins of reverse transcriptase. o The virus first fuses to the host cell, which will release the capsid. o The reverse transcription activity through reverse transcriptase will convert the single stranded RNA into double stranded DNA. o The double stranded DNA molecule need to be INTEGRATED into the host genome. The DNA molecule can stay there for a relative long time. o The transcription of the genome can have will transcribe the viral genome back into RNA and then translates it back into the protein capsid.

When the retroviral DNA is integrated into the host genome, it will contain LTR regions at the extreme ends: o Repeat regions on both sides going from 5 to 3. U3, R and U5 regions. Direct site repeats on either end Between the extreme LTR regions exists a coding region of DNA. o The coding starts between the border of U3 and R! This means that U3 has promoter activity! o At the R and the U5 site at the 5 end of the template, there is a poly (A) site, which will add a poly(A) tail at the end of the transcript. Therefore, when we look at the final transcript from the retroviral DNA copy to the retroviral RNA genome, there NO LONGER EXISTS LTRs o The 5 end is missing U3, and the 3 is missing U5.
How do you go from this single stranded RNA transcript to creating a double stranded part that contains the LTR? 1. The primary transcript of the retrovirus RNA genome first needs a PRIMER (as reverse transcriptase needs a primer). (5 end R|U5|PBS|coding region|U3|R|POLY-A 3 end) a. The primer is provided by the HOST by a tRNA molecule (not produced by the virus). b. The primer binds RIGHT at the primer binding site (PBS) outside the U5 region of the 5 end, closer towards the 3 end. c. Now we have a free 3 end of the primer. 2. The DNA polymerase, reverse transcriptase, will continue coding (DNA bases) from the free 3 of the primer towards 3, which ends at R, which is the END of the RNA transcript (the 5 border of the original RNA transcript, or the 3 end of the newly created DNA). a. Free 3 end, but no more complementary RNA transcript. 3. Now, there is a SPECIFIC RNA degradation enzyme that degrades only the RNA during an RNADNA complex called RNA-ase H, which is viral encoded RNA-ase gene. a. It will degrade the RNA 5 end (U5 and R) attached to the extended DNA 3 end. b. This makes the 5 of the RNA transcript missing, does not allow further DNA polymerization. 4. The nucleic acid held by the hydrogen bonds at the tRNA primer will be released (at a low frequency) and DNA complementary to the R section will bind to the RNA R section at the 3 end of the RNA transcript. 5. Now that there is a free DNA 3 end and a corresponding template, reverse transcriptase will then continue on coding from the complementary RNA 3 R section to the RNA 5 end of the RNA transcript. 6. RNA-ase H once again degrades the RNA sections connected to the DNA, EXCEPT for one region, connected adjacently to the U3 border on the DNA at the 5 end, that is poly-purine rich! a. The degradation slowly occurs, but slowly enough that the RNA will last for enough time to act as a PRIMER for DNA polymerase, now reading the newly formed DNA

strand, and code the double strand towards the 5 end of the newly formed DNA, ending at the primer RNA region. 7. The reverse transcriptase DNA polymerase will read and code to tRNA primer at the 5 end of the DNA strand, while not coding for the tRNA as it has a FUNKY STRUCTURE. a. It will create a DNA form of the PBS. b. Since it creates DNA-RNA duplex at the PBS sight, the RNA primer will be ripped off. c. However, in this conformation, the 3 end of the newly created DNA sequence is outside of the complementary DNA sequence and cannot be continued by reverse transcriptase. 8. Similar to step 4, the hydrogen bonds connecting the U3, R, U5 will denature, and the complementary sequence at the PBS will bind to the PBS sequence at the 3 of the extended DNA sequence. a. This creates a free 3 prime end on both the extended and the new strand. This will allow DNA transcriptase to continue in BOTH directions. b. This will FORM the LTR repeat section on BOTH sides. Note that the LTRs are DIRECTLY identical from one side to the other, and they HAVE TO BE for the transcription of DNA from RNA translate. Types of retrotransposons through an RNA intermediate: LTR-Retrotransposons also have LTRs at the extreme ends of the element that are very similar to retroviruses. o They also have a target site direct repeat (5-10bp) directly OUTSIDE of the LTR region. o Looks identical to the retrovirus DNA genome. o Looking at the genes retroviral between the LTR regions: Retroviral ORF order is: gag (group specific antigen), pol (polymerase) and env (envelope). Many of these genes produce poly-proteins, which are several enzymes that are linked together in one amino acid chain, but need to be processed at specific sites to form several proteins. Gag includes the protein that are responsible for capsule formation. Env is necessary for envelope formation. Pol codes for the formation of a polypeptide that includes protease, reverse transcriptase and integrase that NEED TO BE CLEAVED. o Looking at LTR-Retrotransposons in humans, they do NOT have an env gene and cannot make an envelope. They cannot be infectious, they cannot fuse and bud out of the host gene. o This raises the point of Chicken or Egg? Some say that LTR-retransposons were able to mutate and capture a host gene that allowed them to encode for the envelope (env) and then became infectious retroviruses. Some say that retroviruses came first, lost the env gene and become trapped in the host. Once the retrovirus integrate themselves into the genome, they will

collect mutations and cripple the env gene. These are called endogenous retroviruses, and our DNA contain MANY. o The overwhelming evidences that retroviruses came from LTR retrotransposons. o Therefore, transcription occurs in a very similar way for DNA LTR-retrotransposons. o There was an elegant experiment that proved that an RNA intermediate was involved. Two plasmids were created and then transformed into yeast. The plasmids have LTR retrotransposons. One plasmid has a manufactured LTR promoter region that is only turned on in the presence of galactose. The other plasmid also has the galactose responsive promoter with another INTRON sequence, which are processed out during the immature transcript to produce the mature transcript of RNA without it. Once placing the plasmids into yeast cells, galactose is NOT added yet to make sure the plasmids first are transformed well. Since NO movement or replication is detected, it shows that the first transcription into RNA transcript is required. Now, when galactose is added, the transcription occurs in both plasmids cases. HOWEVER the intron sequence in the second plasmid is spliced out when forming the RNA transcript. Seen by viewing the transposed element and not finding the intron sequence infers an RNA transcript is required. o LTR-Retrotransposons NEVER have introns because whenever they move, the introns are always spliced out. o RNA-retrotransposons reverse transcription back to DNA is identical to retroviruses as well. o Integration of the element into the DNA is done by integrase which is coded by the region between the LTR regions in the LTR-retrotransposons. VERY similar to transposase of DNA transposons. Stagger cuts the DNA backbone, inserts double stranded LTR DNA and then fills in the cuts with DIRECT REPEATS. In doing so, we create the target-site direct repeat duplications. LINEs and SINEs are referred to non-LTR retrotransposons. o Long and Short interspersed elements. o Most common LINE is L1 and most common SINE is Alu. o LINEs are usually 6-8kb in length. Almost present at 860,000 copies in the human genome and they contribute 21% of the human genome sequence. o They have coding capacity: ORF1 (RNA Binding protein) and ORF2 have reverse transcriptase and DNA endonuclease activity (used for insertion of LINE). o RNA binding protein provides a mechanism to transport the reverse transcriptase and LINE complex back into the NUCLEUS.
Lecture 11

LINES are SINES make up the majority of transposons in human genomes LINES and SINES do have an RNA intermediate, but this intermediate must interact DIRECTLY with host DNA genome, therefore must be transported back into the nucleus. LINES are quite big, 6-8 kb and quite numerous in the human genome that makes up for about 21% LINES consist of (Target site direct repeat-> A/T-rich regions -> ORF1 -> ORF2 -> ___->Target site direct repeat) ORF1 creates the RNA-binding protein, which is very important of getting ORF2 and the LINE back into the nucleus. ORF2 has reverse transcriptase and DNA endonuclease activity. o The reverse transcriptase domain is VERY similar to the LTR-Retrotransposon sequence, so one would wonder if LINES created the first LTR-Retrotransposon, or not. Once ORF 1 gets LINE and ORF2 back into the nucleus, the ORF2 line transcript binds to a specific sequence in the duplex DNA and insert the LINE. o ORF 2 has an endonuclease activity and makes a staggered cut at the 3s of each strand called nick sites. o This type of endonuclease activity and restriction endonuclease activity when talking about cloning is that for the most part, the endonuclease for ORF2 IS NOT sequence specific, but specific to A/T rich regions. o After making the staggered cut, the T rich area on the strand of DNA will be the primer (free 3 end) and the template will be the LINE, attached at the poly A tail.
o o o
This will lead to the extension of the nicked DNA, via ORF 2s reverse transcriptase activity, while reading complementary to the LINE. The ORF2 LINE transcript is still associated with the nicked end so that it doesnt just float off into space. After the LINE has been DNA copied, the LINE DNA will move back into alignment and the LINE RNA will be inserted on the other strand, creating a DNA-RNA hybrid There is a break in the sugar phosphate backbone, because ORF 2 is associated in both the STAGGERED END and the LINE transcript, it reads OVER the break in the backbone end and continues to polymerize.

Afterwards, there are many enzymes, DNA polymerase, DNA ligase, will fix the backbone via gap repair The RNA part of the LINE is removed and continues on to replace the RNA with DNA while reading from 3 to 5 of the LINE DNA. Upon completion, there is the formation of the direct repeat that is CREATED upon insertion. LTR Retrotransposons target sites TEND to be 5 base pairs in length. Families of DNA transposons have specific target site size. Some make 8, 2, etc. For LINES and even SINES, there are no specificity for size, nor sequence, except for being A/T rich.
o o o o
SINES have NO protein coding capacity, differ from LINES. Basically derived cellular non-coding RNA genes (mRNA, tRNA), specifically the human 7SL RNA gene. Basically a zombie gene, because it was previously inactivated and no longer used, but it became capable of movement, but does nothing for the host. Other types of non-coding RNA genes can ALSO be mobilized. In plants, tRNA can be mobilized. Ranging to around 100-400 bp in length. In humans, they make up around 1.6 million copies, but only make up around 13% of the total nucleotide sequence. We do not understand how SINES move, but they seem to pirate the same enzymes that LINEs use: ORF1 and ORF2. THEY cannot move unless there is an active LINE somewhere around.
Processed Pseudogenes are non-functional, decaying genes, but they are processed. They are mobilized protein-coding genes. They originate from a reverse transcription of mRNA and are extremely rare and definitely NOT as common as LINES, SINES. For pseudogenes, they arise from taking mature transcripts of RNA and then reverse transcribing them. o Because of using mRNA as a template, they LACK introns and control regions! o Therefore, because there are no control regions nor promoters, they are DEAD in the water.
DNA transposons are excised, so they are GONE from the original location that they move from. HOWEVER, with retrotransposons, they NEVER leave the original place of insertion. They are simply transcribed into an RNA intermediate and then integrated into another place in DNA Therefore, as long as there are many RNA intermediates, we will get a burst of retrotransposon activity. Reverse transcriptase is VERY error prone and has a lot of mutations. This allows retroviruses to become very variability and escape host immune suppression mechanisms.

However , they cannot have TOO many mutations, because the genes that are in LINE will become defect and not usable. Most elements in the human genome are DEAD and cannot be mobilized anymore due to the mutations. MOST elements that can move, can only move very infrequently and per generation, barely move at all. Most transcripts themselves are truncated because the reverse transcriptase does not hold long enough to read all the way through the RNA transcript. These elements are dead in the water too. Retrotransposons are VERY diverse. The two most used elements that were used for discoveries were Ty1 and Ty3 in Yeast. o
In terms of genome content contribution: Yeasts genome is very small because it has VERY FEW transposable elements in its genome. Rice is one of the model genome for plants, has a rather high genome content with transposable elements: 30%. In humans, when the human genome project was finished, AT LEAST 50% of the genome was made up of transposable element. Maize have a VERY large genome has a VERY HIGH content of transposable elements 90%. o ALMOST ENTIRELY LTR-retrotransposons. Lily genome size is LARGER than a humans, which is because over 99% of the genome is due to LTR retrotransposons.
The last contributor to the non-coding DNA is spacer DNA. We have NO IDEA what it is. As computational techniques are getting better, we are getting to know regions of the spacer DNA. As transposable elements mutate and decompose, they become spacer DNA.
Transposons and Evolution: When considering evolution, many people only look at protein coding genes. There is a negative correlation between transposon mobility and fitness. NOT GOOD for natural selection.
The effects of transposons in evolution: The insertion of transposons can cause mutations! o The insertion of a DNA transposon in maize knocks out the purple color gene in maize. o Floral colors are also effects of insertion knockouts. o There are many cases with insertion knockouts were often SELECTED FOR (lack of purple color in maize) due to human interaction. o An element called P in drosophilia are often inserted into the S6 Kinase gene and creates a dramatic effect in morphological size in drosophilia.

The current idea that they should be ignored is a bad paradigm because some transposons do some good! Gene and segmental duplication can result in the MOVEMENT of genes between transposable elements from one chromosome to another, resulting in duplication sets. o However there is often no need to have 2 copies on one gene, so it often decays and is naturally selected against. o The extra copy will persist, however if it gains a novel function. Polypeptides are composites of different combination of domains that may be shared between different functioning proteins. These domains are SHARED as a consequence of exon shuffling. o Exon shuffling is mediated by recombination between mobile elements. Alu and L1 decorate regions within and around genes. Therefore, an exchange of two sequences can cause specific sequences being shuffled.
Transposons can also move simultaneously when transposase recognizes ONE end of ONE transposon and the other end of ANOTHER transposon, and everything inbetween gets shuffled. o LINEs can also move exons around as the transcription while making the LINEs does not stop at a weak PolyA signal, but continues and also transcribes terminal exons that have polyA signals after them. This can then be reverse transcribed and integrated into gene 2, and the extra exon is added into the new gene. o Very important in creating novel polypeptides, if you knock out the protein coding domains, the polypeptides cannot work. Also an important role in cis-regulatory module mechanisms, which are very important with the regulation of genes in control regions. o Transposable elements also have coding capacities with transposase genes. They have varying coding capacities. They have their own promoters and cis-regulated modules. o CRMs are enhancers and silences. o When transposon elements insert themselves UPSTREAM of the gene, they will add a novel cis-regulatory module to the specific gene. o Usually when introduce insecticides, the insects can develop resistance.

In the P450 gene that codes for insecticide have a transposable element insertion in the resistant strains of the flies UPSTREAM to the P450 gene. This is because the transposon element contributed cis-regulatory modules that enhanced dramatically the P450 translation. Two transposon elements contributed infront of a tb1 gene to promote the transcription of the gene and caused the branches to become cobs of the corn. SINE insertion upstream of insulin-like growth factor 1 changed the regulation of the igf 1 in the domestication of dogs. The transposons that are only contributing to the CRM will be selected for over evolutionary time. However, the rest of the transposon does not contribute and will be selected against, and will eventually degenerate out, only leaving the CRM.
o o o
Lecture 12: Organellar Genomes and DNA Barcoding

Midterm information: Consists of 18 MC questions and 7 Short answers. The MC questions and short answer are very similar to those of the quizzes. Short answer are VERY short, but more involved than the quizzes. They tend to mix information from two or three lectures.
Final exam will be December 14th at 9AM in the McGill Gym. Transposons and evolution: Transductions is the process through which retroviruses and LTR-retrotransposons can acquire genes from its host. o Retrovirus can acquire host genes, LTR-retrotransposons can also acquire genes from the host. o Applies only to LTR-retrotransposons and retroviruses. o The acquired gene does NOT have introns Transduplication is the acquisition of host DNA into DNA transposons. o Often intakes host GENES into the DNA transposon. o The acquire gene DOES have introns because there is no RNA intermediate. By acquiring these small gene fragments, LTR-transposons, retrovirus and DNA transposons are fishing around for fragments that may have a specific benefit to them. o Non-beneficial ones will degrade over time. o Beneficial ones will be maintained! o An example is the acquisition of the env-ORF, which is the transition of an LTRretransposon into a retrovirus.
Domesticated transposable elements were originally mobile transposons, but then become harvested and immobilized by the host and have a specific function for the host. Most of the time, this specific function is VERY necessary to the host.

For DNA transposons, can arise from the lost of one of the terminal repeats, so that transposase does not recognize the transposon anymore and cannot move anymore. Again, if the transposon gene that has become immobilized is non-beneficial, then it will be degraded out of the gene as well. Some examples include the RAG 1 and RAG 2: recombination-activating gene which are important to the assembly of the human immunoglobulin gene. Our immune system is very dependent on this gene. o Was originally a DNA transposon and then incorporated into the genome. o Very similar to how the transposase works in a DNA transposon. Syncytin is important to the human placental cell fusion and development. In other animals, it is involved in the formation of multi-nucleus cells in the interface of the fetus and the placenta. o Came from the LTR-retrotransposon. o Very similar function to an env protein.
Organellar DNA genome in mitochondria and chloroplasts: Involved in respiration and photosynthesis. These organelles were derived from free living bacteria, became endocytosed and become endosymbionts. They still look similar to prokaryoutes: circular DNA, genes lack introns (specific to eukaryote organisms) and gene products resemble prokaryotic RNAs and proteins. In eukaryote cells, there is only one nucleus containing cellular DNA, while there are many mitochondria and chloroplasts that contain multiple genomes. o There are multiple copies of mitochondrial genome within EACH mitochondria. The organelles were sustained in cells because they offered a biological advantage: oxidative phosphorylation and photosynthesis. By eating the green algae, the photosynthetic pigments in the algae reach the slugs epidermal cells and can photosynthesize. VERY similar to endocytosis. o Not passed on to the next generation. Over time, genes that were considered as endosymbiont in the mitochondria and chloroplasts have been transferred to the nucleus. o ALSO evidence of the reverse! Nuclear DNA can go into mitochondria and chloroplasts, as well exchange between the two organellar genome as well. The organellar genes are a small subset of the original gene that is necessary to keep the organelle functioning. o Therefore looking at the sequence of the mitochondrial genome, it is very small. Contains only 37 genes which are required for translation. Lost a LOT of genes in comparison to Ecoli. NO INTRONS. o The proteins coded by the mitochondrial DNA NEVER LEAVE the mitochondria. o There is also alterations to the standard genetic code, where stop codons in DNA are no longer stop codons in mitochondrial DNA. o Known as codon bias between mitochondrial and nuclear genome.

UGA : normally STOP. UGA in mitochondria trypophan. o Outer circle, the genes are transcribed in the CLOCKWISE direction o Inner circle, the genes are transcribed in the COUNTERCLOCKWISE direction. o Chloroplast genomes are somewhat bigger, with 100-200 genes and around 100-200 kbs. The remainder of the genes necessary for its proper maintenance is stored in the nucleus and transferred into the organelle. The mitochondrial genome has been proven to be very important: o In rats, mice with a mitochondrial DNA polymerase defective for proofreading will have HUGE numbers of mutations and obliterate most of the mitochondrial genes. Will exhibit premature aging. o The rats with non-functional mitochondrial DNA polymerase has a much smaller life span.
DNA Barcoding relies on exploiting the sequences in mitochondrial and chloroplast genomes. Unique identifier for each species on the planet In comparison to other molecules to uniquely identify a speices: o Proteins and polysaccharids are VERY hard and expensive to sequence. o RNA is very unstable to last as a barcode. DNA is extremely stable and there is PCR to amplify and get access to the information of the genome. o There exists a barcode sequence in a gene that acts as a unique identifier. It is flanked by regions that act is primers for PCR. Some considerations for choosing the right DNA sequence for barcoding (does not have to a specific gene, but rather just FIT these specific criteria): o However, we cannot use any sequence because there are too many similar sequences between species. o Sequence differences (divergences) have to be high enough to be distinguishable, but low enough to not defect within species. o The sequence must also be able to be amplified via PCR and the region must be flanked by an ULTRA-CONSERVED region (for all species) for primer annealing Around 20 bp in length. o The length of the barcode sequence must not be variable: therefore the sequence must not have introns or transposons within them. The differences in nuclear DNA between human and chimpanzee is only 0.9% However, the differences between the mitochondrial DNA between human and chimpanzee is 9% o Better to use for its hypervarying property. o The mitochondrial DNA genes between two species are similar for ribosomal coding genes, but are different for protein coding genes. For most eukaryotic organisms, and plants, the barcoding use a section of DNA within the mitochondrial cytochrome c oxidase subunit 1 gene (CO1)

1. 2. 3. 4. 5. In plants, the barcoding region is within the chloroplast genome cpDNA, located within a spacer, between genes. To analyze the DNA barcode one must go through these steps: Take a sample of the organism Isolate the DNA. Place it into a PCR thermocycler. Sequence the DNA barcode Compare to Database a. The database that houses all the DNA barcodes, there are approximately 2 million barcodes deposited. DNA barcoding can be conducted: o With DNA fragments, even with ancient DNA. o You do not need the entire organism, but rather just a small part of the organism to use PCR. o Also works for ALL stages of life, which is great for differentiating very similar morphological eggs.
Lecture 14: More Organellar Genomes and Barcoding

Reasons to conduct DNA barcoding (contd): Unmasks species lookalikes. o With a collection of many similar bird species, we expect big genetic differences between species and small differences between members of the same species. o We separate species if they have a 2% genetic difference. o Species that have small genetic differences between themselves might be species that just diverged from a single ancestor and still contain much of the same genome. Reduces ambiguities where animals of the same species look different morphologically in juvenile form compared to adult form. Makes expertise go further, where students are learning how to go out and do DNA barcoding. Democratizes access by allowing free access to the genetic database. Currently, 1.7 species have been identified on the planet, but there are an estimated 10 million species. Bottleneck is the physical acquisition of the species themselves.
Advantages to DNA barcoding: Very fast and cheap because the PCR and sequencing is in the realm of $6, and easily accessible.
Disadvantages of DNA barcoding: It is potentially inaccurate when you isolate DNA from carnivorous insects or animals, where you might find DNA from the ingested animal as well. Costly to set up the initial data base.

Also, all the grant money is going towards DNA barcoding, rather than the initial training for taxonomists because it is more politically appealing. DNA can also be degraded and can no longer be barcoded, however, all you need is ONE intact copy.
Lecture 14: Chromatins and Chromosomes

DNA molecules, after replication, are too long to fit into the nucleus. Therefore there must be a way to contract the DNA and send it off into the nucleus. Lengths of DNA can be measured in centimeters, but diameter of nucleus is only a few microns.
DNA is first organised into chromatin, and then finally it exists as a chromosome. During interphase (longest cycle in the DNA life cycle), DNA exists as nucleoprotein complexes known as chromatin. Chromatin DNA isolated in an isotonic buffer has an equal proportions of DNA to protein. In a low salt solution, then the chromatin extends out and becomes more visually detailed. Called extended form of chromatin, or beads on a string. o The beads are nucleosomes (10 nm in diameter) Nucleosomes consist of histones and wrapped DNA. o There are 5 major types of histones: H1, H2A, H2B, H3 and H4. o Rich in positively charged + amino acids to interact with charged phosphate groups. o Since nucleosomes are used in all eukaryotic DNA, the H2A, H2B, H3 and H4 are HIGHLY conserved. H1 is slightly variable. o Nucleosomes consists of wrapped, DNA, ~147 bps wrapped almost two full turns around the surface of the protein. Protein cores are an octamer of histones, with 2 copies of H2A, H2B, H3 and H4. The N-terminal tails of these histones polypeptides stick outside of the nucleosome complex. For H2A and H2B, there are also C-terminal tails extending out. o Nucleosomes are attached to one another via linker DNA, around 10-90 bp in length. In a high, physiological salt concentration, the chromatins exist in a condensed, fiber-like form, and is referred to as a 30-nm chromatin fiber. o Beads on a string are compacted by stacking every other nucleosome ontop of each other, forming a zig-zag ribbon compaction and then a two-start helix. o The diameter of the helix is restrained by the length of the linker DNA. o H1 is not a part of the protein core, but rather plays a role in stabilizing the 30-nm chromatin fiber. Associates with the helix and probably the linker DNA. Note that this level of chromatin compaction, into the two-start helix via the nucleosomes is only occurring during interphase. It is also a dynamic event.

In comparison to genes, they can be up to 1000 bps, which are much larger than both linker and wrapper DNA around a nucleosome, so there are many proteins that in order to be transcribed, need to be unravelled from its nucleosome structure. o This is done by the interaction with the N-terminal and C-terminal tails of the nucleosome. Modifications of the histone tails regulate chromatin condensation. o The chief modification is acetylation of lysine residue in the tails. On a lysine molecule an enzyme: histone acetylase removes the terminal amino group and replaces it with a carboxyl group. Therefore changing the + amino residue to a neutralized state. Less interaction with (-) of DNA. o Other modification is phosphorylation and ubiquitination, which adds a Ubiquitin to the tail and change the characteristics with how the histone acts with the DNA. Transcription is correlated with the level of chromatin condensation. o In an experiment, the globin gene in a decondensed chromatin is very accessible to a protein: DNase, but in a condensed structure, the DNase couldnt find the globin gene. Isolated chromatin deprived of histones still have non-histone proteins, which provide structural scaffolds for long chromatin loops. o These loops tend to be gene rich and 1-4 Mb in length. o The points at which the loops attach to the scaffold is called scaffold-associated regions (SARs) or matrix-attachment regions (MARs), in plants. o Formation of loops also has an impact on transcription. 30 nm Chromatin fiber -> 100-130 chromonema fiber -> 200-250 middle prophase chromatid-> 500-750 nm metaphase chromatid. o Organization and proteins involved in orchestration is not well understood. Modulating chromatins from the loose form to the more compact form is understood as the regulation of transcription. In interphase, after mitosis, not everything is in open form and waiting for transcription. It is dynamic. o Transcription factors that bind to cis-regulatory modules will also alter chromatin condensation. o Regions called heterochromatin NEVER tend to decondense and always retain the 30 nm form. Poor in transcripable genes. Rich in repetitive DNA regions and transposable elements Found in centromeres and telomeres. o Euchromatin is thought to be always in an open, decondense structure. Rich in transcriptionally active genes Poor in repetitive DNA and transposable elements.
Lecture 15: Chromatin and Chromosomes

One can always write down any concerns about questions in the back of the front page, indicating WHICH question and which answer you chose.

Structure of chromosomes: THREE functional elements that are required for the stable replication and inheritance of chromosomes. Multiple origins of replication: talked about during DNA replication lecture. Centromeres are regions that allow for mitotic segregation. o NOTE that prokaryotes do not have centromeres, NOR telomeres because they have circular DNA. They do have origins of replication. o In yeast, centromeres are extremely simple, consists of one single sequence, one copy. There is a central region that is A/T rich. There is nothing else specific about the precise motif. Looking at bordering regions in all the 16 yeast chromosomes, the surrounding regions are HIGHLY conserved. Region I and region III are HIGHLY conserved, while region II is the centromere region. Centromere sequences are bound by certain proteins that bind to spindle fibers. A special type of nucleosomes, histone H3, bind the centromere sequence region. o In humans, centromere regions are also very A/T rich and have HIGHLY CONSERVED surrounding regions. o For most other eukaryotic organisms, centromeres exist as repeat sequences that can be 10s 100s or 1000s bps in length. o Telomeres prevent the shortening of chromosomes. o In DNA replication, the leading strand replication occurs fully from the 3 end to the 5 without a problem. o However, in the lagging strand, the END primer sequence at the 5 end of the lagging strand will be removed, and no gap filling will occur because there are no free 3 ends to extend o Telomeres very gene poor and do not code for much. o Therefore, since this is not resolved, the telomere will get shorter and shorter. Eventually, you will shorten out the telomere and lose genes. o The maintenance of the length of telomeres is done through telomerase, a ribonucleoprotein (RNA+Protein, like ribosomes), combines with a protein that is a telomerase reverse transcriptase (TERT), which has an associated RNA template, a telomerase RNA, which has an important business end that acts as a template for lengthening the telomerase. The telomerase protein engulfs the end of the telomere that needs to be lengthened. There is a catalytic site of the reverse transposase protein. There is an association of the template with the exposed single strand Generally, we are trying to extend the TEMPLATE for the lagging strand long enough to have another Okazaki fragment.

Reverse transcriptase will now use this RNA extension template to add dNTPs from the 3 end up to RNA 35, which might be because of the secondary and tertiary structure of the RNA template. There seems to a deannealing of the base pairs and forms a hairpin structure (not always). Deassociation and re-associaiton closer to the lagging template will occur. This step will occur OVER and OVER again to EXTEND the lagging strand template. Now that the lagging strand template is long enough, a new 100 nucleotide Okazaki fragment can be laid down. DNA polymerase alpha-primase can now prime the synthesis of a new Okazaki fragment. Telomeric repeats are often seen in eukaryotes and is highly rich in G sequences. Note that for cells that are permanently differentiated and do not divide, telomeres do not do anything, there are complexes associated to the end of the chromosomes that prevent exonuclease from occurring.
o o
An experiment that shows the importance of these three functional elements: Take a plasmid that ONLY has the gene that allows for the synthesis of leucine. Introduce this plasmid into a mutant yeast cell that CANNOT synthesis leucine, a Leu cell. Even though the introduced plasmid has a functional leucine, none of the yeast cells have the ability to create leucine. o We need an origin of replication or else leucine will not be created. In yeast, the origin replication is called ARS: autonomous replication sequence. Even when the plasmid containing the LEU and the ARS are introduced, very FEW of the yeast cells take in the gene. o This is because of mitotic segregation of the chromosomes, and since the plasmid does not segregate properly without a centromere and does not get incorporated into the cell. o Yeast cells are not prokaryotic. When we introduce a centromere into the plasmid and yeast cell, then the plasmid becomes carried over and has a ARS and a Leu gene. However, in eukaryote cells, the DNA is not circular but linear. If we cut the plasmid and make it linear, then none of it is uptaken because the linear molecule is becoming degraded after each DNA replication cycle. o Any degrading mechanisms in the cell will be degrading both the LEU and the ARS. Loss of either of them will cause in complete functional loss. After adding yeast telomeres to the linear plasmid, while keeping the centromere, and the ARS, then the transfer occurs fine.
Lecture 15: Genomics and Bioinformation

Devise a method of deciphering what the genome sequence tells us. The field of genomics is the analysis of genomes. Determination of the DNA sequences of the genome. Annotation is the identification of the locations of genes. In the beginning, we only cared about the coding regions, the genes. Human genome project occurred around a decade ago.
-Omics era: Functional genomics is the function of EVERY gene in the genome. Proteomics is the study of all the gene products, all the proteins produced. Referred to as the proteome. Evolutionary genmoics is the study of genomes to understand its underlying influence on the evolution of organisms. Transcriptomics is trying to understand the sequences and nature of all the transcripts of the organism. Phenomics is looking at the complete phenotypes of the organism. Spliceomics is looking at the splice sites on the genome. All the omics have a common sense of looking at the OVERALL view.
Since the first genomic sequence in 1984 of the Epstein-Barr virus, advancements of DNA sequencing technology allowed for thousands of other organism sequencing to occur. In 2005, only 186 microbial DNAs have ben sequenced. In 2012, a coupe days ago, already 2472 genomes have been sequenced. The influenza virus was one of the first things to be sequenced, very small genome. However, they also tried very hard to sequence the genomes of model organisms, such as yeast, C.elegans, drosophila, mice, zebrafish and plants. There are ~2300 completed or ongoing eukaryotic genomic projects. Human genome has been resequenced SEVERAL thousand times, because now it can be done in a week for 10k. Mosquitoes and rice were also sequenced very early on, one for malaria and the other for calorie provider. Around 2000. Many organisms who are associated with disease have been sequenced.
Dideoxy sequencing: Shotgun sequencing The length of sequencing for didioxy sequencing is around 400-500 nucleotides, VERY SHORT That means we have to break down a large geonome into SMALL sequences to be screend. Mechanical shearing breaks down a large genome into small bits, with different lengths. We will first have one patch of DNA that we know matches to the overall genome. Then we have to look for OVERLAPPING fragments to indicate that the fragment comes after the previously associated patch. o Tiling paths of overlapping fragments are called contiguous sequences, of contigs.

The addition of contiguous sequences into the whole genome will result in GAPS between which you can find no matching patch of DNA. Gaps correspond to regions that are rich in repeated sequences, such as transposable elements. o Can vary in size from 10s of bp to Mbp. o This was acceptable in the beginning because we only wanted to know about coding genes, not transposable elements. o
Next generation sequencing was created to associate specific mutative genomic changes that cause disease. To sequence genome very quickly at a very low cost. Refers to the Illumina Solexa sequencing, and the Roche 454 pyrosequencing. We have to have high throughput to allow sequencing to happen very quickly. Massively parallel means that hundreds of sequences will be deciphered AT THE SAME TIME. Microfluidics are new technologies that allow the moving around of very very small amounts of liquid. Fixed synthesis refers to fixation of the template in ONE coordinate, or sometimes the DNA polymerase, which is required for the function of a detector. Read length is the length limit that each segment can be to be read. In dideoxy reading, it is only 400-500 bp, but in new technologies, much longer. General steps for next generation sequencing: 1. DNA isolation 2. Fragmentation like mechanical shearing 3. Producing a library, by getting the templates that are about to be sequenced in a fixed coordinate 4. Amplification the template at the specific coordinate, done through a protocol very similar to PCR. 5. Sequencing chemistry, which are different 6. Assembly of the data, which can be up to gigabases in length
Lecture 15: Fringe Genomics

Next generation sequencing is much better than dideoxy sequencing. However dideoxy sequencing is still used for products like one PCR product. Illumina sequencing and 454 sequencing both have high throughput: Massively parallel, with millions of sequencing reactions occurring at the same time. Relative small mechanics so it incorporates the microfluidics, which is the movement of very small amounts of liquid. Fixed synthesis refers to how not only the sequence is fixed, but also DNA polymerase. Read length is referring to the amount of DNA that can be read at once.
General sequencing pipeline:

DNA isolation (high quality DNA), Fragmentation (mechanical shearing), Library refers to immobilizing the template on the matrix Amplification is similar in protocol as PCR, for a stronger signal in detecting. Sequencing is very different and proprietary per type of sequencing. Assembly which is reconstruction of the final puzzle, and it is the bottleneck of the procedure.
Illumina sequencing: Sequencing: o DNA replication still occurs as per usual, needs a template and a primer with a free 3 end. o ddNTPs are present in solution, but are instead reversible terminators, in which the obstacle can be removed and polymerization can continue. The blocking agents are fluorescently labeled dyes. Once attached and terminated the polymerization, the rest of the dyed ddNTPs are removed. Detector can sense that there is a dye corresponding to the base at the terminus site. After detection the dye is removed and the 3 end is re-exposed. Fluorescently colored ddNTPs are back and cycle continues for each bp. After fragmenting, the ends of the DNA need to be fixed. They also add a tail at the end of the fix similar to a poly A tail. We will add an adapter to either terminal end of the fragment, similar to the oligonucleotides that are customly made. To immobilize the DNA on a matrix, it is done via cluster generation. o The various fragments will be immobilized on a tray as a matrix. o Holes, called flow cells, which are independently used for sequencing. o Flow cells are just a lawn of oligonucleotides, they are complementary to the adapter that is added. Following a denaturation step in which one strand is always removed, a process similar to PCR will be conducted. o Since the oligonucleotide lawn is fixed, the two adapters added to either ends will form a loop . o All the clusters of loops will have the same template single strand.
454 Sequencing: We need basic ingredients: template, primer, dNTPs, DNA polymerase, ATP sulfurylase, adenosine 5 phosphosulfate, luciferase, luciferin and apyrase. DNA polymerase occurs with normal DNA and dNTP, and forms a and releases a pyrophosphate. Then, the PP+APS is used by ATP sulfurylase as the enzyme to make sulfate and ATP. Luciferin+ATP, in conjunction with luciferase as the enzyme, produces oxyluciferin with light. Requires apyrase to degrades dNTPs into dNDP, dNMP and phosphate, and ATP into ADP.

These products are used to get rid of the excess ATP in the reaction. All the other components of the reaction, the free dNTPs will be degraded by apyrase. This allows the replication to continue uninhibited. Smaller peaks are due to one residue being incorporated at a time, while if more than is being incorporated at the same time, then the light peak will be much bigger!
454 fragmentation and library prep is similar to that of Illumina: Fragmentation and adapter DNA is added to either ends. Instead of being immobilized in a flow cell, they are immobilized on a bead, and then amplification occurs through PCR. The bead itself is immobilized in the well on a plate, and then detection can be easily conducted by the detector now that it is fixed.
Pros/Cons Read lengths (bases) Dideoxy Sequencing Pyro Illumina 1000 500 75 Runtime (days per gigabase) 500 2 0.5 Cost ($ per 1000 bases) $0.10 $0.02 $0.001
Dideoxy sequencing have HUGE read lengths, but the runtime is much larger and the cost per 1000 bases is higher. Pyro has a relatively long read length and has a runtime of 2 days and costs 2 cents per 1000 bases Illumina, which has a read length of 75, very small, a runtime of 0.5 days and the cost DROPS SIGNIFICANTLY. o Because the fragments are smaller, it means that the computing assemblies that piece the puzzles back together must be very good. o Larger fragment are easier to reassemble. Typical genoming projects us all of these sequencing.
As these DNA technologies are so popular, there are BETTER technologies these days: 3rd generation sequencing: Pacific Biosciences or PacBio: o Uses a single DNA molecule in each well (rather than amplification in each well). Therefore, the sensor MUST be very sensitive! o It is the DNA polymerase being immobilized rather than just the template. o Read lengths of 10,000 bases. Life Technologys Ion Torrent, or Ion. o Sequences on a semiconductor chip.

They use pH changes as dNTPs are incorporated into the DNA, rather than any dyes. Much closer to native environment of DNA sequencing. Since it is a microchip type of sequencing, Moores Law will apply, where integrative chip capacity will DOUBLE every 2 years. o They will try to take advantage of chip technology advancements. They use this for resequencing of the human genome for <$100. o o o
Bioinformatics: Assembly and quality assessment o Quality assessment is an important part of the assembly procedure so there is no incorrect input. Sequence analysis o Find genes Genome mining and database organization and management. o How to manage all the DNA data to facilitate 10s of thousands of terabytes. Phylogenetic inference o Recontrusction of a family tree Pattern recognition and image analysis. o Utilizing mutants to figure out what happens to specific genes, globally. To look at the global phenotype of the organism. o Using it for facial anaylsis Gene and regulatory networks: o How th enzymes react with each other in each reaction pathway o How do gene products interact with each other? o How the enzymes created from one part of the gene act on another part of the gne.e Modeling of complex biological and molecular processes.
Annotation was originally why bioinformatic get started. Where are each type of gene located: Exons or OPEN READING FRAME Poloyadenylation Where the transposable elements are located. Know something about non-coding DNA.
Programs for Sequence Analysis: There are two flavors of BLAST: Basic Local Alignment Search Tool. BLASTN looks at nucleotide sequence similarities. BLASTP looks at protein seuqnece similarities. We start off with a query sequence, which the program will probe a database for. o BLASTN looks for a perfect match with a certain sized region of your sequence. o Looks for local similar regions to extend the match o Extends the similarities by allowing for gaps in alignment.

Using BLAST is very important to understand the function of genes or the function of the gene products. NF1 gene is responsible for the elephant-man syndrome. HOWEVER after doing a BLAST search, it got a hit with a yeast gene Ira: a GTP-ase accelerating protein that regulates Ras (controls cell replication and differentiation). This allowed us to speculate that NF1 is regulated in a similar way as Ira. Absolutely true. This protocol must look at the RIGHT sequence: o For an example, looking at the gene of tubulin, essential for the production of microtubules. o There will be multiple copies of a single gene due to a gene duplication event, and adapted into alpha tubulin and beta tubulin. Two identical copies can exist transiently, but to persist they must adapt evolutionary function. o We must be able to link -Tubulin 1 to -Tubulin 2, not -Tubulin 2, which would lead us astray. o One can create gene tree with the speciation differences in genes. o Orthologous, or human alpha tubulin genes are orthologues of the alpha tubulin gene in flies. o Alpha tubulin and beta tubulin are paralogues and are due to a gene duplication event. While orthologues differ via a speciation event. o Orthologous tells us more closer ancestry than paralogues do. o Orthologues are related through a speciation event, as paralogues are related through a gene duplication event. We can divide a gene protein function into a specific biological function event. o There is a HUGE component in the gene content that we dont know how they function. o
Fringe Genomics: How can genomics correct a wrong, or bring back something that is distinct. Story of the Tasmanian tiger, Thylacine, which is a marsupial. Back in 1936, the settlers in Tasmania put a bounty on the Tasmanian tiger, which quickly drove the animal to extinction. They tried to have a Tasmanian tiger project, where they take the preserved DNA of a Tasmanian tiger and inject it into an embryo of a numbat, the closest relative of the Tasmanian tiger.

BIOL 200 Molecular Biology Lecture Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BIOL 200 Molecular Biology Lecture Notes

Uploaded by

Copyright:

Available Formats

Molecular Biology BIOL 200 Notes U1

Lecture 2: Nucleotides and Amino Acids

DNA stands for deoxyribonucleic acid. Can be millions of nucleotides long.

Molecular Biology BIOL 200 Notes U1

Lecture 3: DNA, RNA, Protein Structure:

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Lecture 4: Transcription and Translation:

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Lecture 5: DNA Replication

Molecular Biology BIOL 200 Notes U1

Molecular machinery involved in eukaryotic DNA replication:

Molecular Biology BIOL 200 Notes U1

DNA replication proceeds bidirectionally:

Molecular Biology BIOL 200 Notes U1

Lecture 6: DNA Mutations and Repair

DNA repair mechanisms counteract with mutations:

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Lecture 7: Cloning, PCR, DNA Sequencing

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Lecture 8: Eukaryotic Gene Structure and Non-Coding DNA

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Lecture 9: DNA transposons

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Lecture 10: Infested Genome: Retrotransposons

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Lecture 12: Organellar Genomes and DNA Barcoding

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Lecture 14: More Organellar Genomes and Barcoding

Molecular Biology BIOL 200 Notes U1

Lecture 14: Chromatins and Chromosomes

Molecular Biology BIOL 200 Notes U1

Lecture 15: Chromatin and Chromosomes

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Lecture 15: Genomics and Bioinformation

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Lecture 15: Fringe Genomics

General sequencing pipeline:

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1

Molecular Biology BIOL 200 Notes U1