Professional Documents
Culture Documents
Adrien Treuille
Carnegie Mellon University
Protein Folding
http://martin-protean.com/protein-structure.html
Protein Folding
MSFQGHGIY YIYTRLALS AYVANTRL
Amino Acid Sequence Protein Shape
Protein Folding
RNA Nanoengineering
GCUAGGCUA AUACGAUAC CAACATGA
Nucleotide Sequence Target RNA Shape
RNA Nanoengineering
Game Interface
Voting
Results
Synthesis
RNA Nanoengineering
Crowdsourcing Science
Launched 2008
Launched 2011
Protein Folding
57,000 Players
RNA Nanoengineering
25,000 Players
Computational Chemistry
Experimental Chemistry
Crowdsourcing Science
Scientists
Problem
Game
Players
Crowdsourcing Science
Scientists
Problem
Players
Game
Foldit
EteRNA
BioClipse
http://chem-bla-ics.blogspot.com/2006/04/ protein-support-in-bioclipse-using.html
Pull/Bands
Lock
Wiggle
Shake
Rebuild
Tweak
Repulsive
Attractive
Solvation
Hydrogen Bonds
Issue Analysis
Foldit
EteRNA
Interactive Biology
PhD
Target Shape
Computer Designs
Computer Designs
#4 Human Learning
A Score: 96%
#4 Human Learning
Starry's Bulged Star III by starryjess
EteRNA Score: 94% EteRNA Score: 96% EteRDing's Round 2 NA S c Viennore: 7Star Bulged 6%
lged % EteRNA Score:II 94% Star I Mat -bBulged star y sta r v1.1 ryjess by mat747
's Bu
sngiseD retupmoC
Computer Designs
sngiseD retupmoC
sngiseD reyalP
D reyalP
49 :erocS ANRetE
Computer Solutions
et ANR E
Player Solutions
37 :erocS ANRetE
49 :erocS ANRetE
57 :erocS ANRetE
Starry's Bulged Star Mat - Bulged star III v1.1 aRNA EteRN by starryjess by mat747 Desiby Ding A Sco gn 03 by Vi Vienn re: 75% enna aRNA RNA Target Shape Bot EteRN Desig A Sco A Score: 76% EteRNA Score: 75% EteRNA n 0 Score: 73% by Vi 5 Vienn re: 73% enna iennaRNA ViennaRNA ViennaRNA aRNA RNA B02 Desig Design 03 Design 05 Design ot n 02 by Vi enna ennaRNA Bot by ViennaRNA Bot by ViennaRNA Bot RNA Bo EteRNA Score: 76% EteRNA Score: 75% EteRNA Score: t 73% ViennaRNA ViennaRNA ViennaRNA Design 03 Design 05 Design 02 by ViennaRNA Bot by ViennaRNA Bot by ViennaRNA Bot
747t
ssejy
S :eroc
#4 Human Learning
#4 Human Learning
#4 Human Learning
Let us not waste even one of our precious few design slots.
The optimal secondary structure in dot-bracket notation with a minimum free energy of -39.80 kcal/mol is given below.
This line should match up with the energy in ETErna, otherwise you have chosen the wrong energy options! From here on out, I will say MFE instead of minimum free energy.
The ensemble diversity is 0.44 This is average distance, in number of base-pairs, between structures in the ensemble. So, lower is better, here we see that the remaining 20% only differ by less than one base pair, on average, from the MFE. Thats good! Note there are TWO structures displayed below, the MFE and the centroid. The centroid is exactly what it sounds like, its the middle-of-the-pack structure in the ensemble (again distance is measured in base-pairs). Since the MFE is 80% of the ensemble, the centroid is identical to the MFE, but if that percentage were lower it would not! The structures are colored by default to base pair probability, which is the probability the base is in the structure that you see. They should all be close to 1 for a good structure. But its not the end of the world if one or two base pairs dont form correctly, thats still a win if it doesnt happen very often. If its highly likely that a few base pairs will be off, but it only happens in a few ways that preserve the rest of the MFE structure, it could still win. If its highly likely we have wrong base-pairs forming and there are many ways this can happen without preserving the MFE structure, then we are toast! How do we measure the number of ways the fold is expected to go wrong weighted by how likely it is? ENTROPY, which, in the words of my physics professor, is just a fancy word for the logarithm of the number of ways. You can also think if it as disorder, but how do you count an amount of disorder? Click the box that says positional entropy to see this map:
The Vienna RNA servers are here: http://rna.tbi.univie.ac.at/ The Vienna source code is here: http://www.tbi.univie.ac.at/~ivo/RNA/ THIS is a link to a discussion on how to download the sequence files for submitted designs for lab 103 one bulge cross.
I will be referring to output from the web server version for this tutorial, but if you want to do your own analysis of more than 1 sequence at a time youre probably best off compiling and running on your own machine. Its not as hard as it sounds, you do not need to know how to program, but you do need a unix/linux environment to compile in. If you are running windows, I can highly recommend ubuntu running on virtualbox (http://www.virtualbox.org/), both are free software and very user-friendly to set up and use, beats the pants off of Cygwin.
Putting it all together: So now we know, the computer expects 80% of the test tube to fold perfectly, and 20% will have a defect, most likely to occur at the green spots on the picture above. BUT, we also know the average difference between structures is less than 1 bp from the target, so not all of the green spots will be wrong at the same time, they probably occur individually in individual molecules one at a time. So the MFE structure will be preserved, this is a win!
Here is a link to the results when the round 4 winning design by dimension 9, input into the RNAfold webserver. Note I have no idea how long that link will work, so Ill cut and paste the relevant bits here if you want to try and reproduce it. Use default settings except where mentioned below, you have to expand the show advanced options to see them sequence: GGAAGGUUCUCUGGCGUUCGUGAAAACAUGAAUGGGAGGCAUCAAGAGAUGGCUCCGCUUGUUCAAGAGAAUAGGCCCAGAGAGCAAA advanced settings:
unpaired bases can participate in at most one dangling end (MFE folding only)
(yes, for the super observant, this is the rule that lets only one side of a loop get a bonus from adding a red G)
So now you get a pretty output page with lots of details. What should you pay attention to?
You can see below this structure is not predicted to maintain the central hub, and the bottom arm probably doesnt form correctly. Most of the ensemble is NOT represented by the MFE, and
7/18/11 11:24 AM 2 of 6 7/18/11 11:24 AM 4 of 6 7/18/11 11:24 AM
1 of 6
https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...
https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...
https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...
members of the ensemble differ by 3-4 bp from each other. Note the axis on the entropy map goes to 0.8 this time. 3-4 bp is quite alot if they are right next to each other, because that means an entire arm will form wrong. Another way to tell this is to, look at the mountain plot below, where sloped lines are base-paired positions and flat lines are unpaired positions. The fact that the green (the average of the ensemble) and blue (the centroid) DONT overlap indicates we have a problem. And since the cool colors are all clustered together in groups of 3-4 bp, we could reasonably expect misfolds to lose an entire arm or worse!
Notice the cool colors represent values at the weakest spots of this structure, where entropy >0. Can you see the corresponding peaks on the entropy vs position plot below? These are the most likely positions for deviations from the MFE structure. Note the scale of the graph, 0 entropy means NO deviations, and >0 means some deviations. How is entropy calculated? This is a Shannon Entropy from information theory which is calculated where p is the probability of a particular outcome and log is the natural log (base e). Note that all of the probabilities added together have to sum to exactly 1. So if there is only 1 possibility with probability 1, -1*log(1) = 0 Say, there are 2 possibilities, one with 0.99 probability and 0.01 for the other, thats -1 *( 0.99*log(0.99) + 0.01*log(0.01)) = 0.056 : pretty darn close to 0 Say there are 100 equally likely outcomes, -1*(0.01*log(0.01)) * 100 = 4.6. Thats very big compared to 0 or 0.056. So, many numbers of equally likely outcomes means entropies much greater than 0, and in the limit that there is only a single possible way for the base to be positioned, the entropy goes to 0.
Thats all for today. In the segment II, I will explain why Christmas trees are bad using the barriers and subopt RNAfold kinetics simulation program.
Published by Google Docs Report Abuse Updated automatically every 5 minutes
So how good is the prediction compared to the lab result? Heres a snapshot in target mode of the synthesis results. You can see its not an exact prediction, but it gives alot of the right trends. Useful!
3 of 6
7/18/11 11:24 AM
5 of 6
7/18/11 11:24 AM
6 of 6
7/18/11 11:24 AM
defect, most likely to occur at the green spots on the picture above. BUT, we also know average difference between structures is less than 1 bp from the target, so not all of the spots will be wrong at the same time, they probably occur individually in individual molec a time. So the MFE structure will be preserved, this is a win!
GGAAAGUAGGAGAUGUUAGUUUGAAAGGAUUGGCCGGUGGUUUGAAAGGGCGAUUGUCUUUAGU
The optimal secondary structure in dot-bracket notation with a minimum f of -19.20 kcal/mol is given below. The frequency of the MFE structure in the ensemble is 27.45 %. The ensemble diversity is 3.47
You can see below this structure is not predicted to maintain the central hub, and the bot probably doesnt form correctly. Most of the ensemble is NOT represented by the MFE, a
6
Create Algorithms
Inverse Crowdsourcing
(but thats not all)
Backyard Biosynth
by Joshua Weizmann
man/ 26 years
(in 6 months)
#9 #10
Crowdsourcing Science
Scientists
Problem
Game
Players
Crowdsourcing Science
Scientists
Problem
Players
Game
Crowdsourcing Science
Is backyard biosyth possible? How can we trust it? Who owns these designs? Can we get the players to write a paper without us? Major progress in the next 5 years.... Maybe we can save the world.
Crowdsourcing Science
Seth Cooper
Zoran Popovi
Jee Lee
David Baker