You are on page 1of 11

University

of Santo Tomas
College of Science
Department of Biological Sciences

 
EXPERIMENT 9

BIOINFORMATICS TOOLS FOR CELL AND MOLECULAR BIOLOGY

INTRODUCTION
Bioinformatics is the mathematical, statistical and computing methods that aim to solve biological
problems using DNA and amino acid sequence and related information (Atwood and Parry-Smith,
1999). It may also be described as the application of computer technology to the management
and analysis of biological data. The result is that computers are being used to gather, store,
analyse and merge biological data. This makes this emerging field an interdisciplinary research
area that is the interface between the biological and computational sciences.
Traditionally, molecular biology research was carried out entirely at the experimental laboratory
bench but the huge increase in the scale of data being produced in this genomic era has seen a
need to incorporate computers into this research process.
Its ultimate goal is to uncover the wealth of biological information hidden in the mass of data and
obtain a clearer insight into the fundamental biology of organisms. This new knowledge could
have profound impacts on fields as varied as human health, agriculture, the environment, energy
and biotechnology.
In this experiment, the students will make use of MEGA (Molecular Evolutionary Genetics
Analysis) software which has been widely used since its creation in 1993; MEGA6 has since come
out. It uses DNA sequence, protein sequence, evolutionary distance or phylogenetic tree data.
The authors’ goal was to take advantage of advances in computer power and graphic user
interfaces to make available a ‘flexible and easy-to-use genetic data analysis workbench’ (de
Vicente, et al, 2004).However, instead of directly loading the base sequence of the target gene
into MEGA6, the student will be asked to download the sequences prior to their laboratory classes
in order to finish the experiment within the prescribed period (please see alternative procedure).

OBJECTIVES:
At the end of this exercise, you should be able to:
1. Download nucleotide/amino acid sequences from the NCBI website.
2. Align sequences using the MEGA6 software.
3. Construct a phylogenetic tree of the species included in the experiment.
4. Compute for divergence time between species.

HYPOTHESIS:
_________________________________________________________________________ 
_________________________________________________________________________ 
_________________________________________________________________________ 
_________________________________________________________________________ 

Page |46
 
University of Santo Tomas
College of Science
Department of Biological Sciences

 
MATERIALS:
MEGA6.0 (http://megasoftware.net)
Laptop/desktop computer with internet access
PROCEDURE: 
A. ALIGNING SEQUENCES
Obtaining Sequence Data from the Internet (GenBank)
(NOTE: You are going to perform the alternative procedure. The standard procedure is given here.)
Using MEGA’s integrated browser you can fetch GenBank sequence data from the NCBI
website if you have an active internet connection.
1. From the main MEGA window, select Align | Edit/Build Alignment from the main menu.

2. When prompted, select Create New Alignment and click ok. Select Protein.

Page |47
 
University of Santo Tomas
College of Science
Department of Biological Sciences

 
3. In M6: Alignment Explorer, activate MEGA’s integrated browser by selecting Web |
Query Genbank from the main menu.

4. When the NCBI: Protein site is loaded, enter rbcL (rubisco large subunit) followed by the
scientific name of the plant (i.e. rbcL allium cepa) as a search term into the search box
at the top of the screen. Press the Search button.

5. When the search results are displayed, check the box next to any item(s) you wish to
import into MEGA.

Page |48
 
University of Santo Tomas
College of Science
Department of Biological Sciences

 
6. Click on FASTA. The page will reload with the amino acid sequence in a FASTA format.
Press the Add to Alignment button (with the red + sign) located above the web address
bar. This will import the sequences into the Alignment Explorer.

7. Repeat steps 3-7 for the other plant samples.


8. With the data now displayed in the Alignment Explorer, you can close the Web
Browser window.
9. Align the new data using the steps detailed below under the heading “Aligning
Sequences by ClustalW” starting with step 12.
Alternative procedure
A. Download rbcL amino acid sequence for plant samples from http://www.ncbi.nlm.nih.gov/
protein. Please see “Worksheet” for names of plant samples.
1. In the “Search window”, type “rcbL(scientific name of plant)” (please see arrow).

Page |49
 
University of Santo Tomas
College of Science
Department of Biological Sciences

 
2. You will be given a list of sequences. Choose the complete protein.

3. Click on “GenProt” below your protein of choice.

4. Scroll down and copy amino acid sequence.

Page |50
 
University of Santo Tomas
College of Science
Department of Biological Sciences

5. Paste in an MS-Word® document. Repeat procedure for other plant species.


6. Save MS-Word® document with appropriate headings.
7. You may just copy the amino acid sequences directly into MEGA6 instead of adding from the
NCBI website.
Aligning Sequences by ClustalW
You can create a multiple sequence alignment in MEGA using either the ClustalW or Muscle
algorithms. Here we align a set of sequences using the ClustalW option. If you used the
alternate procedure, start with step 1.
1. Open “MEGA6.0.”
2. Click Align/Build Alignment.When prompted, select Create New Alignment and
click Ok. Select Protein.
3. An “M6: Alignment Explorer” will open.
4. Click on Data/Create a new alignment. Click on protein.
5. Click Edit/Insert blank sequence. The area for the new sequence will be marked as
“sequence 1”. Please see screen capture below.
6. Mouse over “sequence 1” then right-click. Click on “Edit Sequence Name.”
7. Type the name of your plant then press Tab. This will move the cursor to a small box on
the right side of the name of the plant.

Page |51
 
University of Santo Tomas
College of Science
Department of Biological Sciences

8. Go to your MS-Word® document containing the amino acid sequence for the plants in the
Worksheet. Copy sequence. (NOTE: The sequence has a space between rows. Make sure
you delete all spaces between the rows.)
9. Go back to M6 Alignment Explorer.
10. Click “Edit/Paste” (Ctl-V). The rbcLamino acid sequence will appear on the right side of the
name of the plant.

11. Do the same for the rest of the plants.


12. When done, click “Edit/Select All”.
13. Select Alignment | Align by ClustalW from the main menu to align the selected
sequences data using the ClustalW algorithm. Click the “Ok” button to accept the default
settings for ClustalW.

Page |52
 
University of Santo Tomas
College of Science
Department of Biological Sciences

 
14. Once the alignment is complete, save the current alignment session by selecting Data |
Export Data from the main menu. Give the file an appropriate name, such as
"vegetable.meg". This will allow the current alignment session to be restored for future
editing.
15. Exit the Alignment Explorer by selecting Data | Exit Aln Explorer from the main
menu.
Note: We have aligned some sequences and they are now ready to be analyzed. Whenever you
need to edit/change your sequence data, you will need to open it in the Alignment Editor and edit
or align it there. Then export it to the MEGA format and open the resulting file.

Estimating Evolutionary Distances


In this tutorial, we will estimate evolutionary distances for sequences from the 10 vegetable
species using the Pairwise Distance model.
Estimating Evolutionary Distances Using Pairwise Distance
In MEGA, you can estimate evolutionary distances between sequences by computing the
proportion of nucleotide differences between each pair of sequences.
1. Open the "vegetable.meg" data file. If needed, refer to the “MEGA Basics” tutorial.
2. From the main MEGA launch bar, select Distance | Compute Pairwise Distance.
3. In the Analysis Preferences window, click the Substitutions Type pull-down and then
select the Amino acid option.
4. Click the pull-down for Model/Method and select the p-distance model. For this
example we will be using the defaults for the remaining options. Click Compute to
begin the computation.
5. A progress indicator will appear briefly and then the distance computation results will
be displayed in grid form in a new window. Leave this window open so we can
compare the results from the next steps.

Compute the Proportion of Amino Acid Differences


You can also calculate evolutionary distances based on the proportion of amino acid
differences.
Note: MEGA will automatically translate nucleotide sequences into amino acid sequences using the
selected genetic code table. The genetic code table can be edited by Data | Select Genetic Code
Table from the main MEGA launch bar.
1. From the main MEGA window, select Distance | Compute Pairwise Distances from the
main menu. This will display the Analysis Preferences window.
2. Click the Substitutions Type pull-down, select Amino Acid and then select p-
distance under Model/Method.
3. Click the Compute button to accept the default values for the rest of the options and begin
the computation. A progress dialog box will appear briefly. As with the nucleotide

Page |53
 
University of Santo Tomas
College of Science
Department of Biological Sciences

 
estimation, a results viewer window will be displayed, showing the distances in a grid
format.
4. After you have inspected the results, use the File | Quit Viewer command to close the
results viewer.
5. Close the data by selecting the Close Data button on the main MEGA task bar.

Building Trees from Sequence Data


In this procedure, we will illustrate the procedures for building trees and in-memory sequence
data editing, using the commands available in theData and Phylogeny menus. We will be using
the "vegetable.meg" file. This file contains amino acid sequences for the large subunit rubisco
gene from different vegetable species.

Building a Neighbor-Joining (NJ) Tree


In this example, we will illustrate the basics of phylogenetic tree re-construction using MEGA
and become familiar with the Tree Explorer window.
1. Activate the "vegetable.meg" data file.
2. From the main MEGA launch bar, select Phylogeny | Construct/Test Neighbor-
Joining Tree menu option.
3. In the Analysis Preferences window select the p-distance option from the Model/Method
drop-down.
4. Click Compute to accept the defaults for the rest of the options and begin the
computation. A progress indicator will appear briefly before the tree displays in the Tree
Explorer window.
5. To select a branch, click on it with the left mouse button. If you click on a branch with the
right mouse button, you will get a small options menu that will let you flip the branch and
perform various other operations on it.
6. Select a branch and then press the Up, Down, Left, and Right arrow keys to see how
the cursor moves through the tree.
7. Change the branch style by selecting the View | Tree/Branch Style command from
the Tree Explorer main menu.
8. Select the View | Topology Only command from the Tree Explorer main menu to
display the branching pattern on the screen.
9. You can display the numerical branch lengths in the Topology Only option by
selecting View | Options and clicking on the Branchtab. Check the box labeled Display
Branch Length and click Ok.

Page |54
 
University of Santo Tomas
College of Science
Department of Biological Sciences

 
EXPERIMENT 9

BIOINFORMATICS TOOLS FOR CELL AND MOLECULAR BIOLOGY

Year & Section: ____________ Group No. : _____________


Name: _________________________ ___________________________
_________________________ ___________________________
_________________________ ___________________________

************************************************************************************************************
HYPOTHESIS:
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________

RESULTS & INTERPRETATIONS:


(please see Worksheet)

CONCLUSIONS:
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________

ANSWERS TO GUIDE QUESTIONS:


1) What is the relevance of evolutionary distances between species?
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
2) If you used DNA sequence instead of amino acid sequence, would you expect a different
phylogenetic tree? If so, to what would you attribute the difference?
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
3) Which do you think is a more reliable technique in systematics: morphological characters or
DNA/protein sequence? Explain.
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________

Page |55
 
University of Santo Tomas
College of Science
Department of Biological Sciences

 
 
 

EXPERIMENT 9 - BIOINFORMATICS TOOLS FOR MOLECULAR BIOLOGY

WORKSHEET FOR SEQUENCE ALIGNMENT

Group No. _________


Course/Year/Section: ___________________________ Lab Instructor/s: ____________________ _____________
Names: ____________________________________ ____________________ _____________
____________________________________ (Signature)

____________________________________ Date Performed: ____________________


____________________________________
____________________________________

Order Family Genus Species Common name


1 Daucus sp. D. carota carrots
2 Cucurbita sp. C. maxima squash
3 Solanum sp. S. melongena eggplant
4 Lycopersicon sp. L. esculentum tomato
5 Solanum sp. S. tuberosum potato
6 Cucurbita sp. C. pepo cucumber
7 Zingiber sp. Z. officinale cauliflower
8 Brassica sp. B. oleracea var capitata cabbage
9 Brassica sp. B. rapa var chinensis pechay
10 Lactuca sp. L. sativa lettuce

Draw a molecular phylogenetic tree Common name Scientific name

Calculate divergence time using the molecular clocks.

1 The average time required for one amino acid substitution in rbcl is calculated to be approximately 8 million years.
2 Calculate the divergence time after dividing the amino acid differences between the species by 2.

* The number of amino acid substitutions between _______________ and ____________ is (_____) amino acids.
* The divergence time between these 2 species is ___________________ years.
 

Page |56
 

You might also like