You are on page 1of 37

http://www.ebi.ac.uk/2can/bioinformatics/bioinf_what_1.

html (Bioinformatic
Introduction)
What is Bioinformatics?
1/5
!he "enomic era
!he "enomic era has seen a massi#e e$plosion in the amount of
biolo"ical information a#ailable due to hu"e ad#ances in the fields of
molecular biolo"% and "enomics.
Bioinformatics is the application of computer technolo"% to the mana"ement and anal%sis
of biolo"ical data. !he result is that computers are bein" used to "ather& store& anal%se
and mer"e biolo"ical data.
Bioinformatics is an interdisciplinar% research area that is the interface between the
biolo"ical and computational sciences. !he ultimate "oal of bioinformatics is to unco#er
the wealth of biolo"ical information hidden in the mass of data and obtain a clearer
insi"ht into the fundamental biolo"% of or"anisms. !his new knowled"e could ha#e
profound impacts on fields as #aried as human health& a"riculture& the en#ironment&
ener"% and biotechnolo"%.
'lease select a topic:
(h% is bioinformatics important)
Biolo"ical databases
Biolo"ical applications
*eal world applications of bioinformatics
(h% is bioinformatics important)
!he "reatest challen"e facin" the molecular biolo"% communit% toda% is to make sense of
the wealth of data that has been produced b% the "enome se+uencin" pro,ects.
!raditionall%& molecular biolo"% research was carried out entirel% at the e$perimental
laborator% bench but the hu"e increase in the scale of data bein" produced in this
"enomic era has seen a need to incorporate computers into this research process.
-e+uence "eneration& and its subse+uent stora"e& interpretation and anal%sis are entirel%
computer dependent tasks. .owe#er& the molecular biolo"% of an or"anism is a #er%
comple$ issue with research bein" carried out at different le#els includin" the "enome&
proteome& transcriptome and metabalome le#els. /ollowin" on from the e$plosion in
#olume of "enomic data& similar increase in data ha#e been obser#ed in the fields of
proteomics& transcriptomics and metabalomics.
!he first challen"e facin" the bioinformatics communit% toda% is the intelli"ent and
efficient stora"e of this mass of data. It is then their responsibilit% to pro#ide eas% and
reliable access to this data. !he data itself is meanin"less before anal%sis and the sheer
#olume present makes it impossible for e#en a trained biolo"ist to be"in to interpret it
manuall%. !herefore& incisi#e computer tools must be de#eloped to allow the e$traction
of meanin"ful biolo"ical information.
!here are three central biolo"ical processes around which bioinformatics tools must be
de#eloped:
012 se+uence determines protein se+uence
'rotein se+uence determines protein structure
'rotein structure determines protein function
!he inte"ration of information learned about these ke% biolo"ical processes should allow
us to achie#e the lon" term "oal of the complete understandin" of the biolo"% of
or"anisms.
Biolo"ical databases
Biolo"ical databases are archi#es of consistent data that are stored in a uniform and
efficient manner. !hese databases contain data from a broad spectrum of molecular
biolo"% areas. 'rimar% or archi#ed databases contain information and annotation of 012
and protein se+uences& 012 and protein structures and 012 and protein e$pression
profiles.
-econdar% or deri#ed databases are so called because the% contain the results of anal%sis
on the primar% resources includin" information on se+uence patterns or motifs& #ariants
and mutations and e#olutionar% relationships. Information from the literature is contained
in biblio"raphic databases& such as 3edline.
It is essential that these databases are easil% accessible and that an intuiti#e +uer% s%stem
is pro#ided to allow researchers to obtain #er% specific information on a particular
biolo"ical sub,ect. !he data should be pro#ided in a clear& consistent manner with some
#isualisation tools to aid biolo"ical interpretation.
-pecialist databases for particular sub,ects ha#e been set4up for e$ample 53B6 database
for nucleotide se+uence data& 7ni'rot8B/-wiss4'rot protein database and '0Be a 90
protein structure database.
-cientists also need to be able to inte"rate the information obtained from the underl%in"
hetero"eneous databases in a sensible manner in order to be able to "et a clear o#er#iew
of their biolo"ical sub,ect. -*- (-e+uence *etrie#al -%stem) is a powerful& +uer%in" tool
pro#ided b% the 5BI that links information from more than 15: hetero"eneous resources.
Biolo"ical applications
;nce all of the biolo"ical data is stored consistentl% and is easil% a#ailable to the
scientific communit%& the re+uirement is then to pro#ide methods for e$tractin" the
meanin"ful information from the mass of data. Bioinformatic tools are software pro"rams
that are desi"ned to carr% out this anal%sis step.
/actors that must be taken into consideration when desi"nin" these tools are:
!he end user (the biolo"ist) ma% not be a fre+uent user of computer technolo"%
!hese software tools must be made a#ailable o#er the internet "i#en the "lobal
distribution of the scientific research communit%
!he 5BI pro#ides a wide ran"e of biolo"ical data anal%sis tools that fall into the
followin" four ma,or cate"ories:
-imilarit% -earchin" !ools
'rotein /unction 2nal%sis
-tructural 2nal%sis
-e+uence 2nal%sis
-imilarit% -earchin" !ools
.omolo"ous se+uences are se+uences that are related b% di#er"ence from a common
ancestor. !hus the de"ree of similarit% between two se+uences can be measured while
their homolo"% is a case of bein" either true of false. !his set of tools can be used to
identif% similarities between no#el +uer% se+uences of unknown structure and function
and database se+uences whose structure and function ha#e been elucidated.
'rotein /unction 2nal%sis
!his "roup of pro"rams allow %ou to compare %our protein se+uence to the secondar% (or
deri#ed) protein databases that contain information on motifs& si"natures and protein
domains. .i"hl% si"nificant hits a"ainst these different pattern databases allow %ou to
appro$imate the biochemical function of %our +uer% protein.
-tructural 2nal%sis
!his set of tools allow %ou to compare structures with the known structure databases. !he
function of a protein is more directl% a conse+uence of its structure rather than its
se+uence with structural homolo"s tendin" to share functions. !he determination of a
protein<s 20/90 structure is crucial in the stud% of its function.
-e+uence 2nal%sis
!his set of tools allows %ou to carr% out further& more detailed anal%sis on %our +uer%
se+uence includin" e#olutionar% anal%sis& identification of mutations& h%dropath%
re"ions& =p> islands and compositional biases. !he identification of these and other
biolo"ical properties are all clues that aid the search to elucidate the specific function of
%our se+uence.
*eal world applications of bioinformatics
!he science of bioinformatics has man% beneficial uses in the modern da% world.
!hese include the followin":
1. 3olecular medicine
o 1.1 3ore dru" tar"ets
o 1.2 'ersonalised medicine
o 1.9 're#entati#e medicine
o 1.? >ene therap%
2. 3icrobial "enome applications
o 2.1 (aste cleanup
o 2.2 =limate chan"e
o 2.9 2lternati#e ener"% sources
o 2.? Biotechnolo"%
o 2.5 2ntibiotic resistance
o 2.@ /orensic anal%sis of microbes
o 2.A !he realit% of bioweapon creation
o 2.B 5#olutionar% studies
9. 2"riculture
o 9.1 =rops
o 9.2 Insect resistance
o 9.9 Impro#e nutritional +ualit%
o 9.? >row crops in poorer soils and that are drou"ht resistant
?. 2nimals
5. =omparati#e studies
1. 3olecular medicine
!he human "enome will ha#e profound effects on the fields of biomedical research and
clinical medicine. 5#er% disease has a "enetic component. !his ma% be inherited (as is
the case with an estimated 9:::4?::: hereditar% disease includin" =%stic /ibrosis and
.untin"tons disease) or a result of the bod%<s response to an en#ironmental stress which
causes alterations in the "enome (e". cancers& heart disease& diabetes..).
!he completion of the human "enome means that we can search for the "enes directl%
associated with different diseases and be"in to understand the molecular basis of these
diseases more clearl%. !his new knowled"e of the molecular mechanisms of disease will
enable better treatments& cures and e#en pre#entati#e tests to be de#eloped.
!op
1.1 3ore dru" tar"ets
2t present all dru"s on the market tar"et onl% about 5:: proteins. (ith an impro#ed
understandin" of disease mechanisms and usin" computational tools to identif% and
#alidate new dru" tar"ets& more specific medicines that act on the cause& not merel% the
s%mptoms& of the disease can be de#eloped. !hese hi"hl% specific dru"s promise to ha#e
fewer side effects than man% of toda%<s medicines.
!op
1.2 'ersonalised medicine
=linical medicine will become more personalised with the de#elopment of the field of
pharmaco"enomics. !his is the stud% of how an indi#idual<s "enetic inheritence affects
the bod%<s response to dru"s. 2t present& some dru"s fail to make it to the market because
a small percenta"e of the clinical patient population show ad#erse affects to a dru" due to
se+uence #ariants in their 012.
2s a result& potentiall% life sa#in" dru"s ne#er make it to the marketplace. !oda%& doctors
ha#e to use trial and error to find the best dru" to treat a particular patient as those with
the same clinical s%mptoms can show a wide ran"e of responses to the same treatment. In
the future& doctors will be able to anal%se a patient<s "enetic profile and prescribe the best
a#ailable dru" therap% and dosa"e from the be"innin".
!he followin" article contains more information:
pharmaco"enomics
!op
1.9 're#entati#e medicine
(ith the specific details of the "enetic mechanisms of diseases bein" unra#elled& the
de#elopment of dia"nostic tests to measure a persons susceptibilit% to different diseases
ma% become a distinct realit%. 're#entati#e actions such as chan"e of lifest%le or ha#in"
treatment at the earliest possible sta"es when the% are more likel% to be successful& could
result in hu"e ad#ances in our stru""le to con+uer disease.
!he followin" article contains more information:
"ene testin"
!op
1.? >ene therap%
In the not too distant future& the potential for usin" "enes themsel#es to treat disease ma%
become a realit%. >ene therap% is the approach used to treat& cure or e#en pre#ent disease
b% chan"in" the e$pression of a persons "enes. =urrentl%& this field is in its infantile sta"e
with clinical trials for man% different t%pes of cancer and other diseases on"oin".
!he followin" articles contain more information:
"ene therap% present/future
fundamentals of "ene therap%
"ene therap% clinical trials
+uestion/answers about "ene therap%
"ene therap% in the %ear 2:2:
"ene therap% research re+uirements
2. 3icrobial "enome applications
3icroor"anisms are ubi+uitous& that is the% are found e#er%where. !he% ha#e been found
sur#i#in" and thri#in" in e$tremes of heat& cold& radiation& salt& acidit% and pressure. !he%
are present in the en#ironment& our bodies& the air& food and water.
!raditionall%& use has been made of a #ariet% of microbial properties in the bakin"&
brewin" and food industries. !he arri#al of the complete "enome se+uences and their
potential to pro#ide a "reater insi"ht into the microbial world and its capacities could
ha#e broad and far reachin" implications for en#ironment& health& ener"% and industrial
applications. /or these reasons& in 1CC?& the 7- 0epartment of 5ner"% (0;5) initiated
the 3>' (3icrobial >enome 'ro,ect) to se+uence "enomes of bacteria useful in ener"%
production& en#ironmental cleanup& industrial processin" and to$ic waste reduction.
B% stud%in" the "enetic material of these or"anisms& scientists can be"in to understand
these microbes at a #er% fundamental le#el and isolate the "enes that "i#e them their
uni+ue abilities to sur#i#e under e$treme conditions.
!op
2.1 (aste cleanup
0einococcus radiodurans is known as the world<s tou"hest bacteria and it is the most
radiation resistant or"anism known. -cientists are interested in this or"anism because of
its potential usefulness in cleanin" up waste sites that contain radiation and to$ic
chemicals.
!he followin" articles contain more information:
superbu" 4 cleans up uranium from to$ic waste sites
3icrobial >enome 'ro"ram (3>') scientists are determinin" the 012 se+uence of the
"enome of =. crescentus& one of the or"anisms responsible for sewa"e treatment.
!op
2.2 =limate chan"e
Increasin" le#els of carbon dio$ide emission& mainl% throu"h the e$pandin" use of fossil
fuels for ener"%& are thou"ht to contribute to "lobal climate chan"e. *ecentl%& the 0;5
(0epartment of 5ner"%& 7-2) launched a pro"ram to decrease atmospheric carbon
dio$ide le#els. ;ne method of doin" so is to stud% the "enomes of microbes that use
carbon dio$ide as their sole carbon source.
!he followin" articles contain more information:
microbes and climate chan"e
!op
2.9 2lternati#e ener"% sources
-cientists are stud%in" the "enome of the microbe =hlorobium tepidum which has an
unusual capacit% for "eneratin" ener"% from li"ht.
!op
2.? Biotechnolo"%
!he archaeon 2rchaeo"lobus ful"idus and the bacterium !hermoto"a maritima ha#e
potential for practical applications in industr% and "o#ernment4funded en#ironmental
remediation. !hese microor"anisms thri#e in water temperatures abo#e the boilin" point
and therefore ma% pro#ide the 0;5& the 0epartment of 0efence& and pri#ate companies
with heat4stable enD%mes suitable for use in industrial processes.
;ther industriall% useful microbes include& =or%nebacterium "lutamicum which is of
hi"h industrial interest as a research ob,ect because it is used b% the chemical industr% for
the biotechnolo"ical production of the amino acid l%sine. !he substance is emplo%ed as a
source of protein in animal nutrition. 6%sine is one of the essential amino acids in animal
nutrition. Biotechnolo"icall% produced l%sine is added to feed concentrates as a source of
protein& and is an alternati#e to so%beans or meat and bonemeal.
Eanthomonas campestris p#. is "rown commerciall% to produce the e$opol%saccharide
$anthan "um& which is used as a #iscosif%in" and stabilisin" a"ent in man% industries.
6actococcus lactis is one of the most important micro4or"anisms in#ol#ed in the dair%
industr%& it is a non4patho"enic rod4shaped bacterium that is critical for manufacturin"
dair% products like buttermilk& %o"urt and cheese. !his bacterium& 6actococcus lactis
ssp.& is also used to prepare pickled #e"etables& beer& wine& some breads and sausa"es and
other fermented foods. *esearchers anticipate that understandin" the ph%siolo"% and
"enetic make4up of this bacterium will pro#e in#aluable for food manufacturers as well
as the pharmaceutical industr%& which is e$plorin" the capacit% of 6. lactis to ser#e as a
#ehicle for deli#erin" dru"s.
!op
2.5 2ntibiotic resistance
-cientists ha#e been e$aminin" the "enome of 5nterococcus faecalis a leadin" cause of
bacterial infection amon" hospital patients. !he% ha#e disco#ered a #irulence re"ion
made up of a number of antibiotic4resistant "enes that ma% contribute to the bacterium<s
transformation from a harmless "ut bacteria to a menacin" in#ader. !he disco#er% of the
re"ion& known as a patho"enicit% island& could pro#ide useful markers for detectin"
patho"enic strains and help to establish controls to pre#ent the spread of infection in
wards.
!he followin" articles contain more information:
3*-2 superbu"
rise of antibiotic resistance
disco#er% of no#el antibiotics
!op
2.@ /orensic anal%sis of microbes
-cientists used their "enomic tools to help distin"uish between the strain of Bacillus
anthracis that was used in the summer of 2::1 terrorist attack in /lorida with that of
closel% related anthra$ strains.
!op
2.A !he realit% of bioweapon creation
-cientists ha#e recentl% built the #irus poliom%elitis usin" entirel% artificial means. !he%
did this usin" "enomic data a#ailable on the Internet and materials from a mail4order
chemical suppl%. !he research was financed b% the 7- 0epartment of 0efence as part of
a biowarfare response pro"ram to pro#e to the world the realit% of bioweapons. !he
researchers also hope their work will discoura"e officials from e#er rela$in" pro"rams of
immunisation. !his pro,ect has been met with #er% mi$ed feeelin"s& more...
!op
2.B 5#olutionar% studies
!he se+uencin" of "enomes from all three domains of life& eukar%ota& bacteria and
archaea means that e#olutionar% studies can be performed in a +uest to determine the tree
of life and the last uni#ersal common ancestor.
/or more interestin" stories& check the archi#e at the >enome 1ews 1etwork (>11).
/or information on structural& functional and comparati#e anal%sis of "enomes and "enes
from a wide #ariet% of or"anisms see !he Institute of >enomic *esearch (!I>*).
!op
9. 2"riculture
!he se+uencin" of the "enomes of plants and animals should ha#e enormous benefits for
the a"ricultural communit%. Bioinformatic tools can be used to search for the "enes
within these "enomes and to elucidate their functions. !his specific "enetic knowled"e
could then be used to produce stron"er& more drou"ht& disease and insect resistant crops
and impro#e the +ualit% of li#estock makin" them healthier& more disease resistant and
more producti#e.
!op
9.1 =rops
=omparati#e "enetics of the plant "enomes has shown that the or"anisation of their "enes
has remained more conser#ed o#er e#olutionar% time than was pre#iousl% belie#ed.
!hese findin"s su""est that information obtained from the model crop s%stems can be
used to su""est impro#ements to other food crops. 2rabidopsis thaliana (water cress) and
;r%Da sati#a (rice) are e$amples of a#ailable complete plant "enomes.
!op
9.2 Insect resistance
>enes from Bacillus thurin"iensis that can control a number of serious pests ha#e been
successfull% transferred to cotton& maiDe and potatoes. !his new abilit% of the plants to
resist insect attack means that the amount of insecticides bein" used can be reduced and
hence the nutritional +ualit% of the crops is increased.
!op
9.9 Impro#e nutritional +ualit%
-cientists ha#e recentl% succeeded in transferrin" "enes into rice to increase le#els of
Fitamin 2& iron and other micronutrients. !his work could ha#e a profound impact in
reducin" occurrences of blindness and anaemia caused b% deficiencies in Fitamin 2 and
iron respecti#el%.
-cientists ha#e inserted a "ene from %east into the tomato& and the result is a plant whose
fruit sta%s lon"er on the #ine and has an e$tended shelf life& more...
!op
9.? >row in poorer soils and drou"ht resistant
'ro"ress has been made in de#elopin" cereal #arieties that ha#e a "reater tolerance for
soil alkalinit%& free aluminium and iron to$icities. !hese #arieties will allow a"riculture to
succeed in poorer soil areas& thus addin" more land to the "lobal production base.
*esearch is also in pro"ress to produce crop #arieties capable of toleratin" reduced water
conditions.
!op
?. 2nimals
-e+uencin" pro,ects of man% farm animals includin" cows& pi"s and sheep are now well
under wa% in the hope that a better understandin" of the biolo"% of these or"anisms will
ha#e hu"e impacts for impro#in" the production and health of li#estock and ultimatel%
ha#e benefits for human nutrition.
!he followin" articles contain more information:
5nsembl "enome browser
animal databases at the *oslin Institute
!op
5. =omparati#e studies
2nal%sin" and comparin" the "enetic material of different species is an important method
for stud%in" the functions of "enes& the mechanisms of inherited diseases and species
e#olution. Bioinformatics tools can be used to make comparisons between the numbers&
locations and biochemical functions of "enes in different or"anisms.
;r"anisms that are suitable for use in e$perimental research are termed model or"anisms.
!he% ha#e a number of properties that make them ideal for research purposes includin"
short life spans& rapid reproduction& bein" eas% to handle& ine$pensi#e and the% can be
manipulated at the "enetic le#el.
2n e$ample of a human model or"anism is the mouse. 3ouse and human are #er%
closel% related (CBG) and for the most part we see a one to one correspondence between
"enes in the two species. 3anipulation of the mouse at the molecular le#el and "enome
comparisons between the two species can and is re#ealin" detailed information on the
functions of human "enes& the e#olutionar% relationship between the two species and the
molecular mechanisms of man% human diseases.
!he followin" articles contain more information:
functional and comparati#e "enomics fact sheet 4 wh% use mouse)
what is a model or"anism)
model or"anisms for biomedical research
!op
In this ne$t section ne$t we will look at a basic introduction to molecular biolo"%

http://www.ebi.ac.uk/2can/databases/inde$.html (2kses ke 1=BI. 066.)


5$plosi#e "rowth in biolo"ical data
*ecent %ears ha#e seen an e$plosi#e "rowth in biolo"ical data& which is usuall% no lon"er
published in a con#entional sense& but deposited in a database and assi"ned a uni+ue
identif%in" number for +uotation in publications.
-e+uence data from me"a4se+uencin" pro,ects ma% not e#en be linked to a con#entional
publication. !his trend and the need for computational anal%ses of the data has made
databases essential tools for biolo"ical research.
!he "oal of this material is to describe the different molecular biolo"% databases
a#ailable to researchers. !here are so man% specialised databases& that it is not reasonable
to list the 7*6s of all of them& especiall% since this cate"or% of databases is +uite
chan"eable and an% list pro#ided here would soon be outdated. /or e$ample <8ar%n<s
>enomes< an information resource about all or"anisms whose "enomes ha#e been
completel% se+uenced& this could be placed in either the <>enomic 0atabases< section or
<Biblio"raphic 0atabases< section.
.owe#er& the 5$pas% bioinformatics team in -witDerland do ha#e a list of information
sources for molecular biolo"ists& which is kept constantl% up4to4date.
'lease select a topic:
Biblio"raphic 0atabases
!a$onomic 0atabases
1ucleotide 0atabases
>enomic 0atabases
'rotein 0atabases
3icroarra% 0atabases
http://www.ebi.ac.uk/2can/databases/dna.html (26232! BI;I1/;*32!I8 : 1=BI
53B6 066)
!he International 1ucleotide -e+uence 0atabase =ollaboration
!his collaboration is a ,oint operation b% 53B64Bank at the 5uropean Bioinformatics
Institute (5BI)& the 012 0ata Bank of Hapan (00BH) at the =enter for Information
Biolo"% (=IB) and >enBank at the 1ational =enter for Biotechnolo"% Information
(1=BI).

In 5urope& the #ast ma,orit% of the nucleotide se+uence data produced is collected&
or"anised and distributed b% the 53B6 1ucleotide -e+uence 0atabase located at the 5BI
in =ambrid"e 78& an ;utstation of the 5uropean 3olecular Biolo"% 6aborator% (53B6)
in .eidelber"& >erman%.
!he nucleotide se+uence databases are data repositories& acceptin" nucleic acid se+uence
data from the scientific communit% and makin" it freel% a#ailable. !he databases stri#e
for completeness& with the aim of recordin" e#er% publicl% known nucleic acid se+uence.
!hese data are hetero"enous& the% #ar% with respect to the source of the material (e.".
"enomic #ersus c012)& the intended +ualit% (e.". finished #ersus sin"le pass se+uences)&
the e$tent of se+uence annotation and the intended completeness of the se+uence relati#e
to its biolo"ical tar"et (e.". complete #ersus partial co#era"e of a "ene or a "enome). !he
nucleotide databases are distributed free of char"e o#er the internet.
00BH& >enBank and 53B64Bank e$chan"e new and updated data on a dail% basis to
achie#e optimal s%nchronisation. !he result is that the% contain e$actl% the same
information& e$cept for se+uences that ha#e been added in the last 2? hours.
53B6I
5$planation of an 53B6 entr%
!he 53B6 1ucleotide se+uence database (also known as 53B64Bank) is di#ided into
sections that reflect ma,or ta$onomic di#isions.
!hese ta$onomic di#isions include:
In#ertebrates
;ther 3ammals
3us musculus
;r"anelles
Bacteriopha"e
'lants
'rokar%otes
*odents
7nclassified Firuses
;ther Fertebrates
1on ta$onomic se+uence "roups also part of 53B64Bank are:
patents
ht"
htc
"ss
w"s
est (althou"h some are within species specific files)
5ach entr% in a database must ha#e a uni+ue identifier that is a strin" of letters and/or
numbers that onl% that record has. !his uni+ue identifier& which is known as the accession
number& can be +uoted in the scientific literature& as it will ne#er chan"e.
2s the accession number must alwa%s remain the same& another code is used to indicate
the different #ersions due to se+uence corrections. Jou should therefore alwa%s take care
to +uote both the uni+ue identifier and the #ersion number& when referrin" to records in a
nucleotide se+uence database.
=lick here to see a 53B6 nucleotide se+uence e$ample entr%
!he nucleotide se+uence identifier is of the form of <2ccession.Fersion< (e"&
2H::::12.1). !he first part is the ne#er chan"in" accession number& followed b% a period
and a #ersion number. !he accession number part will be stable& but the #ersion part will
be incremented when the se+uence chan"es.
2lthou"h the nucleotide se+uence data are checked for inte"rit% and ob#ious errors b% the
data librar% staff& the +ualit% of the data is the responsibilit% of the submitter. 2s a
conse+uence& there are man% errors in the database: man% se+uence entries are either
mislabelled& contaminated& incompletel% or erroneousl% annotated& or contain se+uencin"
errors. In addition& the database is #er% redundant& in the sense that the same se+uence
from the same or"anism ma% be included man% times& simpl% reflectin" the redundanc%
of the ori"inal scientific reports.
http://www.ebi.ac.uk/2can/tutorials/inde$.html (BI;I1/;*34!7!;*I26)
Bioinformatics tools
6earn how to use the tools at the 5BI to find out more about %our nucleotide or protein
se+uences.
Jou will be "uided throu"h a series of e$ercises usin" sample fra"ments of se+uence. !o
"ain more information about these se+uences& %ou will use a #ariet% of tools to compare
the se+uences to databases and anal%se them.
Please choose a tutorial:
1ucleotide 2nal%sis
!opics includeK check for #ector contamination& B62-! similarit% search&
/2-!2 similarit% search& whole "enome search& pairwise ali"nment& =lustal(2
multiple se+uence ali"nment and translation of 012/*12 into protein.
'rotein and 'roteomic 2nal%sis
!opics includeK B62-! similarit% search& /2-!2 similarit% search& whole
"enome search& pairwise ali"nment& =lustal(2 multiple se+uence ali"nment and
a proteomics e$ample.
'rotein /unction
!opics includeK motif searchin" usin" Inter'ro-can& /in"er'*I1!-can and
''-earch.
'rotein -tructure
!opics includeK introduction to nucleotide structures& introduction to protein
structures and bioinformatics for protein structure prediction.
>enome Browsin"
!opics includeK information on the two main "enome browsers a#ailable 5nsembl
in the 78 and 7=-= >enome Bioinformatics in the 7-2.
0atabase Browsin"
!opics includeK summar% of the interfaces a#ailable for searchin" mulitple
databases and also a detailed e$planation of the -e+uence *etrie#al -%stem (-*-)
pro#ided b% the 5BI.
5BI (eb -er#ices
!opics include: (hat are -;2' (eb -er#ices& where to "et these fromK how to
use them and how to de#elop applications with these.
http://www.ebi.ac.uk/2can/tutorials/aa.html ('*;!5I1 =;05)
Protein Sequences
!he one4letter and three4letter abbre#iation codes for amino acids for e$ample& used in
7ni'rot8B/-wiss4'rot are those adopted b% the commission on Biochemical
1omenclature of the I7'2=4I7B and are as follows:
One-letter code Three-letter code Amino-acid name
2 2la 2lanine
* 2r" 2r"inine
1 2sn 2spara"ine
0 2sp 2spartic acid
= =%s =%steine
L >ln >lutamine
5 >lu >lutamic acid
> >l% >l%cine
. .is .istidine
H Ele 6eucine or Isoleucine
6 6eu 6eucine
I I6e Isoleucine
8 6%s 6%sine
3 3et 3ethionine
/ 'he 'hen%lalanine
' 'ro 'roline
; '%l '%rrol%sine
7 -ec -elenoc%steine
- -er -erine
! !hr !hreonine
( !rp !r%ptophan
J !%r !%rosine
F Fal Faline
B 2s$ 2spartic acid or 2spara"ine
M >l$ >lutamic acid or >lutamine
E Eaa 2n% amino acid
3ore about amino acids.
http://www.ebi.ac.uk/2can/tutorials/nucleotide/inde$.html (!7!;*I26 17=65;!I0)
Introduction
-uppose %ou are a 3olecular Biolo"ist& who has disco#ered an unknown fra"ment of
012 deduced from a "el& which %ou ha#e had se+uenced. Jou will want to tr% and
find out as much as %ou can about it.
/or e$ample is it contaminated with #ector se+uences) Is it an alread% known "ene. Is
it related to an% other "enes either b% ha#in" a common e#olutionar% ancestor or is it
similar in function to other "enes #ia con#er"ent e#olution) (hat could the protein
se+uence be for this nucleotide fra"ment if it is translated and what mi"ht this be like)
Before lookin" at a se+uence to find all this information& we will check an unknown
se+uence for #ector contamination.
-tart the tutorial b% checkin" for #ector contamination in %our se+uence
NNN 're#ious OO -tart of 6esson OO 1e$t
Checking for vector contamination - Introduction
012/*12 from a biolo"ical source are usuall% inserted into a clonin" #ector (e.".
plasmid or pha"e) so that the% can be cloned. -e+uencin" of such constructs fre+uentl%
produces raw se+uences that include se"ments deri#ed from #ectors. 2lso as part of the
clonin" or amplification process& oli"onucleotides can be attached 012/*12 under
in#esti"ation and se+uences of these oli"onucleotides are often present in raw se+uences
and will contaminate the finished se+uence unless the% are identified and remo#ed. 2lso
transposable elements from the clonin" host ("enerall% bacteria or %east) can insert itself
into the cloned 012/*12 while the clone is bein" propa"ated and will then be
se+uenced& as can 012/*12 contaminants.
3ore than 2:: reports can be found in the public literature on the sub,ect of
contamination b% #ector se+uences of the ma,or se+uence databases. 2ll of these reports
tr% to alert the scientific communit% of the potential pitfalls associated. Fector se+uences
can be found in the se+uence databanks for #arious reasons. !he most common cause is
that the submitters for"et to remo#e them. 2nother is that the% are submitted to the
databanks because the% are& after all& #ectors which others mi"ht find useful.
!he publication of a newl% disco#ered "ene or "ene fra"ment re+uires that people submit
the se+uence to the public databanks in order to obtain an accession number. !his number
is uni+ue for each submission and helps scientists identif% their se+uence amon"st the
m%riad of se+uences bein" produced each da% b% the man% world4wide se+uencin"
efforts. /urthermore& without an accession number& there can be no publication.
-cientists are often in a hurr% to submit their se+uences and innocentl% for"et to remo#e a
crucial part of the clonin" #ector the% used to obtain it. !his is fre+uentl% the pol%4linker
and inserts related to it. ;ther parts of clonin" #ectors are also to be found in eukar%otic
se+uence submissions& which accidentall%& ha#e made their wa% into an otherwise
"enuine "ene. In "eneral this implies that some sort of rearran"ement has taken place
durin" the clonin" e$periments. 2ccidents do happen and the more we se+uence the more
submissions with #ector contamination will occur.
In an effort to assist the submitters& the 5BI is now pro#idin" this Fector -creenin"
-er#ice called B62-!2 5F5=. !his is based on 1=BI B62-!2 and uses the latest
implementation of the B62-! al"orithm and a special se+uence databank known as
53F5= to check %our se+uences for #ector contamination.
53F5= is an e$traction of se+uences from the s%nthetic di#ision of 53B6 containin"
o#er 2::: se+uences commonl% used in clonin" and se+uencin" e$periments. 53F5= is
b% no means a complete #ector databank but the 5BI belie#es it is representati#e of the
kind of material used in modern se+uencin" and should be useful to submitters.
.ow to run a check for #ector contamination
http:!!!"e#i"ac"uk$cantutorialsnucleotidevector%"h
tml &CO'TO( OP)*ASIO'A+,
*unning a -o# !ith B+AST$ ).)C
7sin" B62-!2 5F5= %ou can now check 2 unknown se+uences for contamination.
-e+uence 1
-e+uence 2
!he form for the B62-!2 5F5= tool looks as follows:
http://www.ebi.ac.uk/!ools/sss/ncbiblast/#ectors.html (=2*2 ;'5*2-I;126
B62-!)
!he se+uence is entered into the te$tbo$ in /2-!2 format& which consists of a
one4line header startin" with a PP s%mbol& followed b% the se+uence name. !he
se+uence is then entered on new line(s). Jou can find out more about se+uence
formats here.
Pinteracti#eP is chosen so that I will ha#e the results deli#ered to the browser as
soon as the% are a#ailable. 2lternati#el% %ou can chose PemailP and fill in %our
email and ha#e the results deli#ered #ia email.
!he title of the search is left as P-e+uenceP althou"h %ou can "i#e %our search title
an% name %ou wish to help %ou identif% the results.
!he B62-!1 pro"ram is used& which is desi"ned to search a nucleotide +uer%
se+uence a"ainst a 012 databank& in this case the em#ec database.
!he number of scores (hits to the database) and the number of ali"nments of these
a"ainst the +uer% se+uence to& is left to& in each case 5 to limit the siDe of the
output results.
;ther options ha#e been left on PdefaultP
Jou now can either "o to the B62-!2 5F5= pa"e and run the searches %ourself
or #iew the sample results for se+uence 1 and se+uence 2.
(hich se+uence is contaminated)
-ee an e$planation of the results of B62-!2 5F5=
NNN 're#ious OO -tart of 6esson OO 1e$t
http://www.ncbi.nlm.nih."o#/2bout/primer/bioinformatics.html (Bioinformatics)
1ational =enter for Biotechnolo"% Information (1=BI)
/ust the 0acts: A Basic Introduction to the
Science 1nderl2ing 'CBI *esources
BIOI'0O*3ATICS

;#er the past few decades& ma,or
ad#ances in the field of molecular
biolo"%& coupled with ad#ances in
"enomic technolo"ies& ha#e led to
an e$plosi#e "rowth in the
biolo"ical information "enerated
b% the scientific communit%. !his
delu"e of "enomic information
has& in turn& led to an absolute
re+uirement for computeriDed
databases to store& or"aniDe& and inde$ the data and for specialiDed tools
to #iew and anal%De the data.


4hat Is a Biological 5ata#ase6
2 #iological data#ase is a lar"e& or"aniDed bod% of persistent data&
usuall% associated with computeriDed software desi"ned to update&
+uer%& and retrie#e components of the data stored within the s%stem. 2
simple database mi"ht be a sin"le file containin" man% records& each of
which includes the same set of information. /or e$ample& a record
associated with a nucleotide se+uence database t%picall% contains
information such as contact name& the input se+uence with a description
of the t%pe of molecule& the scientific name of the source or"anism from
which it was isolated& and often& literature citations associated with the
se+uence.
!he completion of a Pworkin"
draftP of the human "enome44an
important milestone in the
.uman >enome 'ro,ect44was
announced in Hune 2::: at a
press conference at the (hite
.ouse and was published in the
/ebruar% 15& 2::1 issue of the
,ournal Nature.
/or researchers to benefit from the data stored in a database& two
additional re+uirements must be met:
eas% access to the information
a method for e$tractin" onl% that information needed to answer a
specific biolo"ical +uestion
2t 1=BI& man% of our databases are linked
throu"h a uni+ue search and retrie#al s%stem&
called )ntre7. 5ntreD (pronounced ahn< tra%)
allows a user to not onl% access and retrie#e
specific information from a sin"le database but
to access inte"rated information from man%
1=BI databases. /or e$ample& the 5ntreD 'rotein
database is cross4linked to the 5ntreD !a$onom%
database. !his allows a researcher to find
ta$onomic information (ta8onom2 is a di#ision
of the natural sciences that deals with the
classification of animals and plants) for the species from which a protein
se+uence was deri#ed.


4hat Is Bioinformatics6
!he data in
>enBank are made
a#ailable in a
#ariet% of wa%s&
each tailored to a
particular use& such
as data submission
or se+uence
searchin".
Bioinformatics is the field of science in which
biolo"%& computer science& and information
technolo"% mer"e to form a sin"le discipline. !he
ultimate "oal of the field is to enable the disco#er%
of new biolo"ical insi"hts as well as to create a
"lobal perspecti#e from which unif%in" principles
in biolo"% can be discerned. 2t the be"innin" of
the P"enomic re#olutionP& a bioinformatics concern
was the creation and maintenance of a database to
store biolo"ical information& such as nucleotide and
amino acid se+uences. 0e#elopment of this t%pe of database in#ol#ed
not onl% desi"n issues but the de#elopment of comple$ interfaces
whereb% researchers could both access e$istin" data as well as submit
new or re#ised data.
7ltimatel%& howe#er& all of this information must be combined to form a
comprehensi#e picture of normal cellular acti#ities so that researchers
ma% stud% how these acti#ities are altered in different disease states.
!herefore& the field of bioinformatics has e#ol#ed such that the most
pressin" task now in#ol#es the anal%sis and interpretation of #arious
t%pes of data& includin" nucleotide and amino acid se+uences& protein
domains& and protein structures. !he actual process of anal%Din" and
interpretin" data is referred to as computational #iolog2. Important sub4
disciplines within bioinformatics and computational biolo"% include:
the de#elopment and implementation of tools that enable efficient
access to& and use and mana"ement of& #arious t%pes of
information
the de#elopment of new al"orithms (mathematical formulas) and
statistics with which to assess relationships amon" members of
lar"e data sets& such as methods to locate a "ene within a
se+uence& predict protein structure and/or function& and cluster
protein se+uences into families of related se+uences


4h2 Is Bioinformatics So Important6
Biolo"% in the
21st centur% is
bein"
transformed from
a purel% lab4
based science to
an information
science as well.
!he rationale for appl%in" computational
approaches to facilitate the understandin" of
#arious biolo"ical processes includes:
a more "lobal perspecti#e in e$perimental desi"n
the abilit% to capitaliDe on the emer"in" technolo"% of data#ase-
mining 4 the process b% which testable h%potheses are "enerated
re"ardin" the function or structure of a "ene or protein of interest
b% identif%in" similar se+uences in better characteriDed
or"anisms


)volutionar2 Biolog2
1ew insi"ht into the molecular basis of a disease ma% come from
in#esti"atin" the function of homolo"s of a disease "ene in model
or"anisms. In this case& homolog2 refers to two "enes sharin" a common
e#olutionar% histor%. -cientists also use the term homolo"%& or
homolo"ous& to simpl% mean similar& re"ardless of the e#olutionar%
relationship.

5+uall% e$citin" is the potential for unco#erin" e#olutionar%
relationships and patterns between different forms of life. (ith the aid of
nucleotide and protein se+uences& it should be possible to find the
ancestral ties between different or"anisms. !hus far& e$perience has
tau"ht us that closel% related or"anisms ha#e similar se+uences and that
more distantl% related or"anisms ha#e more dissimilar se+uences.
'roteins that show a si"nificant se+uence conser#ation& indicatin" a clear
e#olutionar% relationship& are said to be from the same protein famil2.
B% stud%in" protein folds (distinct protein buildin" blocks) and families&
scientists are able to reconstruct the e#olutionar% relationship between
two species and to estimate the time of di#er"ence between two
or"anisms since the% last shared a common ancestor.
2lthou"h a human
disease ma% not be
found in e$actl% the
same form in animals&
there ma% be
sufficient data for an
animal model that
allow researchers to
make inferences
about the process in
humans.
1=BI<s =;>s
database has been
desi"ned to simplif%
e#olutionar% studies of
complete "enomes and
to impro#e functional
assi"nment of
indi#idual proteins.

Ph2logenetics is the field of biolo"% that deals with
identif%in" and understandin" the relationships
between the different kinds of life on earth.


Protein 3odeling
!he process of e#olution has resulted in the production of 012
se+uences that encode proteins with specific functions. In the absence of
a protein structure that has been determined b% E4ra% cr%stallo"raph% or
nuclear ma"netic resonance (13*) spectroscop%& researchers can tr% to
predict the three4dimensional structure usin" protein or molecular
modeling. !his method uses e$perimentall% determined protein
structures &templates, to predict the structure of another protein that has
a similar amino acid se+uence &target,.
2lthou"h molecular modelin" ma% not be as accurate at determinin" a
protein<s structure as e$perimental methods& it is still e$tremel% helpful
in proposin" and testin" #arious biolo"ical h%potheses. 3olecular
modelin" also pro#ides a startin" point for researchers wishin" to
confirm a structure throu"h E4ra% cr%stallo"raph% and 13*
spectroscop%. Because the different "enome pro,ects are producin" more
se+uences and because no#el protein folds and families are bein"
determined& protein modelin" will become an increasin"l% important tool
for scientists workin" to understand normal and disease4related
processes in li#in" or"anisms.

The 0our Steps of Protein 3odeling
Identif% the proteins with known three4dimensional structures
that are related to the tar"et se+uence
2li"n the related three4dimensional structures with the tar"et
se+uence and determine those structures that will be used as
templates
=onstruct a model for the tar"et se+uence based on its
ali"nment with the template structure(s)
5#aluate the model a"ainst a #ariet% of criteria to determine if
it is satisfactor%


9enome 3apping
9enomic maps ser#e as a scaffold for orientin" se+uence information. 2
few %ears a"o& a researcher wantin" to localiDe a "ene& or nucleotide
se+uence& was forced to manuall% map the "enomic re"ion of interest& a
time4consumin" and often painstakin" process. !oda%& thanks to new
technolo"ies and the influ$ of se+uence data& a number of hi"h4+ualit%&
"enome4wide maps are a#ailable to the scientific communit% for use in
their research.
=omputeriDed maps make "ene huntin" faster& cheaper& and more
practical for almost an% scientist. In a nutshell& scientists would first use
a "enetic map to assi"n a "ene to a relati#el% small area of a
chromosome. !he% would then use a ph%sical map to e$amine the re"ion
of interest close up& to determine a "ene<s precise location. In li"ht of
these ad#ances& a researcher<s burden has shifted from mappin" a
"enome or "enomic re"ion of interest to na#i"atin" a #ast number of
(eb sites and databases.


3ap .ie!er: A Tool for .isuali7ing 4hole 9enomes or
Single Chromosomes
1=BI<s 3ap .ie!er is a tool that allows a user to #iew an or"anism<s
complete "enome& inte"rated maps for each chromosome (when
a#ailable)& and/or se+uence data for a "enomic re"ion of interest. (hen
usin" 3ap Fiewer& a researcher has the option of selectin" either a
P(hole4>enome FiewP or a P=hromosome or 3ap FiewP. !he >enome
Fiew displa%s a schematic for all of an or"anismQs chromosomes&
whereas the 3ap Fiew shows one or more detailed maps for a sin"le
chromosome. If more than one map e$ists for a chromosome& 3ap
Fiewer allows a displa% of these maps simultaneousl%.

1sing 3ap .ie!er: researchers can find ans!ers
to questions such as:
(here does a particular "ene e$ist within an
or"anism<s "enome)
(hich "enes are located on a particular
chromosome and in what order)
(hat is the correspondin" se+uence data for
a "ene that e$ists in a particular
chromosomal re"ion)
(hat is the distance between two "enes)

!he rapidl% emer"in" field of bioinformatics promises to lead to
ad#ances in understandin" basic biolo"ical processes and& in turn&
ad#ances in the dia"nosis& treatment& and pre#ention of man% "enetic
diseases. Bioinformatics has transformed the discipline of biolo"% from a
purel% lab4based science to an information science as well. Increasin"l%&
biolo"ical studies be"in with a scientist conductin" #ast numbers of
database and (eb site searches to formulate specific h%potheses or to
desi"n lar"e4scale e$periments. !he implications behind this chan"e& for
both science and medicine& are sta""erin".

Back to top

*e#ised: 3arch 2C& 2::?.
http://www.ncbi.nlm.nih."o#/2bout/model/otheror".html
What Is a Model Organism?
;#er the last centur%& research on a small number of or"anisms has pla%ed a pi#otal role in
ad#ancin" our understandin" of numerous biolo"ical processes. !his is because man%
aspects of biolo"% are similar in most or all or"anisms& but it is fre+uentl% much easier to
stud% a particular aspect in one or"anism than in others. !hese much4studied or"anisms are
commonl% referred to as model organisms& because each has one or more characteristics
that make it suitable for laborator% stud%. !he most popular model or"anisms ha#e stron"
ad#anta"es for e$perimental research& such as rapid de#elopment with short life c%cles&
small adult siDe& read% a#ailabilit%& and tractabilit%& and become e#en more useful when
man% other scientists work on them. 2 lar"e amount of information can then be deri#ed
from these or"anisms& pro#idin" #aluable data for the anal%sis of normal human
de#elopmentK "ene re"ulation& "enetic diseases& and e#olutionar% processes.
Arabidopsis thaliana is a small
flowerin" plant that belon"s to the
Brassica famil%& which includes species
such as broccoli& cauliflower& cabba"e&
and radish. Because 2rabidopsis has a
small "enome relati#e to other plants
and is easil% "rown under laborator%
conditions& it has become the or"anism
of choice for basic studies of the molecular "enetics of flowerin"
plants. -cientists e$pect that s%stematic studies of 2rabidopsis will
offer important ad#anta"es for basic research in "enetics and
molecular biolo"% and will illuminate numerous features of plant
biolo"%& includin" those of si"nificant #alue to a"riculture& ener"%&
en#ironment& and human health.
http://www.ncbi.nlm.nih."o#/2bout/outreach/"lossar%.html (;utreach R education)
http://www.ncbi.nlm.nih."o#/sutils/e4pcr/ ('=*)
What is e-PCR
e4'=* identifies se+uence ta""ed sites(-!-s)within 012 se+uences. 7sin" e4'=*& %ou
can search for sub4se+uences that closel% match the '=* primers and ha#e the correct
order& orientation& and spacin".
What's new
Improved Search Sensitivit2
Jou can use multiple disconti"ous words instead of a sin"le e$act word. 5ach of this
word has "roups of si"nificant positions seperated b% <wildcard< positions. It is not
re+uired that these positions match. 2lso& it is now possible to allow "aps in the primer
ali"nments. 2 fuDD% matchin" strate"% reduces the likelihood that a true -!- will be
missed due to mismatches.
*everse Searching
-earchin" the human "enome se+uence and other lar"e "enomes is now possible. !he
new #ersion of e4'=* pro#ides a search mode usin" a +uer% se+uence a"ainst a se+uence
database.
Pubmed References
-e+uence mappin" b% electronic '=*.
-chuler&>0. (1CCA)
2 web ser#er for performin" electronic '=*.
*otmistro#sk% 8& Han" (& -chuler >0. (2::?)
http://www.ncbi.nlm.nih."o#/2bout/tools/restable_se+.html (B62-!/!he Basic 6ocal
2li"nment -earch !ool)
TOO! "OR !#$%#&C# '&'(!I!
NCBI provides access to various sequence analysis tools, including:

B+AST
!he Basic 6ocal 2li"nment -earch !ool (B62-!) for comparin" "ene and protein
se+uences a"ainst others in public databases& comes in se#eral t%pes includin" '-I4
B62-!& '.I4B62-!& and B62-! 2 se+uences.

Conserved 5omain 5ata#ase &C55,
2 collection of se+uence ali"nments and profiles representin" protein domains conser#ed
in molecular e#olution. !he =0 -earch -er#ice can be used to search =00.

)lectronic-PC* &e-PC*,
=an be used to compare a +uer% se+uence to mapped se+uence4ta""ed sites to find a
possible map location for the +uer% se+uence.

)ntre7 9ene
/ind information on se+uence anal%ses for a particular "ene and or"anism.

9ene )8pression Omni#us &9)O,
>5; pro#ides se#eral tools to assist with the #isualiDation and e$ploration of curated
>5; data.

O*0 0inder
2 "raphical anal%sis tool that finds all open readin" frames of a selected minimum siDe in
a user<s se+uence or in a se+uence alread% in the database.

Open 3ass Spectrometr2 Search Algorithm
;3--2 allows for identification of 3-/3- peptide spectra b% searchin" libraries of
known protein se+uences.

Trace Archive
0e#eloped to store the raw se+uence data underl%in" se+uences "enerated b% #arious
"enome pro,ects.

.ecScreen
2 tool for identif%in" se"ments of a nucleic acid se+uence that ma% be of #ector& linker&
or adapter ori"in before usin" !ools for -e+uence 2nal%sis or submission.

http://www.ncbi.nlm.nih."o#/2bout/tools/restable_lit.html (6iterature 0ata Base)
IT#R'T%R# )'T'B'!#!
Together, NCBI's literature databases constitute an extended searchable library
of the life sciences literature.
Books
In collaboration with authors and publishers& 1=BI offers biomedical books and
mono"raphs for the (eb with links to 'ub3ed as well as to other resources. 1=BI4
authored resources are a#ailable in this database as well such as the 1=BI .andbook&
=offee Break& and >enes and 0isease. /ournals
-earches the ,ournals inde$ed in 'ub3ed with links to the records within those ,ournals.
3eS(
-earches 163<s controlled #ocabular% used for inde$in" articles. '+3 Catalog
163 =atalo" pro#ides access to 163 biblio"raphic data for ,ournals& books&
audio#isuals& computer software& electronic resources and other materials. 6inks to the
librar%<s holdin"s in 6ocator'lus& 163<s online public access catalo"& are also pro#ided.
O3IA
!his database is a catalo" of "enes& inherited disorders& and traits in animal species other
than human and mouse authored b% 'rofessor /rank 1icholas of the 7ni#ersit% of
-%dne%& 2ustralia. O3I3
!his database is a catalo" of human "enes and "enetic disorders authored and edited b%
0r. Fictor 2. 3c8usick and his collea"ues at Hohns .opkins and elsewhere and
de#eloped for the (eb b% 1=BI. Pu#3ed
2 ser#ice of the 1ational 6ibrar% of 3edicine that pro#ides access to o#er 1A million
citations from 3506I15 and additional life sciences ,ournals. 'ub3ed includes links to
man% sites pro#idin" full te$t articles and other related resources. Pu#3ed Central
'ub3ed =entral is a di"ital archi#e of life sciences ,ournal literature& de#eloped and
mana"ed b% 1=BI. (ith 'ub3ed =entral& 1=BI is takin" the lead in preser#in" and
maintainin" open access to the electronic literature.
http://blast.ncbi.nlm.nih."o#/Blast.c"i (searchin" nucleotide data base)
B+AST Assem#led *efSeq 9enomes
=hoose a species "enome to search& or list all "enomic B62-! databases.
.uman
3ouse
*at
Arabidopsis thaliana
Oryza sativa
Bos taurus
Danio rerio
Drosophila melanogaster
Gallus gallus
Pan troglodytes
Microbes
Apis mellifera
Basic B+AST
=hoose a B62-! pro"ram to run.
Program Program 5escription
nucleotide blast
-earch a nucleotide database usin" a nucleotide +uer%
Algorithms: blastn& me"ablast& disconti"uous me"ablast
protein blast
-earch protein database usin" a protein +uer%
Algorithms: blastp& psi4blast& phi4blast
blast$ -earch protein database usin" a translated nucleotide +uer%
tblastn -earch translated nucleotide database usin" a protein +uer%
tblast$
-earch translated nucleotide database usin" a translated nucleotide
+uer%
Speciali7ed B+AST
=hoose a t%pe of specialiDed search (or database name in parentheses.)
3ake specific primers with 'rimer4B62-!
-earch trace archi#es
/ind conser#ed domains in %our se+uence (cds)
/ind se+uences with similar conser#ed domain architecture (cdart)
-earch se+uences that ha#e "ene e$pression profiles (>5;)
-earch immuno"lobulins (I"B62-!)
-earch usin" -1' flanks
-creen se+uence for #ector contamination (#ecscreen)
2li"n two (or more) se+uences usin" B62-! (bl2se+)
-earch protein or nucleotide tar"ets in 'ub=hem Bio2ssa%
-earch -*2 transcript and "enomic libraries
=onstraint Based 'rotein 3ultiple 2li"nment !ool
1eedleman4(unsch >lobal -e+uence 2li"nment !ool
-earch *ef-e+>ene
-earch (>- se+uences "rouped b% or"anism
http://www.ncbi.nlm.nih."o#/tools/primer4blast/inde$.c"i)6I18_6;=SBlast.ome
('rimer desi"n)
'=* !emplate
5nter accession& "i& or /2-!2 se+uence (2 refse+ record is preferred) T)U =lear
5nter the '=* template here (multiple templates are currentl% not supported). It is hi"hl%
recommended to use refse+ accession or >I (rather than the raw 012 se+uence)
whene#er possible as this allows 'rimer4B62-! to better identif% the template and thus
perform better primer specificit% checkin".
2 template is not re+uired if both forward and re#erse primers are entered below.
!he template len"th is limited to 5:&::: bps. If %our template is lon"er than that& %ou
need to use primer ran"e to limit the len"th (i.e.& set forward primer P/romP and re#erse
primer P!oP fields but lea#e forward primer P!oP and re#erse primer P/romP fields
empt%).
;r& upload /2-!2 file
*an"e
/rom !o
/orward primer
*e#erse primer
T)U =lear
5nter the position ran"es if %ou want the primers to be located on the specific sites. !he
positions refer to the base numbers on the plus strand of %our template (i.e.& the P/romP
position should alwa%s be smaller than the P!oP position for a "i#en primer). 'artial
ran"es are allowed. /or e$ample& if %ou want the '=* product to be located between
position 1:: and position 1::: on the template& %ou can set forward primer P/romP to
1:: and re#erse primer P!oP to 1::: (but lea#e the forward primer P!oP and re#erse
primer P/romP empt%).
1ote that the position ran"e of forward primer ma% not o#erlap with that of re#erse
primer.

'rimer 'arameters 7se m% own forward primer (5<49< on plus strand)
T)U
;ptionall% enter %our pre4desi"ned forward primer. 2lwa%s use the actual primer
se+uence (i.e.& 5<49< on plus strand of the template).
=lear
7se m% own re#erse primer (5<49< on minus strand)
T)U
;ptionall% enter %our pre4desi"ned re#erse primer. 2lwa%s use the actual primer
se+uence (i.e.& 5<49< on minus strand of the template).
=lear
'=* product siDe
3in 3a$
V of primers to return
'rimer meltin" temperatures (!
m
)
3in ;pt 3a$ 3a$ !
m
difference
T)U
!he !m calculation is controlled b% !able of thermod%namic parameters and -alt
correction formula (under ad#anced parameters). !he default !able of thermod%namic
parameters is P-anta6ucia 1CCBP and the default -alt correction formula is P-anta6ucia
1CCBP as recommended b% primer9 pro"ram.
5$on/intron selection
2 refse+ m*12 se+uence as '=* template input is re+uired for options in the section T)U
2 refse+ m*12 se+uence (for e$ample an entreD se+uence record that has accession
startin" with 13_) allows the pro"ram to properl% identif% the corrspondin" "enomic
012 and thus find correct e$on/intron boundaries.
5$on ,unction span
T)U
!his controls whether the primer should span an e$on ,unction on %our m*12 template.
!he option P'rimer must span an e$on4e$on ,unctionP will direct the pro"ram to return at
least one primer (within a "i#en primer pair) that spans an e$on4e$on ,unction. !his is
useful for limitin" the amplification onl% to m*12. Jou can also e$clude such primers if
%ou want to amplif% m*12 as well as the correspondin" "enomic 012.
5$on ,unction match
5$on at 5< side 5$on at 9< side
3inimal number of bases that must anneal to e$ons at the 5< or 9< side of the ,unction T)U
!his specifies the minimal number of bases that the primer must anneal to the template at
5< side (i.e.& toward start of the primer) or 9< side (i.e.& toward end of the primer) of the
e$on4e$on ,unction. 2nnealin" to both e$ons is necessar% as this ensures annealin" to the
e$on4e$on ,unction re"ion but not either e$on alone. 1ote that his option is effecti#e onl%
if %ou select P'rimer must span an e$on4e$on ,unctionP for P5$on ,unction spanP option.
Intron inclusion
'rimer must be separated b% at least one intron on the correspondin" "enomic 012
T)U
(ith this option on& the pro"ram will tr% to find primer pairs that are separated b% at least
one intron on the correspondin" "enomic 012 usin" m*124"enomic 012 ali"nment
from 1=BI. !his makes it eas% to distin"uish between amplification from m*12 and
"enomic 012 as the product from the latter is lon"er due to presence of an intron.
Intron len"th ran"e
3in 3a$
T)U
!his specifies the ran"e of total intron len"th on the correspondin" "enomic 012 that
would separate the forward and re#er#se primers.
'rimer 'air -pecificit% =heckin" 'arameters -pecificit% check
5nable search for primer pairs specific to the intended '=* template T)U
(ith this option on& the pro"ram will search the primers a"ainst the selected database and
determine whether a primer pair can "enerate a '=* product on an% tar"ets in the
database based on their matches to the tar"ets and their orientations. !he pro"ram will
return& if possible& onl% primer pairs that do not "enerate a #alid '=* product on
unintended se+uences and are therefore specific to the intended template. 1ote that the
specificit% is checked not onl% for the forward4re#erse primer pair& but also for forward4
forward as well as re#erse4re#erse primer pairs.
;r"anism
Homo sapiens
5nter an or"anism name& ta$onom% id or select from the su""estion list as %ou t%pe. T)U
!his will limit the primer specificit% checkin" to the specified or"anism. It is stron"l%
recommended that %ou alwa%s specif% the or"anism if %ou are amplif%in" 012 from a
specific or"anism (because searchin" all or"anisms will be much slower and off4tar"et
primin" from other or"anisms is irrele#ant). =lick on P2dd more or"anismsP label if %ou
want to restrict to multiple or"anisms (enter onl% one or"anism in each input bo$).
0atabase
T)U
>enome database (reference assembl% from selected or"anisms):
Rnbsp-e+uences from selected or"anisms includin" apis mellifera& arabidopsis& bos
taurus& danio rerio& do"& drosophila melano"aster& "allus "allus& human& mouse& ;.
sati#a(,aponica culti#ar4"roup)& pan tro"lod%tes& rat. !his is the "enome database of
choice if it co#ers %our or"anism as it contains minimal redundanc% and the search speed
is faster.
>enome database (chromosomes from all or"anisms):
Rnbsp-e+uences from 1=BI chromosome database (see B62-! database descriptions )
e$cept that se+uences whose accessions start with 2=_ (alternate assemblies) are
e$cluded to reduce redundanc%.
-ee B62-! database descriptions for information on other databases.
5ntreD +uer%
T)U
Jou can use a re"ular entreD +uer% to limit the database search for primer specificit%. /or
e$ample& enter a >enBank accession number to limit search to that particular se+uence
onl% (=aution: this means the primer specificit% will 1;! be checked a"ainst an% other
se+uences e$cept the specified one).
'rimer specificit% strin"enc%
'rimer must ha#e at least total mismatches to unintended tar"ets& includin"
at least mismatches within the last bps at the 9< end. T)U
!he lar"er the mismatches (especiall% those toward 9< end) are between primers and the
unintended tar"ets& the more specific the primer pair is to %our template (i.e.& it will be
difficult to anneal to and amplif% unintended tar"ets). .owe#er& specif%in" a lar"er
mismatch #alue ma% make it more difficult to find such specific primers. !r% to lower the
mismatch #alue in such case.
I"nore tar"ets that ha#e or more mismatches to the primer. T)U
!his is another parameter that can be used to ad,ust primer specificit% strin"ec%. If the
total number of mismatches between tar"et and primer is e+ual to or more than the
specified number (re"ardless of the mismatch locations)& then an% such tar"ets will be
i"nored for primer specificit% check. /or e$amaple& if %ou are onl% interested in tar"ets
that perfectl% match the primers& %ou can set the #alue to 1. Jou can also lower the 5
#alue (see ad#anced parameters) in such case to speed up the search as the hi"h default 5
#alue is not necessar% for detectin" tar"ets with few mismatches to primers.
2dditionall% this pro"ram has limit detectin" tar"ets that are too different from the
primers...it will detect tar"ets that ha#e up to 95G mismatches to the primer se+uences
(i.e.& a total of A mismatches for a 2:4mer).
Jou ma% need to choose more sensiti#e blast parameters (under ad#ance parameters) if
%ou want to detect tar"ets with a hi"her number of mismatches than default.
3isprimed product siDe de#iation
T)U
!his specifies the siDe #ariation of the off4tar"et '=* products relati#e to that of %our
intended '=* product. ;nl% those primer pairs producin" an off4tar"et '=* product
within the specified ran"e will be ta""ed as non4specific.
-plice #ariant handlin"
2llow primer to amplif% m*12 splice #ariants (re+uires refse+ m*12 se+uence as
'=* template input) T)U
If enabled& this pro"ram will 1;! e$clude the primer pairs that can amplif% the one or
more m*12 splice #ariants of the same "ene as %our '=* template& thus makin" primers
"ene specific rather than transcript specific (1ote that it is 1;! intended to "enerate
primers that will anneal to all #ariants. It onl% means that the primers ma% amplif% one or
more other slice #ariants& in addition to the one %ou ha#e specified). !his option re+uires
%ou to enter a refse+ m*12 accession or "i or fasta se+uence as '=* template input
because other t%pe of input ma% not allow the pro"ram to properl% interpret the result.
-how results in a new window 7se new "raphic #iew T)U
!his option enables our new "raphic #iew which offers much more details for %our
template and primers. It will replace the current "raphic #iew in the future.
2d#anced parameters

You might also like