(Doi 10.1016 - J.physa.2016.08.033) PDF

Physica A xx (xxxx) xxx–xxx
Q1
Minireview Protein contact maps: A binary depiction of protein 3D

structures
Q2
Arnold Emerson Isaac

∗
, Amala Arumugam
Department of Biotechnology, School of Bio Sciences and Technology, VIT University, Vellore-632014, Tamil Nadu,
India
article info
Article history: Received 4 January 2016 Received in revised form 25 June 2016 Available online xxxx
Keywords: Protein structure prediction Contact map Complex network
abstract
In recent years, there has been a considerable interest in examining the structure and dynamics of complex
networks. Proteins in 3D space may also be considered as complex systems emerged through the interactions of
their constituent amino acids. This representation provides a powerful framework to uncover the general organized
principle of protein contact network. Here we reviewed protein contact map in terms of protein structure prediction
and analyses. In addition, we had also discussed the various computational techniques for the prediction of protein
contact maps and the tools to visualize contact maps.
© 2016 Elsevier B.V. All rights reserved.
Contents
1.
Introduction........................................................................................................................................................................
..................... 1 2. Construction of protein contact
map..................................................................................................................................................... 2 3. Protein
structures as complex networks
............................................................................................................................................... 2 4. Common methods in
protein 3D structure prediction ......................................................................................................................... 4 5.
Protein 3D structure prediction from contact
maps............................................................................................................................. 5 6. Protein contact map
prediction.............................................................................................................................................................. 5 6.1.
Sequence similarity and multiple sequence alignment
methods............................................................................................ 6 6.2. Machine learning approaches
.................................................................................................................................................... 7 7. Summary of
protein contact maps.........................................................................................................................................................
8 8. Conclusion
...........................................................................................................................................................................................
.... 8
Acknowledgements...........................................................................................................................................................
...................... 9
References........................................................................................................................................................................
........................ 9
1. Introduction
1
Proteins are the most abundant organic molecules in living systems. These molecules are much more diverse in
structure Q3
2
and function than other classes of macromolecules. At any given point of time, the living system contains thousands
of
3
proteins within a single cell and each with a unique function. Proteins play a wide array of roles in a cell or organism
4
from enzymes to hormones. The shape of a protein is typically described using four levels of structural complexity:
the
5
primary, secondary, tertiary, and quaternary levels. For some proteins, a single polypeptide chain folded in its proper
6
three-dimensional structure creates the final protein. Protein structures are complex systems with several tens,
hundreds
7
∗
Corresponding author. E-mail address: i_arnoldemerson@yahoo.com (A.E. Isaac).
http://dx.doi.org/10.1016/j.physa.2016.08.033 0378-4371/© 2016 Elsevier B.V. All rights reserved.
Contents lists available at ScienceDirect
Physica A
journal homepage: www.elsevier.com/locate/physa
2 A.E. Isaac, A. Arumugam / Physica A xx (xxxx) xxx–xxx
or even thousands of residues, interacting with each other to help stabilize the tertiary structures so that specific
functions 1
can be realized in vivo. In this sense, the network modelling approach is suitable for characterizing and
analysing protein 2
structures, in which residues correspond to vertices of the networks, and interaction (or any other type of
relationship) 3
between residues represented as an edge linking the corresponding nodes. 4
Understanding complex systems often requires a bottom-up approach, breaking the system into small and
elementary 5
constituents and mapping out the interactions between these components. In the biological systems, networks
emerge in 6
many disguises, from food webs in ecology to various biochemical nets in molecular biology. In particular, the
wide range 7
of interactions between genes, proteins and metabolites in a cell are best represented by various complex
networks. An 8
alternative way in which to conceptualize and model protein structures is to consider the contacts between
atoms in amino 9
acid residues as a network of interactions irrespective of secondary structure and fold type. 10
So far, many studies have been carried out to investigate the protein structures as complex networks of
interacting 11
residues [1–7]. Recently, we had analysed the membrane protein structures in terms of complex networks
[8–10] and also in- 12
vestigated the structural and functional critical residues in protein structures using the k-core decomposition
algorithm [11]. 13
There have been many more detailed reviews on protein as networks, protein dynamics and the linking of
topological char- 14
acteristics to the protein folding [12–16]. In this review, we represent how the protein contact maps enable us to
study the 15
protein structure prediction, various tools for analysis and prediction of contact map. Protein 3D structure
coordinates can 16
be represented as a more reduced form called protein contact maps. It is possible to reconstruct the 3D
coordinates of a 17
protein using its contact map by several computational techniques [17,18]. 18
2. Construction of protein contact map 19
Fig. 1 represents the steps in constructing the protein contact maps. The Cα atom of each amino acid has
been considered 20
as vertices of the corresponding protein contact network as shown in Fig. 1(a). The distances between every
pair of residue 21
were determined using Euclidean distance and the part of the distance matrix is shown in Fig. 1(b). The
diagonal line in the 22
distance matrix is always zero since the distance between the same residues is zero. To determine whether any
two residues 23
are connected, the distance between the residues should be less than or equal to the cut-off value 7 Å distance.
The choice of 24
cut-off distance was based on the range at which non-covalent interactions, which are responsible for the
polypeptide chain 25
to fold into its native-state. Various cut-offs ranging from 5 Å [7], to 7 Å [8], to 8.5 Å [6] have been used in earlier
studies. 26
The protein contact map was derived using the said cut-off value represented in 2-dimensional binary matrix
(Fig. 1(c)). If 27
any two residues are connected, then the matrix cell values are set to 1 (black colour) or else 0 (white colour) if
they are not 28
connected (Fig. 1(d)). In recent years, there had been several tools developed to analyse protein contact maps
as shown in 29
Table 1. 30
3. Protein structures as complex networks 31
Several researches have focused on modelling biological systems to be of a complex nature and more
specifically to 32
possess complex networks [34–43]. Among all biological systems such as the cells, tissues, or even the human
body, ‘proteins’ 33
are considered to be one of the most important macromolecules. Proteins perform a vast array of functions
within living 34
organisms including catalyzing metabolic reactions, DNA replication, responding to stimuli, and transporting
molecules 35
from one location to another. All these make proteins a very interesting system to study as a ‘complex
dynamical system’. 36
Traditionally, the three-dimensional folds of proteins have been perceived as a construct, based on elements
of secondary 37
structure and fold arrangement [44–47]. An alternative way in conceptualizing and modelling protein structures
is to 38
consider the contacts between atoms in amino acids as a network of interactions, irrespective of secondary
structures and 39
fold type. There is a natural distinction of contacts into two types: long-range and short-range interactions.
Long-range 40
interactions occur between residues that are distant from each other in the primary structure but situated at a
much closer 41
distance in the tertiary structure. These interactions are important for defining the overall topology. Short-range
interactions 42
occur between residues that are local to each other in both the primary, secondary and tertiary structures. 43
For most networks what is termed as a node and a link is fairly straightforward. When looking at protein
transition 44
states, several studies have considered the Cα atoms to be the nodes, and established a link between two
nodes if the atoms 45
were within 8.5 Å of each other [3,48]. In chemical terms, however, this is a simplification of the interactions
within a 46
protein. Side-chains play the pivotal role in forming and fixing a protein structure, and any information on their
orientation 47
is lost in an analysis based solely on Cα atoms. Another study conducted has used native structures
considering each amino 48
acid to be a node, and a link established between the two nodes when any two atoms from two amino acids are
within a 49
distance of 5 Å from each other [4]. These efforts deflect the studies on protein structure as complex networks
representing 50
amino acids as nodes and interactions between them as edges. Analyses of protein 3D structures as a complex
network 51
approach help us to understand in many different aspects, including its structural flexibility, key residues
stabilizing its 52
3D structures, folding nucleus, important functional residues, mixing behaviour of the amino acids and hierarchy
of the 53
structure, etc. [3,7,48–51]. Several mathematical principles in network systems have been implicated to analyse
protein 54
contact networks. This principle includes bell-shaped Poisson distribution, small-world, scale-free, betweenness
centrality 55
and degrees of separation. Results revealed that protein structures have small-world properties and thus protein
contact 56
networks have a high clustering coefficient C with a relatively short characteristic path length L [3–6]. This
concept was 57
Fig. 1. Construction of protein contact map. (a) PDB atomic coordinates for the protein bovine rhodopsin (PDB_ID:
1U19a). (b) Distance matrix calculated using Euclidean distance. (c) Protein contact map obtained using a cut-off
distance. (d) Visualization of protein contact map.
Table 1 Tools for analysing protein contact maps.
S.No. Name Description References
1. CMA: Contact Map Analysis This program analyses contacts between two chains or within one chain in a given
PDB file.
[19]
2. Protein contact maps This tool allows the user to easily generate contact maps and distance maps for
protein molecules.
[20]
3. Contact map plugin The contact map plugin provides an easy-to-use interface for viewing
residue–residue contacts between two sets of selected atoms from molecules loaded into VMD.
[21]
4. Structer and Dotter Structer calculates contact maps from three-dimensional molecular structural
data. The contact map matrix can then be viewed in the graphical matrix-visualization program Dotter.
[22]
5. Con-Struct Map Con-Struct Map is a graphical tool for the comparative study of protein structures. [23] 6. CMView
CMView will allow you to display the contact map and interact with it as well as to show features of the contact map in
the corresponding 3-dimensional structure by using the PyMol molecular viewer.
[24]
7. CMWeb CMWeb is an interactive on-line web application to examine contact maps
together with linked 3D structures, MSAs, secondary structures, sequence conservation and five commonly used
prediction methods.
[25]
8. PConPy PConPy is an open-source Python module for generating protein contact maps,
distance maps and hydrogen bond plots.
[26]
9. RaptorX contact prediction
server
The server predicts inter-residue contacts for a protein sequence. [27]
10. RR distance maps RR distance maps create a distance map, a generalization of a protein contact map
in which residue–residue distances are shown with colour gradations.
[28]
11. FT-COMAR Fault Tolerance Reconstruction of 3D structure from protein contact maps. [29] 12. BBcontacts
BBcontacts is a Python program predicting residue-level contacts between
beta-strands by detecting patterns in matrices of predicted couplings.
[30]
13. BND server Protein contact prediction using balanced network deconvolution. [31] 14. CMAPpro CMAPpro is a
server for the prediction of maps of contacts between protein
residues.
[32]
15. C2S—Contacts-to-Structure C2S is an automated procedure for building full atom protein structures based on
contact maps.
[33]
A.E. Isaac, A. Arumugam / Physica A xx (xxxx) xxx–xxx 3
Fig. 2. Modelling of protein structure as a complex network. (a) NMR structure of Trp-cage miniprotein shown in
secondary structures: alpha helix is depicted in red and coils or loops in green (PDB_ID:1L2Y). (b) 2D binary
symmetric matrix or adjacency matrix represents protein contact map. (c) Protein contact map representing in
adjacency matrix and (d) Protein contact network of Trp-cage miniprotein where nodes or amino acids are depicted
as red circles and edges or interactions as blue lines. (For interpretation of the references to colour in this figure
legend, the reader is referred to the web version of this article.)
1
originally initiated by Watts and Strogatz in the analysis of a worm neural network, collaborations within film actors
and
2
the Western United States power grid [52].
3
Q4 Fig. 2 represents the relationship between protein contact map and complex network. For constructing a protein
4 contact network, the Cartesian or xyz co-ordinates are required and this can be obtained from a RCSB protein data
bank
5
(www.rcsb.org) [53]. The secondary structure of Trp-cage miniprotein (20 amino acids) was visualized using Rasmol,
which
6
is an open source molecular graphics visualization tool [54]. The protein contact map was determined with the 7 Å
cut-off
7
distance as shown in Fig. 2(b) and this distance denotes the non-covalent interactions. A network can be represented
by its
8
adjacency matrix (Fig. 2(c) i.e., binary depiction of protein contact map). The rows or columns of the matrix denote
the nodes
9
or vertices and the elements in the matrix represent the links or edges. The elements a
ij
in the matrix are equal to 1 whenever
10
there is an edge connecting the vertices i and j, and equal to 0 otherwise. When the graph is undirected, the
adjacency matrix
11
is symmetric, i.e., the elements a
ij
=a
ji
for anyiandj. Each element of the adjacency matrix represents a connection between
12
13
two nodes. For instance, as the node 1 is the symmetric elements a
21
=a
31
=a
41
connected = a 51
= 1. to This the adjacent nodes 2, matrix 3, 4 and can 5, we then have visualized a
12
=a
as 13
an = undirected a
14
=a
15
= network 1 and for as
14
shown in Fig. 2(d) and it was visualized using Pajek, a program for large network analysis tool [55].
15
4. Common methods in protein 3D structure prediction
16
The 3D structure of proteins can be solved either by experimental methods or structure prediction methods.
Determining
17
the 3D structures through experimental method is more tedious as it involves complex procedures. Solving through
X-ray
18
crystallography produces very good results, but it requires a pure protein sample that forms crystals that are relatively
19
flawless. Solving through NMR is limited to small soluble proteins. In addition to this, large scale sequencing projects
like
the human genome project produce protein sequences at a very fast rate. Thus, there is a huge gap between the
number of
1
known protein sequences and the number of solved protein 3D structures. Protein structure prediction from its amino
acid
2
sequence aims at reducing this gap and is a key issue in molecular biology. Early methods of secondary structure
prediction,
3
introduced in the 1960s and early 1970s, focused on identifying likely alpha helices and were based mainly on
helix-coil
4
transition models [56–61]. The tertiary structure of proteins involves the folding of the secondary structural elements.
The
5
physical properties that determine the fold are the backbone rigidity, interaction between the amino acids which
include
6
the electrostatic interaction, the van der Waals interaction, hydrogen and disulphide bonds and interaction with water.
7
The protein 3D structure prediction methods can be classified into the following three methods homology modelling,
fold
8
recognition or threading and Ab-initio methods. The progress and challenges in protein structure prediction have
been
9
reviewed in Zhang (2008) [62]. The basic steps for the protein structure prediction are depicted as a flowchart (Fig.
3). If
1
0 the protein sequence has homologous proteins with known structures, then the homology or comparative modelling
is
1
1 the best choice for the structure prediction. This method requires at least 25% homologous with the target protein
while
1
2 sequences falling below a 20% sequence identity may have very different structure. The second choice for the
structure
1
3 prediction method is based on the protein fold, if the protein sequence has a known protein fold in nature, then we
can
1
4 adopt the fold recognition or protein threading method. The basic principle of threading was based on the
observation that
1
5 a large percentage of proteins adopt one of a limited number of folds. The fold recognition is made simple just by
sequence
1
6 comparison to identify the correct fold, instead of sequence similarity. Therefore, the fold recognition method can be
used
1
7 when a protein reaches less than 25% sequence similarity to that of a template structure. If both the method fails to
model
1
8 the structure, then Ab initio or De novo prediction is the final choice. This method attempts to predict tertiary
structure
1
9 from the sequences that manage its own protein folding and it does not require a template structure. Therefore it
uses
2
0 the principles of theoretical calculations in statistical thermodynamics and quantum mechanics to attain a minimum
free
2
1 energy of protein folds [63]. The most representative soft computing methods for solving the tertiary structure
prediction
2
2 problem can be found in the recent review [64].
23
5. Protein 3D structure prediction from contact maps
24
Very few studies have been carried out on protein contact map for the prediction of 3D structure. For example, the 3D
2
5 structure of crambin protein (PDB_ID:1CRN) is depicted in the protein contact map. The helix regions H1 and H2
appear
2
6 as thick bands along the main diagonal, beta sheet regions S1 and S2 are identified as strips of thin layer along the
main
2
7 diagonal and scattered contacts represent the contact in the loops or coil regions (Fig. 4). The first study used the
stochastic
2
8 method which is very fast and reliable algorithm to find a chain conformation whose contact map is nearly identical
to the
2
9 target. In addition to these, this method is also able to find a good candidate structure even when the target map
has been
3
0 corrupted with nonphysical contacts. Furthermore, the reconstructed and original structures are similar up to the
resolution
3
1 of the contact map representation [65].
3
2 A case-based reasoning approach was used to predict the structure for novel proteins from their contact maps.
They
3
3 considered contact maps computed from existing structures in the PDB. The underlying hypothesis of using CBR is
that
3
4 proteins with similar contact maps tend to share similar 3D structures and such methods were able to recover
structure even
3
5 from a large degree of noisy maps [66]. Other studies focused on combining contact maps and a hybrid mining
approach
3
6 to predict the structure of an unknown protein. Mining contacts using HMMSTR for local structures and mining
frequent
3
7 dense patterns are the two different mining approaches that have been used to model the local propagation of
structure [67].
3
8 Recently, similar mining approaches had been carried out to extract conserved patterns from the contact maps to
identify
3
9 secondary structural elements. The triangle subdivision method (TSM) was implemented which captures the
locations of the
4
0 dense clusters. This mining approach was found to be simple and computationally inexpensive algorithm that
successfully
4
1 characterizes the off diagonal interactions in the contact map for predicting specific folds [68,69]. The last two
findings
4
2 suggest that the mining technique in protein contact maps may provide a novel method for fold predictions.
However, this
4
3 technique led us to focus more on three main aspects related to the prediction of protein stability, kinetics and
structure of
4
4 the protein.
4
5 Other interesting studies too adopted protein contact map to identify the remote homologs proteins called
remote-C3D.
4
6 This method was found to achieve a higher accuracy than the composition-based methods and the profile-based
meth-
4
7 ods [70]. Similarly contact maps were also used to peel the proteins as small successive compact units along the
sequence
4
8 called protein units (PUs). They analysed the PUs at different levels of cutting, using a non-redundant protein
databank to de-
4
9 termine the preferential amino acid interactions inside and between PUs [71]. In future, these protein contact maps
can lead
5
0 us to further interesting studies like pattern identification related to different folds, structural and functional
classifications
5
1 of different protein families.
52
6. Protein contact map prediction
53
In the previous section, we discussed the various methods for predicting the 3D protein structure; this section tends
to
5
4 describe mainly three emerging areas for the prediction of protein contact map, focusing initially on the sequence
similarity
5
5 and multiple sequence alignment method, secondly on neural networks and finally with the different machine
learning
5
6 approaches. We also represent how evolutionary and physical constraints increase the prediction accuracy.
57
Fig. 3. Guidance in protein structure prediction methods.
6.1. Sequence similarity and multiple sequence alignment methods 1
Contact maps are a matrix representation of protein residue–residue contacts within a distance threshold,
that provide an 2
avenue for predicting protein 3D structure [72,73]. Prediction of protein inter-residue contacts is one of the most
important 3
Fig. 4. Alpha helix and Beta sheet regions of crambin protein (PDB_1D:1CRN). (a) Protein contact map representing
alpha helix contacts (S1, S2) and beta sheet contacts (H1, H2). (b) 3D structure of crambin protein representing
alpha helix in red, beta sheet in yellow and coils or loops in green. (For interpretation of the references to colour in
this figure legend, the reader is referred to the web version of this article.)
intermediate steps to the protein folding problem and also in protein structure prediction [74,75]. The problem of
predicting
1
a protein fold from sequence information alone is a difficult task. Correlations in mutational behaviour between
different
2
positions in a multiple sequence alignment were used to predict contact maps for a group of protein families [76]. To
predict
3
the fold of a given amino acid sequence, a contact map was predicted that will sufficiently approximate the structure
of the
4
corresponding protein. Then the similarity of this contact map was analysed with the representative contact map of
each
5
fold; the fold that corresponds to the closest match is the predicted fold for the input sequence [77]. Energy
minimization
6
is one of the most widely used methods for solving the protein folding problems and protein structure predictions.
Several
7
energy functions were analysed to represent the contact map of the native state of proteins from an energy
minimization
8
procedure [78]. Both multiple sequence alignment and several energy functions, as described in this review, could be
9
effectively modelled by evolution of contact maps. Comparison of similar contact maps can simultaneously identify
changes
1
0 at the sequence and the structure level. More importantly, it could also identify dependencies of the changes
between the
1
1 two levels, which embody evolutionary constraints that the sequence and structure impose on each other.
12
6.2. Machine learning approaches
13
The Artificial Neural Networks (ANN), are considered as a paradigm of machine learning and cognitive science. In
this
1
4 section, we initially reviewed the protein contact map prediction based on ANN and then with other machine
learning
1
5 methods.
1
6 (i) Artificial Neural Network (ANN) based methods
1
7 Artificial Neural Networks play an important role in protein structure prediction primarily from all the levels of
proteins.
1
8 There have been several studies that make use of the neural network. A novel binary input encoding scheme was
utilized
1
9 for every possible residue pair to train this neural network [79] wherein the inter-residue contacts map with different
2
0 threshold (5, 6, 7 and 8 Å) were predicted using radial basis neural network, based on the 53 globulin protein
sequences and
2
1 their corresponding amino acid residues coordinates. Correlated mutations from the multiple sequence alignment
were also
2
2 used for contact predictions using neural networks. By incorporating the mutual information in sparse contingency
tables,
2
3 statistic have denoted a good predictor of residue–residue contacts [80]. The state-of-the-art neural network-based
contact Q5
2
4 map predictor NNcon was ranked among the best methods in the Eighth Critical Assessment of Techniques for
Protein
2
5 Structure Prediction (CASP8), 2008 [81]. The 2D-Recursive Neural Network (2D-RNN) models were used to predict
both
2
6 general residue–residue contacts and specific beta contacts [82]. Similarly, DNcon is a new sequence based
residue–residue
2
7 contact predictor which employs deep networks and boosting techniques. When compared to the current
state-of-the-art,
2
8 DNcon performed favourably on medium and long range contact predictions [83]. However, only a few studies have
been
2
9 reported concerning two-layered neural networks that increased the accuracy not more than 47% [84–86]. Artificial
Neural
3
0 Network combined with different groups of protein families may increase the prediction accuracy and moreover to
train
3
1 the neural network a refined dataset with representative protein structures is required.
3
2 (ii) Other machine learning based methods
3
3 In addition to the artificial neural networks, other machine learning approaches had also played a major role in the
3
4 tertiary structure prediction both in treading and ab initio methods. A segmental semi Markov model (SSMM) was
developed
35
for protein secondary structure prediction which incorporates multiple sequence alignment profiles for improving
the 1
predictive performance. By incorporating the information from long range interactions in beta-sheets, this model
was also 2
found to be capable of predicting protein contact maps [87]. Accurate prediction of protein contact map is an
important 3
step towards the reconstruction of the protein’s 3D structure. In spite of continuous progress in developing
contact map 4
predictors, highly accurate prediction is still an unresolved problem. The JUSTcon is a protein contact map
predictor which 5
utilizes a new machine-learning model. This model is based on adaptive neuro-fuzzy inference system (ANFIS)
and K nearest 6
neighbours (KNN) algorithm. The model has the ability to produce a set of amino acid pairs predictions, which
are more likely 7
to be in contact and also a novel model in protein contact prediction in terms of its architecture and accuracy
[88]. ProC_S3 8
is the first residue–residue contact predictor based on the Random Forest algorithm that utilized a new amino
acid residue 9
contact propensity matrix and a new set of seven amino acid groups based on contact preference. One of its
advantageous 10
features was that, it did not require a time-consuming optimization process and made predictions considerably
faster than 11
many other algorithms such as SVM [89]. Thus the limited machine learning approaches had proved that the
accuracy of 12
the prediction is more likely close to the ensemble contacts. 13
The New CMAPpro predictor employs several new ideas for contact prediction using a multi-stage machine
learning ap- 14
proach, with increasingly refined levels of resolution. The levels of approaches used were of 2D recursive neural
networks to 15
predict coarse contacts and orientations between secondary structure elements, an energy-based method to
align secondary 16
structure elements and predict contact probabilities between residues in contacting alpha-helices or strands and
deep neu- 17
ral network architecture to organize and progressively refine the prediction of contacts. The accuracy of the
predictor is close 18
to 30%, a significant increase over existing approaches [32]. The generalized input–output HMMs (GIOHMMs)
is a class of 19
Bayesian network, which showed that these architectures can be trained from examples to yield a better contact
map pre- 20
dictor. The current version of this method was found to accurately predict 60.5% of contacts at a distance cut-off
of 8 Å and 21
45% of distant contacts at 10 Å [90]. Most existing methods predict the contact map matrix element-by-element,
ignoring 22
correlations among contacts and physical feasibility of the whole-contact map. A novel method PhyCMAP for
contact map 23
prediction integrates both evolutionary and physical restraints by machine learning and integer linear
programming [27]. 24
A SVMcon is an another contact map prediction method, that used support vector machines together with a
large number 25
of useful information including profiles, secondary structure, solvent accessibility, contact potentials, residues
types and 26
protein level information. This method yielded a 4% improvement over the CMAPpro predictor [91]. 27
Multiple methods have been developed to predict contact maps. Some of them are based on Ab initio
approaches, homol- 28
ogy, fold recognition, templates, machine learning and others methods [32,92–97]. Even though there were
several attempts 29
made previously, the prediction accuracy had reached only up to 50%. To improve the classification accuracy,
recently the 30
FoDT classifier utilized a new algorithm for the prediction of proteins contact map. This method is based on 400
classifiers 31
ensemble one for each amino acid pair that was found suitable for protein contact map prediction. The internal
validation has 32
shown that FoDT predicts contacts with an average accuracy over 57% [98]. Many methods have been
evaluated in CASP10 33
experiment on residue–residue contact prediction which assessed the prediction accuracy from 26 prediction
groups [99]. 34
Thus for the prediction of protein contact map one may use either tools or web servers which are available on
online. Among 35
all the three methods of protein contact map predictor, machine learning ensembles combined with evolutionary
informa- 36
tion have shown to be a better classification method among the different protein contact pairs, while the Neural
network 37
methods have been useful in predicting contacts for smaller proteins (less than 300 residues) with higher
accuracy. 38
7. Summary of protein contact maps 39
Protein contact maps are two-dimensional representations of three-dimensional protein structures. Studies
on protein 40
contact maps have been done on three major areas namely mining or analysis of PCM, generation or
visualization of PCM 41
and finally, prediction of PCM. On analysing or mining the contact maps, several features have been explored
that include 42
the extraction of conserved patterns and structural motifs from contact maps. Moreover the contact map forms a
structural 43
‘‘fingerprint’’ of a protein and thus each protein can be identified based on its contact map. Recently these
contact maps 44
have been proven to be more useful in predicting protein folds and also in the 3D structure prediction of
proteins, where 45
contact maps are actually used as template structures. Secondly, by examining several proteins’ contact maps,
one can easily 46
recognize the distinct features of contact maps. For generation as well as visualization of contact maps there
have been 47
many on-line web applications and python modules available. Interestingly, the interactive visualization and
analysis of 48
contact maps are made possible using VMD plugin and Pymol software tools. Finally, predicting contact map
using sequence 49
information has been actively researched in recent years. In addition to this, multiple sequence alignment and
evolutionary 50
information have also been used in the contact map prediction. Many machine-learning methods have been
developed for 51
protein contact prediction in the past decades and most of these methods are in the form of software tools. 52
8. Conclusion 53
The contact map provides a host of useful information about the protein’s structure. We can extract valuable
information 54
from contact maps. For example, clusters of contacts represent certain secondary structures, and also capture
non-local 55
interactions, giving clues to the tertiary structure. The secondary structure, fold topology, and side-chain packing
patterns 56
can also be visualized conveniently and read from the contact map. Thus, the protein contact maps are
two-dimensional
1
representation of the three-dimensional layout of protein structures. Finally, it is very evident that protein structures
2
represent similar characteristics to other complex network systems such as the world-wide web, social networks,
power
3
grids and neural networks. We believe that this review may help in understanding novel methods for protein structure
4
predictions and in also discovering protein folding pathways. More specifically it provided guidance for researchers
working
5
on the prediction of protein contact maps.
6
Acknowledgements
7
The authors thank the VIT University for the computational facility provided for this study and also thank the
anonymous
8
reviewers for their constructive comments which helped them to improve this review article.
9
References
10
[1] A. Aszoi, W.R. Taylor, Connection topology of proteins, Comput. Appl. Biosci.: CABIOS 9 (5) (1993) 523–529.
1
1 [2] N. Kannan, S. Vishveshwara, Identification of side-chain clusters in protein structures by a graph spectral
method, J. Mol. Biol. 292 (2) (1999) 441–464.
1
2 [3] M. Vendruscolo, et al., Small-world view of the amino acids that play a key role in protein folding, Phys. Rev. E
65 (6) (2002) 061910.
1
3 [4] L.H. Greene, V.A. Higman, Uncovering network systems within protein structures, J. Mol. Biol. 334 (4) (2003)
781–791.
1
4 [5] A.R. Atilgan, P. Akan, C. Baysal, Small-world communication of residues and significance for protein dynamics,
Biophys. J. 86 (1) (2004) 85–91.
1
5 [6] G. Bagler, S. Sinha, Network properties of protein structures, Physica A 346 (1) (2005) 27–33.
1
6 [7] G. Amitai, et al., Network analysis of protein structures identifies functional residues, J. Mol. Biol. 344 (4) (2004)
1135–1146.
1
7 [8] I.A. Emerson, K.M. Gothandam, Network analysis of transmembrane protein structures, Physica A 391 (3)
(2012) 905–916.
1
8 [9] I.A. Emerson, K.M. Gothandam, Residue centrality in alpha helical polytopic transmembrane protein structures,
J. Theoret. Biol. 309 (2012) 78–87.
1
9 [10] I.A. Emerson, P.T. Louis, Detection of active site residues in bovine rhodopsin using network analysis, Trends
Bioinform. 8 (2) (2015) 63.
2
0 [11] A.E. Isaac, S. Sinha, Analysis of core-periphery organization in protein contact networks reveals groups of
structurally and functionally critical residues,
2
1 J. Biosci. 40 (4) (2015) 683–699. [12] A. Giuliani, et al., Proteins as networks: usefulness of graph theory in protein
science, Curr. Protein Pept. Sci. 9 (1) (2008) 28–38.
2
2 [13] C. Bode, et al., Network analysis of protein dynamics, FEBS Lett. 581 (15) (2007) 2776–2782.
2
3 [14] S. Khor, Static and dynamic characteristics of protein contact networks, 2010. ArXiv Preprint arXiv:1011.2222.
2
4 [15] L. Di Paola, et al., Protein contact networks: an emerging paradigm in chemistry, Chem. Rev. 113 (3) (2012)
1598–1613.
2
5 [16] L. Di Paola, A. Giuliani, Protein contact network topology: a natural language for allostery, Curr. Opin. Struct.
Biol. 31 (2015) 43–48.
2
6 [17] M.J. Pietal, J.M. Bujnicki, L.P. Kozlowski, GDFuzz3D: a method for protein 3D structure reconstruction from
contact maps, based on a non-Euclidean
2
7 distance function, Bioinformatics 31 (21) (2015) 3499–3505. [18] M. Vassura, et al., Reconstruction of 3D structures
from protein contact maps, IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 5 (3) (2008) 357–367.
2
8 [19] V. Sobolev, et al., SPACE: a suite of tools for protein structure prediction and analysis based on
complementarity and environment, Nucleic Acids Res.
2
9 33 (suppl 2) (2005) W39–W43. [20] B. Rafferty, Z.C. Flohr, A. Martini, Protein Contact Maps, 2014.
3
0 [21] http://www.ks.uiuc.edu/Research/vmd/plugins/contactmap/.
3
1 [22] E.L.L. Sonnhammer, J.C. Wootton, Dynamic contact maps of protein structures, J. Mol. Graph. Model. 16 (1)
(1998) 1–5.
3
2 [23] J.-L. Chung, et al., Con-Struct Map: a comparative contact map analysis tool, Bioinformatics 23 (18) (2007)
2491–2492.
3
3 [24] C. Vehlow, et al., CMView: interactive contact map visualization and analysis, Bioinformatics 27 (11) (2011)
1573–1574.
3
4 [25] D. Kozma, I. Simon, G.E. TusnÃidy, CMWeb: an interactive on-line tool for analysing residueâe’’residue
contacts and contact prediction methods,
Nucleic Acids Res. 40 (W1) (2012) W329–W333.
Q6
35
[26] H.K. Ho, M.J. Kuiper, R. Kotagiri, PConPy - a Python module for generating 2D protein maps, Bioinformatics 24
(24) (2008) 2934–2935.
3
6 [27] Z. Wang, J. Xu, Predicting protein contact map using evolutionary and physical constraints by integer
programming, Bioinformatics 29 (13) (2013)
3
7 [28] i266–i273.
J.E. Chen, C.C. Huang, T.E. Ferrin, RRDistMaps: a UCSF Chimera tool for viewing and comparing protein distance
maps, Bioinformatics (2015) btu841.
3
8 [29] M. Vassura, et al., FT-COMAR: fault tolerant three-dimensional structure reconstruction from protein contact
maps, Bioinformatics 24 (10) (2008)
3
9 1313–1315. [30] J. Andreani, J. Soeding, bbcontacts: prediction of beta-strand pairing from direct coupling patterns,
Bioinformatics (2015) btv041.
4
0 [31] H.-P. Sun, et al., Improving accuracy of protein contact prediction using balanced network deconvolution,
Proteins: Struct. Funct. Bioinform. 83 (3)
4
1 (2015) 485–496. [32] P. Di Lena, K. Nagata, P. Baldi, Deep architectures for protein contact map prediction,
Bioinformatics 28 (19) (2012) 2449–2457.
4
2 [33] B.M. Konopka, et al., Automated procedure for contact-map-based protein structure reconstruction, J. Membr.
Biol. 247 (5) (2014) 409–420.
4
3 [34] H. Atlan, The living cell as a paradigm for complex natural systems, ComplexUs 1 (1) (2003) 1–3.
4
4 [35] Z.N. Oltvai, A.-L. Barabasi, Life’s complexity pyramid, Science 298 (5594) (2002) 763–764.
4
5 [36] C. Koch, G. Laurent, Complexity and the nervous system, Science 284 (5411) (1999) 96–98.
4
6 [37] P. Smaglik, For my next trick, Nature 407 (6806) (2000) 828–829.
4
7 [38] J. Knight, Physics meets biology: Bridging the culture gap, Nature 419 (6904) (2002) 244–246.
4
8 [39] M.E. Csete, J.C. Doyle, Reverse engineering of biological complexity, Science 295 (5560) (2002) 1664–1669.
4
9 [40] E. Alm, A.P. Arkin, Biological networks, Curr. Opin. Struct. Biol. 13 (2) (2003) 193–202.
5
0 [41] S.R. Proulx, D.E.L. Promislow, P.C. Phillips, Network thinking in ecology and evolution, Trends Ecol. Evolut. 20
(6) (2005) 345–353.
5
1 [42] A.-L. Barabasi, Z.N. Oltvai, Network biology: understanding the cell’s functional organization, Nat. Rev. Genet.
5 (2) (2004) 101–113.
5
2 [43] S.H. Strogatz, Exploring complex networks, Nature 410 (6825) (2001) 268–276.
5
3 [44] C. Chothia, L. Michael, Structural patterns in globular proteins, Nature 261 (1976) 552–558.
5
4 [45] C.A. Orengo, et al., CATHâeä hierarchic classification of protein domain structures, Structure 5 (8) (1997)
1093–1109.
5
5 [46] A.G. Murzin, et al., SCOP: a structural classification of proteins database for the investigation of sequences
and structures, J. Mol. Biol. 247 (4) (1995)
5
6 [47] 536–540.
A.M. Lesk, Introduction to protein architecture, 2001.
5
7 [48] N.V. Dokholyan, et al., Topological determinants of protein folding, Proc. Natl. Acad. Sci. 99 (13) (2002)
8637–8641.
5
8 [49] A. del Sol, et al., Residues crucial for maintaining short paths in network communication mediate signaling in
proteins, Mol. Syst. Biol. 2 (1) (2006).
5
9 [50] S. Kundu, Amino acid network within protein, Physica A 346 (1) (2005) 104–109.
6
0 [51] M. Aftabuddin, S. Kundu, Weighted and unweighted network of amino acids within protein, Physica A 369 (2)
(2006) 895–904.
6
1 [52] D.J. Watts, Six Degrees: The Science of a Connected Age, Vintage, London, ISBN: 978-0-09-944496-1, 2003.
6
2 [53] H.M. Berman, et al., The protein data bank, Nucleic Acids Res. 28 (1) (2000) 235–242.
6
3 [54] R.A. Sayle, E.J. Milner-White, RASMOL: biomolecular graphics for all, Trends Biochem. Sci. 20 (9) (1995)
374–376.
64
1
[55] V. Batagelj, A. Mrvar, Pajek-program for large network analysis, Connections 21 (2) (1998) 47–57. 2
[56] M. Schiffer, A.B. Edmundson, Use of helical wheels to represent the structures of proteins and to identify
segments with helical potential, Biophys. J.
7 (2) (1967) 121. 3
[57] A.V. Guzzo, The influence of amino acid sequence on protein structure, Biophys. J. 5 (6) (1965) 809. 4
[58] J.W. Prothero, Correlation between the distribution of amino acids and alpha helices, Biophys. J. 6 (3) (1966)
367.
5
[59] D. Kotelchuck, H.A. Scheraga, The influence of short-range interactions on protein conformation, II. A model for
predicting the Î±-helical regions of
proteins, Proc. Natl. Acad. Sci. 62 (1) (1969) 14–21. 6
[60] P.N. Lewis, D. Kotelchuck, H.A. Scheraga, Helix probability profiles of denatured proteins and their correlation
with native structures, Proc. Natl. Acad.
Sci. 65 (4) (1970) 810–815. 7
[61] M. Froimowitz, G.D. Fasman, Prediction of the secondary structure of proteins using the helix-coil transition
theory, Macromolecules 7 (5) (1974)
583–589. 8
[62] Y. Zhang, Progress and challenges in protein structure prediction, Curr. Opin. Struct. Biol. 18 (3) (2008)
342–348. 9
Q7 [63] G.S.M. John, C. Rose, S. Takeuchi, Understanding Tools and Techniques in Protein Structure Prediction,
INTECH Open Access Publisher. 10
[64] A.E. Marquez-Chamorro, et al. Soft computing methods for the prediction of protein tertiary structures: A
survey, Appl. Soft Comput., 35, 398–410. 11
[65] M. Vendruscolo, E. Kussell, E. Domany, Recovery of protein structure from contact maps, Fold. Des. 2 (5)
(1997) 295–306. 12
[66] J. Glasgow, T. Kuo, J. Davies, Protein structure from contact maps: A case-based reasoning approach, Inf.
Syst. Front. 8 (1) (2006) 29–36. 13
[67] M.J. Zaki, Mining protein contact maps. in: The 3rd ACM SIGKDD Workshop on Data Mining in
Bioinformatics, BIOKDD, 2003. 14
[68] M.O. Swaroopa, K.S. Vani, Mining dense patterns from off diagonal protein contact maps, Int. J. Comput.
Appl. 49 (12) (2012). 15
[69] S. Bhavani, S. Sinha, Mining of protein contact maps for protein fold prediction, Wiley Interdiscip. Rev.: Data Min.
Knowl. Discovery 1 (4) (2011)
16
[70] 362–368.
O. Bedoya, I. Tischer, Reducing dimensionality in remote homology detection using predicted contact maps,
Comput. Biol. Med. 59 (2015) 64–72. 17
[71] G. Faure, A.l. Bornot, A.G. De Brevern, Analysis of protein contacts into protein units, Biochimie 91 (7)
(2009) 876–887. 18
[72] R. Bonneau, et al., Contact order and ab initio protein structure prediction, Protein Sci. 11 (8) (2002)
1937–1944. 19
[73] L. Bartoli, et al., The pros and cons of predicting protein contact maps, in: Protein Structure Prediction,
Springer, 2008, pp. 199–217. 20
[74] Y. Shi, et al., Protein contact order prediction from primary sequences, BMC Bioinformatics 9 (1) (2008)
255. 21
[75] A. Ramanathan, P.K. Agarwal, C.J. Langmead, Using tensor analysis to characterize contact-map
dynamics of proteins, 2008. 22
[76] U. Gobel, et al., Correlated mutations and residue contacts in proteins, Proteins-Struct. Funct. Genet. 18 (4)
(1994) 309–317. 23
[77] N. Gupta, N. Mangal, S. Biswas, Evolution and similarity evaluation of protein structures in contact map space,
Proteins: Struct. Funct. Bioinform. 59
(2) (2005) 196–204. 24
[78] K. Park, M. Vendruscolo, E. Domany, Toward an energy function for the contact map representation of proteins,
Proteins: Struct. Funct. Bioinform. 40
(2) (2000) 237–248. 25
[79] G.-Z. Zhang, D.-S. Huang, Z.-H. Quan, Combining a binary input encoding scheme with RBFNN for globulin
protein inter-residue contact map prediction,
Pattern Recognit. Lett. 26 (10) (2005) 1543–1553. 26
[80] G. Shackelford, K. Karplus, Contact prediction using mutual information and neural nets, Proteins: Struct.
Funct. Bioinform. 69 (S8) (2007) 159–164. 27
[81] J. Moult, et al., Critical assessment of methods of protein structure prediction-Round VII, Proteins: Struct.
Funct. Bioinform. 69 (S8) (2007) 3–9. 28
[82] A.N. Tegge, et al., NNcon: improved protein contact map prediction using 2D-recursive neural networks, Nucleic
Acids Res. 37 (suppl 2) (2009)
W515–W518. 29
[83] J. Eickholt, J. Cheng, Predicting protein residue-residue contacts using deep networks and boosting,
Bioinformatics 28 (23) (2012) 3066–3072. 30
[84] B. Xue, E. Faraggi, Y. Zhou, Predicting residue-residue contact maps by a two-layer, integrated neural-network
method, Proteins: Struct. Funct.
Bioinform. 76 (1) (2009) 176–183. 31
[85] A. Vullo, I. Walsh, G. Pollastri, A two-stage approach for improved prediction of residue contact maps, BMC
Bioinformatics 7 (1) (2006) 1. 32
[86] P. Chen, et al., Predicting contact map using radial basis function neural network with conformational energy
function, Int. J. Bioinform. Res. Appl. 4
(2) (2008) 123–136. 33
[87] W. Chu, et al., Bayesian segmental models with multiple sequence alignment profiles for protein secondary
structure and contact map prediction,
IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 3 (2) (2006) 98–113. 34
[88] A.A. Abu-Doleh, O.M. Al-Jarrah, A. Alkhateeb, Protein contact map prediction using multi-stage hybrid
intelligence inference systems, J. Biomed.
Inform. 45 (1) (2012) 173–183. 35
[89] Y. Li, Y. Fang, J. Fang, Predicting residue-residue contacts using random forest models, Bioinformatics 27
(24) (2011) 3379–3384. 36
[90] G. Pollastri, P. Baldi, Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral
propagation from all four cardinal corners,
Bioinformatics 18 (suppl 1) (2002) S62–S70. 37
[91] J. Cheng, P. Baldi, Improved residue contact prediction using support vector machines and a large feature
set, BMC Bioinformatics 8 (1) (2007) 1. 38
[92] P. Fariselli, et al., Prediction of contact maps with neural networks and correlated mutations, Protein Eng.
14 (11) (2001) 835–843. 39
[93] C.W. Howe, M.S. Mohamad, Protein residue contact prediction using support vector machine, World Acad.
Sci. Eng. Technol. (60) (2011) 1985–1990. 40
[94] C.E. Santiesteban-Toca, et al., Short-Range interactions and decision tree-based protein contact map predictor,
in: Evolutionary Computation, Machine
Learning and Data Mining in Bioinformatics, Springer, 2012, pp. 224–233. 41
[95] A.K. Mandle, P. Jain, S.K. Shrivastava, Protein structure prediction using support vector machine, Int. J.
Soft Comput. (IJSC) 3 (1) (2012) 67–78. 42
[96] C.E.S. Toca, M. Garcia-Borroto, J.S.A. Ruiz, Using short-range interactions and simulated genetic strategy to
improve the protein contact map prediction,
in: Pattern Recognition, Springer, 2012, pp. 166–175. 43
[97] A. Deka, K.K. Sarma, Artificial neural network aided protein structure prediction, Int. J. Comput. Appl. 48
(18) (2012) 33–37. 44
[98] C.E. Santiesteban-Toca, J.S. Aguilar-Ruiz, A new multiple classifier system for the prediction of protein’s
contacts map, Inform. Process. Lett. 115 (12)
(2015) 983–990. 45
[99] B. Monastyrskyy, et al., Evaluation of residue-residue contact prediction in CASP10, Proteins: Struct. Funct.
Bioinform. 82 (S2) (2014) 138–153.

(Doi 10.1016 - J.physa.2016.08.033) PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(Doi 10.1016 - J.physa.2016.08.033) PDF

Uploaded by

Copyright:

Available Formats

Physica A xx (xxxx) xxx–xxx

Minireview Protein contact maps: A binary depiction of protein 3D

Arnold Emerson Isaac

You might also like