You are on page 1of 10

Mathematical Biosciences 262 (2015) 147156

Contents lists available at ScienceDirect

Mathematical Biosciences
journal homepage: www.elsevier.com/locate/mbs

Bioinformatics in protein kinases regulatory network and drug discovery


Qingfeng Chen a,b,, Haiqiong Luo c , Chengqi Zhang d , Yi-Ping Phoebe Chen e,
a
School of Computer, Electronic and Information, Guangxi University, Nanning, 530004, China
b
State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, China
c
School of Public Health, Guangxi Medical University, Nanning, 530021, China
d
Centre for Quantum Computation & Intelligent Systems, University of Technology, Sydney P.O. Box 123, Broadway, NSW 2007, Australia
e
Department of Computer Science and Computer Engineering, La Trobe University, Vic 3086, Australia

a r t i c l e i n f o a b s t r a c t

Article history: Protein kinases have been implicated in a number of diseases, where kinases participate many aspects
Received 18 July 2014 that control cell growth, movement and death. The deregulated kinase activities and the knowledge of
Revised 16 January 2015
these disorders are of great clinical interest of drug discovery. The most critical issue is the development
Accepted 22 January 2015
of safe and ecient disease diagnosis and treatment for less cost and in less time. It is critical to develop
Available online 2 February 2015
innovative approaches that aim at the root cause of a disease, not just its symptoms. Bioinformatics including
Keywords: genetic, genomic, mathematics and computational technologies, has become the most promising option for
Bioinformatics effective drug discovery, and has showed its potential in early stage of drug-target identication and target
Data mining validation. It is essential that these aspects are understood and integrated into new methods used in drug
Protein kinase discovery for diseases arisen from deregulated kinase activity. This article reviews bioinformatics techniques
Drug for protein kinase data management and analysis, kinase pathways and drug targets and describes their
Pathway potential application in pharma ceutical industry.
Disease
2015 Elsevier Inc. All rights reserved.

1. Introduction Traditional approach to drug discovery depends on trial-and-error


of new chemical entities on cultured cells or animals, and matching
Protein kinases are viewed as the second most important group the apparent effects to treatments. Ligand-based drug design (indirect
of drug targets after G-protein-coupled receptors [22]. There are sev- drug design) and structure-based drug design (direct drug design) are
eral groups of protein kinases, and each group is then classied into two major types of drug design [43]. However, the drug developed
families or subfamilies [42]. Protein kinases are clinically relevant and from traditional methods might not be appropriate for all patients.
abnormal kinase activity is a frequent cause of a number of human dis- Some patients may be at risk of suffering serious side effects from new
eases. Nearly 400 human diseases have been reported to be connected drug. Further, traditional methods that largely depend on organism
to protein kinases, such as cancer [74,87], cardiovascular [73,100], level experimentation are time consuming and high cost.
neurological disorders [40,77], diabetes [76,83], rheumatoid arthri- Bringing a new drug to market from scratch typically takes
tis [84,101], and asthma [39,95,99]. Kinase activity is highly regulated 15 years and costs about $500 million. The pharmaceutical industry
by phosphorylation, by combining activator proteins or inhibitor pro- are constantly searching for a better understanding of fundamental
teins [65], or by changing their cellular location. The statistic data disease mechanisms, tools for early diagnosis and even pre-diagnosis
indicate that nearly 2% of human genes, including 500 protein kinase disease [52]. The successful sequencing of the genomes of human and
genes, are contained in the human genome [64]. Kinase activity has a other organisms in the past few years has opened the way to an en-
signicant effect on up to 30% of all human proteins. Thus, the inves- tirely new approach to drug design. A wealth of information on the
tigation of features of kinase activity is an attractive therapeutic and ingredients of patients at the genetic level is available due to the us-
pharmaceutical strategy for drug design and the treatment of human age of bioinformatics. A number of algorithms and tools from data
diseases. mining, machine learning, articial intelligence, statistics have been
successfully used for gene identication and classication, secondary
structure prediction and function annotation [25,32]. In particular,

Corresponding author. Tel.: +61 394796768.
some of them have been applied in complementing disease diagnosis
E-mail addresses: qingfeng@gxu.edu.cn (Q. Chen), hqluo@163.com (H. Luo), and treatment of illnesses [77]. They will surely to play an essential
chengqi.zhang@uts.edu.au (C. Zhang), phoebe.chen@latrobe.edu.au (Y.-P.P. Chen). role in drug target discovery.

http://dx.doi.org/10.1016/j.mbs.2015.01.010
0025-5564/ 2015 Elsevier Inc. All rights reserved.
148 Q. Chen et al. / Mathematical Biosciences 262 (2015) 147156

Recent studies have shown that the structural variations [36] of diseases, and arthritis owing to their families key function in signal
genome are implicated in a number of diseases and medical con- transduction for all organisms. In this regard, protein kinases rep-
ditions, ranging from genetic disorders to cancer, and are seen as resent as much as 30% of all protein targets under investigation by
increasingly important in pharmaceutical research and development pharmaceutical companies. Protein kinases are novel and excellent
and medical diagnostics [69]. The genome approach offers a blueprint drug targets of post genomic era. Recent successful launches of drugs
for ecient and personalized drug development. The tremendous with kinase inhibition as the mode of action demonstrate the ability
genome data make computational biology a central aspect for devel- to deliver kinase inhibitors as drugs with the appropriate selectivity,
oping more advanced and highly customized therapies [31]. In addi- potency, and pharmacokinetic properties [22,96].
tion, the technology may increase eciency and effectiveness of tests Bioinformatics is widely applied in identifying kinase-disease as-
for diagnosis of disease and patient-specic risk factors, and tools for sociations [81]. Ingenuity Pathway Analysis (IPA) software (Ingenuity
identication of individual patients likely to suffer from side effects Systems, www.ingenuity.com) was used to construct sub-networks
from taking certain drugs. signicantly associated with OSA (obstructive sleep apnea) [11]. The
Many valuable genome data are generated owing to high through- results indicate a novel association of phosphoinositide 3-kinase,
out biological techniques. Their management becomes a critical issue. the STAT family of proteins and its related pathways with OSA.
This may include gene data collection, gene annotation, structure IPA, MetaCore (http://www.genego.com/metacore.php), sigPathway
data modeling, standardization and normalization of data, database algorithm (http://watson.nci.nih.gov/bioc_mirror/packages/2.3/bioc/
establishment, and so forth. Further, the communication between html/sigPathway.html) were carried out to understand the basis for
heterogenous biological database would be another key issue and drug ecacy in the mouse model, by which to map similarities in
often requires data integration by removing inconsistent and noisy mTOR pathways in human lupus nephritis [72]. Gene set enrichment
data [16]. analysis (GSEA) version 2.0 [89] identied biological pathways as-
There have been an explosion of knowledge about signal transduc- sociated with resistance for each chemotherapy agent tested. Based
tion pathways, impacting virtually all areas of biology and medicine. on the rank-ordered gene list provided by GSEA, the top genes up-
Protein kinases are key regulators of cell function that constitute one regulated and down-regulated for docetaxel resistance were ana-
of the largest and most functionally diverse gene families [49]. Fur- lyzed using the connectivity map (cmap) to link genes associated
ther, many diseases, such as cancer, diabetes and neurodegeneration, with a phenotype with potential therapeutic agents, such as the as-
indicate underlying gene correlations to the disease phenotype. The sociation between phosphatidylinositol 3-kinase/AKT and docetaxel
study of in-depth knowledge of kinase pathways and possible role resistance.
to disease state is a big challenge [56]. There is still a long way to The kinase-disease associations can be seen at http://www.
go by combining the molecular biology, biochemistry, genetics and cellsignal.com/reference/kinase_disease.html. It provides the infor-
bioinformatics for modern drug development [15]. mation of kinases, including their groups, disease types and molecular
Many protein kinase databases have been created to explore the basis. By using Hanks classication scheme [41,42], the human pro-
genomics, function and evolution of protein kinases. However, the tein kinases can be clustered into groups, families, subfamilies on the
relevant data analysis and knowledge extraction for modern drug basis of the amino acid sequence similarity of their catalytic domains.
discovery have been underdeveloped. Obviously, it is impractical to The reported diseases mainly consist of cancer, diabetes, cardiovascu-
handle this arduous and challenging problem by just relying on tradi- lar, behavior, cardiopulmonary, neurodegeneration, vision, cognition,
tional biological experiments. In this regards, bioinformatics [53] that hypertension, inammation. Table 1 presents a summary for kinase-
include aspects of computer science, mathematics and molecular bi- disease associations. It is observed that a disease type can be related to
ology has become integral to process in that eld. multiple kinase groups, and several diseases can arise from a common
Bioinformatics has been widely applied to investigate the regu- set of kinase group. As a result, the investigation of their associations
latory mechanisms of protein kinase, including their structural and is useful to create regulatory pathways for drug discovery. For exam-
functional features [15]. Further, some researchers attempt to build ple, AMPK (AMP-activated protein kinase) may have a regulatory role
human protein kinase gene family and repository, by which to iden- in metabolism [38] and the therapeutic value of activating AMPK in
tify kinases that have a high probability of impacting human disease diabetes or metabolic syndrome is described in [20].
based on data analysis [93]. This is able to discover useful information Abnormal protein phosphorylation has been proved to be a pri-
from existing valuable data resources and greatly benet to the phar- mary cause of disease. There has been increasingly growth of in-
maceutical industry in the aspect of increasing accuracy and saving terest in developing activate kinase inhibitors. There have been a
cost. Although there have been many successful cases of bioinfor- number of kinase-targeted drugs approved by experiments. Despite
matics application in kinases and drug discovery. The bioinformatics some unsuccessful cases, some of them have been in the approval
is still easily underestimated in both its cruciality and its resource of clinical trials. Protein kinases have now become the second most
requirement [82]. This article aims to provide a literature review for
recent application and development of bioinformatics, protein kinase
regulatory network and drug discovery. Table 1
Summary of kinase-disease associations.
2. Protein kinase and diseases Disease types Kinase group

Cancer AGC, atypical, CAMK, CK1, CMGC, RGC, TK, TKL, STE,
Consistent with the complex role of the post-translational mod-
Development AGC, atypical, CMGC, RGC, STE, TK, TKL
ication in the cell, protein kinases can be regulated by activator Diabetes AGC, CMGC, TK
proteins, inhibitor proteins, ligand binding to regulatory subunits, Cardiovascular AGC, CAMK, CMGC, TKL
cofactors, and phosphorylation by other proteins or by themselves Behavior CK1, TKL
(autophosphorylation) [46,48]. Edmond H. Fischer and Edwin G. Krebs Hypertension AGC, CAMK, RGC
Neurodegeneration AGC, CAMK, CMGC, CK1
were awarded the 1992 Nobel Prize in Physiology and Medicine for Inammation CMGC, STE, TKL
discovering reversible protein phosphorylation as a biological regu- Vision AGC, RGC, TK
latory mechanism. Epilepsy CAMK, TK
Protein kinases have been viewed as a very attractive target class Cognition AGC, CMGC, STE, TKL
Immunity AGC, TK
for therapeutic interventions in many disease states such as can-
Reproduction AGC, TKL
cer, diabetes, obesity, autoimmune disorders, inammation, vascular
Q. Chen et al. / Mathematical Biosciences 262 (2015) 147156 149

important group of drug targets since G-protein-coupled receptors Protein kinases have been viewed as the most important class of
[22]. The target kinases for the approved drugs can be seen at http: drug target of cancer [23] in pharmaceutical industry. In the past
//en.wikipedia.org/wiki/Protein_kinase_inhibitor, in which some ex- decade, 20 kinase-targeted drugs have been approved for clinical
amples of kinase inhibitors in clinical use/trails and a comparison of use, and hundreds more are undergoing clinical trials. Further, the
available agents used as human medicines are presented. recent approval of the rst protein kinase inhibitors for the treat-
The rst oncogene is shown to be a protein kinase in 1978 [24]. ment of inammatory diseases, in combination with an enhanced
It provided the rst connections between abnormal protein phos- understanding of the signaling networks to control immune system,
phorylation and disease. In the early 1980s, The rst protein-kinase could be a surge of interest in this area in the future. For exam-
inhibitors, naphthalene sulfonamides, were introduced by Hiroyoshi, ple, there have been efforts for investigating the connections be-
from which several compounds were developed as inhibitors of pro- tween kinase and signalling networks, such as targeting protein ki-
tein kinases. Although many other compounds that were developed nases in the MyD88 signaling network for the development of drugs
subsequently are of relatively low potency in inhibiting protein ki- to treat chronic inammatory and autoimmune diseases, the pro-
nases, HA1077, also known as AT877 is progressed to human clin- tein kinases SPAK/OSR1 and LRRK2 for the treatment of hyperten-
ical use in the early 1990s and was approved in Japan in 1995 for sion and Parkinsons disease, and the progress in developing LRRK2
the treatment of cerebral vasospasm. At the beginning, all inhibitors inhibitors.
are ATP competitive. According to the three-dimensional structure of
protein kinase, it was then proved to be dicult to develop protein- 3. Bioinformatics and protein kinase
kinase inhibitors with the requisite potency and specicity since the
residues that were involved in binding ATP were conserved from 3.1. Kinase data
kinase to kinase. In 1994, the rst nanomolar inhibitors of recep-
tor protein tyrosine kinases were developed and cytokine-synthesis Due to the application of advanced biological techniques, there
anti-inammatory drugs are indicated to inhibit p38 MAPK. Gleevec has been an increasing growth of diverse protein kinase data. The
(CGP57148, STI-571) was in human clinical trials for the treatment tremendous data is not only valuable resource for deep understand-
of CML in 1996. In particular, just a small part of protein-kinase in- ing of kinase pathways and is a big challenge for further data analysis
hibitors can be used as drugs for the sake of toxicity, pharmacology in a systematic way. A number of kinase databases have been estab-
or solubility, such as SB203580. However, they could be very useful lished for storage, management and other purposes. Table 2 presents
research reagents. a collection of data sources of protein kinases.
The AMP-activated protein kinase (AMPK) system is a regulator of KKB (kinase knowledgebase) is Eidogen-Sertantys database of
energy balance at both the cellular and whole-body levels that, once kinase structureactivity and chemical synthesis data. Eidogen-
activated by low energy status, effects a switch from ATP-consuming Sertantys knowledgebase provides high quality training sets for com-
anabolic pathways to ATP-producing catabolic pathways. It now ap- putational scientists to build predictive QSAR models. This is designed
pears to be the major target for two existing classes of drug used to support medicinal chemists during all project stages of drug dis-
to treat type 2 diabetes, i.e., the biguanides and thiazolidinediones. covery, and aims to ensure that the clients receive all the relevant
However, in both cases these activate AMPK indirectly, and an in- information around their targets and anti-targets.
teresting question concerns whether a drug that directly activated KinBase explores the functions, evolution and diversity of protein
AMPK would retain the therapeutic benets of the existing drugs kinases, the key controllers of cell behavior. It focuses on the kinome,
while eliminating unwanted side effects. AMPK activators also now the full complement of protein kinases in any sequenced genome.
have potential as anticancer drugs. The kinome of an organism represents the set of protein kinases in
The rapid development of bioinformatics, such as high-throughput its genome [50]. The term was rst used by Gerard Manning and col-
techniques and biology big data analysis, provides a smarter way to leagues in their papers analyzing the 518 human protein kinases [64]
identify or predict inhibitors of the activity and activation of particular and the evolution of protein kinases throughout eukaryotes. KinBase
protein kinases in terms of a detailed understanding of their catalytic holds information on over 3000 protein kinase genes found in the
and regulatory characteristic patterns, and is likely to play more and genomes of human, and many other sequenced genomes. Users can
more important roles in clinical use. search the database by different gene names and accessions, or in
A number of kinase-targeted drugs have been developed in the terms of the sequence based classication. It provides Blast analysis
past few years. Although most were approved by FDA, some are of the human kinome by comparing mouse kinome to human kinome,
still in clinical trials, such as Fostamatinib by Rigel Pharmaceuticals, and carried out kinome analysis of several important organisms.
Lenvatinib by Eisai Co. In particular, several inhibitors are proved to Kinweb includes a collection of protein kinases encoded in the
be unsuccessful, and are possibly abandoned or withdrawn, such as human genome. It provides: a comprehensive analysis of functional
Mubritinib by Takeda, and Vandetanib by AstraZeneca . There are domains of each gene product; a prediction of secondary and ter-
27 FDA approved protein kinase inhibitors by April 2014. tiary structure motifs by using machine learning based programs; a

Table 2
Resources of information about kinases.

Data source Contents

KinMutbase (http://bioinf.uta./KinMutBase/) Information about kinases, gene and mutation in KinMutBase, and alignments
of sequences in KinMutBase
Kinasource (http://www.kinasource.co.uk/Database/substrates.html) Kinase substrate
Kinase.com (http://kinase.com/) Kinome, the full complement of protein kinases in any sequenced genome
Kinasedb (http://kinasedb.ontology.ims.u-tokyo.ac.jp:8081/) Classication of protein kinases and their functional conservation ortholog
tables among species, proteinprotein, proteingene, and proteincompound
interaction data, domain information, and structural information.
Kinweb (http://www.itb.cnr.it/kinweb/) A collection of protein kinases encoded in the human genome
ProQinase (http://www.proqinase.com/content/view/82) Protein Kinase Technology Platform for preclinical drug development of protein
kinase inhibitors in oncology and other therapeutic areas
KSD (http://sequoia.ucsf.edu/ksd/) A comprehensive compilation of aliases for each kinase
150 Q. Chen et al. / Mathematical Biosciences 262 (2015) 147156

collection of conserved sequence elements identied by comparative their effects by changing the phosphorylation states of intracellu-
analysis of human kinase genes and their murine counterparts, useful lar proteins. There are about 518 protein kinases genes, constitut-
to the identication of additional coding sequences. The users can ing about 2% of all human genes [64]. It is estimated that one third
query the database by gene name or query the database by selecting of the proteins can be phosphorylated and about half of the kinase
a classication group or a protein domain. groups is connected to diseases like cancer [97]. Some specic in-
KSD (kinase sequence database) is a collection of protein kinase hibitors of protein kinases have been reported for the treatment of
sequences grouped into families by homology of their catalytic do- cancer and chronic inammatory diseases [37]. For example, pro-
mains. The aligned sequences are available in MS Excel format, as well tein kinase (GSK3) may facilitate the development of drugs to treat
as in HTML. The current version of database features a total of 287 diabetes [86].
families, which contain 7128 protein kinases from 948 organisms. A variety of methods have been developed to explore phosphory-
KPD (kinase pathway database) is an integrated database in- lation. They focus on applying biology experiments to predict phos-
volving major completely sequenced eukaryotes. It also provides an phorylation sites, such as mass spectrometry [63]. However, tradi-
automatic pathway graphic image interface. The protein, gene, and tional experimental methods not only are high cost but also are easily
compound interactions are automatically extracted from abstracts for affected by many other reactions. Further, most discovered phospho-
all genes and proteins by natural-language processing (NLP). With this rylation sites do not contain kinase information at all. For example,
database, pathways can be compared among species using data with there are less than 12% records in database Phospho.ELM [29] are
more than 47,000 protein interactions and protein kinase ortholog annotated by kinase groups. Most phosphopeptides do not contain
tables. kinase information. A number of computational methods have been
Despite the valuable protein kinase data, their deep analysis are proposed to deal with the increasingly growth of phosphorylated pro-
insucient and their communication are underdeveloped. This pre- tein data, such as the KinasePhos2.0 using SVM [97], Netphosk using
vents us from obtaining a comprehensive understanding of the ki- neural network [8], PPSP using statistic, and PredPhospho based on
nase pathways and their potential usage in drug discovery. Thus, it support vector machine [55]. Unfortunately, these algorithms aim to
is critical to apply bioinformatics to deal with protein kinase data for identify phosphorylation sites, and are only able to deal with short
accurate and ecient drug design. peptide sequence around phosphorylation sites. This may result in
substantial loss of information. Moreover, some useless information
3.2. Phosphorylation may be introduced by just increasing sequence fragment length. This
can give rise to unexpected impact on the predicted results.
Protein phosphorylation is one type of post-translational mod- Protein kinases not only combine the adjacent domain to phospho-
ication of proteins, in which a serine(S)/threonine(T)/tyrosine(Y) rylation site of protein substrate, but also attempt to bind sequence
residue is phosphorylated by a protein kinase in combination with from remote domain, such as docking site [19]. In addition, there
phosphate group. Regulation of proteins in virtue of phosphoryla- also exist some known PBD (phosphopeptide-binding domains) with
tion, namely phosphoregulation, has been viewed as one of the most relevance to combination of phosphopeptide [90]. Although docking
widely used modes responsible for activating, deactivating, or gives sites and PDB are a little far from phosphorylation site, they are im-
rise to a change in protein function. Phosphorylation has been proved portant for its identication since both show more or less sequence
to play central roles in regulating the subcellular location and activity conservation. Thus, it is necessary to take into account the remote
of the transcriptional repressor [30]. domains to nd interesting featured patterns of phosphorylation
Kinases have been recognized as a metabolic master switch regu- sites.
lating the majority of cellular pathways. A protein kinase alters other
proteins by chemically adding phosphate groups to them. Phosphory-
lation usually leads to a functional alteration of the target protein by 3.3. Mathematical models of kinase activity
changing enzyme activity, cellular location, or association with other
proteins [9,12]. They regulate many aspects that control cell growth, Mathematical model is one of the most important methods to
movement and death. Thus, it is critical to explore the correlations depict integrated qualitative, dynamic and topological behavior or
between phosphorylation sites and protein kinases, especially the activity of intracellular signal network, including diverse relation-
distribution properties of amino acids around the phosphorylation ships between involved components. There are many computational
site of a specic kinase group. modeling techniques applied for different purposes, such as Bayesian
In recent years, large-scale phosphorylation experiments were network, HMM (hidden Markov model), neural network etc. Mathe-
performed to investigate the characteristic patterns of phosphopep- matical models can be presented as many forms, including dynamic
tide. A large number of data phosphorylation sites are accumulated systems, differentiate equation, statistical model, or logical models.
in various database, such as Phospho.ELM and PhophoSitePlus. There Although a model is initially for the topological representation of
have been considerable efforts for phosphorylation prediction. Com- its components and their correlations, the description in the model
putational methods for prediction of eukaryotic phosphorylation sites of the biological systems dynamic behavior enables its predictive
was recently reviewed by Trost and Kusalik [94], including the ma- power. The most important thing is to uncover and interpret those
chine learning technique used for handling sequence information, unexpected behavior encoded in the design of biological networks. A
structural information. This review summarizes, classies and com- collection of models of the MAPK pathway are discussed in [57] from
pares the computational techniques for phosphorylation site predic- different facets of emphasis, such as dual phosphorylations, feedback
tion, and offers an overview of the challenges that are faced when de- loop. In general, a mathematical model usually represents a system
signing predictors and how they have been addressed. This aids users as a set of variables and a set of equations that establish relationships
to choose an appropriate phosphorylation site predictor for their spe- between the variables.
cic or in general biological application, or to attempt to extend or Kinetic equations are applied to describe all molecular interactions
adapt existing techniques in the future. in [80]. The dissociation reaction is written as:
The reversible phosphorylation of proteins is in responsible for k+ kon
almost all aspects of cell life, while abnormal phosphorylation is A + B A B C (1)
k koff
a cause or consequence of many diseases [21]. Mutations in par-
ticular protein kinases and phosphatases lead to a number of dis- where k+ and kon represent the diffusion-dependent movement to-
orders, and many naturally occurring toxins and pathogens exert ward the A B complex, and the following reaction, respectively.
Q. Chen et al. / Mathematical Biosciences 262 (2015) 147156 151

Suppose kf and kr are the overall forward rate constant and overall and present relative changes instead of changes in absolute concen-
reverse rate constant, respectively. They are dened as trations. Without standardization of measurements, users have to
compare the data from different laboratories in a qualitative form.
kon k+
kf = (2) These block the use of ordinary differential equations (ODEs) since it
kon + k+ requires absolute changes in concentrations and exact reaction rates
as input. As a result, people expects to move seamlessly between
koff k different levels of abstraction that can describe individual reactions,
kr = (3)
koff + k network modules and whole networks.

Kinetic parameter and state variables are two primary compo- 3.4. Kinase data analysis for drug discovery
nents of the mathematical model and present the state of a system in
a certain time, such as the involved molecules of a specic compound. The kinase data analysis includes different aspects, such as features
As indicated by the authors, the former is derived from literature, and of kinase sequence and structures, kinase classication and identi-
includes MichaelisMenten constants, turnover numbers, and rate cation. There are many methods or tools used to address kinase data.
constants of association and dissociation. It is based on ordinary dif- It is uneasy to describe all in this article. We focus on introducing typ-
ferential equations (ODES) and comprises 94 state variables and 95 ical bioinformatics algorithms or tools with respect to protein kinase
parameters. Another kinetic rate-based model [92] is proposed for and drug discovery.
the MAP kinase pathway, which comprises a cytosolic subsystem and Discovery of kinase pathways. To fully understand protein kinase
a nuclear subsystem. In particular, proteomic data are used for in- networks, it is critical to identify regulators and substrates of kinases,
ferring kinetic parameters and developing more accurate signaling especially for weakly expressed proteins. To our knowledge, the used
pathway. Genetic algorithm is applied to estimate the parameters, bioinformatics techniques can include:
and a MATLAB toolbox is used to infer the unknown rate constants.
Macromolecular crowding has been proved to impact on the intra- identifying kinase docking site;
cellular signaling pathway. However, the understanding of how the clustering and visualization of phosphorylation proles of regula-
macromolecular crowding affects the overall reaction rate and pro- tors;
cessivity of enzymes is still insucient. Mathematical models of re- mining associations between kinase subunits, and between sub-
action kinetics are proposed to address the effect of thermodynamic units and stimuli;
activity, viscosity and processivity in an environment with macro- others.
molecular crowding [3]. Further, they provide empirical validations
by employing in vitro ERK MAP kinase phosphorylation. A hybrid computational search algorithm has been developed,
A mathematical theory [45] is developed for modeling linear which integrates machine learning and expert knowledge to discover
kinase-phosphatase cascades, as well as systems containing feed- kinase docking sites. This algorithm was used to search the human
back interactions, crosstalk with other signaling pathways, and/or genome for novel MAP kinase substrates and regulators focused on
scaffolding and G proteins. Three key questions must be answered for the JNK family of MAP kinases [91].
any signal transduction system, including (1) the speed that the signal Cluster 3.0 [28] is used to classify the phosphorylation proles of
arrives the expected destination, (2) the time the signal lasts, and (3) regulators, which are constructed by quantifying response regulator
the strength of the signal. The authors introduce three parameters to bands in each prole. Further, the prole is visualized using Java
answer these questions, including (1) the signal time i , which means Treeview [75]. It is applied to dissect the basis of phosphotransfer
the average time to activate kinase i, (2) the signal duration i , which specicity in two-component signaling pathways in [10].
is the average time during which the kinase i is activated, and (3) the Association rule mining is used to analyze the AMPK regulation
signal amplitude Si , which is the average concentration of activated data derived from the published experimental results [15]. A number
kinase i. A phosphorylation step is viewed as a reaction between the of rules of interest are discovered from mining AMPK data, such as
phosphorylated form Xi1 of kinase i 1 within the pathway and the A = {moderate intensity treadmill} B = {high expression of 2a ,
nonphosphorylated form X i of a downstream kinase i. The phospho-
high expression of 2a }. They reveal numerous potential associations
rylation rate for each reaction is expressed as between the states of subunit isoforms of AMPK, or between the
vp,i = i Xi1 Xi (4) stimulus factor and the state of isoforms, many of which are useful
for drug design. Further, negative rule association [98], represented
where i is the second order rate constant for phosphorylation by the as the forms of X Y, X Y or X Y can be applied for
ith kinase. investigating the potential inhibitive regulatory correlation between
The dephosphorylation rate of the ith kinase is dened as the subunit isoforms of protein kinases, and the stimulus factors.
Advanced bioinformatics and systems biology tools assist in un-
vd,i = i Xi (5)
locking network properties and modeling kinase pathways. Recent
where i is the rate constant for dephosphorylation by the ith phos- examples are the development of a predictor for breast cancer prog-
phatase. nosis based on the modularity of protein interaction networks and
The concentration of each activated kinase i (except the rst ac- the identication of cancer-associated phosphorylation networks
tivated kinase in the pathway ) is represented as a function of time, through the combined alignment of conserved phosphorylation sites
namely Xi (t), and is given by differential equations: and kinase-substrate networks [58]. Moreover, Fuzzy logic model is
applied to encode the dynamics of a complex intracellular signal-
dXi
= vp,i vd,i = i Xi1 Xi i Xi (6) ing network of protein kinases [2]. In the similar way, an ecient
dt framework based on hidden Markov models (HMMs) is presented for
Computational model has been widely applied to understand the nding homologous pathways in a network of interest [71].
behavior of complex biological systems by offering useful informa- Kinases in gene expression. A rational approach is presented
tion. Although it also performs many provocative predictions that for identifying and ranking protein kinases that are likely respon-
were not experimentally tested, most of the data that biologists gen- sible for observed changes in gene expression [5]. By combining pro-
erate cannot be directly amenable to dynamic modelling. They are moter analysis, it can identify and rank candidate protein kinases
often qualitative instead of quantitative, include sparse time series, for knock-down, or other types of functional validations, based on
152 Q. Chen et al. / Mathematical Biosciences 262 (2015) 147156

genome-wide changes in gene expression. This describes how pro- algorithm [79] is used to measure the similarity of druggable protein-
tein kinase candidate identication and ranking can be made robust ligand binding sites to the ATP-binding site of Pim-1 kinase (PDB entry
by cross-validation with phosphoproteomics data as well as through 1yhs with bound inhibitor staurosporine) [27].
a literature-based text-mining approach. Data integration can rapidly Kinase-inhibitor proling panel is another important kinase-
advance drug target discovery and unraveling drug mechanisms of specic topic. Many researchers proposed high-throughput prol-
action. ing studies for interaction between many protein kinases versus
Inappropriate activation of AKT signaling is a relatively common many chemical compounds. Some of these huge data are publicly
occurrence in human tumors, and can be caused by activation of accessed, such as moleculekinase interaction map for clinical ki-
components of, or by loss or decreased activity of inhibitors of this nase inhibitors [33] and a systematic interaction map of validated
signaling pathway. Causal network modeling is a systematic compu- kinase inhibitors with Ser/Thr kinases [34]. A number of informatics
tational analysis that identies upstream changes in gene regulation studies based on these data have been published, and are recently
that can serve as explanations for observed changes in gene expres- summarized in [35]. It presents machine learning methods either for
sion [60,61]. classication (binds/does not bind) or regression on the measured
Kinase inhibition. While system biology and personalized inhibition values. Variants of nave Bayesian classier, support vector
medicine are becoming increasingly important, there is a urgent need machine, decision tree, k-nearest neighbors were used to learn asso-
to map the inhibition prole of a compound on a large panel of tar- ciations between kinase residues and compound fragments. Another
gets by using both experimental and computational methods. This is class of works includes those aims to obtain a more accurate repre-
especially important for kinase inhibitors, given the high similarity sentation of kinase binding sites by taking advantage of kinase 3D
at the binding site level for the 518 kinases in the human genome. A structures.
new method is proposed and validated to predict the inhibition map
of a compound by comparison of binding pockets [67].
4. Kinase genes and drug targets
Kinase inhibitors have potential for treatment of many diseases.
Currently there are a number of drugs approved or in develop-
Many kinase genes have been identied in the human genomes,
ment that target protein kinases. More details can be seen at http:
and other sequenced genomes. For example, KinBase holds infor-
//en.wikipedia.org/wiki/Protein_kinase_inhibitor and will not be re-
mation on over 3000 protein kinase genes. Many databases provide
peated herein. However, current inhibitors interact with a broad va-
searching tools in terms of a variety of different gene names and ac-
riety of kinases and interfere with multiple vital cellular processes,
cessions, or according to the sequence based classication. The chal-
which causes toxic effects. Bioinformatics approaches that can pre-
lenge to bioinformatics is evolving from that of identifying long lists
dict inhibitorkinase interactions from the chemical properties of the
of genes to that of determining short lists of the targets most likely to
inhibitors and the kinase macromolecules might aid in design of more
play central roles in diseases.
selective therapeutic agents, that show better ecacy and lower tox-
A variety of gene identication algorithms have been developed
icity. Proteochemometric modelling is applied to correlate the prop-
in drug discovery, including the identication of positions, secondary
erties of 317 wild-type and mutated kinases and 38 inhibitors to the
structures and functions. Nevertheless, the discussion of these al-
respective combinations interaction dissociation constant (K d ) [62].
gorithms are not the key point in this article. In contrast, this pa-
A report of protein kinases as targets for inhibitor design in [65] is
per focuses on describing the associations with respect to kinase
presented as follows:
genes and drug targets. It is presented in terms of the following
aspects.
Receptor tyrosine kinases. Dysregulation of growth factor sig-
naling networks has been reported in multiple human cancers.
Kinase gene and diseases
Binding of growth factors to extracellular domains of receptor ty-
Human kinome in drug discovery
rosine kinases activates the intracellular kinase domain. Based on
EGFR (epidermal growth factor receptor) and VEGFR (the vascular
endothelial growth factor receptor), different therapeutic agents 4.1. Kinase genes and diseases
have been developed for anticancer drug development.
Nonreceptor tyrosine kinases. About one third of tyrosine kinases The gene family encoding protein kinases is the most commonly
are classed as nonreceptor tyrosine kinases. mutated in human cancer. Moreover, mutated and activated protein
Phosphatidylinositol 3-kinase (PI3K). Resistance to radiation kinases have proved to be tractable targets for the development of
treatment in a number of cancers has been linked to activation new anticancer therapies. At http://www.sanger.ac.uk/genetics/CGP/
of the PI3K-AKT pathway, which suggests that inhibition of PI3K Kinases/, it examined the full coding sequence of the protein kinase
to overcome resistance and to improve the ecacy of radiation genes (518 protein kinases, more than 1.3 Mb of DNA per sample)
treatment is an attractive clinical goal [66]. in primary cancers and cancer cell lines. Targeting receptor protein
Signal-transducing serine-threonine kinases. p38 selective in- tyrosine kinases (RPTKs) as a cancer chemotherapy has continued
hibition could be a therapeutically useful target route to treatment to become a compelling approach for a long time. Preclinical and
of a number of inammatory and autoimmune diseases [70]. clinical data strongly support the involvement of specic RPTKs in
Cyclin-dependent kinases (CDKs) and other cell cycle control the formation and progression of a subset of solid and liquid tumors.
kinases. Nature has provided many useful CDK inhibitors that have As mentioned above, there are various kinases that are related
reached clinical trials. High-throughput screening of compound to different diseases, such as cancer and diabetes. To explore the
libraries has also disocovered many CDK inhibitors. functions, evolution and diversity of protein kinases, it is necessary
to transfer our focus to their relevant sequenced genome. This assists
Computer-aided efforts, including molecular dynamics (MD) sim- us in obtaining a deep and comprehensive understanding of kinase
ulations and anisotropic network model (ANM) normal mode analy- pathways with respect to drug discovery.
sis, are used for generating potential ligand-bound conformers start- It has been commonly recognized that gene mutation, including
ing from the apo state of p38 [6]. A structure modeling-based method deletion, insertion and duplication, can result in diseases. No matter
is developed for sequence and structure analysis, which is helpful which case occurs, the protein made by the gene may not function
in identifying the ligand binding sites and molecular function of the properly. In other words, the mutation can alter the function of the re-
Leishmania specic mitogen-activated protein kinase [78]. SiteAlign sulting protein. Thus, the investigation of disease relevant kinase gene
Q. Chen et al. / Mathematical Biosciences 262 (2015) 147156 153

Table 3 Neural networks in combination with sequence similarity are trained


An example of kinase-gene associations.
to recognize splice sites by NetGene2.
Gene Species Kinase classication PipMaker (http://pipmaker.bx.psu.edu/pipmaker/) and Ge-
AKT1 Human AGC: Akt
neWise http://www.ebi.ac.uk/Tools/Wise2/index.html) are tools for
AKT2 Human AGC: Akt homology-based prediction. The former compares two long DNA
AKT3 Human AGC: Akt sequences to identify conserved segments in terms of the alignment
BARK1 Human AGC: GRK: BARK engine called BlastZ and a scoring matrices. The latter was developed
MAST1 Human AGC: MAST: MAST
from a combination of hidden Markov models to predict gene
structure using similar protein sequence.
families is critical. Meanwhile, their analysis including sequences and Sequence similarity and structure similarity are widely used
structure is also a big challenge for us. strategies in gene identication for functional genomic. A variety of
Table 3 presents a list of genes with respect to specied kinase approaches have been developed for measuring the similarities, such
categories. Suppose one gene is found to be related to a kinase group, as BLAST and FASTA for sequence similarity, DALI (distance alignment
or its families or subfamilies, we are able to link genes and diseases via matrix method used a Monte Carlo simulation) [47] for protein align-
kinases. For example, in Table 1, AGC kinase group is a frequent cause ment and FOLDALIGN for RNA structural alignment [44]. Recent stud-
of many diseases. Kinase types can be viewed as the key, by which ies indicate that RNA structures perform many important regulatory,
a comprehensive regulatory networks for protein kinases, genes and structural, and catalytic roles in the cell. They have been found in most
diseases can be generated. organism and comprise functional domains within ribozymes, self-
The leader gene approach, a data mining method, for gene search splicing introns, ribonucleoprotein complexes, viral genomes, and
in a specic process is used to identify the common genomic path- many other biological systems [1]. Association rule mining is applied
ways between periodontitis and type 2 diabetes [26]. ENDEAVOUR to identify hidden structurefunction patterns within RNA [13,14].
software is applied to prioritize all genes of the whole genome There are also a number of reported approaches and case studies
in relation to type 2 diabetes [88]. A regularized Bayesian inte- that specialize in kinase gene identication. Thirty-four CDPK genes
gration system, HEFalMp (Human Experimental/Functional Mapper, that are widespread in plant consist of a multigene family were iden-
http://function.princeton.edu/hefalmp), aims to provide maps of tied by a genome-wide analysis of Arabidopsis CDPKs [17]. However,
functional activity and interaction networks in over 200 areas of only a limited number of rice CDPK family members have been char-
human cellular biology. It allows users to interactively explore func- acterized. A database search was performed to identify CDPK family
tional maps integrating evidence from thousands of genomic experi- genes in the rice genome using entire amino acid sequences from
ments, focusing as desired on specic genes, processes, or diseases of previously identied rice CDPKs and the full-length rice cDNA clones
interest [54]. encoding CDPKs [4]. The genomic sequences were selected as can-
didates according to the score by the TBLASTN algorithm, and the
4.2. Kinase gene identication obtained candidate sequences were analyzed by the BLASTX program
and motif scanning, by which to sort out CDPKs from other related
A number of algorithms and tools are developed for gene identi- protein kinases. By the above search, 29 rice CDPKs are discovered.
cation. As described in [59], they can include software tools for (1) ab A set of host kinase genes required for inuenza virus replication
initio gene prediction, (2) splicing site prediction, and (3) homology- and the regulatory role of microRNAs were identied [7], in which a
based prediction. ab initio gene prediction relies on the own informa- small interfering RNA (siRNA) screen of 720 HPKs was performed. It
tion in the DNA sequence, such as promoter, intro or exon, and use assists in understanding the role of HPKs (human protein kinases) and
statistical parameter to predict genes. In contract, homology-based the signaling networks that impact inuenza virus replication. Up-
prediction depends primarily on nding homologous sequences in regulation and down-regulation of transcript expression of induced
other genomes and/or in public databases using BLAST, or Smith protein kinases by miRNAs and protein expression are applied by
Waterman algorithms. This method compares newly obtained se- performing an RNAi-based genetic screen.
quence data from experiments with known gene information. It uses Eukaryotic protein kinases (PKs) represent one of the largest pro-
sequence alignment to construct gene models and validates the pre- tein superfamilies [18]. PKs are related by the presence of a con-
dicted genes through similarity searches including sequence similar- served kinase domain (or catalytic domain). Based on the substrate
ity or structure similarity. Splicing site prediction is commonly used specicity, PKs can be further divided into protein tyrosine kinases
as complement in the gene prediction tools. (PTKs) and protein serine/threonine kinases (PSKs). Genomic DNA is
Many software tools can be publicly accessed for ab ini- applied as template to perform PCR (polymerase chain reaction) to
tio gene prediction. The representative works include Gene- screen for protein kinase (pk) genes for the round-spotted puffer-
Mark (http://exon.gatech.edu/GeneMark/), GENSCAN (http://genes. sh Tetraodon uviatilis. Forty-one T. uviatilis pk genes encoding 7
mit.edu/GENSCAN.html), GeneBuilder (http://http://zeus2.itb.cnr. receptor tyrosine kinases, 14 nonreceptor tyrosine kinases, 16 ser-
it/~webgene/genebuilder.html/). GeneMark employed a non- ine/threonine kinases, 1 dual kinase and 3 novel kinases have been
homogeneous Markov model to classify DNA regions into protein- identied.
coding, non-coding, and non-coding but complementary to coding.
GENSCAN uses a complex probabilistic model of the gene structure 4.3. Challenges in drug discovery
that is based on actual biological information about the properties of
transcriptional, translational, and splicing signals. GeneBuilder is an The completion of Human Genome project had assembled a ge-
integrated computing system for protein-coding gene prediction. netic blueprint for human being. Genome science has accelerated
NNSplice (http://www.fruity.org/seq_tools/splice.html). Splice- genetic research and revolutionized the diagnosis, prevention and
view (http://zeus2.itb.cnr.it/~webgene/wwwspliceview.html), Net- treatment of various diseases. Genetics has showed its signicance in
Gene2 (http://www.cbs.dtu.dk/services/NetGene2/) are three typical clinical medicine. It collects and studies medical histories and DNA
methods for splicing site prediction. NNSplice uses a decision tree samples from family members. Genetic analysis of these samples as-
method called maximal dependence decomposition (MDD), and en- sists researchers in nding genes or patterns of genes that are different
hances it with Markov models that capture additional dependencies among affected and unaffected family members and that may be re-
among neighboring bases in a region around the splice site. Splice- lated to the disease. Genetic testing aids in dening clinical indicators
view is based on prediction of splice signals by classication approach. of a hereditary predisposition to develop cancers.
154 Q. Chen et al. / Mathematical Biosciences 262 (2015) 147156

In recent years, there have been an explosive growth of dis- (tyrosine kinases). To do this, the kinases is classied into groups by
ease genes such as oncogenes found due to the high throughput using feed-forward multilayer perceptron (MLP) neural networks and
screening. For example, Cancer Gene (https://cabig.nci.nih.gov/ structural, electronic, and hydrophobic descriptors of the amino acids
inventory/data-resources/cancer-gene-index) Index is a collection at the adenosine-binding site [68].
of records on 6955 human genes identied from the literature as In general, tradition virtual drug screening has a limitation, namely
having an association with cancer. Diabetes Disease Portal (http: assuming a well-dened binding site. A ligand binding pathway anal-
//rgd.mcw.edu/) provides gene information for several diseases, such ysis for kinase using long time molecular dynamics simulation was
as diabetes, obesity and metabolic diseases. It is observed that genes performed by Shan et al. to identify target binding site [85]. Unlike
and diseases might be cross-correlated. Bioinformatics is an ecient traditional methods, it does not demand for any prior knowledge of
way to validate those discovered targets and sort out short lists of the the binding sites location. This may assist in the development of al-
targets most possibly to be critical in disease. losteric inhibitors that target previously undiscovered binding sites.
The identication of susceptibility genes for cancers allows testing Further, due to identication of previously unknown binding sites,
before symptoms become apparent. If a single gene is responsible, molecular dynamics(MD) simulations of protein-ligand binding, may
testing during pregnancy or at any other appropriated time of life for greatly extend the applicability of computational techniques to drug
the specic cancer may predict a high risk or eliminate concern about development.
that particular cancer. For example, if a person is found to carry an
inherited mutation in one of the cancer mutation repair genes, he/she
5. Conclusions
could benet from annual xed examination. Thus, any pathological
changed would be detected and eliminated before they progress to a
Emerging varied diseases have been a big challenge to current
potentially invasive cancer.
health care system. Many evidences indicate protein kinases are in-
Microarray allow the gene expression proles of thousands of
volved in a number of diseases. Many diseases have their roots in
genes to be measured and compared with each other in cells and
genes. Thus, the study of abnormalities, disorder or mutation of ki-
tissues between healthy and diseased states. Those genes that are
nase genes is a promising and ecient way for drug target discovery
functionally associated usually share similar expression proles. To
because they can change the function of genes and have an effect on
ensure the accuracy, the microarray data is combined with pathways
relevant diseases. The discovery of protein kinase networks including
that exist in the context of complex proteinprotein interaction net-
kinase types, diseases, inhibitors and kinome assist us in understand-
works. It is necessary to use gene ontologies (GO), which describes
ing the root cause of diseases. Bioinformatics combing genetics and
gene products in terms of their associated biological processes, cel-
genomics technologies has become a critical aspect of drug discov-
lular components and molecular functions in a species-independent
ery. Applying bioinformatics into target discovery and data manage-
manner.
ment can not only increase the eciency and accuracy of drug design
Most of the genetic disorders originate from a mutation in one
and generate personalized drug development. This article reviews
gene. One of the most dicult issues is to determine how genes con-
the techniques for protein kinase networks, including kinase data,
tribute to diseases that have a complex pattern of inheritance, such
kinase inhibitors, kinase gene families, and their properties. Further,
as in the cases of diabetes, asthma and cancer. In other words, no
the relevant algorithms, tools and the associations between kinome
one gene is able to denitely say whether a person has a disease or
and drug discovery are highlighted.
not. It is possible that several mutations occur before the disease is
evident, and many genes may each partially contribute to a persons
susceptibility to a disease; genes may also affect a persons reactions Acknowledgments
to environmental factors. It is a big challenge to unravel these net-
works of events. It will be undoubtedly assisted by the availability of The work reported in this paper was partially supported by a
the sequence of the human genome [51]. National Natural Science Foundation of China project 61363025 and
Kinase SARfari is an integrated chemogenomics database for two key projects of Natural Science Foundation of Guangxi, Guangxi,
kinase, which can be accessed by https://www.ebi.ac.uk/chembl/ China 053006 and 019029.
sarfari/kinasesarfari/. The system focuses on the protein kinase family
of drug targets. It combines chemical and biological data and provides References
a platform that integrates kinase bioactivity data and links bioactiv-
ities to kinase sequence, structure, compounds and screening data. [1] D.P. Aalberts, N.O. Hodas, Asymmetry in RNA pseudoknots: Observation and
It comes populated with diverse chemical and biological data re- theory, Nucleic Acids Res. 33 (7) (2005) 22102214.
[2] B.B. Aldridge, J. Saez-Rodriguez, J.L. Muhlich, P.K. Sorger, D.A. Lauffenburger,
sources, which comprises all human protein kinase sequences and Fuzzy logic analysis of kinase pathway crosstalk in TNF/EGF/INSULIN-induced
a large number of model organism orthologs; protein kinase clini- signaling, PLoS Comput. Biol. 5 (4) (2009) e1000340.
cal candidates and FDA-approved drugs. In particular, more than 700 [3] K. Aoki, K. Takahashi, K. Kaizu, M. Matsuda, A quantitative model of ERK MAP
kinase phosphorylation in crowded media, Sci. Rep. 3 (2013) 1541.
3D structural domains from the protein data bank (PDB) with com- [4] T. Asano, N. Tanaka, G. Yang, N. Hayashi, S. Komatsu, Genome-wide identication
plete binding-site focused structural superposition, more than 14,000 of the rice calcium-dependent protein kinase and its closely related kinase gene
protein kinase compounds and more than 57,000 structureactivity families: comprehensive analysis of the CDPKs gene family in rice, Plant Cell
Physiol. 46 (2) (2004) 356366.
relationship (SAR) screening are included. This greatly increases pro- [5] M. Avi, C.H. John, Protein kinase target discovery from genome-wide messenger
ductivity and facilitate knowledge discovery. RNA expression proling, Mt. Sinai J. Med. 77 (4) (2010) 345349.
Most kinase inhibitors under development as drugs act by directly [6] A. Bakan, I. Bahar, Computational generation inhibitor-bound conformers of
p38 map kinase and comparison with experiments, Proc. Pac. Symp. Biocomput.
competing with ATP at the ATP-binding site of the kinase. There are
(2011) 181192.
more than 500 protein kinases, and the ATP-binding site is highly con- [7] A. Bakre, L.E. Andersen, V. Meliopoulos, et al. Identication of host kinase genes
served among them. Therefore selectivity is an essential requirement required for inuenza virus replication and the regulatory role of microRNAs,
for clinically effective drugs, and understanding the structural char- PLOS One 8 (2013) e66796.
[8] N. Blom, T. Sicheritz-Ponten, R. Gupta, S. Gammeltoft, S. Brunak, Prediction of
acteristics of ATP-binding sites is of crucial importance. The objective post-translational glycosylation and phosphorylation of proteins from the amino
of the present study was to elucidate the structural characteristics acid sequence, Proteomics 4 (6) (2004) 16331649.
of the adenosine-binding site of four major kinase groups, AGC (PKA, [9] D. Bungard, B.J. Fuerth, P.Y. Zeng, B. Faubert, N.L. Mass, B. Viollet, D. Carling,
C.B. Thompson, R.G. Jones, S.L. Berger, Signaling kinase AMPK activates stress-
PKG, and PKC families), CaMK (calcium/calmodulin-dependent pro- promoted transcription via histone H2B phosphorylation, Science 329 (5996)
tein kinases), CMGC (CDK, MAPK, GSK3, and CLK families), and TK (2010) 12011205.
Q. Chen et al. / Mathematical Biosciences 262 (2015) 147156 155

[10] E.J. Capra, B.S. Perchuk, E.A. Lubin, O. Ashenberg, J.M. Skerker, et al. Systematic [45] R. Heinrich, B.G. Neel, T.A. Rapoport, Mathematical models of protein kinase
dissection and trajectory-scanning mutagenesis of the molecular interface that signal transduction, Mol. Cell 9 (2002) 957970.
ensures specicity of two-component signaling pathways, PLoS Genet. 6 (11) [46] S. Higashiyama, H. Iwabuki, C. Morimoto, M. Hieda, H. Inoue, N. Matsushita,
(2010) e1001220. Membrane-anchored growth factors, the epidermal growth factor family: be-
[11] E. Castiglioni, et al. Sequence variations in mitochondrial ferritin: distribution yond receptor ligands, Cancer Sci. 99 (2) (2008) 214220.
in healthy controls and different types of patients, Genet. Test. Mol. Biomarkers [47] L. Holm, C. Sander, Mapping the protein universe, Science 273 (1996) 595603.
14 (6) (2010) 793796. [48] http://www.abgent.com/docs/article_kinases.
[12] Q.F. Chen, Y.P.P. Chen, Mining inhibition pathways for protein kinases on skeletal [49] http://www.cellsignal.com/reference/kinase/index.html.
muscle, IEEE Intell. Syst. 27 (5) (2012) 1926. [50] http://kinase.com/.
[13] Q.F. Chen, Y.P.P. Chen, Mining characteristic relations bind to RNA secondary [51] http://www.ncbi.nlm.nih.gov/books/NBK22183/.
structures, IEEE Trans. Inform. Technol. Biomed. 14 (1) (2010) 1015. [52] http://www.sciencemag.org/site/products/drugdiscnew.xhtml.
[14] Q.F. Chen, Y.P.P. Chen, Discovery of structural and functional features in RNA [53] L. Hunter (Ed.), Articial Intelligence and Molecular Biology, MIT Press, 1993.
pseudoknots, IEEE Trans. Knowl. Data Eng. 21 (7) (2009) 974984. [54] C. Huttenhower, E.M. Haley, M.A. Hibbs, V. Dumeaux, D.R. Barrett, H.A. Coller,
[15] Q.F. Chen, Y.P.P. Chen, Mining frequent patterns for AMP-activated protein reg- O.G. Troyanskaya, Exploring the human genome with functional maps, Genome
ulation on skeletal muscle, BMC Bioinform. 7 (2006) 394. Res. 19 (6) (2009) 10931096.
[16] C. QF, Y.P.P. Chen, C.Q. Zhang, Detecting inconsistency in biological molecular [55] J.H. Kim, J. Lee, B. Oh, K. Kimm, I. Koh, Prediction of phosphorylation sites using
databases using ontology, Data Min. Knowl. Discov. 15 (2) (2007) 275296. SVMs, Bioinformatics 20 (2004) 31793184.
[17] S.H. Cheng, M.R. Willmann, H.C. Chen, J. Sheen, Calcium signaling through pro- [56] H. Kitano, A robustness-based approach to systems-oriented drug design, Nat.
tein kinases. The arabidopsis calcium-dependent protein kinase gene family, Rev. Drug Discov. 6 (2007) 202210.
Plant Physiol. 129 (2002) 469485. [57] W.1. Kolch, M. Calder, D. Gilbert, When kinases meet mathematics: the systems
[18] C.M. Chou, W.C. Lin, J.H. Leu, T.L. Su, C.K. Chou, C.J. Huang, Isolation and identica- biology of MAPK signalling, FEBS Lett. 579 (8) (2005) 18911895.
tion of novel protein kinase genes from the round-spotted puffersh (Tetraodon [58] W. Kolch, A. Pitt, Functional proteomics to dissect tyrosine kinase signalling
uviatilis) genomic DNA, J. Biomed. Sci. 5 (2) (1998), 127134. pathways in cancer, Nat. Rev. Cancer 10 (9) (2010) 618629.
[19] A.J. Bardwell, E. Frankson, L. Bardwell, Selectivity of docking sites in MAPK ki- [59] E.V. Koonin, M.Y. Galperin, Sequence-Evolution-Function: Computational Ap-
nases, J. Biol. Chem. 284 (2009) 1316513173. proaches in Comparative Genomics, Kluwer Academic, 2003.
[20] M.P. Coghlan, D.M. Smith, Introduction to the kinases in diabetes biochemical [60] R. Kumar, S.J. Blakemore, C.E. Ellis, E.F. Petricoin, D. Pratt, M. Macoritto,
society focused meeting: are protein kinases good targets for antidiabetic drugs? A.L. Matthews, J.J. Loureiro, K. Elliston, Causal reasoning identies mechanisms
Biochem. Soc. Trans. 33 (2) (2005) 339342. of sensitivity for a novel AKT kinase inhibitor, GSK690693, BMC Genom. 11
[21] P. Cohen, The role of protein phosphorylation in human health and disease, Eur. (2010) 419.
J. Biochem. 268 (19) (2001) 50015010. [61] J. Lamb, E.D. Crawford, D. Peck, et al. The connectivity map: using gene-
[22] P. Cohen, Protein kinases the major drug targets of the twenty-rst century? expression signatures to connect small molecules, genes, and disease, Science
Nat. Rev. Drug Discov. 1 (4) (2002) 309315. 313 (2006) 19291935.
[23] P. Cohen, D.R. Alessi, Kinase drug discoverywhats next in the eld? ACS Chem. [62] M. Lapins, J.E.S. Wikberg, Kinome-wide interaction modelling using alignment-
Biol. 8 (1) (2013) 96104. based and alignment-independent approaches for kinase description and linear
[24] M.S. Collett, R.L. Erikson, Protein kinase activity associated with the avian sar- and non-linear data analysis techniques, BMC Bioinform. 11 (2010) 339.
coma virus src gene product, Proc. Natl. Acad. Sci. USA 75 (1978) 20212024. [63] J. Lin, Z. Xie, H. Zhu, J. Qian, Understanding protein phosphorylation on a systems
[25] G. Cong, K.L. Tan, K.H. Anthony, T. Xin, X. Xu, Mining top-K covering rule groups level, Brief. Funct. Genom. 9 (1) (2010) 3242.
for gene expression data, in: Proceedings of the 2005 ACM SIGMOD International [64] G. Manning, D.B. Whyte, R. Martinez, T. Hunter, S. Sudarsanam, The protein
Conference on Management of Data, 2005, pp. 670681. kinase complement of the human genome, Science 298 (2002) 19121934.
[26] U. Covani, S. Marconcini, G. Derchi, A. Barone, L. Giacomelli, Relationship be- [65] E.M. Noble Martin, A. Endicott Jane, N. Johnson Louise, Protein kinase inhibitors:
tween human periodontitis and type 2 diabetes at a genomic level: a data-mining insights into drug design from structure, Science 303 (5665) (2004) 1800
study, J. Periodontol. 80 (8) (2009) 12651273. 180519.
[27] E. Defranchi, C. Schalon, M. Messa, F. Onofri, F. Benfenati, D. Rognan, Binding [66] W.G. McKenna, R.J. Muschel, Genes chromosomes, Cancer 38 (2003) 330.
of protein kinase inhibitors to synapsin I inferred from pair-wise binding site [67] F. Milletti, A. Vulpetti, Predicting polypharmacology by binding site similar-
similarity measurements, PLoS One 5 (8) (2010) e12214. ity: from kinases to the protein universe, J. Chem. Inf. Model 50 (2010) 1418
[28] M.J. de Hoon, S. Imoto, J. Nolan, S. Miyano, Open source clustering software, 1431.
Bioinformatics 20 (2004) 14531454. [68] T. Niwa, Elucidation of characteristic structural features of ligand binding sites
[29] H. Dinkel, C. Chica, A. Via, C.M. Gould, L.J. Jensen, T.J. Gibson, F. Diella, Phos- of protein kinases: a neural network approach, J. Chem. Inf. Model. 46 (5) (2006)
pho.ELM: a database of phosphorylation sitesupdate, Nucleic Acids Res. 39 21582166.
(Database issue) (2011) D261D267. [69] C.T. ODushlaine, R.J. Edwards, S.D. Park, D.C. Shields, Tandem repeat copy-
[30] D. Domnguez, B. Montserrat-Sents, A. Virgs-Soler, et al. Phosphorylation reg- number variation in protein-coding regions of human genes, Genome Biol. 6
ulates the subcellular location and activity of the snail transcriptional repressor, (2006) R69.
Mol. Cell Biol. 23 (14) (2003) 50785089. [70] G. Pearson, et al. Mitogen-activated protein (MAP) kinase pathways: regulation
[31] J.T. Dudley, Y. Pouliot, R. Chen, A.A. Morgan, A.J. Butte, Translational bioinfor- and physiological functions, Endocr. Rev. 22 (2001) 153.
matics in the cloud: an affordable alternative, Genome Med. 2 (2010) 51. [71] X.N. Qian, S.H. SZE, B.J. Yoon, An ecient framework based on hidden markov
[32] S.R. Eddy, Non-coding RNA genes and the modern RNA world, Nat. Rev. Genet. models (HMMS) that can be used for nding homologous pathways in a network
2 (2001) 919929. of interest, J. Comput. Biol. 16 (2) (2009) 145157.
[33] M.A. Fabian, W.H. Biggs, D.K. Treiber, et al. A small molecule-kinase interaction [72] P.S. Reddy, H.M. Legault, J.P. Sypek, M.J. Collins, E. Goad, S.J. Goldman, W. Liu,
map for clinical kinase inhibitors, Nat. Biotechnol. 23 (3) (2005) 329336. S. Murray, A.J. Dorner, M. OToole, Mapping similarities in mTOR pathway per-
[34] O. Fedorov, B. Marsden, V. Pogacic, P. Rellos, S. Mller, A.N. Bullock, J. Schwaller, turbations in mouse lupus nephritis models and human lupus nephritis, Arthritis
M. Sundstrm, S. Knapp, A systematic interaction map of validated kinase in- Res. Ther. 10 (6) (2008) R127.
hibitors with Ser/Thr kinases, Proc. Natl. Acad. Sci. USA 104 (51) (2007) 20523 [73] B.A. Rose, T. Force, Y. Wang, Mitogen-activated protein kinase signaling in the
20528. heart: angels versus demons in a heart-breaking tale, Physiol. Rev. 90 (4) (2010)
[35] F. Ferr, A. Palmeri, M. Helmer-Citterich, Computational methods for analysis 15071546.
and inference of kinase/inhibitor relationships, Front. Genet. 5 (2014) 196. [74] N.S. Ruppender, A.R. Merkel, T.J. Martin, G.R. Mundy, J.A. Sterling, S.A. Guelcher,
[36] L. Feuk, A.R. Carson, S.W. Schererc, Structural variation in the human genome, Matrix rigidity induces osteolytic gene expression of metastatic breast cancer
Nat. Rev. Genet. 7 (2006) 8597. cells, PLoS One 5 (11) (2010) e15451.
[37] J.G. Foster, M.D. Blunt, E. Carter, S.G. Ward, Inhibition of PI3K signaling spurs new [75] A.J. Saldanha, Java Treeviewextensible visualization of microarray data, Bioin-
therapeutic opportunities in inammatory/autoimmune diseases and hemato- formatics 20 (2004) 32463248.
logical malignancies, Pharmacol. Rev. 64 (4) (2012) 10271054. [76] S. Salinthone, V. Yadav, R.V. Schillace, D.N. Bourdette, D.W. Carr, Lipoic acid
[38] L.G.D. Fryer, D. Carling, AMP-activated protein kinase and the metabolic syn- attenuates inammation via cAMP and protein kinase a signaling, PLoS One 5
drome, Biochem. Soc. Trans. 33 (2) (2005) 362366. (9) (2010) e13058.
[39] J.M. Goldberg, G. Manning, A. Liu, P. Fey, K.E. Pilcher, Y.J. Xu, J.L. Smith, The dic- [77] A. Sanz-Clemente, J.A. Matta, J.T. Isaac, K.W. Roche, Casein kinase 2 regulates
tyostelium kinomeanalysis of the protein kinases from a simple model organism, the NR2 subunit composition of synaptic NMDA receptors, BMC Bioinform. 10
PLoS Genet. 2 (3) (2006) 02910303. (Suppl. 12) (2010) S6.
[40] Z. Guo, S. Kozlov, M.F. Lavin, M.D. Person, T.T. Paull, ATM activation by oxidative [78] P. Saravanan, S.K. Venkatesan, C.G. Mohan, S. Patra, V.K. Dubey, Mitogen-
stress, Science 330 (6003) (2010) 517521. activated protein kinase 4 of Leishmania parasite as a therapeutic target, Eur. J.
[41] S. Hanks, A. Quinn, T. Hunter, The protein kinase family: conserved features and Med. Chem. 45 (12) (2010) 56625670.
deduced phylogeny of the catalytic domains, Science 241 (1988) 4252. [79] C. Schalon, J.S. Surgand, E. Kellenberger, D. Rognan, A simple and fuzzy method
[42] S.K. Hanks, T. Hunter, Protein kinases 6. The eukaryotic protein kinase superfam- to align and compare druggable ligand-binding sites, Proteins 71 (4) (2008)
ily: kinase (catalytic) domain structure and classication, FASEB J. 9 (8) (1995) 17551778.
576596. [80] B. Schoeberl, C. Eichler-Jonsson, E.D. Gilles, G. Mller, Computational modeling of
[43] H. Jhoti, A.R. Leach, Structure-Based Drug Discovery, Springer, (2007). the dynamics of the MAP kinase cascade activated by surface and internalized
[44] J.H. Havgaard, R.B. Lyngso, G.D. Stormo, J. Gorodkin, Pairwise local structural EGF receptors, Nat. Biotechnol. 20 (4) (2002) 370375.
alignment of RNA sequences with sequence similarity less than 40%, Bioinfor- [81] E. Schwarz, F. Markus Leweke, S. Bahn, P. Li, Clinical bioinformatics for complex
matics 21 (9) (2005) 18151824. disorders: a schizophrenia case study, Neuron 67 (6) (2009) 984996.
156 Q. Chen et al. / Mathematical Biosciences 262 (2015) 147156

[82] D.B. Searls, Using bioinformatics in gene and drug discovery, Drug Discov. Today [92] T. Tian, J. Song, Mathematical modelling of the MAP kinase pathway using pro-
5 (4) (2000) 135143. teomic datasets, PLoS One 7 (8) (2012) e42230.
[83] S. Sengupta, T.R. Peterson, D.M. Sabatini, Regulation of the mTOR complex 1 [93] A. Torkamani, N.J. Schork, Accurate prediction of deleterious protein kinase poly-
pathway by nutrients, growth factors, and stress, Mol. Cell 40 (2) (2010) 310 morphisms, Bioinformatics 23 (21) (2007) 29182925.
322. [94] B. Trost, A. Kusalik, Computational prediction of eukaryotic phosphorylation
[84] L. Shao, J.J. Goronzy, C.M. Weyand, DNA-dependent protein kinase catalytic sub- sites, Bioinformatics 27 (21) (2011) 29272935.
unit mediates T-cell loss in rheumatoid arthritis, EMBO Mol. Med. 2 (10) (2010) [95] P. Verdino, D.A. Witherden, W.L. Havran, I.A. Wilson, The molecular interaction
415427. of CAR and JAML recruits the central cell signal transducer PI3K, Science 329
[85] Y.B. Shan, E. Kim, M.P. Eastwood, R.O. Dror, M.A. Seeliger, D.E. Shaw, How does (5996) (2010) 12101214.
a drug molecule nd its target binding site? J. Am. Chem. Soc. 133 (24) (2011) [96] M. Vieth, et al. Kinomics-structural biology and chemogenomics of ki-
91819183. nase inhibitors and targets, Biochim. Biophys. Acta 1697 (12) (2004) 243
[86] S. Shina, L. Wolgamotta, Y.H. Yub, J. Blenisb, S.O. Yoon, Glycogen synthase kinase 257.
(GSK)-3 promotes p70 ribosomal protein S6 kinase (p70S6K) activity and cell [97] Y.H. Wong, T.Y. Lee, H.K. Liang, et al. KinasePhos 2.0: a web server for iden-
proliferation, Proc. Natl. Acad. Sci. USA 108 (47) (2011) E1204E1213. tifying protein kinase-specic phosphorylation sites based on sequences and
[87] G. Song, H. Zeng, J. Li, L. Xiao, Y. He, Y. Tang, Y. Li, miR-199a regulates the tumor coupling patterns, Nucleic Acids Res. 35 (Web Server issue) (2007) W588
suppressor mitogen-activated protein kinase kinase kinase 11 in gastric cancer, W594.
Biol. Pharm. Bull. 33 (11) (2010) 18221827. [98] X.D. Wu, C.Q. Zhang, S.C. Zhang, Ecient mining of both positive and negative
[88] S. Sookoian, T.F. Gianotti, M. Schuman, C.J. Pirola, ENDEAVOUR software is ap- association rules, ACM Trans. Inform. Syst. 22 (3) (2004) 381405.
plied to prioritize all genes of the whole genome in relation to type 2 diabetes, [99] J.Q. Yang, H. Liu, M.T. Diaz-Meco, J. Moscat, NBR1 is a new PB1 signalling adapter
Genet. Med. 11 (5) (2009) 338343. in Th2 differentiation and allergic airway inammation in vivo, EMBO J. 29 (19)
[89] A. Subramanian, P. Tamayo, V.K. Mootha, et al. Gene set enrichment analysis: a (2010) 34213433.
knowledge-based approach for interpreting genome-wide expression proles, [100] M. Zahid, B.E. Phillips, S.M. Albers, N. Giannoukakis, S.C. Watkins, P.D. Rob-
Proc. Natl. Acad. Sci. USA 102 (2005) 1554515550. bins, Identication of a cardiac specic protein transduction domain by in vivo
[90] K. Takeshita, T. Tezuka, Y. Isozaki, et al. Structural exibility regulates biopanning using a M13 phage peptide display library in mice, PLoS One 5 (8)
phosphopeptide-binding activity of the tyrosine kinase binding domain of Cbl-c, (2010) e12252.
J. Biochem. 152 (5) (2009) 487495. [101] A. Zanin-Zhorov, Y. Ding, S. Kumari, M. Attur, K.L. Hippen, M. Brown, B.R. Blazar,
[91] C.W. Thomas, T.H. David, W.B. Ryan, S.R. Jeffrey, M.K. Robyn, A.G. Elizabeth, S.B. Abramson, J.J. Lafaille, M.L. Dustin, Protein kinase C-theta mediates neg-
H. Lan, B. Pierre, B. Lee, Computational prediction and experimental verication ative feedback on regulatory T cell function, Science 328 (5976) (2010) 372
of new MAP kinase docking sites and substrates including Gli transcription 376.
factors, PLoS Comput. Biol. 6 (8) (2010) 121.

You might also like