Decoding of Exon Splicing Patterns in The Human RUNX1-RUNX1T1

The International Journal of Biochemistry & Cell Biology 68 (2015) 4858
Contents lists available at ScienceDirect
The International Journal of Biochemistry

& Cell Biology
journal homepage: www.elsevier.com/locate/biocel
Decoding of exon splicing patterns in the human RUNX1RUNX1T1

fusion gene
Vasily V. Grinev a, , Alexandr A. Migas b , Aksana D. Kirsanava a , Olga A. Mishkova b ,
Natalia Siomava c , Tatiana V. Ramanouskaya a , Alina V. Vaitsiankova a , Ilia M. Ilyushonak a ,
Petr V. Nazarov d , Laurent Vallar d , Olga V. Aleinikova b
a
Department of Genetics, Faculty of Biology, Belarusian State University, Minsk, Belarus

Laboratory of the Genetic Biotechnology, Department of Research, Belarusian Research Center for Pediatric Oncology, Hematology and Immunology,
Minsk, Belarus
c
Department of Developmental Biology, University of Gttingen, Gttingen, Germany
d
Genomics Research Unit, Luxembourg Institute of Health, Luxembourg
b
a r t i c l e
i n f o
Article history:
Received 1 May 2015
Received in revised form 12 August 2015
Accepted 24 August 2015
Available online 29 August 2015
Keywords:
RUNX1RUNX1T1 fusion gene
Alternative splicing
Data mining
Exons-hubs
Power-law behavior
a b s t r a c t
The t(8;21) translocation is the most widespread genetic defect found in human acute myeloid leukemia.
This translocation results in the RUNX1RUNX1T1 fusion gene that produces a wide variety of alternative
transcripts and inuences the course of the disease. The rules of combinatorics and splicing of exons in the
RUNX1RUNX1T1 transcripts are not known. To address this issue, we developed an exon graph model of
the fusion gene organization and evaluated its local exon combinatorics by the exon combinatorial index
(ECI). Here we show that the local exon combinatorics of the RUNX1RUNX1T1 gene follows a power-law
behavior and (i) the vast majority of exons has a low ECI, (ii) only a small part is represented by exonshubs of splicing with very high ECI values, and (iii) it is scale-free and very sensitive to targeted skipping
of exons-hubs. Stochasticity of the splicing machinery and preferred usage of exons in alternative
splicing can explain such behavior of the system. Stochasticity may explain up to 12% of the ECI variance
and results in a number of non-coding and unproductive transcripts that can be considered as a noise.
Half-life of these transcripts is increased due to the deregulation of some key genes of the nonsensemediated decay system in leukemia cells. On the other hand, preferred usage of exons may explain up
to 75% of the ECI variability. Our analysis revealed a set of splicing-related cis-regulatory motifs that
can explain attractiveness of exons in alternative splicing but only when they are considered together.
Cis-regulatory motifs are guides for splicing trans-factors and we observed a leukemia-specic prole of
expression of the splicing genes in t(8;21)-positive blasts. Altogether, our results show that alternative
splicing of the RUNX1RUNX1T1 transcripts follows strict rules and that the power-law component of
the fusion gene organization confers a high exibility to this process.
2015 Elsevier Ltd. All rights reserved.
1. Introduction
The t(8;21) translocation occurs in 412% of adult and 1230% of
pediatric cases of acute myeloid leukemia (AML) and represents the
most common genetic abnormality in human leukemias (Mller
et al., 2008). The main outcome of the translocation is the fusion
gene RUNX1RUNX1T1, which produces a wide range of different
transcripts (Era et al., 1995; Erickson et al., 1992; Kozu et al., 1993,
Corresponding author at: Department of Genetics, Faculty of Biology, Belarusian

State University, Nezavisimosti Avenue 4, 220030 Minsk, Belarus.
E-mail address: grinev vv@bsu.by (V.V. Grinev).
http://dx.doi.org/10.1016/j.biocel.2015.08.017
1357-2725/ 2015 Elsevier Ltd. All rights reserved.
2005; LaFiura et al., 2008; Lasa et al., 2002; Mannari et al., 2010;
Miyoshi et al., 1993; Nisson et al., 1992; Saunders et al., 1996; Tighe
and Calabi, 1994; Van de Locht et al., 1994; Yan et al., 2006; Zhang
et al., 1997). One part of these transcripts is protein-coding, the
other is non-coding. Both full-length and truncated isoforms of the
fusion protein were also found experimentally. These isoforms are
transcriptional regulators with different activity (Kozu et al., 2005;
LaFiura et al., 2008; Mannari et al., 2010; Yan et al., 2006). It is
believed that RUNX1RUNX1T1 proteins play the critical role in
the initiation and persistence of the t(8;21)-positive AML (Hatlen
et al., 2012).
A large diversity of the RUNX1RUNX1T1 transcripts raises a
question if there is any rule of exon combination and splicing. To
V.V. Grinev et al. / The International Journal of Biochemistry & Cell Biology 68 (2015) 4858
date, only some elements of this puzzle are known. Thus, Tighe and
Calabi (1994) showed that the structure of the breakpoint region of
the fusion gene inuences variety of its transcripts. LaFiura et al.
(2008) found a connection between inclusion of cassette exons
from this region and formation of premature termination codons
(PTCs) in transcripts. At the same time, usage of some other alternative exons does not lead to a PTC but produces active isoforms
of the protein (Mannari et al., 2010; Yan et al., 2006). However,
available data are insufcient for the full understanding of the splicing principles of the RUNX1RUNX1T1 transcripts. Meanwhile, this
knowledge would allow us to clarify organization of the fusion
gene, its properties and functional role in leukemogenesis.
Our goal was to nd out whether there is any pattern in the local
exon combinatorics of the fusion gene. In this article, the term local
exon combinatorics refers to a set of alternative splicing events
generating different mRNA isoforms from a given exon, whereas
exon combinatorial index (ECI) is a quantitative measure of the
local exon combinatorics. Instead of the conventional linear model,
we used an exon graph model of the fusion gene in which the ECI
is an equivalent of the topological index node degree and means a
number of unique splicing events that involve an exon.
Here we show that empirical distribution of ECI values of the
RUNX1RUNX1T1 exons follows a power-law function and has
some specic properties: the vast majority of exons has a low ECI
while a small part is represented by exons-hubs of splicing with
high ECI values, the distribution is scale-free and is sensitive to
targeted skipping of exons-hubs. This distribution is formed by
stochasticity of the splicing machinery and preferred usage of exons
in alternative splicing, where attractiveness of an exon is mostly
determined by a set of sequence-related features. Altogether, our
results show that alternative splicing of the RUNX1RUNX1T1
transcripts follows strict rules and that the power-law component
of the fusion gene organization confers a high exibility to this
process.
2. Materials and methods
2.1. Cell line, patients and healthy donors samples
The t(8;21)-positive AML cell line Kasumi-1 (ATCC CRL2724TM) was obtained from the ATCC (LGC Standards GmbH,
Germany) and cultivated according to the standard protocol.
Twelve young patients with t(8;21)-positive AML were
diagnosed and treated at Belarusian Research Center for Pediatric Oncology, Hematology and Immunology (Minsk, Belarus).
Mononuclear cells were isolated using Histopaque (SigmaAldrich,
St Louis, USA) from patients bone marrow samples obtained before
the treatment and/or at the time of remission.
Bone marrow mononuclear cells (BMMNC) and peripheral blood
mononuclear cells (PBMNC) were obtained from primary material
of healthy donors using Histopaque (SigmaAldrich, St Louis, USA).
CD34+ hematopoietic progenitor/stem cells (HPSC) were isolated
from BMMNC of healthy individuals by magnetic separation with
EasySep Human CD34 Positive Selection Kit (StemCell Technologies
SARL, Grenoble, France). For the further total RNA isolation, we used
only cell samples with purity of CD34+ HPSC 99%.
This study was approved by the institutional ethical committee
and our research team followed the principles of the Declaration of
Helsinki for research involving human subjects.
2.2. cDNA synthesis, standard RT-PCR and real-time PCR
Total cellular RNA was isolated from cells using a TRI Reagent
(SigmaAldrich, St Louis, USA) according to the instruction of the
manufacturer.
49
For cDNA synthesis, we used 1 g of total cellular RNA in the

nal reaction volume of 20 l with Oligo-dT and SuperScript III
Reverse Transcriptase Kit (Life Technologies, Carlsbad, USA). PCR
was performed with Platinum Taq DNA Polymerase Kit (Life Technologies, Carlsbad, USA), 0.30.5 M of each primer and 2 l of total
cDNA as a template. Real-time PCR was performed in duplicates
on StepOnePlus Real-time PCR System (Life Technologies, Foster
City, USA) using QuantiTect SYBR Green PCR Kit (Qiagen GmbH,
Hilden, Germany) in 12.5 l volume with 0.3 M of each primer
and 1 l of diluted 1:2 total cDNA (nal dilution 1:25) as a template. Abundance of target transcripts was normalized relative to
the expression level of the TBP gene, coding a TATA box binding
protein, as previously described (Migas et al., 2014) and quantied
according to the Ruijters approach (Ruijter et al., 2009).
2.3. cDNA library
Single stranded cDNA from leukemia blasts was converted
into double-stranded cDNA and amplied by primers specic
to annotated 5 UTRs and 3 UTRs of the RUNX1RUNX1T1 gene
in standard PCR. PCR products were ligated into the pTZ57R/T
cloning vector (Thermo Scientic, Lithuania) that was used for the
further transformation of XL1-Blue Escherichia coli strain. Recombinant DNA was puried from the positive clones and sequenced.
Obtained sequences were aligned against human reference genome
GRCh37/hg19 by BLAT (Karolchik et al., 2014), exon structure of
transcripts was described, and new variants were deposited in GenBank (Supplementary Table 1).
2.4. DNA gel-electrophoresis, purication and sequencing
Amplicons were separated in 12% agarose gel and extracted
with QIAquick Gel Extraction Kit (Qiagen GmbH, Hilden, Germany).
Capillary DNA gel-electrophoresis was performed with Agilent
2100 Bioanalyzer (Agilent Technologies, Santa Clara, USA) using
DNA 7500 Kit according to the protocol of the manufacturer.
Sequencing reaction was performed using BigDye v3.1 Terminator Cycle sequencing Kit (Applied Biosystems, Austin, USA).
Products of the reaction were cleaned up with ethanol precipitation and analyzed on 3130 Genetic Analyzer (Hitachi, Tokyo, Japan)
according to the standard procedure.
2.5. Exon graph reconstruction and manipulations
Exon graph of the RUNX1RUNX1T1 gene was reconstructed
according to the previously described approaches (Heber et al.,
2002; Majoros et al., 2014). The ECIs, the shortest distances of exons
in an exon graph, Kleinbergs authority scores and the assortativity coefcient were calculated by R/Bioconductor package igraph
v.0.6.5-2 (Csardi and Nepusz, 2006).
Kleinbergs authority score is a local topological index that indicates whether there is a tendency for splicing of exons with high ECI
values together (Kleinberg, 1999; Newman, 2003). Potential clustering of exons by Kleinbergs authority score was evaluated by
k-means method implemented in R. We used Akaike and Bayesian
information criteria to identify the optimal number of clusters and
R/Bioconductor package ConsensusClusterPlus v.1.22.0 (Wilkerson
and Waltman, 2015) with 1000 subsamples to investigate the consensus between the clusters.
The assortativity coefcient is a global characteristic of an exon
graph (Newman, 2002). If this coefcient is 1, the graph is perfectly assortative and exons strongly prefer splicing with the similar
exons (in terms of ECI values). Otherwise, when the coefcient is
1, the graph is completely disassortative and exons with high
ECI values are spliced with exons with low ECIs and vice versa.
50
Finally, in the absence of any preferences for splicing, the graph

is non-assortative and the coefcient is 0.
an implementation of Circos plot in R package circlize v.0.2.5 was

used (Gu, 2015; Krzywinski et al., 2009).
2.6. Fitting of statistical models to empirical data
2.10. Modeling of the exon skipping
Initially, we rejected those statistical models that clearly do

not t the empirical distribution and obtained ve closest models: power-law, power-law with exponential cut-off, exponential,
stretched exponential (or complementary cumulative Weibull) and
log-normal distributions. Then, we tted selected statistical models
to the empirical distribution according to xmin paradigm (Clauset
et al., 2009). Finally, goodness-of-t test, log-likelihood ratio test,
KolmogorovSmirnov test and Akaike and Bayesian information
criteria were used to assess the plausibility of the statistical hypothesis and for the direct comparison of alternative statistical models
(Clauset et al., 2009; Klaus et al., 2011; Vuong, 1989).
For this kind of analysis, we used a general approach developed

by Trajanovski et al. (2013). Theoretically expected exon graphgenerated transcripts were identied with the full crawl of the
graph. In order to produce stable and reproducible results, 1000
simulations were made for each fraction of the skipped exons.
2.7. Identication of the signicant open reading frames (ORFs)

and PTC in transcripts
Primarily, all possible ATG-ORFs were identied in transcript(s)
of interest. Next, for each empirical transcript, 100 random
sequences with the same length were generated using a multinomial model (Ababneh et al., 2006). This new set of articial
transcripts was used to identify of ORFs. Finally, 99th percentile
of the distribution formed by lengths of articial ORFs was used
as a threshold for identication of the true ORF(s) in the empirical transcript. Transcripts with no signicant ORFs were classied
as non-coding. To identify PTCs, exonic structure and coordinates
of ORF(s) in the transcript of interest were matched. A transcript
was annotated as PTC-containing if the end of its ORF was localized
upstream of the last exonexon junction in the transcript.
2.8. Development of a short-list of the most important
nonsense-mediated decay (NMD) and splicing genes
We used a three-step approach to select genes into a short-list.
First, we downloaded microarray data for t(8;21)-positive AML and
normal hematopoietic cells (Supplementary Table 2) from NCBI
GEO repository (Barrett et al., 2011). We used this set of microarrays for two-class differential gene expression analysis with limma
v.3.22.1 (Smyth, 2005) and selected genes with at least 2-fold statistically signicant difference in expression. Second, we used a set
of differentially expressed genes from the rst step and leukemia
microarray data to reconstruct a gene regulatory network with
ARACNE2 algorithm (Margolin et al., 2006). For genes from this
network, we calculated combined centrality scores (del Rio et al.,
2009). Finally, we functionally annotated top-scored genes from the
second step and selected only hub-like entities into the nal shortlist. This three-step approach allowed us to focus only on the most
interesting NMD and splicing genes and to verify their differential
expression by real-time PCR in limited clinical material.
2.9. Data mining by regression random forests
All important sequence features were selected with Boruta
v.3.1.0 (Kursa and Rudnicki, 2010). Machine learning was carried
out with package randomForest v.4.6-7 (Breiman et al., 2013; Liaw
and Wiener, 2002) in regression forests mode for nonlinear multiple regression. Importance of each feature was determined via
calculation of the mean decrease in accuracy of ECI value prediction after random permutation of the original values of the feature.
Accuracy of the prediction was evaluated by Spearmans between
empirical and predicted values of the ECI and by the coefcient of
determination. For integrated representation of the complex data,
3. Results
3.1. The RUNX1RUNX1T1 gene is a source of unprecedented
diversity of mRNA products
To reconstruct the exon graph, we created a comprehensive collection of transcripts of the fusion gene. We identied 102 unique
full-length transcripts and 8 unique expressed sequence tags (ESTs)
of the gene of interest in PubMed, GenBank and ChimerDB 2.0
databases (Benson et al., 2013; Era et al., 1995; Erickson et al., 1992;
Kim et al., 2010; Kozu et al., 1993, 2005; LaFiura et al., 2008; Lasa
et al., 2002; Mannari et al., 2010; Miyoshi et al., 1993; Nisson et al.,
1992; Saunders et al., 1996; Sayers et al., 2012; Tighe and Calabi,
1994; Van de Locht et al., 1994; Yan et al., 2006; Zhang et al., 1997).
In these sources, exon structure of all transcripts was described,
but the nucleotide sequence of some rare and unique exons was
not published. Therefore, we were able to fully reconstruct the
nucleotide sequence for 61.8% of full-length transcripts, and the
sequence of remaining transcripts was restored only partially.
To complete our collection, we created a cDNA library. The
library is based on cDNA from bone marrow samples of 12 young
patients with t(8;21)-positive AML (Supplementary Table 3) and
Kasumi-1 cells. For cDNA amplication, we used forward primers
directed to 5 UTR exons 1, 4a/4b, 7a/7c, 7d, 8a and 11a and reverse
primers directed to 3 UTR exons 12a, 15a, 17a and 17 of the fusion
gene. We also used primers specic to internal exons to amplify
rare and poorly detected transcripts (Fig. 1; Supplementary Tables
4 and 5).
In our cDNA library, we identied 33 new full-length and 55
short EST-like transcripts (Supplementary Table 1). This helped us
to expand signicantly the list of known transcripts of the fusion
gene: current collection includes 135 full-length and 63 EST-like
sequences. From 55 newly found ESTs, 30 sequences matched the
full-length transcripts of the fusion gene only partially. It means
that in t(8;21)-positive leukemia exists a subset of rare or hardly
amplied full-length transcripts that were not identied so far.
3.2. Power-law behavior of the local combinatorics of the
RUNX1RUNX1T1 exons
To nd out the character of the local exon combinatorics, we
developed an exon graph of the fusion gene organization. This exon
graph is based on full-length transcripts and includes 99 exons
connected by 163 splicing events (Fig. 2A).
We quantied the exon usage in different alternative splicing
events by the exon graph topology analysis and expressed this
metric with ECI values. This index falls in the range from 1 to 34
with high standard deviation of 5.1. Visual inspection of the ECI
value distribution lead us to the hypothesis that this index follows
a power-law function. To test this hypothesis, we used a threestep approach (Section 2) based on the mathematical formalism
of (Clauset et al., 2009; Virkar and Clauset, 2012). Our statistical
tests supported the power-law model y = x2.31 of the observed
distribution (Fig. 2B).
51
Fig. 1. A set of primers specic for terminal or internal exons of human RUNX1 and RUNX1T1 genes was used for the RUNX1RUNX1T1 gene cDNA library construction.
Analysis of the cumulative distribution shows that approximately 80% of exons of the RUNX1RUNX1T1 gene have a small ECI
value 3. This exon group represents cassette (mainly UTR exons
and exons from the breakpoint region) and constitutive (most of
the exons from 3 -RUNX1T1 part of the fusion gene) exons that are
not involved in alternative splicing.
At the same time, about 20% of the remaining exons have high
combinatorial index 4. These exons form a heavy right tail of
the empirical distribution. They are constitutive and are widely
used in alternative splicing as the exons of this group account
for about 80% of the total diversity of splicing events occurred in
the fusion gene transcripts. Noteworthy, exons 5, 6, 8b, 9, 10 and
11 are the most interesting: about 64% of the diversity of splicing events occurs involving these exons. Herewith, exons 5 and 6
encode almost entire DNA binding Runt homology domain RHD
of the RUNX1RUNX1T1 protein (Meyers et al., 1993) and exon 8
encodes a polypeptide bridge that connects RUNX1 and RUNX1T1
parts of the fusion protein. As for exons 9, 10 and 11, they encode
the rst conservative domain NHR1 from the RUNX1T1 part of the
RUNX1RUNX1T1 protein (Davis et al., 2003).
To clarify the relationship between the two groups of exons
mentioned above, we evaluated splicing preferences of these exons
by Kleinbergs authority score and the assortativity coefcient. We
found that according to the authority score all exons can be grouped
into three stable clusters with consensus higher than 0.95. The rst
cluster included exons with extremely low authority score between
4.4e18 and 5.4e2 (dark-green balls, Fig. 2C), the second cluster was composed of exons with moderate authority score ranging
from 6.6e2 to 0.3 (red balls, Fig. 2C) and, nally, the outlying exon
8b was always considered as the third cluster (blue ball, Fig. 2C).
Herewith, the second cluster is represented by exons with ECI values ranging from 2 to 31 (mean 4.4) that is on average 2.1 times
higher (p = 0.0006, MannWhitney U test) than for exons of the
rst cluster with ECI values ranging from 1 to 9 (mean 2.1). Despite
this, the assortativity coefcient for the whole exon graph is 0.38,
which is apparently due to a signicant predominance of the exons
Fig. 2. The local combinatorics of RUNX1RUNX1T1 exons follows a power-law behavior. (A) Exon graph of the RUNX1RUNX1T1 gene. Exons were clustered into 23 groups
(E) based on the genomic origin and/or overlapping of sequences. For each group, a well-known reference exon is shown in parentheses. (B) The power-law behavior of
the local combinatorics of RUNX1RUNX1T1 exons is supported by statistical tests on plausibility. The power-law function is good tted (red dashed line) to the heavy
right tail of empirical data (blue diamonds) and has the lowest KolmogorovSmirnov distance D and the highest bootstrap p-value among competing statistical models (see
Section 2). (C) Exons can be grouped into three stable clusters based on Kleinbergs authority score. However, most of exons have extremely low or moderate values of the
authority score. (For interpretation of the references to color in this gure legend, the reader is referred to the web version of this article.)
52
1.0
0.8
1
3
4
ECI value
Cumulative probability
0.6
0.4
30
20
10
0.2
0.0
0.0
40
0.2
0.4
0.6
0.8
1.0
Normalized ECI value
0
-4
-2
Position of exon in transcripts
Fig. 3. Stochastic noise in the splicing machinery and the positional distribution of exons in transcripts make a minor contribution to the variance of ECI values. (A) There
is a clear and signicant difference between the cumulative curve of the empirical ECI values (line 1) and the theoretical cumulative curve for a random exon graph (line 2)
(p = 1.4 108 , MannWhitney U test). No signicant difference was found between empirical and non-coding (line 3) or unproductive (line 4) noise corrected cumulative
distributions (p > 0.05, MannWhitney U test). Distributions were normalized to their max values. (B) Exons with high ECI values tend to occupy a position close to the center
of transcripts that include the exon of interest. In this gure, the corresponding position of an exon that is close to the 5 end (left to the center, indicated by 0) is displayed
by a negative value, while an exon close to the 3 end is indicated by a positive value.
with a moderate or low authority score and low ECI values in the
graph.
3.3. Stochasticity makes a minor contribution to the variance of
ECI values
The power-law distribution cannot result only from random
splicing of the RUNX1RUNX1T1 exons. Thus, cumulative distribution of the empirical ECI values is clearly and signicantly
different from the theoretical curve for a random graph (Fig. 3A).
Nevertheless, we evaluated contribution of randomness to the
local combinatorics of the RUNX1RUNX1T1 exons because it is
an important source of diversity of alternative splicing events in
human transcriptome (Melamud and Moult, 2009; Pickrell et al.,
2010).
For this purpose, we rst identied noise splicing events that
lead to the formation of non-coding transcripts or unproductive
transcripts with a PTC. These two categories of noise account
for about 13% and 26% of the splicing events diversity in the
RUNX1RUNX1T1 transcripts, respectively. However, the empirical cumulative distribution of ECI values becomes slightly different
only after correction for unproductive splicing but not after correction for splicing events that lead to non-coding transcripts (Fig. 3A).
Additionally, we evaluated the relationship between position
of an exon in transcripts and its ECI value. We performed this
analysis because the fusion gene is characterized by a large variety of cassette UTR exons and exons from the breakpoint region.
We expected that such organization of the gene gives a chance to
the nearest constitutive exons to get a high rank ECI. However, we
found only a moderate correlation between the positional distribution of the exons and the distribution of their ECI values ( = 0.455,
p = 2.2 106 ; Fig. 3B).
From a random forests-based nonlinear multiple regression, we
found that the noise splicing and the positional chance explained
not more than 12% of the ECI variance. Therefore, stochasticity is
only a minor factor in formation of the ECI value.
3.4. Deregulation of the NMD genes in leukemia cells may explain
a high abundance of unproductive RUNX1RUNX1T1 transcripts
In our dataset, about 38% of mRNA molecules are PTC-containing
transcripts. Although these transcripts are potential targets for
NMD system, their expression remains at relatively high level.
For example, inclusion of exon 15a as an internal exon (amplicon
exons 15a-15, Fig. 4A) always leads to formation of transcripts
with a PTC, which expression is comparable with that of some
transcripts without PTC (for instance, mRNAs with termination in

exon 17a; amplicon exons 16-17a, Fig. 4B).
High frequency of transcripts containing PTC suggests that there
can be a dysfunction of the NMD system in t(8;21)-positive AML.
To check this hypothesis, we developed a shortlist of the most
important NMD genes that are responsible for different steps of
decay of PTC-containing mRNA molecules. Real-time PCR conrmed differential expression of some of these genes in leukemia
cells comparing to the normal CD34+ HPSC, BMMNC and PBMNC
(Fig. 4C). In particular, we found a disbalanced expression of
some key components of the exon junction complexes (EJCs) in
leukemia blasts: CASC3 gene was 3.23.9-fold downregulated,
whereas MAGOH and RBM8A genes were from 1.6 to 3.3 times
upregulated. Herewith, it was shown that the MAGOH-RBM8A heterodimer through interaction with EJCs regulator WIBG/PYM leads
to the disassembly of EJCs in the cytoplasm and enhances translation of EJCs-bearing spliced mRNAs by recruiting them to the
ribosomal 48S preinitiation complex (Gehring et al., 2005). Another
observation was that the expression of SMG1 and UPF2 genes, coding important components of the NMD machinery, is signicantly
reduced (on average 1.64.4 times) and the expression of UPF1
gene, coding the key effector of the whole NMD process, tends to
decrease with statistical signicance observed only in the comparison with BPMNC. There was also a 1.85.0-fold decrease in
the expression of UPF3A (comparing to BMMNC and PBMNC) and
GSPT1 (when compared to CD34+ HPSC and BMMNC) genes, coding proteins responsible for the recruitment of UPF1 to ribosomes
stalled on PTC-containing mRNAs. Similar decrease was found for
SMG5, SMG6 and SMG7 genes, coding downstream effectors that
are involved in degradation of transcripts marked for NMD. Finally,
expression of DCP1B (comparing only to CD34+ HPSC) and DCP2
(comparing to all types of the normal hematopoietic cells) genes,
coding core components of the mRNA decapping complex, was
diminished from 2.4 to 6.5 times.
In addition, we observed a signicant correlation between the
expression of NMD genes and some of mRNA isoforms of the fusion
gene (Fig. 4D). Altogether, these results indicate that NMD genes
from different steps of decay of PTC-containing mRNAs have a specic expression prole in t(8;21)-positive AML that presumably
contributes to the diversity of RUNX1RUNX1T1 transcripts.
3.5. Different attractiveness of the RUNX1RUNX1T1 exons for
alternative splicing is associated with sequence-related features
The simplest explanation for the observed power-law distribution is a preferential attachment (Albert and Barabasi, 2002). In
53
Fig. 4. Activity of the NMD system in children t(8;21)-positive AML cells is deregulated. (A) In NMD study, ve different RUNX1RUNX1T1 cDNA-based amplicons were
quantied by real-time PCR. Quantity of the amplicon exons 15a-15 indicates the expression level of transcripts comprising exon 15a as an internal exon. When exon 15a
is used as an internal exon, it introduces a PTC in the mature transcript. (B) According to real-time PCR and statistical analysis, RUNX1RUNX1T1 mRNA isoforms containing
exons 11-12a, 15a, 15a-15, 16-17a or 16-17 are differentially expressed in leukemia cells. Herewith, expression level of transcripts with internal exon 15a is similar to
transcripts with exons 16-17a, which do not include a PTC (p = 0.79, MannWhitney U test). However, it is assumed that exon 15a can be not only an internal but also a
3 UTR exon. In particular, the overall expression level of transcripts containing exon 15a is signicantly higher than level of the PTC-containing transcripts with exons 15a-15
(p < 0.001, MannWhitney U test). (C) Expression of NMD genes is signicantly increased or decreased in leukemia cells in comparison with normal hematopoietic cells.
Statistical signicance of differences was conrmed with MannWhitney U test. (D) For some mRNA isoforms, we found a strong correlation with expression of NMD genes.
#16 and ESRS hexamers in the downstream neighboring exons

and their conservatism (Supplementary Table 7). Importantly, we
observed a clear prevalence of positive correlations over negative
ones between values of selected features and the ECI value.
1.0
empirical
Cumulative probability
the context of splicing, preferential attachment corresponds to the

preferred use of some exons by the alternative splicing machinery and the ECI serves as a measure of an exon attractiveness for
this process. Our in silico experiments conrmed plausibility of this
hypothesis: cumulative distribution of the empirical ECI values is
similar to the theoretical curve based on the model of graph with
preferential attachment (Fig. 5).
We assumed that attractiveness of some RUNX1RUNX1T1
exons in alternative splicing may be caused by cis-regulatory elements. To verify this hypothesis, we assembled a compendium of
2801 sequence features and subdivided them into three classes
(Fig. 6A; Supplementary Table 6). By regression random forests,
we found that 221 entities out of 2801 sequence features are signicant in determination of the total-ECI (sum of all alternative
splicing events that involve the exon) or its decomposed variants
the in-ECI (sum of all 5 alternative splicing events that involve the
exon) and the out-ECI (sum of all 3 alternative splicing events that
involve the exon). Herewith, only 32 features are common for all
three types of the ECI (Fig. 6B, Venn diagram).
All selected features have different importance for prediction
of the ECI (Fig. 6B, line plot; Fig. 6C). The most important were
regional counts of GC- and AT-rich short motifs in exons and anking introns, strength of the splice sites of target exons, frequency
of RESCUE ESE hexamers, PESE octamers, motifs MAHCE #03 and
MAHCE #10 in upstream neighboring exons, Yeos ISREs in anking 3 introns of the upstream neighboring exons, motif MAHCE
0.8
power 0.8
power 1
0.6
power 1.1
0.4
power 1.2
power 1.6
0.2
power 1.4
0.0
0.0
0.2
0.4
0.6
0.8
1.0
Normalized ECI value

Fig. 5. The power-law behavior of RUNX1RUNX1T1 exons during the splicing process may be explained in a frame of BarabasiAlbert preferential attachment model.
The gure shows that the cumulative curve of the empirical ECI values is close
to the theoretical cumulative curve for the exon graph with preferential attachment and power of attractiveness near to 1.1. The result is based on one thousand
simulations for each theoretical curve with a different power of the attractiveness.
54
Fig. 6. Sequence features of human RUNX1RUNX1T1 exons and anking introns may determine the value of the ECI. (A) All sequence features were extracted from three
classes of mRNA structure elements. The rst class includes features of the target exon (exon of interest; SETE ) and its anking 5 (S5TE 1 ) and 3 (S3TE 1 ) intronic sequences. The
USE

second class contains features of the upstream rst neighboring exons (SEUSE ) and their anking 5 (S5USE
1 ) and 3 (S3 1 ) intronic sequences. Finally, the third class includes
DSE

features of the downstream rst neighboring exons (SEDSE ) and corresponding anking 5 (S5DSE
1 ) and 3 (S3 1 ) intronic sequences. (B) Sequence features are not equal in
importance for the prediction of the ECI value. The important features were ranked according to the mean decrease in the accuracy of the ECI value prediction after the
random permutation of the original feature values. An insertion of Venn diagram shows an overlap between the selected important features for the three types of the ECI.
(C) A complex relationship between the sequence features and the ECI value. None of the sequence features can reliably predict the ECI value. Such predictions can be made
on a compendium of features. The inner track of Circos plot includes sectors of combined set of features that were selected as signicant in prediction of the value of the in-,
out- and/or total-ECI. Width of each sector is proportional to the strength of the corresponding feature effect on the ECI value. Positive or negative character of this effect was
inferred from the correlation analysis. The outer track of the plot contains features of different subclasses. (D) Our compendium of the sequence features permits to predict
the values of the ECI by regression random forests with a high accuracy. The line plot demonstrates a binned distribution of Spearmans between the real values of the
ECI from the test subset of empirical data and the predicted values. This plot is based on 1000 simulations of the original and randomly permutated ECI values. Lines 1 and
1 represent the original and permutated total-ECI, lines 2 and 2 show the original and permutated in-ECI, and lines 3 and 3 indicate the original and permutated out-ECI,
respectively.
Model experiments demonstrated that selected features permit to predict the ECI value with high accuracy. For instance, the
median of Spearmans between values predicted by the trained
algorithm and empirical values of the total-ECI is 0.86 (Fig. 6D),
and the adjusted coefcient of determination equals to 0.75. We
observed the same results for in- and out-ECIs (Fig. 6D). Altogether,
our data provide an evidence that sequence features and the ECI
value of the RUNX1RUNX1T1 exons are closely interrelated.
3.6. Differential expression of splicing genes correlates with

abundance of the RUNX1RUNX1T1 isoforms
It is well known that cis-regulatory motifs serve as guide marks
for splicing trans-factors (Wang et al., 2012). Therefore, theoretically discovered interconnection between a sequence feature and
an exon splicing may be only a statistical phenomenon if cells lack
the corresponding trans-acting protein. To verify some theoretical
achievements from the above mentioned data mining, we developed a short list of the most important splicing genes and evaluated
their expression by real-time PCR.
The most interesting observation was related to the expression of the RBFOX3 gene. This gene is not expressed or expressed
under the threshold of the real-time PCR sensitivity in normal
hematopoietic cells. However, both qualitative and quantitative
analyses conrmed its expression in t(8;21)-positive leukemia cells

(Fig. 7A and B). Moreover, we found RBFOX3 binding sites in anking introns of some RUNX1RUNX1T1 exons. Frequency of these
sites was selected as an important feature by the regression random forests algorithm (Supplementary Table 7), and we observed a
signicant correlation between expression of the RBFOX3 gene and
expression of some mRNA isoforms of the fusion gene in leukemia
cells (Fig. 7D).
The differential expression and the signicant correlation were
conrmed for other splicing genes as well, in particular, for SRSF6,
RBM25, PTBP1 and TIA1 genes (Fig. 7C and D). Therefore, a number
of splicing-related genes are differentially expressed in t(8;21)positive leukemia cells. This fact may contribute to the diversity
of mRNA products of the fusion gene.
3.7. Exons with high ECI values are hot points of the
RUNX1RUNX1T1 mRNA splicing
A power-law graph is highly sensitive to targeted attacks against
important vertices (Iyer et al., 2013; Schneidera et al., 2011). The
RUNX1RUNX1T1 exon graph has a power-law component and it
may have the same property. To check this hypothesis, we modeled
a skipping of exons by the splicing system and an outcome of such
a skip was evaluated with ve metrics (Fig. 8).
55
Fig. 7. Genes of splicing factors differentially expressed in t(8;21)-positive AML blasts. (A) The RBFOX3 gene is not expressed or expressed under the threshold of detection
by RT-PCR in normal hematopoietic cells but this gene is expressed in leukemia cells. The lanes on the upper electrophoregram: Fermentas GeneRulerTM 100 bp DNA Ladder
Plus (1), amplication of cDNA of the TBP gene from Kasumi-1 cells (2), amplication of cDNA of the RBFOX3 gene from normal PBMNC (3, 5), BMMNC (7, 9), CD34+ HPSC
(11, 13) and from Kasumi-1 cells (15) and amplication of cDNA of the RBFOX3 gene from respective RT negative controls (4, 6, 8, 10, 12, 14, 16). The lanes on the bottom
electrophoregram: Fermentas GeneRulerTM 100 bp DNA Ladder Plus (1), amplication of cDNA of the RBFOX3 gene from the bone marrow samples of nine children with
t(8;21)-positive AML (2, 4, 6, 8, 10, 12, 14, 16, 18) and respective RT negative controls (3, 5, 7, 9, 11, 13, 15, 17, 19). (B) Real-time PCR conrms the differential expression
of the RBFOX3 gene in normal and malignant hematopoietic cells. Expression of the RBFOX3 gene was normalized relative to the expression of the TBP gene, and then
re-normalized to the expression of this gene in Kasumi-1 cells. The picture shows an averaged expression of the RBFOX3 gene in 4 samples of normal CD34+ HPSC, 5 samples
of normal BMMNC, 5 samples of normal PBMNC and 9 bone marrow samples of children with t(8;21)-positive AML. (C) There is a signicant (according to MannWhitney U
test) differential expression of the splicing factors genes in leukemia cells in comparison with normal hematopoietic cells. (D) Correlation between expression of the splicing
factors genes and mRNA isoforms of the RUNX1RUNX1T1 gene.
We found that targeted skipping of exons with the top ranked

ECI values leads to a very rapid drop in values of all ve metrics. At
the same time, we observed a rather slow decline of metrics values
when low ECI exons were skipped, and that was proportional to
the fraction of excluded exons. Herewith, the above observations
were applied to both experimentally detected and theoretically
possible transcripts that can be generated by exon graph. Interestingly, a set of expected transcripts includes 43,486 entities of which
experimentally veried transcripts represent only 0.3%. Altogether,
these results indicate that the power-law component of the fusion
gene organization confers a high exibility to alternative splicing
of RUNX1RUNX1T1 transcripts.
4. Discussion
In this work, we showed that local combinatorics of the
RUNX1RUNX1T1 exons follow a power-law behavior. This behavior is also typical for exons of normal RUNX1 and RUNX1T1 genes
and for the whole set of exons of human transcriptome (data not
shown).
The observed power-law distribution has four key properties.
First, the vast majority of exons has low values of the ECI. These
exons are mostly represented by constitutive exons encoding conserved RUNX1T1 domains of the fusion protein, UTR exons and
cassette exons from the breakpoint region. In fact, the NLS-NHR2NHR3-NHR4 coding part of the fusion gene (from 3 -end of exon
11 to 5 -end of exon 17) is the most constant in terms of splicing.

This is consistent with the empirical data showing that RUNX1T1
domains are very important for the fusion protein function and any
alternative splicing events in this area may cardinally change activity of the protein (Park et al., 2009; Sun et al., 2013; Yan et al., 2006).
At the same time, a high diversity and a low individual abundance
in transcripts may be the main reasons why UTR exons and exons
from the breakpoint region are fallen in a group with low values of
the ECI.
Second, a small part of exons have high ECI values. This part
includes constitutive exons-hubs that participate in different
splicing mechanisms and mostly contribute to alternative splicing.
Thus, about 80% of the splicing events with exon 5 use alternative
5 splice sites of 5 UTR exons. At the same time, about 70% of the
splicing events involving exons 6 and 8b use alternative 5 or 3
splice sites of cassette exons from the breakpoint region. Most of
these exons were found by LaFiura et al. (2008), we identied only
two new sequences from the breakpoint region. Perhaps, this is due
to patient specicity and rarity of such exons.
It is interesting to note that most of the exons with high ECI values belong to the second and the third clusters based on Kleinbergs
authority score. Moreover, these exons encode Runt homology
domain RHD, NHR1 domain and the polypeptide bridge, uniting
RUNX1- and RUNX1T1-parts of the fusion protein. Herewith, the
RHD domain is responsible for specic DNA binding and the NHR1
domain provides heterodimerization of the fusion protein with
56
20 40 60 80 100
Fraction of skipped exons
Normalized value of metric
1.0
1.0
1.0
20 40 60 80 100
B
1.0
A
0.8
0.6
0.4
0.2
0.0
0.8
0.6
0.4
0.2
0.0
0.8
0.6
0.4
0.2
0.0
20 40 60 80 100
20 40 60 80 100
0.8
0.6
0.4
0.2
0.0
Legend:
diversity of transcripts
average size (in number of exons) of transcripts
average length (in number of nucleotides) of transcripts
average length of ORF
portion of transcripts containing PTC
Fig. 8. In silico modeling supports a strong sensitivity of splicing of RUNX1RUNX1T1 transcripts to skipping of exons with high ECI values. (A) Skipping of exons that were
listed in the descending order of their ECI values: experimentally veried transcripts (on the top), predicted transcripts (on the bottom). (B) This picture is similar to (A), but
exons were excluded from splicing process in the ascending order of values of their ECI.
other transcriptional regulators (Hug and Lazar, 2004; Tahirov et al.,

2001; Zhang et al., 2004). Consequently, intense alternative splicing of these exons can effect DNA-binding activity of the fusion
protein(s), ways of RUNX1- and RUNX1T1-parts combination and
ability to form multimeric regulatory complexes.
Third, the ratio between the number of exons with low and high
values of the ECI is constant because the power-law distribution is
scale-free (Newman, 2005). In fact, this ratio did not change and
exons did not alter ranks when we expanded the RUNX1RUNX1T1
exon graph by ESTs ( = 0.99, p < 0.001). The scale-free property also
means that exons with high values of the ECI have the highest probability to undergo alternative splicing generating mRNA isoforms
of the fusion gene not identied so far.
Fourth, the power-law distribution is an indication that the system is close to the critical point of a phase transition (Newman,
2005; Stauffer, 2012). Being near the critical point, the system can
quickly adapt itself to changing environmental conditions due to
the redundant inner diversity (Nykter et al., 2008). Our in silico
experiments showed that the fusion gene has a hidden potential to produce different mRNA isoforms. Moreover, involving or
skipping of the exons with high ECI values during splicing process
can dramatically change diversity of RUNX1RUNX1T1 transcripts.
This observation means that splicing of primary RUNX1RUNX1T1
transcripts is very exible and may quickly change.
We believe that the observed distribution is an outcome of
several overlapping mechanisms. A part of the distribution could
be explained by stochasticity in the splicing machinery. Artifacts

caused by sample processing (Dvinge et al., 2014), template switching of reverse transcriptase during cDNA synthesis (Houseley and
Tollervey, 2010) and possible misannotation of transcript variants
may inuence values of the ECI. Nevertheless, the main mechanism of the power-law distribution is a preferential attachment
(Albert and Barabasi, 2002). In the context of splicing, preferential attachment means a preferred usage of an exon in alternative
splicing and ECI serves as a measure of attractiveness. Data mining revealed that attractiveness of exons is determined by a
set of sequence-related features. Among these features, there are
well known cis-regulators of splicing as well as our new predicted
motifs. It is interesting that none of these features has a high importance in determination of the ECI value, and only together they may
explain up to 75% of ECI variance. We assume that the observed
redundancy of important sequence features is necessary for relatively robust splicing of the RUNX1RUNX1T1 transcripts.
Cis-regulatory motifs are just attachment sites for trans-factors
of splicing. As a result, deregulated expression of splicing factor genes may also contribute to the variance of the ECI. In fact,
we observed a statistically signicant differential expression of
splicing genes in t(8;21)-positive AML cells. Apparently, deregulated expression of splicing genes is a frequent situation for
leukemias (Maciejewski and Padgett, 2012) and this should lead
to the establishment of a new leukemia-specic conguration of
the intracellular network that regulates the splicing process.
Finally, dysfunction of the NMD system in t(8;21)-positive

AML may be an additional cause that contributes to the diversity of mRNA molecules originated from RUNX1RUNX1T1 gene.
Indeed, in leukemia cells, we observed down- or overexpression
of genes coding the key regulators and effectors of the NMD
process (Kashima et al., 2006; Kervestin and Jacobson, 2012;
Nicholson et al., 2010). This may lead to the increased survival
of the PTC-containing mRNA molecules and expansion of the
observed diversity of these molecules. Moreover, misspliced mRNA
molecules can be translated into truncated RUNX1RUNX1T1 proteins, which apparently are typical for t(8;21)-positive leukemia
cells and play essential role in leukemogenesis (LaFiura et al., 2008;
Mannari et al., 2010; Migas et al., 2014; Yan et al., 2006).
Thus, our results shed light on some patterns of local combinatorics of RUNX1RUNX1T1 exons during splicing of primary
transcripts. At the same time, these results give rise to new questions. In particular, we used a model that does not allow to establish
patterns of the global exons combinatorics at the entire transcripts
formation. Extremely interesting questions concern the obvious
discrepancy between the number of transcripts that can be predicted by RUNX1RUNX1T1 exon graph versus the number of
experimentally veried transcripts, and identication of hidden
factors that could explain the remaining 13% of ECI variance.
Additionally, our results indicate that DNA-tropic systems, like
CRISPR/Cas, should be more efcient than RNA interference for
the control of the fusion gene expression in functional genomics
and experimental gene therapy. These results should also be considered for the RT-PCR-based monitoring of the minimal residual
disease, because approaches used in clinical practice do not take
into account all diversity of transcripts of the fusion gene. Finally,
extremely interesting is the further in-depth study and monitoring
of activity of splicing and NMD genes in individual AML patients,
as they affect the transcriptome state of leukemia cells.
Conict of interests
None of the authors has a nancial disclosure.
Acknowledgements
This work was supported by the Ministry of Education and the
Ministry of Health of the Republic of Belarus (interdepartmental
grant #554/54 (1.1.31)).
Appendix A. Supplementary data
Supplementary data associated with this article can be found,
in the online version, at http://dx.doi.org/10.1016/j.biocel.2015.08.
017.
References
Ababneh, F., Jermiin, L.S., Robinson, J., 2006. Generation of the exact distribution
and simulation of matched nucleotide sequences on a phylogenetic tree. J.
Math. Model. Algorithm 5, 291308.
Albert, R., Barabasi, A.L., 2002. Statistical mechanics of complex graphs. Rev. Mod.
Phys. 74, 4797.
Barrett, T., Troup, D.B., Wilhite, S.E., Ledoux, P., Evangelista, C., Kim, I.F., et al., 2011.
NCBI GEO: archive for functional genomics data sets 10 years on. Nucleic
Acids Res. 39, D1005D1010.
Benson, D.A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J.,
et al., 2013. GenBank. Nucleic Acids Res. 41, D36D42.
Breiman, L., Cutler, A., Liaw, A., Wiener, M., Package randomForest. Version 4.6-7.
https://cran.r-project.org/web/packages/randomForest/index.html (accessed
29.08.13).
Clauset, A., Shalizi, C.R., Newman, M.N.J., 2009. Power-law distributions in
empirical data. SIAM Rev. 51, 661703.
Csardi, G., Nepusz, T., 2006. The igraph software package for complex graph
research. Int. J. Complex Syst., 1695.
57
Davis, J.N., McGhee, L., Meyers, S., 2003. The ETO (MTG8) gene family. Gene 303,
110.
del Rio, G., Koschtzki, D., Coello, G., 2009. How to identify essential genes from
molecular networks? BMC Syst. Biol. 3, 102.
Dvinge, H., Ries, R.E., Ilagan, J.O., Stirewalt, D.L., Meshinchi, S., Bradley, R.K., 2014.
Sample processing obscures cancer-specic alterations in leukemic
transcriptomes. Proc. Natl. Acad. Sci. U. S. A. 111, 1680216807.
Era, T., Asou, N., Kunisada, T., Yamasaki, H., Asou, H., Kamada, N., et al., 1995.
Identication of two transcripts of AML1/ETO-fused gene in t(8;21) leukemic
cells and expression of wild type ETO gene in hematopoietic cells. Genes
Chromosomes Cancer 13, 2533.
Erickson, P., Gao, J., Chang, K.S., Look, T., Whisenant, E., Raimondi, S., et al., 1992.
Identication of breakpoints in t (8;21) acute myelogenous leukemia and
isolation of a fusion transcript, AMLl/ETO, with similarity to Drosophila
segmentation gene, runt. Blood 80, 18251831.
Gehring, N.H., Kunz, J.B., Neu-Yilik, G., Breit, S., Viegas, M.H., Hentze, M.W., et al.,
2005. Exon-junction complex components specify distinct routes of
nonsense-mediated mRNA decay with differential cofactor requirements. Mol.
Cell 20, 6575.
Gu, Z., Package circlize. Version 0.2.4. https://cran.r-project.org/web/packages/
circlize/index.html (accessed 20.03.15).
Hatlen, M.A., Wang, L., Nimer, S.D., 2012. AML1ETO driven acute leukemia:
insights into pathogenesis and potential therapeutic approaches. Front. Med. 6,
248262.
Heber, S., Alekseyev, M., Sze, S.H., Tang, H., Pevzner, P.A., 2002. Splicing graphs and
EST assembly problem. Bioinformatics 18, S181S188.
Houseley, J., Tollervey, D., 2010. Apparent non-canonical trans-splicing is
generated by reverse transcriptase in vitro. PLoS ONE 5, e12271.
Hug, B.A., Lazar, M.A., 2004. ETO interacting proteins. Oncogene 23, 42704274.
Iyer, S., Killingback, T., Sundaram, B., Wang, Z., 2013. Attack robustness and
centrality of complex networks. PLOS ONE 8, e59613.
Karolchik, D., Barber, G.P., Casper, J., Clawson, H., Cline, M.S., Diekhans, M., et al.,
2014. The UCSC genome browser database: 2014 update. Nucleic Acids Res. 42
(Database issue), D764D770.
Kashima, I., Yamashita, A., Izumi, N., Kataoka, N., Morishita, R., Hoshino, S., et al.,
2006. Binding of a novel SMG-1-Upf1-eRF1-eRF3 complex (SURF) to the exon
junction complex triggers Upf1 phosphorylation and nonsense-mediated
mRNA decay. Genes Dev. 20, 355367.
Kervestin, S., Jacobson, A., 2012. NMD: a multifaceted response to premature
translational termination. Nat. Rev. Mol. Cell Biol. 13, 700712.
Kim, P., Yoon, S., Kim, N., Lee, S., Ko, M., Lee, H., et al., 2010. ChimerDB 2.0 a
knowledgebase for fusion genes updated. Nucleic Acids Res. 38, D81D85.
Klaus, A., Yu, S., Plenz, D., 2011. Statistical analyses support power law
distributions found in neuronal avalanches. PLoS ONE 6, e19779.
Kleinberg, J.M., 1999. Authoritative sources in a hyperlinked environment. J. ACM
46, 604632.
Kozu, T., Fukuyama, T., Yamami, T., Akagi, K., Kaneko, Y., 2005. MYND less splice
variants of AML1MTG8 (AML1CBFA2T1) are expressed in leukemia with
t(8;21). Genes Chromosomes Cancer 43, 4553.
Kozu, T., Miyoshi, H., Shimizu, K., Maseki, N., Kaneko, Y., Asou, H., et al., 1993.
Junctions of the AMLl/MTG8(ETO) fusion are constant in t(8;21) acute myeloid
leukemia detected by reverse transcription polymerase chain reaction. Blood
82, 12701276.
Krzywinski, M., Schein, J., Birol, I., Connors, J., Gascoyne, R., Horsman, D., et al.,
2009. Circos: an information aesthetic for comparative genomics. Genome Res.
19, 16391645.
Kursa, M.B., Rudnicki, W.R., 2010. Feature selection with the Boruta package. J. Stat.
Softw. 36, 113.
LaFiura, K.M., Edwards, H., Taub, J.W., Matherly, L.H., Fontana, J.A., Mohamed, A.N.,
et al., 2008. Identication and characterization of novel AML1ETO fusion
transcripts in pediatric t(8;21) acute myeloid leukemia: a report from the
Childrens Oncology Group. Oncogene 27, 49334942.
Lasa, A., Nomdedeu, J.F., Carnicer, M.J., Llorente, A., Sierra, J., 2002. ETO sequence
may be dispensable in some AML1ETO leukemias. Blood 100, 4243
4244.
Liaw, A., Wiener, M., 2002. Classication and regression by random Forest. R News
2, 1822.
Maciejewski, J.P., Padgett, R.A., 2012. Defects in spliceosomal machinery: a new
pathway of leukaemogenesis. Br. J. Haematol. 158, 165173.
Majoros, W.H., Lebeck, N., Ohler, U., Li, S., 2014. Improved transcript isoform
discovery using ORF graphs. Bioinformatics 30, 19581964.
Mannari, D., Gascoyne, D., Dunne, J., Chaplin, T., Young, B., 2010. A novel exon in
AML1ETO negatively inuences the clonogenic potential of the t(8;21) in
acute myeloid leukemia. Leukemia 24, 891894.
Margolin, A.A., Wang, K., Lim, W.K., Kustagi, M., Nemenman, I., Califano, A., 2006.
Reverse engineering cellular networks. Nat. Protoc. 1, 662671.
Melamud, E., Moult, J., 2009. Stochastic noise in splicing machinery. Nucleic Acids
Res. 37, 48734886.
Meyers, S., Downing, J.R., Hiebert, S.W., 1993. Identication of AML-1 and the
(8;21) translocation protein (AML-1/ETO) as sequence-specic DNA-binding
proteins: the runt homology domain is required for DNA binding and
protein-protein interactions. Mol. Cell. Biol. 13, 63366345.
Migas, A.A., Mishkova, O.A., Ramanouskaya, T.V., Ilyushonak, I.M., Aleinikova, O.V.,
Grinev, V.V., 2014. RUNX1T1/MTG8/ETO gene expression status in human
t(8;21) (q22;q22)-positive acute myeloid leukemia cells. Leukemia Res. 38,
11021110.
58
Miyoshi, P., Kozu, T., Shimizu, K., Enomoto, K., Maseki, N., Kaneko, Y., et al., 1993.
The t(8;21) translocation in acute myeloid leukemia results in production of an
AML1MTG8 fusion transcript. EMBO J. 12, 27152721.
Mller, A.M.S., Duque, J., Shizuru, J.A., Lbbert, M., 2008. Complementing
mutations in core binding factor leukemias: from mouse models to clinical
applications. Oncogene 27, 57595773.
Newman, M.E.J., 2002. Assortative mixing in networks. Phys. Rev. Lett. 89, 208701.
Newman, M.E.J., 2003. The structure and function of complex networks. SIAM Rev.
45, 167256.
Newman, M.E.J., 2005. Power laws, Pareto distributions and Zipfs law. Contemp.
Phys. 46, 323351.
Nicholson, P., Yepiskoposyan, H., Metze, S., Zamudio Orozco, R., Kleinschmidt, N.,
Mhlemann, O., 2010. Nonsense-mediated mRNA decay in human cells:
mechanistic insights, functions beyond quality control and the double-life of
NMD factors. Cell. Mol. Life Sci. 67, 677700.
Nisson, P.E., Watkins, P.C., Sacchi, N., 1992. Transcriptionally active chimeric gene
derived from the fusion of the AML1 gene and a novel gene on chromosome 8
in t(8;21) leukemic cells. Cancer Genet. Cytogenet. 63, 8188.
Nykter, M., Price, N.D., Larjo, A., Aho, T., Kauffman, S.A., Yli-Harja, O., et al., 2008.
Critical networks exhibit maximal information diversity in
structuredynamics relationships. Phys. Rev. Lett. 100, 058702.
Park, S., Chen, W., Cierpicki, T., Tonelli, M., Cai, X., Speck, N.C., et al., 2009. Structure
of the AML1ETO eTAFH domain-HEB peptide complex and its contribution to
AML1-ETO activity. Blood 113, 35583567.
Pickrell, J.K., Pai, A.A., Gilad, Y., Pritchard, J.K., 2010. Noisy splicing drives mRNA
isoform diversity in human cells. PLoS Genet. 6, e1001236.
Saunders, M.J., Tobal, K., Keeney, S., Liu Yin, J.A., 1996. Expression of diverse
AML1/MTG8 transcripts is a consistent feature in acute myeloid leukemia with
t(8;21) irrespective of disease phase. Leukemia 10, 11391142.
Sayers, E.W., Barrett, T., Benson, D.A., Bolton, E., Bryant, S.H., Canese, K., et al., 2012.
Database resources of the National Center for Biotechnology Information.
Nucleic Acids Res. 40, D13D25.
Schneidera, C.M., Moreirab, A.A., Andrade, J.S., Havlin, J.S., Herrmanna, H.J., 2011.
Mitigation of malicious attacks on networks. Proc. Natl. Acad. Sci. U. S. A. 108,
38383841.
Smyth, G.K., 2005. Limma: linear models for microarray data. In: Gentleman, R.,
Carey, V., Dudoit, S., Irizarry, R., Huber, W. (Eds.), Bioinformatics and
Computational Biology Solutions using R and Bioconductor. Springer, New
York, pp. 397420.
Ruijter, J.M., Ramakers, C., Hoogaars, W.M.H., Karlen, Y., Bakker, O., van den Hoff,
M.J.B., et al., 2009. Amplication efciency: linking baseline and bias in the
analysis of quantitative PCR data. Nucleic Acids Res. 37 (6), e45, http://dx.doi.
org/10.1093/nar/gkp045.
Stauffer, D., 2012. Phase transitions on fractals networks. In: Meyers, R.A. (Ed.),
Mathematics of complexity and dynamical systems. Springer
Science + Business Media, LLC, New York, pp. 14001406.
Sun, X.J., Wang, Z., Wang, L., Jiang, Y., Kost, N., Soong, T.D., et al., 2013. A stable
transcription factor complex nucleated by oligomeric AML1ETO controls
leukaemogenesis. Nature 500, 9398.
Tahirov, T.H., Inoue-Bungo, T., Morii, H., Fujikawa, A., Sasaki, M., Kimura, K., et al.,
2001. Structural analyses of DNA recognition by the AML1/Runx-1 Runt
domain and its allosteric control by CBFbeta. Cell 104, 755767.
Tighe, J.E., Calabi, F., 1994. Alternative, out-of-frame runt/MTG8 transcripts are
encoded by the derivative (8) chromosome in the t(8;21) of acute myeloid
leukemia M2. Blood 84, 21152121.
Trajanovski, S., Martin-Hernandez, J., Winterbach, W., Van Mieghem, P., 2013.
Robustness envelopes of networks. J. Complex Netw. 1, 4462.
Van de Locht, L.T., Smetsers, T.F., Wittebol, S., Raymakers, R.A., Mensink, E.J., 1994.
Molecular diversity in AML1/ETO fusion transcripts in patients with t(8;21)
positive acute myeloid leukaemia. Leukemia 8, 17801784.
Virkar, Y., Clauset, A., 2012. Power-law distributions in binned empirical data. Ann.
Appl. Stat., 133.
Vuong, Q.H., 1989. Likelihood ratio tests for model selection and non-nested
hypotheses. Econometrica 57, 307333.
Wang, Y., Ma, M., Xiao, X., Wang, Z., 2012. Intronic splicing enhancers, cognate
splicing factors and context dependent regulation rules. Nat. Struct. Mol. Biol.
19, 10441052.
Wilkerson, M., Waltman, P., Package ConsensusClusterPlus. Version 1.22.0.
http://bioconductor.org/packages/release/bioc/html/ConsensusClusterPlus.
html (accessed 13.07.15).
Yan, M., Kanbe, E., Peterson, L.F., Boyapati, A., Miao, Y., Wang, Y., et al., 2006. A
previously unidentied alternatively spliced isoform of t(8;21) transcript
promotes leukemogenesis. Nat. Med. 12, 945949.
Zhang, J., Kalkum, M., Yamamura, S., Chait, B.T., Roeder, R.G., 2004. E protein
silencing by the leukemogenic AML1-ETO fusion protein. Science 305,
12861289.
Zhang, Y.W., Bae, S.C., Huang, G., Lu, J., Ahn, M.Y., Kanno, Y., et al., 1997. A novel
transcript encoding an N-terminally truncated AML1/PEBP2 alpha B protein
interferes with transactivation and blocks granulocytic differentiation of
32Dcl3 myeloid cells. Mol. Cell. Biol. 17, 41334145.

Decoding of Exon Splicing Patterns in The Human RUNX1-RUNX1T1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Decoding of Exon Splicing Patterns in The Human RUNX1-RUNX1T1

Uploaded by

Copyright:

Available Formats

The International Journal of Biochemistry & Cell Biology 68 (2015) 4858

Contents lists available at ScienceDirect

The International Journal of Biochemistry

Decoding of exon splicing patterns in the human RUNX1RUNX1T1

Department of Genetics, Faculty of Biology, Belarusian State University, Minsk, Belarus

Corresponding author at: Department of Genetics, Faculty of Biology, Belarusian

For cDNA synthesis, we used 1 g of total cellular RNA in the

Finally, in the absence of any preferences for splicing, the graph

an implementation of Circos plot in R package circlize v.0.2.5 was

2.6. Fitting of statistical models to empirical data

2.10. Modeling of the exon skipping

Initially, we rejected those statistical models that clearly do

For this kind of analysis, we used a general approach developed

2.7. Identication of the signicant open reading frames (ORFs)

Normalized ECI value

Position of exon in transcripts

transcripts without PTC (for instance, mRNAs with termination in

#16 and ESRS hexamers in the downstream neighboring exons

the context of splicing, preferential attachment corresponds to the

Normalized ECI value

3.6. Differential expression of splicing genes correlates with

analyses conrmed its expression in t(8;21)-positive leukemia cells

We found that targeted skipping of exons with the top ranked

11 to 5 -end of exon 17) is the most constant in terms of splicing.

Normalized value of metric

Normalized value of metric

Normalized value of metric

Normalized value of metric

other transcriptional regulators (Hug and Lazar, 2004; Tahirov et al.,

be explained by stochasticity in the splicing machinery. Artifacts

Finally, dysfunction of the NMD system in t(8;21)-positive

You might also like