Professional Documents
Culture Documents
a r t i c l e
i n f o
Article history:
Received 17 December 2010
Received in revised form 8 June 2013
Accepted 10 June 2013
Available online 19 June 2013
Keywords:
Multidimensional rule
Sequential rule
Activity prediction
Mobile Personalized Marketing
Data mining
a b s t r a c t
Personalized marketing via mobile devices, also known as Mobile Personalized Marketing (MPM), has become an increasingly important marketing tool because the ubiquity, interactivity and localization of mobile
devices offers great potential for understanding customers' preferences and quickly advertising customized
products or services. A tremendous challenge in MPM is to factor a mobile user's context into the prediction
of the user's preferences. This paper proposes a novel framework with a three-stage procedure to discover
the correlation between contexts of mobile users and their activities for better predicting customers' preferences. Our framework helps not only to discover sequential rules from contextual data, but also to overcome
a common barrier in mining contextual data, i.e. elimination of redundant rules that occur when multiple
dimensions of contextual information are used in the prediction. The effectiveness of our framework is evaluated through experiments conducted on a mobile user's context dataset. The results show that our framework can effectively extract patterns from a mobile customer's context information for improving the
prediction of his/her activities.
2013 Elsevier B.V. All rights reserved.
1. Introduction
Personalized Marketing (PM), also known as one-to-one marketing, is the process of delivering targeted products and services
to a customer based on the customer's prole [36,41]. The main objective of PM is to identify the needs of a customer and offer products and services that appeal to that particular customer. Recently,
with the dazzling proliferation of mobile commerce, personalized
marketing via mobile devices (Mobile Personalized Marketing,
MPM) has become an increasingly important marketing tool because the ubiquity, interactivity, and localization of mobile devices
offers great potential for collecting customers' information, understanding their preferences and quickly advertising customized
products [16,37,60]. Recent studies have predicted that the volume
of business transactions associated with MPM will soon become the
primary contributor to revenue growth in one-to-one marketing
[39].
In personalized marketing, it is important to consider the contextual information, i.e. the environment where a customer is located, in
order to understand the needs of the customer. Since it is possible to
Corresponding author at: Av. Padre Toms Pereira Taipa, Macao, China. Tel.: +853
83974172; fax: +853 83978320.
E-mail addresses: hengtang@umac.mo (H. Tang), issliao@cityu.edu.hk (S.S. Liao),
sherry.sun@cityu.edu.hk (S.X. Sun).
0167-9236/$ see front matter 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.dss.2013.06.004
In the example above, multiple dimensions of the contextual information provide important clues to the customer's preferences under a
more specic circumstance (e.g., location, weather and time). As such,
accuracy of the prediction whether a customer will likely accept an
offer of a service can be improved when such multidimensional information about the customer's context is incorporated into a prediction
method for personalized marketing.
The estimation of a customer's preference for a service can be
regarded as a mapping from the customer, context, and service to a
probability, i.e. p = f (Customer, Context, Service). In recommender
systems, the extent to which a customer prefers a service is reected
by the User Rating [2]. In this study, the probability of a customer
preferring a service is obtained through analyzing the correlation between a sequence of contexts and the activities of a customer based
on historical data. Then, for a given context, the service or product
with the highest probability of being preferred can be proactively offered to the customer. The correlations between a series of contexts
and activities of a customer are represented as sequential rules,
often represented as x leads to y indicating y happens after x has
happened [5]. Users' activities, notably, can also be viewed as an important type of contextual information, since they can offer valuable
clues for predicting future moves. This sequential rule based solution
enables the service provider to not only tailor services for customers,
but also deliver the services in advance.
Example 2. A simple sequential rule is given as follows, showing the
correlation between the contexts and the activities of the customer in
Example 1.
{ofce, afternoon}, {shopping mall, night} leads to redeeming a
mobile coupon for the food court (probability = 70%).
This sequential rule indicates that the customer is likely to
accept a mobile coupon when going from ofce to shopping
mall at night. Location and time are the involved dimensions of
the two contexts in sequence. As this rule has a comparatively
high probability, it can be used to make predictions. Then,
whenever the antecedent, i.e. the contexts {ofce, afternoon}
and {shopping mall, night}, occur again, a prediction can be
made that a mobile coupon for the food court will be preferred
by the customer.
Given that in practice, a huge amount of contextual information
with various dimensions can be collected using mobile devices, it is
a great challenge to effectively identify the sequential rules that are
most useful for prediction of customer preferences. Moreover, in
order to proactively address customers' needs, it is also critical to
quickly identify situations where a sequential rule is applicable.
As different combinations of dimensions of a context can be used
for prediction of customer preferences, accuracy of the prediction
undoubtedly depends on the set of dimensions for various contexts.
Rather than enhancing the predictiveness, incorporation of additional contextual dimensions sometimes results in redundancy
[63], which is generally known as a phenomenon wherein parts
of knowledge are in fact corollaries of other parts of knowledge
[38].
Example 3. Seven sequential rules shown as follows are derived from
the sequential rule in Example 2 but incorporate one new dimension,
i.e. day of the week:
(1) {ofce, afternoon, Monday}, {shopping mall, night, Monday}
leads to redeeming a mobile coupon, and
(2) {ofce, afternoon, Tuesday}, {shopping mall, night, Tuesday}
leads to redeeming a mobile coupon.
235
236
ofce and entertainment, etc.), which can be very special from one
individual to another.
The mining algorithm we propose in this paper attempts to deal
with the second type of data format, i.e. a single sequence, which is
more applicable for context-specic MPM. In this category, Mannila
and Toivonen [32] use Episode to describe frequently recurring
subsequences and propose two efcient algorithms, WINEPI and
MINEPI. Many others have also focused on episode mining, including [8,10,27,33]. For example, Bettini et al. [10] address the problem of mining event structures with multiple time granularity;
Laxman et al. [27] extend the episode mining approach by explicitly bringing event duration constraints into the concept of episode.
The difference between episode mining techniques and ours is
that the former are not directly applicable to multidimensional sequence mining.
2.2. Multidimensional sequence
The term multidimensional used in this paper comes originally
from the multidimensional data model used for data warehousing
and On-Line Analytical Processing (OLAP) [13]. The problem of discovering multidimensional sequential rules for prediction studied in
this paper is a new issue. To the best of our knowledge, no related
work has directly addressed this. However, the general concept of
mining multidimensional sequential rules has been addressed in several studies and dimension is also referred to as attribute [47]. Yu
and Chen [59] investigate the episode mining problem for a
multidimensional sequence. However, the term multidimensional
used in their work refers to multiple granularities in terms of the
time dimension of occurrence of events. It is thus a concept different
from the way the term is used in our study. Attempts to detect sequential patterns from a multidimensional transactional database
have been made by Pinto et al. [42], in which sequence refers to
purchase sequence segments of a certain customer. The research
problem discussed in this paper is different in that our approaches
aim to extract patterns from an entire sequence rather than from a
database of short sequence segments.
2.3. Rule reduction
Knowledge redundancy is known as a common problem of
knowledge-based systems, as it reduces maintainability and efciency of the knowledge base [49]. In this paper, we concentrate on the
redundancy problem associated with rule-based systems.
In general, a rule base with problematic rules, including redundant
ones, can be validated via conducting post-analysis by either domain
experts or through an automatic process. The approach of involving
domain experts is known to be exible and highly applicable. For example, Adomavicius and Tuzhilin [3] propose an expert-driven
framework for validation of a given rule base. The automatic process
for redundancy check normally relies on precisely dening the measure of redundancy. For instance, Zaki [62] proposes the concept of
Closed Itemset based on Formal Concept Analysis, and proves that
237
a closed itemset can be used to capture all information about a conventional frequent itemset. Moreover, a redundant rule is dened to
be a super rule with the same frequency and condence as its
sub-rules [62]. Similar redundancy denitions have also been proposed in some other studies [31,55]. In Ashra et al.'s approach
[6,7], given a rule r, if r's sub-rules with a higher condence are
found in the rule base, then the rule r should be regarded as redundant. [14,15] dene a -tolerance association rule (-TAR) mining
task. The itemset identied using the -TAR method only includes
items with dramatic frequency changes, and rules excluded in the
-TAR are viewed as redundant.
Despite these prior efforts on redundancy reduction, to the best of our
knowledge, few works have addressed this issue in multidimensional
settings. Specically, our research investigates the redundancy problem
introduced by the multidimensionality of sequential rules.
Denition 2. Event
An event e is a subset of the state space, denoted e D.
Denition 3. Sequence
Given T = {t1,t2,tn}, a sequence is a list seqT = {S(t), t T} whose
elements are ordered ascendingly by t. Assuming t1 b t2 b b tn, the
overall time span of seqT is span(seqT) = tn t1.
We say an event e occurs in sequence seqT at time t, denoted e t seqT,
if there exists a time point t such that S(t) e. For brevity it is also denoted e seqT in the case that occurring time does not matter. For example,
given S(t1) = { Street , Morning , Raining , Purchase } dened
on a 4-dimensional state space D and e2 = {v4 = Purchase }, since
S(t1) e2, we say e2 occurs in seqT, denoted e2 seqT. Note that there
can be, in a general case, more than one event occurring at the same
time point.
To simplify the notation in the paper, we use a tuple (e, t) to represent an event occurring in a sequence at a specic time point,
where e is the event label taking value from a nite alphabet and t
is the occurring time of e. Thus a sequence could be denoted by, for
example, seqT1 = b(x,11), (x,12), (x,13), (w,14), (y,15), (x,16),
(z,17), (x,18)>, where w, x, y and z are events occurring in seqT.
A rule is a type of pattern representing the hidden correlation
among events in a sequence dened as follows.
Denition 4. Rule
A rule is a list of events denoted by r = e1,e2, ,el, where
e1 ; ; el D.
Having introduced the notion of a rule, we are now able to dene
what is meant by its occurrence in a sequence so as to formulate its signicance. Shortly, an occurrence of a rule r is considered as a series of
time points recording the occurring time of the corresponding events
of r in a sequence. For practical purposes, we should allow some time
gap between the adjacent events in an occurrence, hence two thresholds g and w are introduced into our formulation: g (or gap) is the maximum allowed time difference between the occurring time of any two
neighboring event types, while w (or width) is the maximum allowed
time difference between the occurring time of the rst and the last
event types. The rigorous denition is provided as follows.
238
Denition 5. Occurrence
T ;g;w
2)
239
events that are either actional or with the dimension set dimset. In the
loop between lines 2 and 6, Freqi is used to store frequent rules with
length i, and the pruning method (denoted prune() in line 4) is then
applied to eliminate rules violating either of the thresholds g or w.
Each concatenatable rule pair is then merged to generate a new
candidate with length i + 1. The above procedure is iterated until
all frequent rules with length l are identied. In addition, since the
goal is to extract sequential rules in which actional events do not appear in the antecedent, rules ending with actional event thus will not
be selected as the left rule when conducting concatenation (line 5, denoted r1.tail.dim {Dm + 1}). This restriction on rule pair selection is
another effective pruning in the algorithm. Subsequently, lines 7 and
8 compute Freq1 so as to nalize the loop.
Ultimately, only sequential rules with the minimum length of 2
are chosen (line 9 in Fig. 2). Since the support of all rules have been
calculated and stored, computing their condence is straightforward.
Rules with condence greater than min_conf are considered as valid
rules to output.
4. Rule reduction
We now consider the redundancy problem motivated in the examples in Section 1. By specifying different dimension sets, the
rule learning algorithm can identify sequential rules with various
congurations of contextual dimensions. Rules with high dimensionality could be of great interest because they offer more specic
and accurate description of context, they may nevertheless also be
considered redundant if they do not carry additional knowledge
than their lower dimensional variations. Traditionally, redundant
dimensions can be spotted by various dimension (feature) selection
methods, such as applying heuristics [19,26], so that they can be excluded from rule extraction in the rst place. However, conventional
dimension selection methods consider redundancy problem from
the dimension level, but ignore the fact that a valueless contextual
dimension in terms of some rules could possibly be valuable for
some others. To this end, this study focuses on the redundancy
problem on rule level.
In the research of association rule mining, a number of criteria are
proposed to determine whether a rule r1 : XY e is redundant with
regard to its closure r2 : X e, where X and Y are items. The most
representative criteria fall into two categories, that is: Association
rule r1 is redundant in terms of r2 if and only if:
(1) conf(r1) conf(r2) [6,7], or
(2) conf(r1) = conf(r2) and supp(r1) supp(r2) [31,55,62]
The above criteria can be straightforwardly applied in multidimensional settings if states in different dimensions are considered
as items. In other words, to determine whether a state Y Dk is
240
On the other hand, based on the assumption of the Principle of Indifference [20,23], we assume that the remaining infrequent specializations are equally probable. Thus the probability of each infrequent
specialization is the average of the remaining possibility, which can be
suppr
calculated by pinf 1 rj Rr;Dm supprj =specr; Dm Rr; Dm ,
where the denominator is the estimated number of infrequent specializations of r. Because spec(r,Dm) is the set of all combinations of
m-dimensional specializations with length l, we can calculate
spec(r,Dm) = dom(Dm)l, where dom(Dm) is the value domain of
D m.
Therefore, the total information amount of all infrequent specializations can be estimated by
Iinf r; Dm pinf logpinf specr; Dm Rr; Dm :
5
Fig. 3. Algorithm 2Reducing a rule base.
Using bounds [0, log(spec(r,Dm))], we normalize the entropybased redundancy degree into [0,1], which is the ratio of the overall
entropy of the specializations over the upper bound, that is,
Redunr; Dm
I freq r; Dm Iinf r; Dm
:
logspecr; Dm
0:08 0:09 0:02
=73 0:0125
1
0:2
0:2
0:2
I inf 0:0125 log0:012573 0:316
0:08
0:08
0:09
0:09
0:02
0:02
log
log
log
1:379:
I freq
0:2
0:2
0:2
0:2
0:2
0:2
P inf
Then the redundancy of the dimension Day of the week with regard
to r is calculated as: Redun(r,Dm) = (0.316 + 1.379)/log 7 0.604.
From the calculation above, we nd that the new dimension Day
of the week carries extra knowledge for the given rule r because its
redundancy level is signicantly smaller than 1 (based on our
trial-and-error simulations, when dom(Dm) is not a large set,
Redun value greater than 0.9 could be considered signicantly redundant). In fact, all supports given above have already intuitively
shown the sequential occurrence of Ofce and actional event e is
signicantly more frequent on 2 days (Sun and Sat) than on other
week days. Additionally, a byproduct of the above process is the
generalization of context in the conceptual hierarchy, i.e. some
values in a contextual dimension could be generalized (e.g., Monday
Friday to Weekdays) if they result in high information amount. This
topic is out of the scope of the current paper and will be discussed in
our other works.
Based on the denition of the redundancy measure, the algorithm
for rule reduction is sketched in Fig. 3 as follows. Various rules with
different dimension sets are stored in the original rule base. First,
each rule r and its specializations in the original rule base are
examined (line 3), where D \ (r.dim) is the set of dimensions not involved in rule r. If the redundancy level of the specialization being
tested does not exceed a given threshold, frequent rules in this specialization set are added into RBr (line 5 in Fig. 3); otherwise rule r
is incorporated while its specialization set is discarded (line 7 in
Fig. 3). The threshold in line 4 can be determined by trialand-error through simulations.
5. Generating prediction by rule matching
In order to predict an actional event, we need to match the data
stream with previously identied rules relevant to the actional
event. In this study, we have developed a generic approach to allow
an inconsecutive matching while factoring the time thresholds (g and
w). This approach is underpinned by the n-gram prediction mechanism
originally used for sequence comparison and has been applied to predict Web requests on the basis of the historical access patterns
[46,58]. In order to allow neighboring events in a rule to nd their corresponding occurrences inconsecutively in the incoming sequence, approximate matching [17] is implemented in the matching process. The
rule matching process is sketched in Algorithm 3 in Fig. 4, which is an
innite procedure that keeps monitoring and processing the next incoming event. Suppose that RB is the rule base with k rules, in which
all rules are with sufciently high support and condence. Three tables
are used to maintain the current matching status of each rule: matched
[1k] maintains the last matched event of each rule, time_last[1k]
maintains the time point of the last matched event of each rule, and
time_rst[1k] records the time point of the rst matched event of
each rule.
Assume that r : e1,e2, ,el el + 1 is the rule to be compared
with, and r[k] is used to denote the k-th event ek in rule r. Given a
matched portion r he1 ; e2 ; ; ei i; i 1l and an incoming snapshot S(t), if S(t) ei + 1, the current input state S(t) is considered
matching r. Consequently, the conditional probability p(e|e1,e2,
,ei + 1) is examined: if it is no less than the predened threshold
, the entire rule r is considered a condent match hence its corresponding actional event e can be triggered (line 79 in Fig. 4). Otherwise, we record the current match and wait for the next incoming
state (line 12 in Fig. 4), since a longer matching is expected to
achieve higher condence [56]. Whenever either of the time thresholds g or w is violated, the current matching with rule r is considered
a failure (line 4 in Fig. 4), we then can start over again by resetting
the matching status maintained in the three tables and wait for the
new incoming state (line 5 in Fig. 4). In particular, owing to the
level-wise nature of the rule extraction algorithm during the learning stage, the conditional probabilities associated with all prexes
of a rule are readily available thus there is no need to calculate during
241
the matching process. For each new state, the algorithm scans all
rules in the rule base and attempts to nd a match from the current
matching position of each rule stored in the table matched.
Note that multiple rules can be applicable simultaneously since
more than one rule can have a conditional probability no less than
the threshold in an iteration of scan. While the algorithm in Fig. 4
only chooses the rst matched rule (line 9 in Fig. 4), an extension
can be easily made to adopt different criteria, such as the shortest
rule, the largest probability, and a predened prot/cost matrix
[1,57], to determine the most applicable rules.
For real-time MPM applications, such on-line processing method serially handling the input piece-by-piece as depicted in Algorithm 3 is
preferable. In traditional string comparison algorithms, inconsecutively
comparing elements in two strings can be solved by using Dynamic
Programming [17] within the time complexity O(MN), where M
and N are lengths of the two strings. Algorithm 3 attempts to align
an input sequence with a xed rule incrementally, hence the maximum time complexity is reduced to O(N) where N is the length of
input sequence. As such, the time complexity of comparing one incoming state with a rule becomes constant, and comparing a state
with all rules in a rule base has complexity O(RB) where RB is
the size of the rule base RB.
6. Evaluation
To validate the methods proposed for extracting, selecting and
ranking sequential rules, we conduct experiments on a context database referred to as Nokia Context Data [18]. The dataset consists of a
sequence of contextual data. In addition to validating the framework
proposed in this study, the dataset is used as a manifestation of the
MPM scenario mentioned in Section 1. The evaluation encompasses
two experiments:
Experiment I. In order to examine the ability of the proposed framework to identify effective multi-dimensional rules, we apply the
learning and matching algorithms to the Nokia Context Data to
242
Table 1
Data samples from the Nokia context dataset.
Record type
Exemplar dimensions
Value domains/format
Example
Context
v1: Location
v2: Day name
v3: Day period
Launch an application
Access a Web page
Initiate communication
1,3
1
1
a,2
b,13
c,3
6.2. Experiment I
We employ the rule learning algorithm on various dimension sets.
The identied rules are denoted in the format of, for example, {v1 =
1, 1, v2 = 1, v3 = 4}{v1 = 1, 3, v2 = 1, v3 = 4}{v1 = 1, 4,
v2 = 1, v3 = 4} {v4 = c, 1}, in which {v4 = c, 1} is an actional
event meaning making a phone call to a certain number. This rule
implies that if on Monday evening the mobile device carrier visits
three places 1,1,1,3 and 1,4 in turns, it is very likely that s/he
will call a number afterwards (in the same time window). Using the
matching algorithm, prediction of an actional event (i.e. call a number)
is done by comparing the sequence of context that has been received
with the extracted rules. As another example, extracted rule {v1 = 3,
46, v3 = 2} {v4 = a, 12} implies that if the device carrier visits
location 3,46 in the morning, s/he tends to launch a certain application afterwards, denoted a,12.
Legend:
Fig. 5. Performance of rule bases versus different min_conf thresholds (in a, b, and c) and min_supp thresholds (in d, e, and f).
0.08
0.09
0.1
# of rules
L
LD
LP
LDP
191
98
39
150
70
31
209
139
83
67
25
3
243
6.3. Experiment II
In this experiment, we compare the proposed entropy-based
method (referred to as ENT) for rule reduction with the two methods
Legend:
Fig. 6. Performance of compound rule bases versus different min_conf thresholds.
244
Table 3
Reduction results after applying different approaches on various rule bases.
Method
LD
LP
LDP
Overall
ORIGINAL
RM1
RM2
ENT
RM1 + ENT
39
39
39
39
39
31
28 (9.68%)
31 (0.00%)
29 (6.45%)
26 (16.13%)
83
79 (4.82%)
83 (0.00%)
74 (10.84%)
73 (12.05%)
3
3 (0.00%)
3 (0.00%)
2 (33.33%)
2 (33.33%)
156
149 (4.49%)
156 (0.00%)
144 (7.69%)
140 (10.26%)
Note: The integers indicate the number of rules remaining after reduction and
percentages in parentheses indicate the ratio of reduction compared with the original rule
base. Labels L, LD, LP, and LDP are the same meaning as those described in Experiment I.
Numbers in bold indicate the size of rule bases being actually reduced.
Legend:
Fig. 7. Comparison of Precision, Recall, and F1 in reduced rule bases.
Symbol
Denition
T = {t1,t2, ,tn}
D D1 D2 Dm1
S(t) = {v1,
,vm + 1},
tT
e
e. dim
seqT
r = b e1, e2, , el >
r. dim
g
w
or = b to1, to2, , tol >
Occr(r, seqT, g, w)
win = [ti, tj)
W(seqT, w)
Wr(seqT, g, w)
concat(r1,r2)
Candi
Freqi
supp(r, seqT, g, w)
min_supp
conf(r, seqT, g, w)
min_conf
spec(r,Dm)
R(r,Dm)
Redun(r,Dm)
matched[]
time_last[]
time_rst[]
245
References
[1] A.S. Abrahams, A. Becker, D. Fleder, I.C. MacMillan, Handling generalized cost
functions in the partitioning optimization problem through sequential binary
programming, Proceedings of the 5th IEEE International Conference on Data Mining, IEEE Computer Society, Houston, Texas, 2005, pp. 39.
[2] G. Adomavicius, R. Sankaranarayanan, S. Sen, A. Tuzhilin, Incorporating contextual information in recommender systems using a multidimensional approach,
ACM Transactions on Information Systems 23 (1) (2005) 103145.
[3] G. Adomavicius, A. Tuzhilin, Expert-driven validation of rule-based user models
in personalization applications, Data Mining and Knowledge Discovery 5 (1)
(2001) 3358.
[4] R. Agrawal, R. Srikant, Fast algorithms for mining association rules in large databases, Proc. of the 20th Internat. Conf. on Very Large Data Bases, Morgan Kaufmann
Publishers Inc., Santiago, Chile, 1994, pp. 487499.
[5] R. Agrawal, R. Srikant, Mining sequential patterns, Proc. of the 11th Internat. Conf.
on Data Engrg, 1995, pp. 314.
[6] M.Z. Ashra, D. Taniar, K. Smith, A new approach of eliminating redundant association rules, Database And Expert Systems Appl.: 15th Internat. Conf, Springer,
Zaragoza, Spain, 2004, pp. 465474.
[7] M.Z. Ashra, D. Taniar, K. Smith, Redundant association rules reduction techniques, AI 2005: Adv. in Articial Intelligence: 18th Australian Joint Conf. on Articial Intelligence, Springer, Sydney, Australia, 2005, pp. 2963.
[8] M. Atallah, R. Gwadera, W. Szpankowski, Detection of signicant sets of episodes in event sequences, Proc. of the 4th Internat. Conf. on Data Mining, 2004,
pp. 310.
[9] M. Baldauf, S. Dustdar, F. Rosenberg, A survey on context-aware systems, International Journal of Ad Hoc and Ubiquitous Computing 2 (4) (2007) 263277.
[10] C. Bettini, X.S. Wang, S. Jajodia, J.L. Lin, Discovering frequent event patterns with
multiple granularities in time sequences, IEEE Transactions on Knowledge and
Data Engineering 10 (2) (1998) 222237.
[11] I. Bose, X. Chen, A framework for context sensitive services: a knowledge discovery based approach, Decision Support Systems 48 (1) (2009) 158168.
[12] C. Chateld, Time-series Forecasting, Chapman & Hall/CRC, 2001.
[13] S. Chaudhuri, U. Dayal, An overview of data warehousing and OLAP technology,
ACM Sigmod Record 26 (1) (1997) 6574.
[14] J. Cheng, Y. Ke, W. Ng, Delta-tolerance closed frequent itemsets, Proc. of the 6th
Internat. Conf. on Data Mining, IEEE Computer Society, Hong Kong, China, 2006,
pp. 139148.
[15] J. Cheng, Y. Ke, W. Ng, Effective elimination of redundant association rules, Data
Mining and Knowledge Discovery 16 (2) (2008) 221249.
[16] D.-Y. Choi, Personalized local internet in the location-based mobile web search,
Decision Support Systems 43 (1) (2000) 3145.
[17] M. Crochemore, C. Hancart, T. Lecroq, Algorithms on Strings, Cambridge University
Press, 2007.
[18] A. Flanagan, Nokia Context Data. , December 13, 2010 http://www.pervasive.jku.
at/Research/Context_Database/index.php2004, (retrieved).
[19] I. Guyon, A. Elisseeff, An introduction to variable and feature selection, Journal of
Machine Learning Research 3 (2003) 11571182.
[20] I.A.N. Hacking, Equipossibility theories of probability, The British Journal for the
Philosophy of Science 22 (4) (2004) 339355.
[21] J. Han, M. Kamber, Data Mining: Concepts and Techniques, 2nd ed. Morgan Kaufmann,
2006.
[22] J.L. Herlocker, J.A. Konstan, L.G. Terveen, J.T. Riedl, Evaluating collaborative ltering recommender systems, ACM Transactions on Information Systems (TOIS) 22
(1) (2004) 553.
[23] E.T. Jaynes, Probability Theory: The Logic of Science, Cambridge University Press,
2003.
[24] H.A. Karimi, X. Liu, A predictive location model for location-based services, Proc.
of the 11th ACM Internat. symposium on Adv. in geographic information systems,
2003, pp. 126133, , (ACM, New Orleans, Louisiana, USA).
[25] J. Kogan, C. Nicholas, M. Teboulle, Grouping multidimensional data: recent advances in clustering, Springer-Verlag, New York, 2006.
[26] R. Kohavi, G.H. John, Wrappers for feature subset selection, Articial Intelligence
97 (12) (1997) 273324.
[27] S. Laxman, P.S. Sastry, K.P. Unnikrishnan, Discovering frequent generalized episodes
when events persist for different durations, IEEE Transactions on Knowledge and
Data Engineering 19 (9) (2007) 11881201.
[28] K.F. Lee, H.W. Hon, R. Reddy, An overview of the SPHINX speech recognition
system, IEEE Transactions on Acoustics, Speech, and Signal Processing 38 (1)
(1990) 3545.
[29] J. Li, X. Zhang, G. Dong, K. Ramamohanarao, Q. Sun, Efcient mining of high condence association rules without support thresholds, Lecture Notes in Computer
Science 1704/1999 (1999) 406411.
[30] J. Lin, Divergence measures based on the Shannon entropy, IEEE Transactions on
Information Theory 37 (1) (1991) 145151.
[31] C. Loglisci, D. Malerba, Mining multiple level non-redundant association rules
through two-fold pruning of redundancies, Proc. of the 6th Internat. Conf. on Machine Learn. and Data Mining in Pattern Recognition, Springer, 2009, p. 265.
[32] H. Mannila, H. Toivonen, Discovering generalized episodes using minimal occurrences, Proc. of the 2nd Internat. Conf. on Knowledge Discovery in Databases and
Data Mining, 1996, pp. 146151.
[33] H. Mannila, H. Toivonen, A. Inkeri Verkamo, Discovery of frequent episodes in
event sequences, Data Mining and Knowledge Discovery 1 (3) (1997) 259289.
[34] D. Mladenic, M. Grobelnik, Feature selection on hierarchy of web documents,
Decision Support Systems 35 (1) (1994) 4587.
246
[35] B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, M.C. Hsu, Mining sequential patterns
by pattern-growth: the PrexSpan Approach, IEEE Transactions on Knowledge
and Data Engineering 16 (11) (2004) 14241440.
[36] B.P.S. Murthi, S. Sarkar, The role of the management sciences in research on personalization, Management Science 49 (10) (2003) 13441362.
[37] E.W.T. Ngai, A. Gunasekaran, A review for mobile commerce research and applications, Decision Support Systems 43 (1) (2007) 315.
[38] T.A. Nguyen, W.A. Perkins, T.J. Laffey, D. Pecora, Knowledge-base verication, AI
Magazine 8 (2) (1987) 6975.
[39] D. O'Shea, Small screen for rent, Telephony Online, 2007. (December 13, 2010
http://telephonyonline.com/mag/telecom_small_screen_rent/, retrieved).
[40] B. Padmanabhan, Z. Zheng, S. Kimbrough, An empirical analysis of the value of
complete information for eCRM models, MIS Quarterly 30 (2) (2006) 247267.
[41] D. Peppers, M. Rogers, Enterprise One to One: Tools for Competing in the Interactive Age, Currency-Doubleday, New York, 1997.
[42] H. Pinto, J. Han, J. Pei, K. Wang, Q. Chen, U. Dayal, Multi-dimensional sequential
pattern mining, Proc. of the 10th Internat. Conf. on Inform. and Knowledge Management, ACM Press, New York, 2001, pp. 8188.
[43] M. Plantevit, T. Charnois, J. Klema, C. Rigotti, B. Cremilleux, Combining sequence and
itemset mining to discover named entities in biomedical texts: a new type of pattern,
International Journal of Data Mining, Modelling and Management 1 (2) (2009)
119148.
[44] J.H. Schiller, A. Voisard, Location-based Services, Morgan Kaufmann, San Francisco,
2004.
[45] R. Srikant, R. Agrawal, Mining sequential patterns: generalizations and performance improvements, Proc. of the 5th Internat. Conf. on Extending Database
Tech, Springer, Avignon, France, 1996, pp. 317.
[46] Z. Su, Q. Yang, Y. Lu, H. Zhang, WhatNext: a prediction system for Web requests
using n-gram sequence models, Proc. of the 1st Internat. Conf. on Web Inform.
Systems Engrg., 2000, pp. 214221.
[47] P.N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Addison-Wesley,
2005.
[48] C.J. Van Rijsbergen, Information Retrieval, Butterworth-Heinemann Newton, MA,
USA, 1979.
[49] A.I. Vermesan, Quality assessment of knowledge-based software: some certication considerations, Proc. of the 3rd Internat. Software Engrg. Standards Sympos,
1997, pp. 16.
[50] J. Wang, J. Han, C. Li, Frequent closed sequence mining without candidate maintenance, IEEE Transactions on Knowledge and Data Engineering 19 (8) (2007)
10421056.
[51] K. Wang, Y. He, D.W. Cheung, Mining condent rules without support requirement, Proc. of the 10th Internat. Conf. on Inform. and Knowledge Management,
ACM, Atlanta, Georgia, 2001, pp. 8996.
[52] W. Wang, J. Yang, Mining Sequential Patterns From Large Data Sets, Springer,
2005.
[53] W.W.S. Wei, Time Series Analysis: Univariate and Multivariate Methods, Addison
Wesley, 1989.
[54] O. White, T. Dunning, G. Sutton, M. Adams, J.C. Venter, C. Fields, A quality control
algorithm for DNA sequencing projects, Nucleic Acids Research 21 (16) (1993)
38293838.
[55] X. Yan, J. Han, R. Afshar, CloSpan: mining closed sequential patterns in large
datasets, SIAM Internat. Conf. on Data Mining, Society for Industrial and Applied
Mathematics, San Francisco, CA, 2003.
[56] Q. Yang, T. Li, K. Wang, Building association-rule based sequential classiers for
Web-document prediction, Data Mining and Knowledge Discovery 8 (3) (2004)
253273.
[57] Q. Yang, J. Yin, C. Ling, R. Pan, Extracting actionable knowledge from decision
trees, IEEE Transactions on Knowledge and Data Engineering 18 (16) (2006).
[58] Q. Yang, H.H. Zhang, Web-log mining for predictive Web caching, IEEE Transactions on Knowledge and Data Engineering 15 (4) (2003) 10501053.
[59] C.C. Yu, Y.L. Chen, Mining sequential patterns from multidimensional sequence
data, IEEE Transactions on Knowledge and Data Engineering 17 (1) (2005)
136140.
[60] S.T. Yuan, C. Cheng, Ontology-based personalized couple clustering for heterogeneous
product recommendation in mobile marketing, Expert Systems with Applications 26
(4) (2004) 461476.
[61] M.J. Zaki, SPADE: an efcient algorithm for mining frequent sequences, Machine
Learning 42 (1) (2001) 3160.
[62] M.J. Zaki, Mining non-redundant association rules, Data Mining and Knowledge
Discovery 9 (3) (2004) 223248.
[63] Y. Zhao, S. Zhang, Generalized dimension-reduction framework for recent-biased
time series analysis, IEEE Transactions on Knowledge and Data Engineering 18
(2) (2006) 231244.
Heng Tang received his Ph.D degree in the Department of Information Systems, City
University of Hong Kong. He is currently an assistant professor at the Department of
Accounting and Information Management, University of Macau. His research interests
are in the areas of Data Mining, Recommender Systems, and Mobile Computing.
Stephen Shaoyi Liao is currently a professor at the Department of Information Systems, City University of Hong Kong. He obtained his Ph.D degree in Information Systems from Aix-Marseille University, France. His main research areas include mobile
applications and Intelligent Transportation Systems.
Sherry Xiaoyun Sun received M.S. and Ph.D degrees in Management Information Systems from Eller College of Management, The University of Arizona, Tucson, Arizona,
USA. She is currently an assistant professor at the Department of Information Systems,
City University of Hong Kong. Her research mainly focuses on business process management, electronic commerce, and service oriented computing. Her research work
has been published in Information Systems Research, IEEE Transactions on Systems,
Man, and Cybernetics, Journal of Systems and Software, Information Systems Frontier,
Knowledge-Based Systems and others.