Professional Documents
Culture Documents
2
Dean,
School of Science and Humanities,
Kongu Engineering College,
Perundurai, Erode
Abstract
Association rule mining aims to explore large Keywords: Association Rule Mining, Weighted
transaction databases for association rules. Association Rule Mining, HITS, Online HITS,
Classical Association Rule Mining (ARM) Dynamic Content
model assumes that all items have the same
significance without taking their weight into 1. Introduction
account. It also ignores the difference between
the transactions and importance of each and Association Rule Mining aims to explore large
every itemsets. But, the Weighted Association transaction databases for association rules, which
Rule Mining (WARM) does not work on may reveal the implicit relationships among the
databases with only binary attributes. It makes data attributes [1]. It has number of practical
use of the importance of each itemset and applications, including classification, text
transaction. WARM requires each item to be mining, Web log analysis, Share Market and
given weight to reflect their importance to the recommendation systems. The classical model of
user. The weights may correspond to special association rule mining employs the support
promotions on some products, or the profitability measure, which treats every transaction equally.
of different items. In contrast, different transactions have different
This research work first focused on a weight weights in real-life data sets. For example, in the
assignment based on a directed graph where market basket data, each transaction is recorded
nodes denote items and links represent with some profit. Much effort has been dedicated
association rules. A generalized version of HITS to association rule mining with preassigned
is applied to the graph to rank the items, where weights. However, most data types do not come
all nodes and links are allowed to have weights. with such preassigned weights, such as Web site
This research then uses enhanced HITS click-stream data. There should be some notion
algorithm by developing an online eigenvector of importance in those data. For instance,
calculation method that can compute the results transactions with a large amount of items should
of mutual reinforcement voting in case of be considered more important than transactions
frequent updates. For Example in Share Market with only one item. Current methods, though, are
Shares price may go down or up. So we need to not able to estimate this type of importance and
carefully watch the market and our association adjust the mining results by emphasizing the
rule mining has to produce the items that have important transactions [1].
undergone frequent changes. These are done by The concept of association rule mining proposes
estimating the upper bound of perturbation and the support-confidence measurement framework
postponing of the updates whenever possible. and reduced association rule mining to the
Next we prove that enhanced algorithm is more discovery of frequent item sets. WARM
efficient than the original HITS under the context generalizes the traditional model to the case
of dynamic data. where items have weights. WARM requires for
each item to be given weight to reflect their
IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 2, No 5, March 2010 17
ISSN (Online): 1694-0784
ISSN (Print): 1694-0814
importance to the user. The weights may the principal eigenvectors of the matrix ATA and
correspond to the profitability of different items. AAT, respectively. Even a single update to the
As more user access will correspond to a perturbation of
data is gathered, which are frequently getting the A matrix. Depending on the weight function
updated, the construction of the graph should be selected, it can change the behaviour of a single
dynamic instead of static. Using Online Hits element or a row of elements of the matrix A. In
algorithm, the graph can be constructed either case the changes to the behaviour are
dynamically and the cost can be reduced by local. These changes in behaviour may cause
postponing updates whenever possible. By variations to the principal eigenvector of ATA
calculating Eigen values we are enforcing the (and AAT).If we can find the relationship
mutual reinforcement relationship between the between the variation of x and y and the
items. behavioural changes to the matrix A, we can
check each update to see whether it will cause
This HITS algorithm is suitable only for static too much changes to x and y. If the change is
content. within acceptable precision, we can postpone
1) They work well in environments where no applying the update thus avoid running Online
dynamic updation is possible. HITS for it. When the accumulated updates
2) They fail to capture the rich informations that cause too many changes to the behaviour, we
lie within the patterns of user access or in the apply all the changes together and run Online
structure that can be defined by user group HITS once. In this way the cost of running the
implicitly. algorithm frequently can be reduced. Such an
algorithm is called as Online HITS. Another
In this paper, we propose to replace the HITS advantage of this approach is that service of user
algorithm with Online HITS algorithm which queries and updating the matrix A and running of
reduces the cost by postponing the updates Online HITS can be made separate. The system
whenever possible and makes it more suitable for can update the matrix A and run Online HITS in
Dynamic environments. HITS algorithm background, and continue servicing user queries
normally used for web pages, but it can also be with old results that we are confident to be
used for transactional datasets from which we within certain range from the latest ones. Users
can calculate the hub and authorities, based on can enjoy the service without any disturbances.
which the graph is constructed. The general For us, the continued accumulation of data and
HITS algorithm is too costly to run on every constructing the graph based on that is an
update. When the updates are accumulated we inherent feature and the results we produce are
run online HITS once. This way cost is reduced. the best guess based on the data we have so far.
Another advantage of online HITS is that service It is too good for the results to undergo dramatic
of user queries, updating the Eigen vector of the change, which reflects the update of the latest
given Matrix A and running of Online HITS can trends and techniques about the world. Rather,
be made as separate activities. we are interested in the bound of the change so
The rest of the paper is arranged as follows: that we can perform the tasks more efficiently. In
section II Introduction to the basic concepts of addition, the conclusions are considered only to
Online HITS algorithm; Section III gives an apply to unweighted graphs represented by
evaluation and analysis of the HITS algorithm in adjacency matrices. Online HITS algorithm uses
dynamic Environment; section IV addresses the 2 main concepts, namely the computations of
evaluation and analysis of Online HITS eigengap and perturbation. They have to be
algorithm in Dynamic Environments. Section V performed efficiently otherwise the cost of
concludes the paper. computing them would affect the saving of not
running HITS. They will be addressed in the
following subsections.
2. Basic concept of Online HITS algorithm
3. Computation of Eigengap
Consider the Matrix A, which is created based
on the transactions and the items within each of Eigengap is nothing but the difference between
those transactions. It will be having the values of the largest and second largest Eigen values
the rules that are going to be considered for namely λ1 and λ2. The original HITS algorithm
mining. The rankings say, x and y, correspond to is essentially a power method to compute the
principal eigenvector of S. It can be revised
IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 2, No 5, March 2010 18
ISSN (Online): 1694-0784
ISSN (Print): 1694-0814
easily, without adding complexity, to produce λ1 Table1: Results of Complaint database using
and ¸ λ2 as byproducts. HITS Algorithm
Two modifications to the original HITS
algorithm are introduced: Complaint Support
1. Find the two eigenvectors λ1 and λ2, instead Type Values
of finding only the principal eigenvector. This Complaint 1 0.97
can be done by using the block power method. Complaint 2 0.95
Initially, start with two orthogonal vectors,
Complaint 3 0.98
multiply them all by S, then apply Gram-
Schmidt to orthogonalize them. This is a single Complaint 4 0.97
step. Iterate until they converge. In the Hits Complaint 5 0.98
algorithm replace this step with the step that Complaint 6 0.99
calculates principle Eigen vector. Complaint 7 1.00
2. HITS ensure convergence by normalizing the Complaint 8 0.95
vector at each step to unit length. Instead, we
normalize each vector by dividing them by their
first non-zero element. They still converge to the
two eigenvectors and the scaling factors
converge to λ1 and λ2.
The above modifications introduce extra
computation of one eigenvector and Gram-
Schmidt orthogonalization. The Extra
computation doubles the work of HITS and
the order of doing orthogonalization is O(n).
The total complexity is the same as HITS
which is O (mn).
that the variation is within acceptable threshold generalizations are in two directions. First, we
level of precision, we can postpone applying the replaced the construction of static graph to the
update thus avoid running Online HITS for it. construction of dynamic graph by finding out the
Thus the Online HITS takes into account the Eigengap. Second, we created an online eigenvector
calculation method that can compute the results of
current variations that happen dynamically, and mutual reinforcement voting efficiently in case of
the algorithm reflects a true dramatically frequent updates by estimating the upper bound of
changed, updated system knowledge of the perturbation and postponing applying the updates
current world. Online HITS constantly monitors whenever possible. Ie. Till the variations are within
the changes and performs operations. The results the applicable limit, we won’t run the Online HITS
of our test are shown in the following figures. algorithm. Both theoretical analysis and experiments
show that our generalized online algorithm is more
efficient than the original HITS under the context of
Table2: Results of Complaint database using dynamic data.
Online HITS Algorithm We are going to enhance the association rule for
dynamic content based on fuzzy logic and mutual
reinforcement voting.
Complaint Support
Type Values References
Complaint 1 0.98
Complaint 2 0.96 [1]. Mining Weighted Association Rules without
Complaint 3 0.99 Preassigned Weights. Ke Sun and Fengshan Bai,
Complaint 4 0.99 IEEE transactions on knowledge and data
Complaint 5 0.99 engineering, vol. 20, no. 4, april 2008
Complaint 6 0.99 [2]. Mining association rules with weighted
Complaint 7 1.00 items Cai, C.H.; Fu, A.W.C.; Cheng, C.H.;
Kwong, W.W. Database Engineering and
Complaint 8 0.97 Applications Symposium, 1998. Proceedings.
IDEASapos;98. International Volume, Issue, 8-
For the above table the figure is shown as below: 10 Jul 1998 Page(s):68 - 77
[3]. Utility Sentient Frequent Itemset Mining and
Online HITS
Association Rule Mining: A Literature Survey
1.01 and Comparative Study. S.Shankar,
1 T.Purusothaman. International Journal of Soft
Support Values
8
nt
nt
nt
nt
nt
nt
nt
nt
ai
ai
ai
ai
ai
ai
ai
pl
pl
pl
pl
pl
pl
pl
pl
m
Co
Co
Co
Co
Co
Co
Co