Professional Documents
Culture Documents
RESEARCH ARTICLE
OPEN ACCESS
ABSTRACT
Time-honored association rule mining and frequent Twitter message cannot meet the demands arising from
some real applications. By considering the different types user of individual items as utilities, utility mining
focuses on identifying the itemsets with high utilities. Recently, high utility Twitter mesages is one of the most
important research issues in data mining. In this paper, present a utility mining using an algorithm named
Generalized Linear Models anothername called Simple High Utility Algorithm (SHU-Algorithm) to efficiently
prune down the number of candidates and to obtain the complete set of high utility itemsets. If we are given a
large transactional Social Media consisting of different transactions this find all the high utility items efficiently
using the proposed techniques. The proposed algorithm uses bottom up approach, where high utility items are
extended one item at a time. This algorithm is more efficient compared to other state -of-the-art algorithms
especially when transactions keep on adding to the Social Media from time to time. With the proposed SHUAlgorithm, incremental data Twitter mining can be done efficiently to provide the ability to use previous data
structures in order to reduce unnecessary calculations when a Social Media is updated tweets, or when the
minimum threshold is changed.
Keywords:- Twiiter,High utility mining, data mining, minimum utility, simple high utility algorithm, utility
mining, Generalized Linear Models .
I.
INTRODUCTION
ISSN: 2347-8578
www.ijcstjournal.org
Page 78
International Journal of Computer Science Trends and Technology (IJCST) Volume 4 Issue 3, May Jun 2016
business. This gives the motivation to develop a
mining model to discover the itemsets contributing
to the majority of the profit.
B.High Utility Mining
Recently a few utility mining models [5]
were defined to discover more important
knowledge from a Social Media. Utility mining is
concerned with both the purchased quantities of
items and unit profit of each item. So by utility
mining several important business decisions like
what are the items to be put on display, how the
items can be arranged in shelves, how the coupons
can be designed etc can be made. Mining high
utility items from Social Medias is a difficult task
since download closure property [1] applicable in
frequent Twitter mining does not hold in this i.e., a
superset of a low utility itemset may be a high
utility itemset. Hence finding all the high utility
itemsets from a transactional Social Media without
any miss is a big challenge in high utility mining.
C. Tree Based Approach
Existing studies [30] using tree based
approach, first build a tree scanning the Social
Media many times and using the minimum utility
value. This tree is then mined to find high utility
itemsets. But if the minimum utility value is
changed, or if transactions are added from time to
time, the tree has to be built starting all over again
and then mined. To address these issues we
propose a novel SHU- algorithm, with which we
can use the previous data structures and avoid
unnecessary repeated calculations , when the Social
Media is updated or the mining threshold is
changed. The rest of this paper is organized as
follows: In section 2, we introduce the related work
for high utility mining and in section 3, the
proposed algorithm is explained in details.
II.
RESEARCH METHODOLOGY
ISSN: 2347-8578
www.ijcstjournal.org
Page 79
International Journal of Computer Science Trends and Technology (IJCST) Volume 4 Issue 3, May Jun 2016
order to reduce unnecessary calculations when a
Social Media is updated, or when the minimum
threshold is changed.
Item
TID
T1
T2
12
T3
15
T4
T5
10
T6
T7
T8
A.Algorithm(SHU)
Algorithm Largest Number Twitter User To
transcation TM
Input: A list of numbers L ->user List TU .
Output: The largest number in the list L>Language Twitter.
if L.size = 0 return null Tweet
largest L[0] Return of Relevancy
Profit( Rs)
(per unit)
15
Media
ISSN: 2347-8578
(showing
different
www.ijcstjournal.org
Page 80
International Journal of Computer Science Trends and Technology (IJCST) Volume 4 Issue 3, May Jun 2016
C. Proposed Algorithm:
Simple High Utility
Algorithm (SHU-Algorithm)
In the first scan of the Social Media we are
collecting frequency and profit of different items
and filling the entries in the frequency table, profit
table and total frequency table as shown in Table 3,
4 and 5 respectively. Frequency table stores the
total frequency of occurrences of each item. In the
example Social Media there are 5 items, a, b,
c, d and e present in different transactions . The
item a is present in transactions T1, T2, T4, T7
and T8 with the quantity of a purchased as 2,3,4,1
and 2 respectively. So total frequency of
occurrence of a is 2+3+4+1+2 =12. Similarly
occurrences of b, c, d and e are 19, 32, 13
and 22 respectively.
Table 3: Frequency Table (showing frequency of
each item in the whole Social Media)
Item
Profit( Rs)
(per unit)
12
19
32
13
22
Profit( Rs)
(per unit)
24
285
96
104
154
ISSN: 2347-8578
www.ijcstjournal.org
Page 81
International Journal of Computer Science Trends and Technology (IJCST) Volume 4 Issue 3, May Jun 2016
Frequency of {ac} = (0:0) + (3:12) = (3:12)
Frequency of {ad} = (0:0) + (3:4) = (3:4)
Frequency of {ae} = (0:0) + (3:2) = (3:2)
Frequency of {cd} = (0:0) + (12:4) = (12:4)
Frequency of {ce} = (0:0) + (12:2) = (12:2)
Frequency of {de} = (0:0) + (4:2) = (4:2)
Consider the next transaction T3 from Table1. T3
consists of items c and e with frequencies 15
and 3. The super sets possible with c and e is
only {ce}. So under {ce} store old value of {ce}
plus (15:3); i.e., (12:2) + (15:3) = (27:5).
Consider the next transaction T4 from Table1. T4
consists of item a only and so no super set with
cardinality 2 is possible with item a only.Consider
the next transaction T5 from Table1. T5 consists of
items b,d and e with frequencies 10, 8 and 9.
The super sets possible with b, d and e are
{bd}, {be}and {de}.
Frequency of {bd} = (0:0) + (10:8) = (10:8);
Fig. 1 Twitter Algorithms Analysis
ISSN: 2347-8578
www.ijcstjournal.org
Page 82
International Journal of Computer Science Trends and Technology (IJCST) Volume 4 Issue 3, May Jun 2016
ab
ac
ad
ae
bc
bd
be
cd
ce
De
frequency
(2:2)
(4:14)
(5:5)
(6:6)
(7:3)
(10:8)
(17:13)
(12:4)
(32:10)
(13:14)
Now first we will find all the high utility itemsets containing 1 item, 2 items, etc. From Table 4 i.e., Profit Table
showing profit in rupees of each item in the whole Social Media, we can find all the high utility items consisting
of only one item. Out of the items a, b, c, d and e only b is a h igh utility item, since its utility
(U(b)=285) is greater than minimum utility i.e., 200.
The items b, c, d and e are low utility items with utilities 24, 96, 104 and 154 respectively which are all
less than 200. Now we need to consider itemset containing 2 items. Consider each of the supersets and their
frequencies listed in Table5, they are ab, ac, ad, etc. Considering each of these one by one, f (ab) i.e.,
frequency of ab in the whole Social Media is (2:2). The quantity of a is 2, quantity of b is 2 and from Utility
Table, unit profit of a is 2 and unit profit of b is 15. From this utility of ab denoted as U(ab)=2*2 + 2*15
= 34, which is <200; so ab is a low utility itemset. Consider the next itemset ac. From Table 5, f (ac) =
(4:14).
ISSN: 2347-8578
www.ijcstjournal.org
Page 83
International Journal of Computer Science Trends and Technology (IJCST) Volume 4 Issue 3, May Jun 2016
ISSN: 2347-8578
www.ijcstjournal.org
Page 84
International Journal of Computer Science Trends and Technology (IJCST) Volume 4 Issue 3, May Jun 2016
D.Discussion
ISSN: 2347-8578
www.ijcstjournal.org
Page 85
International Journal of Computer Science Trends and Technology (IJCST) Volume 4 Issue 3, May Jun 2016
REFERENCES
[1]
IV. CONCLUSION
In this paper, we have proposed a SHU-Algorithm
for finding all the high utility itemsets present in a
transactional Social Media. This SHU algorithm
can be then used for efficient mining of all the high
utility itemsets from transaction Social Media.
PHUIs can be efficiently generated using only one
Social Media scan. Moreover, we have developed
an approach which can be applied to a Social
Media to which transactions keep on adding from
time to time. Whenever a new transaction is added
we need to update only a few corresponding values
in the tables 3 and 5. Whenever minimum threshold
is changed or a few transactions are added to a
Social Media, we can still use the previous values
with only a few changes i.e., incremental data
mining can be done efficiently to provide the
ability to use previous data structures in order to
ISSN: 2347-8578
[2]
[3]
www.ijcstjournal.org
Page 86
International Journal of Computer Science Trends and Technology (IJCST) Volume 4 Issue 3, May Jun 2016
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
ISSN: 2347-8578
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
www.ijcstjournal.org
Page 87
International Journal of Computer Science Trends and Technology (IJCST) Volume 4 Issue 3, May Jun 2016
[20]
[21]
[22]
ISSN: 2347-8578
[23]
[24]
www.ijcstjournal.org
Page 88