You are on page 1of 5

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 12, DECEMBER 2010, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 20

Comparison of Id3, Fuzzy Id3 and


Probabilistic Id3 Algorithms in the
Evaluation of Learning Achievements
Semra ERPOLAT and Ersoy ÖZ

Abstract— In inductive inference one of the widely used practical methods is learning with decision tree. The method is used at
discrete structure objective function that is shown as the decision tree of the learning function. In this study ID3, fuzzy ID3
(F_ID3) and probabilistic fuzzy ID3 (PF_ID3) decision tree algorithms are used in a comparative manner for evaluation of
learning achievements of students. For this purpose, test difficulty (high, middle, low), assignment time (long, average, short),
and score (very good, good, average, bad, very bad) attributes are taken for the tests (A, B, C, D, E) which are applied to
randomly selected students from determined classroom. According to the academic average students are classified as
“success” and “fail” and from the obtained results it was observed that the algorithms with fuzzy and probabilistic structures
given better results.

Index Terms—Decision problems, Fuzzy set, Interactive systems, Probabilistic algotithms.

——————————  ——————————

1 INTRODUCTION

D ECISION trees that have estimative and descriptive


properties have the widest range of use among the
classification models that are used in data mining
rithm developed by Quinlan based on entropy. The main
ideas behind the ID3 algorithm are [6]:
 Each non-leaf node of a decision tree corresponds to
due to their low cost, easy interpretability, ability to be an input attribute, and each arc to a possible value of
easily integrated with database systems and better relia- that attribute. A leaf node corresponds to the ex-
bility [8]. The rules they deduct can be comprehensibly pected value of the output attribute when the input
written, which is their most important property that attributes are described by the path from the root
makes them superior over other classification methods node to that leaf node.
(artificial neural networks, fuzzy logic, Bayes technique  In a “good” decision tree, each non-leaf node should
etc.). Moreover, while the rules of decision trees indicate correspond to the input attribute which is the most in-
certainty, the rules of other methods produce approx- formative about the output attribute amongst all the
imate results [7]. input attributes not yet considered in the path from
One of the most important problems in decision trees the root node to that node. This is because we would
is to know according to which criterion the division, in like to predict the output attribute using the smallest
other words branching, will be done after any root [9]. It possible number of questions on average.
is possible to divide the algorithms that correspond to a
different decision tree algorithm for every distinct crite- Entropy is used to determine how informative a par-
rion into subheads such as; algorithms based on entropy ticular input attribute is about the output attribute for a
(ID3, C4.5 and so on), classification and regression trees subset of the training data. Entropy is a measure of uncer-
(CART, towing, gini) and memory based classification tainty in communication systems introduced by [1]. It is
algorithms (k-nearest neighbors). fundamental in modern information theory.
The details of the ID3, F_ID3 and PF_ID3 algorithms The formulation of entropy, the measure of uncertain-
will be given in the following sections. ty in the system, concerning a set which is containing
values of any set S is shown in (1) [9].
N
2 METHOD
H (S )    p log
i 1
i 2 ( pi ) (1)
2.1 ID3 Algorithm
ID3 (Interactive Dichotomizer 3) is a decision tree algo- The more uncertain a receiver is about a source of mes-
sages, the more information that receiver will need in
———————————————— order to know what message has been sent. For example,
 Semra ERPOLAT, Department of Statistics, Mimar Sinan Fine Arts Uni- if a message source always sends exactly the same mes-
versity, Turkey. sage, the receiver does not need any information to know
 Ersoy ÖZ, Department of Technical Programs, Yildiz Technical Universi- what message has been sent, it’s always the same! The
ty, Turkey.

© 2010 Journal of Computing Press, NY, USA, ISSN 2151-9617


http://sites.google.com/site/journalofcomputing/
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 12, DECEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 21

entropy of such a source is zero: there is no uncertainty at 2.3 Probabilistic Fuzzy ID3 Algorithm
all. On the other hand, if a source can send n possible Another algorithm used to convert unstable classifiers
messages and each message occurs independently of the into stable structures is PF_ID3 which is made of the
preceding message with equal probability, then the un- combination of the ID3 and F_ID3 algorithms.
certainty of the receiver is maximized [6]. Probabilistic Fuzzy ID3 Algorithm [2]:
In order to measure the information which is obtained
1. Create a Root node that has a set of fuzzy data with
by dividing the database S according to test A, the ex-
membership value 1 that fits the condition of well-
pression given in (2) that is called the information gain
defined sample space.
measure is referred [9].
2. Execute the Fuzzy ID3 algorithm from step 2 to end.


Sv
G(S , A )  H ( S )  H (S v ) (2)
AndValues ( A )
S The entropy and the gain measure which are used in the
algorithm are shown in (5) and (6) respectively [3]. The
In order to decide which attribute to split upon, the criteria about vr and vn are also valid for this algorithm.
ID3 algorithm computes the information gain for each C
attribute, and selects the one with the highest gain [6]. H ob (S , A)    E
c 1
AC ( x) log 2 ( E(  AC ( x)))  (5)

2.2 Fuzzy ID3 Algorithm


S
Si
G(S , A)  H ob (S)  * H ob (Si ) (6)
ID3 is known as one of the best methods to building deci-
i
sion tree, but there exist two major difficults when used
3. Create a Root node that has a set of fuzzy data with
for fuzzy partition.
membership value 1 that fits the condition of well-
 ID3 requires features to have discrete values, so it is
defined sample space.
not able to deal with continuous data, which serious
4. Execute the Fuzzy ID3 algorithm from step 2 to end.
limits the range of its applications.
 ID3 algorithm is suitable for crisp partition. In order 2.4 Comparing the algorithms among ID3, FID3 and
to obtain fuzzy partition, the result need to be fuzzi- PFID3
fied. This is inconvenient. Because FID3 and PFID3 are based on ID3, these three
methodologies have similar algorithms. However, there
To overcome the above problems, we propose a fuzzy also exist some differences [2].
ID3 algorithm for fuzzy partition. The basic assumption Data representation: The data representation of ID3 is
of this approach is simple that each node in the decision crisp while for FID3 and PFID3, they are fuzzy, with con-
tree is represented with fuzzy set [5]. tinuous attributes. Moreover, the membership functions
ID3 algorithm contains too many unstable classifiers of PFID3 must satisfy the condition of well-defined sample
depending on small uncertainties in the training data. space. The sum of all the membership values for all data
One of the attempts to convert these unstable classifiers value xi must be equal to 1.
into stable structures is the F_ID3 algorithm which is
Termination criteria:
made of the combination of the ID3 algorithm and fuzzy
 ID3: if all the samples in a node belong to one class or
logic. Given H b (S)  H (S) , the fuzzy entropy and the gain
in other words, if the entropy equals to null, the tree
measure which are used in the algorithm are shown in (3) is terminated. Sometimes, people stop learning when
and (4) respectively [4]. the proportion of a class at the node is greater than or
N N
equal to a predefined threshold. This is called prun-
C  ij  ij ing. The pruned ID3 tree stops early because the re-
H b (S , A)    j
S
log 2
j
S
(3) dundant branches have been pruned.
i 1  FID3 & PFID3: there are three criteria’s.
N 1. If the proportion of the dataset of a class is greater
S
Sv
Gb (S , A)  H b (S)  *H b (Sv , A) (4) than or equal to a threshold vr
v A
2. If the number of a data set is less than another thre-
There are also two criteria used in expanding the tree in
shold vn
the F_ID3 algorithm, fuzziness control threshold value
3. If there are no more attributes at the node to be classi-
( vr ) and leaf decision threshold value ( vn ) , which are
fied
determined by the user. Tree expansion continues as long
If one of these three criteria’s is fulfilled, the learning is
as the quality groups in the subset are smaller than the vr
terminated.
value and the number of samples in any node is equal to Entropy:
or greater than the vn value.  ID3: (3).
 FID3 & PFID3: (5).
Reasoning: The reasoning of the classical decision tree
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 12, DECEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 22

begins from the root node of the tree, and then branch
one edge to test the attribute of the sub node. Repeat the 3.2 Evaluating the Learning Achievement
testing until the leaf node is reached. The result of ID3 is In the study, only the compared results for the “A test” of
the class attached to the leaf node. the ID3, F_ID3 and PF_ID3 algorithms that are used to
The reasoning of the fuzzy decision trees is different. It evaluate the learning achievement are given. Similar re-
does not branch one edge, but all the edges of the tree. It sults are obtained from the other tests.
begins from the root node through the branches to the In Table 2, the corresponding normal and fuzzy values
leaf nodes until all the leaf nodes have been tested. Each for the levels of each quality for the A test are given. The
normal values are used for ID3 algorithm whereas the
leaf node has various proportions of all the classes. In
fuzzy values for the F_ID3 algorithm obtained with the
other words, each leaf node has own certainties of the
help of the value ranges in Table 1 and (7) are also used
classes. The result is the aggregation of the certainties at for the PF_ID3 algorithm without any need to change
all the leaf nodes. since the sum of all values on every level of every quality
TABLE 2
3 FINDINGS THE NORMAL AND FUZZY VALUES THAT CORRESPOND TO
3.1 Regulation of the Data Set THE LEVELS OF QUALITIES

The use of ID3, F_ID3 and PF_ID3 algorithms in evaluat-


ing the learning achievements of students are compara-
tively examined in the study. For this purpose, five dif-
ferent tests A, B, C, D and E are carried out on 10 random-
ly selected students from a selected class. The students
are requested to grade the difficulty of the test (TD) as
high (H), middle (M), low (L), and for each test, to state
the amount of time they worked on that test (WT) as long
(L), average (A) or short (S). At the end of the test, the
success levels of the students according to the scores they
gained (SG) are graded as very good (VG), good (G), av-
erage (A), bad (B), very bad (VB). The students are di-
vided into two groups as “success” and “fail” depending
on their academic GPAs.
The levels and value ranges used to measure learning is 1.
achievements determined by the above mentioned quali-
TABLE 1 4 CONCLUSION
THE VALUES TAKEN BY THE SPECIFIED LEVELS OF THE QUALI-
TIES. The obtained findings for the ID3, F_ID3 and PF_ID3 al-
gorithms for test A are given in the table below.
TABLE 3
OBTAINED FINDINGS FOR THE ID3, F_ID3 AND PF_ID3 ALGO-
RITHMS

ties are shown in Table 1.


As the result of the trials, the best member function to
express the levels of the qualities for the F_ID3 and
PF_ID3 algorithms is determined to be the trapezoid giv-
en in (7).
*: The quality that gives the best gain.
x a
ba , axb
d  x
 cxd
 A ( x; a , b , c , d)   , (7) It is seen from the table that the best quality for all
dc
1 , bxc three methods is the WT. The methods that give the best
 gain for the WT quality is PF_ID3 with 0,716, then comes
 0 , x d , x  a
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 12, DECEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 23

F_ID3 and ID3 with the values 0,697 and 0,557 respective- thods for the A level of the SG quality is lower than 95%,
ly. The fuzziness control threshold value is taken as the expansion continues. The expansion stopped for the G
vr  %95 and the leaf decision threshold value is taken as and VG levels since the ratio of being successful is 100%
vn  %2. In other words, the expansion is stopped where TABLE 6
the ratio of the quality groups in the subset is smaller OBTAINED FINDINGS FOR THE SG QUALITY
than 95% and the number of samples in a node are small-
er than 2.
The examination is done only for the S and A levels of
the WT quality since they satisfy the specified criteria.
The obtained learning achievement results are given in
TABLE 4
OBTAINED FINDINGS FOR THE WT QUALITY FOR THE ID3,
F_ID3 AND PF_ID3 ALGORITHMS
in all three methods.
Therefore, the A level of the SG quality is selected after
the S level of the WT quality and the only remaining qual-
ity, TD, is examined. Only the M and H levels of the TD
quality are considered since they satisfy the leaf decision
threshold criterion. The results obtained are “fail” for the
M level and “success” for the H level.
It is observed from the results that the algorithms con-
Table 4. taining fuzzy and probabilistic structures give better re-
It is seen from the table that the ratio of successful sults. The decision tree reached for all three algorithms is
students for the S level of the WT quality is 90% in reality.
The methods that give the closest value to this ratio are
F_ID3 and PF_ID3 with a ratio of 92,3% and the ID3 me-
thod gives 73%. All the methods give with a ratio of 100%
the same results obtained in reality for the A level of the
WT quality. Therefore, the expansion is carried out since
the ratios obtained for the S level of the WT quality are
smaller than 95% whereas the expansion is stopped since
the ratios obtained for the A level are greater than 95%.
Hence, the entropy and gain values are calculated accord-
ing to the three methods only for the S level of the WT
quality. The obtained results are given in Table 5.
TABLE 5
OBTAINED FINDINGS FOR LEVEL S OF THE WT QUALITY Fig. 1. Final decision tree.
shown in Figure 1.

REFERENCES
[1] C.E. Shannon, “A mathematical theory of communication”, Bell
System Technical Journal, 27 pp.379–423 and 623–656, 1948.
[2] G. Liang, “A Comparative Study of Three Decision Tree Algo-
rithms: ID3, Fuzzy ID3 and Probabilistic Fuzzy ID3”, Bachelor
Thesis, Informatics & Economics, Erasmus University
Rotterdam, Rotterdam, the Netherlands, 2005.
[3] J.V. Berg and U. Kaymak, “On the Notion of Statistical Fuzzy
*: The quality that gives the best gain.
Entropy”, Soft Methodology and Random Information Systems,
Advances in Soft Computing, Heidelberg: Physica Verlag, pp.535–
542, 2004.
It is seen from the table that the best quality for all
[4] M. Umano, H. Okamoto, I. Hatono, H. Tamura, F. Kawachi, S.
three methods is the SG. The methods that give the best
Umedzu and J. Kinoshita, “Fuzzy Decision Trees by Fuzzy ID3
gain for the WT quality are PF_ID3 and ID3 with 0,306
algorithm and Its Application to Diagnosis Systems”, In
and F_ID3 method comes after them with the value 0,092.
Proceedings of the third IEEE Conference on Fuzzy Systems,
The expansion is carried out only for the A, G and VG
Orlando, vol. 3, pp.2113-2118,1994.
levels of the SG quality since they satisfy the vr and vn
[5] Q. Yun-Tao and X. Wei-Xin, “Building Fuzzy Neural Classifiers
criteria. Obtained results are given in Table 6.
By Fuzzy ID3 Algorithm”, Proceedings of ICSP, 1996.
Since the ratio of being successful in all the three me-
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 12, DECEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 24

[6] The ID3 Decision Tree Algorithm, CSE5230 Tutorial, Monash


University, Faculty of Informatioan Technology, 2004,
www.dss.dpem.tuc.gr/pdf/decisiontreesTute.pdf, (04.11.2010).
[7] T.M. Mitchell, Machine Learning, Singapore: MIT Press and The
McGraw-Hill Companies Inc., 1997.
[8] V. Sugumaran, V. Muralidharan and K.I. Ramachandran, “Fea-
ture Selection Using Decision Tree and Classification Through
Proximal Support Vector Machine for Fault Diagnostics of
Roller Bearing”, Mechanical Systems and Signal Processing, 21(2),
pp.930-942, 2007.
[9] Y. Özkan, Veri Madenciliği Yöntemleri, İstanbul, Türkiye: Papatya
Yayıncılık, 2008.

Semra ERPOLAT has obtained her Ph.D. degrees in Operations


Research at Marmara University in Turkey in 2009 and Statistics at
Mimar Sinan Fine Arts University in Turkey in 2007. His works are
about neural networks, data mining and decision support systems.
She is assistant professor doctor in the Statistics department at Mi-
mar Sinan Fine Arts University.

Ersoy ÖZ has obtained his Ph.D. degree in Operations Research at


Marmara University in Turkey in 2009. His works are about markov
chains, hidden markov models and geometric programming. He is
lecturer in the Computer Programming department at Yildiz Technic-
al University.

You might also like