You are on page 1of 4

A New Measure for Comparing Stopping Criteria of Fuzzy Decision Tree

Mohsen Zeinalkhani
Department of Computer Engineering
Shahid Bahonar University of Kerman
Kerman, Iran
zeinalkhani@gmail.com
Mahdi Eftekhari
Department of Computer Engineering
Shahid Bahonar University of Kerman
Kerman, Iran
m.eftekhari@mail.uk.ac.ir


AbstractFuzzy decision trees (FDT) successfully merged
approximate reasoning offered by fuzzy representation and
decision trees, while preserving advantages of both:
uncertainty handling and gradual processing of the former
with the comprehensibility, popularity and ease of application
of the latter. Size and accuracy of FDT can be controlled by
stopping criteria. Comparing stopping criteria based on
accuracy of generated FDTs is the simplest comparison method
which doesnt consider all aspects of them. In this paper, a new
measure, named Growth Control Capability (GCC), for
comparing stopping criteria is introduced which determines its
ability to control the number of node expansions by changing
its threshold value. Different stopping criteria are used for
FDT induction and are compared based on proposed measure.
The obtained results show that the number of instances
stopping criterion can control FDT growth better than the
other ones. Therefore, one can use this stopping criterion in
order to produce an FDT with predefined number of nodes.
Keywords-classification; fuzzy decision tree; stopping criteria
I. INTRODUCTION
Crisp decision tree classifiers [1-4] partition the instance
space in a recursive manner. The decision tree induction
algorithm starts with all the training data in the root node,
and recursively selects a test attribute from available ones to
split the node. The selected test attribute should maximize
some measure, such as the information gain. The splitting
process stops based on stopping criteria to avoid
overspecialization. Constructed tree can be used for
classification when coupled with a simple inference
procedure match a new instance against the tree, select the
leaf that matches it, and report the decision associated with
that leaf [5].
Fuzzy sets provide bases for fuzzy representation. Fuzzy
sets and fuzzy logic allow the modeling of language related
uncertainties. Fuzzy sets along with fuzzy logic and
approximate reasoning methods provide the ability to model
fine knowledge details. Accordingly, fuzzy representation is
becoming increasingly popular in dealing with problems of
uncertainty, noise and inexact data. Approximate reasoning
offered by fuzzy representation combined with decision trees
to build a fuzzy decision trees in order to preserve
advantages of both: uncertainty handling and gradual
processing of the former with the comprehensibility,
popularity and ease of application of the latter [6].
Fuzzy decision trees allow instances to follow down
multiple branches simultaneously, with different satisfaction
degrees ranged on [0, 1]. This property leads to presence of
an instance in many leaves with different satisfaction degree
ranged on [0, 1]. To implement these characteristics, fuzzy
decision trees use fuzzy linguistic terms to specify branch
condition of nodes. This fact is actually advantageous as it
provides more graceful behavior, especially when dealing
with noise or incomplete information. However, from a
computational point of view, the fuzzy decision tree
induction is slower than crisp one. This is the price paid for
having a more accurate but still interpretable classifier [7, 8].
Making a trade-off between complexity (number of
nodes) and accuracy of fuzzy decision tree is the significant
problem in FDT induction. Fuzzy decision trees with large
number of nodes lead to poor generalization performance.
Moreover, they require more computation to classify a new
instance. In the literature, two general approaches employed
to achieve this trade-off; post-pruning and pre-pruning
methodologies. Post-pruning methodologies allow the FDT
to overfit the training data and then the over-fitted tree is cut
back into a smaller one by removing sub-branches that are
not contributing to the generalization accuracy. On the other
hand, pre-pruning methodologies employ stopping criteria
for this purpose and control the size of FDT by defining a
threshold for some stopping criterion. The FDT growing
phase continues until a stopping criterion is triggered.
Employing tight stopping criteria tends to create small and
under-fitted fuzzy decision trees. On the other hand, using
loose stopping criteria tends to generate large decision trees
that are over-fitted to the training data. The main
advantageous of pre-pruning methods are their less
computational requirement and the most significant problem
of them are finding a proper threshold value.
The threshold value defined on a stopping criterion
controls the size of fuzzy decision tree and therefore controls
its accuracy. Various threshold values lead to FDTs of
different sizes. But the question is whether the FDTs of
arbitrary size can be generated by changing the threshold
value or not. This paper tries to answer this question by
defining a Growth Control Capability (GCC) measure which
determines the ability of each stopping criteria to control the
size of fuzzy decision tree. The remainder of this paper is
organized as follows. Section 2 briefly describes the fuzzy
decision tree induction method. Section 3 presents the
proposed growth control capability measure. The








Figure 1. A typical expanded node in fuzzy decision tree.
experimental results are provided in section 4. Finally,
section 5 concludes this manuscript.
II. FUZZY DECISION TREE INDUCTION
Fuzzy decision tree induction has two major components:
a procedure for fuzzy decision tree construction, and an
inference procedure to assign class label to new instances.
This section briefly reviews the Fuzzy ID3 (FID3) induction
algorithm which is the most commonly used method for
FDT induction. FID3 is the fuzzy extension of well-known
ID3 (the crisp decision tree induction algorithm). The
training data used in FDT induction should provide two more
information than usual datasets (which only provide the
value of each attribute for each instance); 1) the predefined
membership functions and, 2) membership degree of each
example. The predefined membership functions
superimposed on attribute domains are the branching
conditions in the generated FDTs. The membership grade of
example determines the degree of its contribution in the
dataset. Each node in the FDT has a dataset. For all nodes,
the data may be having the same attribute values and
membership functions but different membership grades. For
such a dataset S, the number of examples (i.e. |S|) is defined
equal to the sum of all examples membership grade.
FID3 starts with a dataset with predefined membership
functions in which the membership grade of examples are
equal to one. Then, the root of tree containing this dataset is
created. In the root node, branching attribute is selected
based on the FIG. the attribute with maximum FIG value is
the most suitable one for branching attribute. After the
branching attribute A
b
is found, a new child node for each
membership function (defined on it) is created. The dataset
of these new child nodes are the same as their parent node
except membership grade of examples. Suppose p
s
(X

) is
the membership grade of i
th
example of dataset S, X

=
|x

(1)
, x

(2)
, , x

(k)
, y

]
T
, in which the x

(])
is the value of j
th

attribute and y

is the class label of X

. And suppose S is the


dataset of child node corresponding to a membership
function F (F is defined on the domain of branching attribute
A
b
). The membership grade of example X

in the S is
calculated as:
p
Si
(X

) = p
S
(X

) p
P
(x

(b)
) (1)
Where p
P
(x

(b)
) is the membership degree of b
th
attribute
value of example X

in the membership function F. This


calculation is performed for all examples in each child node.
The above process is repeated for each node which can be
expanded until the stopping criteria is triggered.
The branching attribute in the parent node is selected
based on FIG measure. The FIG of attribute A

relative to
dataset S is defined as:
FI0(S, A

) = FE(S) w
]
FE(S
]
)

i
]=1
(2)
Where the FE(S) is the fuzzy entropy of dataset S, r

is
the number of membership function defined on domain of
attribute A

(or number of child nodes generated after


expansion of parent node based on attribute A

), S
]
is the
dataset of j
th
child node generated from parent node
expansion using attribute A

, and w
]
is the fraction of
examples which belong to j
th
child node. w
]
is defined as:
w
]
=
|S
]
|
|S
k
|
r
i
k=1
(3)
Where the |S
]
| is the number of examples in the j
th
child
node and r

is the number of child nodes. Fuzzy entropy of


dataset S is defined as:
FE(S) =
S
j=c
i

|S|
log
2
S
j=c
i

|S|
m
=1
(4)
Where m is the number of class labels, |S| is the number
of examples of dataset S, and |S
=c
i
| is the number of
examples of S which belong to class i. A typical node
expansion is shown in Fig. 1. In this figure F

is the i
th

membership function defined on branching attribute A
b
.
Other methods for selecting branching attribute can be found
in [7-10]. After the growing phase of FDT is terminated, for
each leaf node, the fraction of examples belonging to each
class is assigned as a label of that class.
III. PROPOSED MEASURE
Finding a proper value for threshold of stopping criteria
is the significant problem of them. This problem arises from
the fact that by knowing the threshold value, predicting the
number of nodes in the generated FDT is not possible.
Knowing the approximate number of nodes in the generated
FDT helps to avoid over-fitting or under-fitting. Because of
this problem, the threshold value is determined by trial and
error approach. Whether the FDTs of arbitrary size can be
generated by changing the value of threshold, this is the
question considered in this section.
Each node expansion adds more than one child node to
FDT. For example, when each node expansion generates two
child nodes, the number of nodes of FDT is one of the values

S
Paient noue
A
b
is F
1
A
b
is F
2
A
b
is F

i

S

i

CilJ r

S
2

CilJ 2

S
1

CilJ 1
TABLE I. NUMBER OF NODES AND NUMBER OF INTERNAL NODES
OF DIFFERENT FDTS
Number of internal Nodes Number of nodes
0 1
1 3
10 18
18 30
19 32
20 33
24 39
27 44
29 47
30 48
46 80
47 81
49 84
62 104

TABLE II. PROPERTIES OF DATASETS
Dataset Number of
Instances
Number of
Attributes
Number of
Classes
Wine 178 13 3
Breast
Cancer
699 10 2
Breast
Tissue
106 9 6

1, 3, 5, ... . For this reason generating the FDTs of arbitrary
size is not possible. Number of new nodes added to FDT by
changing the threshold value of stopping criteria determined
by two reasons; number of node expansions and number of
child nodes generated from each expansion. Based on these
explanations, the above question can change to whether the
FDTs of arbitrary number of expanded nodes can be
generated by changing the value of threshold.
There are two kind of stopping criteria used for FDT
induction. In the first kind, lower values of threshold leads to
FDTs with small number of nodes and higher values of
threshold causes the generation of FDTs with large number
of nodes. There is an inverse relation between threshold
value and size of FDT in the second kind. Namely, lower
values of threshold leads to FDTs with large number of
nodes and small trees are generated if high values of
threshold are employed. The accuracy stopping criterion is a
sample of the first kind and the number of instances stopping
criterion is a sample of the second type. For ease of
understanding, the reminder of this section is based on the
stopping criteria of the first kind. The generalization to a
second type is quite straightforward.
The value of stopping criterion for a special node shows
some characteristics of that node. For specific threshold
value T, all the nodes with stopping criterion value lower
than T should be expanded. For a constructed FDT with a
threshold value T, applying the minimum amount of increase
in T such that only one leaf in the current tree violate the new
threshold value and using this new threshold value leads to
minimum increase in the number of nodes (and also
minimum increase in the number of expanded nodes). The
details of this process can be found in [11]. In the same FDT,
performing these operations for different stopping criterion
result in different number of node expansions. The number
of expanded nodes depends on the utilized stopping criterion.
This characteristic of stopping criterion is considered as its
Growth Control Capability (GCC). The proposed measure,
GCC, for a stopping criterion determines the minimum
number of nodes which can be expanded by changing its
threshold value. GCC can give any real value greater than or
equal to one. FDTs of arbitrary number of expanded nodes
can be generated if we have a control on the growth of
nodes. For a particular stopping criterion, this purpose is
achieved when the value of GCC is equal to one. The
stopping criterion, which its GCC is close to one, has
potential to generate FDTs of arbitrary (user defined) number
of expanded nodes. The rest of this section describes the
method for calculating GCC value.
The GCC value is calculated based on the number of
FDTs of different sizes which can be generated by changing
the threshold value. Suppose that the Largest Not over-Fitted
FDT is abbreviated as LNF. For a stopping criterion (SC) the
GCC is calculated as:
uCC(SC) =
g(SC)
](SC)
(5)
Where g(SC) is the number of internal nodes of LNF
constructed by employing SC as a stopping criterion. (SC)
is the number of FDTs of different sizes which can be
generated by changing the threshold value of SC such that
their sizes in terms of number of internal nodes are less than
or equal to that of LNF. Table I shows the number of nodes
and number of internal nodes of different FDTs generated by
changing the threshold value of accuracy stopping criterion.
In this example the LNF supposed to have 104 nodes. This
means that the FDTs with more than 104 nodes are supposed
to be over-fitted.
In this example the 62 of total 104 nodes of LNF are
internal nodes. In other words, the g(occurocy) is equal to
62. The (occurocy) also is equal to 14 because there are
14 different FDTs which their number of internal nodes are
less than or equal to LNFs size (i.e. FDTs of size 1, 3, 18,
). Therefore the uCC(accuiacy) is equal to 4.42 which
means for this example, by changing the threshold value of
accuracy, on the average at least 4.42 node are expanded.
IV. EXPERIMENTAL RESULTS
In this section the GCC measure is calculated for
different stopping criteria including: 1) accuracy, 2) number
of instances, 3) fuzzy information gain, 4) fuzzy entropy and
5) tree depth. These stopping criteria are used for FDT
induction for three datasets taken from UCI machine
learning dataset repository [12]. These datasets and their
properties are listed in table II.
The FDTs with more than 100 nodes are supposed to be
over-fitted. On the domain of each attribute eight triangular
membership functions are considered. In order to make the
GCC value more robust and more reliable 10 fold cross
validation is repeated 10 times and 100 FDTs are constructed
(i.e. one FDT in each fold). Then, the value of GCC is set to
the mean of these 100 GCC values. Obtained results are
listed in table III in the following. In this table, STD stands
for standard deviation. As it is apparent, the number of
instances as a stopping criterion achieves the best average
GCC value while its STD is zero. Consequently, it has a
better control on FDT growth than the others. In other words,
when one utilizes this stopping criterion for FDT
construction, changing the value of threshold results in
generating one node in average. Thus, using number of
instances as stopping criterion is advised due to its ability to
produce FDTs with predefined number of nodes.
V. CONCLUSIONS
In this paper the ability of each stopping criterion to
control the size of FDT in the induction process in studied.
Stopping criteria can have a control on the size of FDT by
their threshold values. Changing the threshold value changes
the number of nodes in the FDT. In order to determine the
capability of each stopping criteria to generate FDT of user
defined size by changing the threshold value, the GCC
(Growth Control Capability) measure is introduced. This
measure is calculated for some datasets taken from the
literature and the obtained results shows that the number of
instances as a stopping criterion has the best control on the
FDT growth and can generate FDTs with arbitrary number of
internal nodes.

REFERENCES
[1] L. Rokach, O. Maimon, Data mining with decision trees: theory and
applications, World Scientific, 2008.
[2] L. Rokach, O. Maimon, Top-down induction of decision tree
classsifiers-a survey, IEEE Transactions on Systems, Man, and
Cybernetics-Part C: Applications and Reviews, Vol. 35, 2005, pp.
476-487.
[3] T. M. Mitchell, Machine Learning, McGraw-Hill, 1997.
[4] H. Zhang, B. H. Singer, Recursive partitioning and application, 2nd
ed., Springer, 2010.
[5] C. Z. Janikow, K. Kawa, Fuzzy decision tree FID, Annual Meeting
of the North American Fuzzy Information Processing Society, 2005,
pp. 379-384.
[6] C. Z. Janikow, Fuzzy decision trees: issues and methods, IEEE
Transactions on Systems, Man and Cybernetics-Part B: Cybernetics,
Vol. 28, 1998, pp. 1-14.
[7] Y. L. Chen, T. Wang, B. S. Wang, Z. J. Li, A survey of fuzzy
decision tree classifier, Fuzzy Information and Engineering, Vol. 1,
2009, pp. 149-159.
[8] T. Wang, Z. Li, Y. Yan, H. Chen, A survey of fuzzy decision tree
classifier methodology, Fuzzy Information and Engineering, Vol.
40, 2007, pp. 959-968.
[9] Y. Yuan, M. J. Shaw, Induction of fuzzy decision trees, Fuzzy Sets
and Systems, Vol. 69, 1995, pp. 125-139.
[10] X. Wang, C. Borgelt, Information Measures in fuzzy decision trees,
Proceedings of 13th IEEE International Conference on Fuzzy
Systems, 2004, pp. 85-90.
[11] Iterative deepening fuzzy ID3: A novel approach for fuzzy decision
tree induction via stopping criteria, unpublished.
[12] A. Frank, A. Asuncion, UCI machine learning repository
[http://archive.ics.uci.edu/ml], Irvine, CA: University of California,
School of Information and Computer Science, 2010.


TABLE III. MEAN GCC FOR DIFFERENT STOPPING CRITERIA
Datasets
Accuracy Number of Instances
Fuzzy Information
Gain
Fuzzy Entropy Tree Depth
Mean STD Mean STD Mean STD Mean STD Mean STD
Wine 1.86 0.77 1.00 0.00 1.80 0.21 1.84 0.87 19.75 0.44
Breast
Cancer
3.05 0.72 1.00 0.00 7.07 2.19 3.11 0.67 23.73 0.21
Breast
Tissue
4.66 1.84 1.00 0.00 1.67 0.20 2.62 0.72 16.52 0.50
Average 3.19 1.11 1.00 0.00 3.51 0.87 2.52 0.76 20.00 0.38

You might also like