You are on page 1of 19

Intelligent Data Analysis 7 (2003) 5973 IOS Press

59
Integrating rough set theory and fuzzy neural network to discover fuzzy rules
Shi-tong Wanga , Dong-jun Yub and Jing-yu Yangb
of Computer Science, School of Information, Southern Yangtse University, Jiangsu
, P.R. China, 214036 b Department of Computer Science, Nanjing University of Sci
ence & Technology, Nanjing, Jiangsu, P.R. China 210094
Received 15 April 2002 Revised 15 June 2002 Accepted 25 June 2002 Abstract. Most
of fuzzy systems use the complete combination rule set based on partitions to d
iscover the fuzzy rules, thus often resulting in low capability of generalizatio
n and high computational complexity. To large extent, the reason originates from
the fact that such fuzzy systems do not utilize the eld knowledge contained in d
ata. In this paper, based on rough set theory, a new generalized incremental rul
e extraction algorithm (GIREA) is presented to extract rough domain knowledge, n
amely, certain and possible rules. Then, fuzzy neural network FNN is used to re ne
the obtained rules and further produce the fuzzy rule set. Our approach and exp
erimental results demonstrate the superiority in both rules length and the number
of fuzzy rules. Keywords: Rough set, fuzzy set, neural networks, incremental ru
le extraction
a Department
1. Introduction In real world, almost every question will nally lead to process d
ata that has characteristics of uncertainty, imprecision. To date, many scholars
have developed all kinds of approaches, such as neural network [1], fuzzy syste
ms [2], rough set theory [3], genetic algorithm etc. Each approach has its own a
dvantages and disadvantages. In order to provide more exible and robust informati
on processing system, using only one approach is not enough. There is already a
trend to integrate different computing paradigms such as neural network, fuzzy s
ystems, rough set theory, genetic algorithm and so on to generate more ef cient hy
brid systems such as neural-fuzzy systems [4]. Typically, fuzzy neural network (
namely, FNN) embodies both advantages of neural networks (namely, NN) and fuzzy
systems. In other words, FNN can be used to construct knowledge-based NN. i.e. h
umanbeings eld knowledge can be incorporated into NN, so FNN can be more suitable
for the question to be solved. But there still exist questions. For example, in
some circumstances, people even cant derive appropriate rules to a given system.
Of course, we can divide every input dimension into several fuzzy subsets, and t
hen all fuzzy subsets in every input dimension are combined to construct the com
plete rule set. However, such kind of FNN contains no eld knowledge, i.e. this ki
nd of FNN may not t for the given system at the very beginning. Recent years, rou
gh set theory has been attracting more and more
1088-467X/03/$8.00 2003 IOS Press. All rights reserved

60
S.-t. Wang et al. / Integrating rough set theory and fuzzy neural network to dis
cover fuzzy rules
attentions and used in various applications, due to its excellent capability of
extracting knowledge from data. In this paper, we will rst apply rough set theory
to extract certain and possible rules which then are used to determine the init
ial structure of FNN such that the FNN here works in the beginning with this typ
e of useful knowledge. As to fuzzy rule extraction, there are two important prob
lems worthy to study. One is how to extract a rule set from data. The other is h
ow to re ne/simplify the obtained rule set. Several approaches [1] can be applied
to extract rules from data, such as fuzzy rule extraction based on product space
clustering, fuzzy rule extraction based on ellipsoidal covariance learning, fuz
zy rule extraction based on direct matching, etc. Fuzzy rule simpli cation approac
h [12] based on similarity measure can effectively reduce the number of fuzzy ru
les by merging similar fuzzy sets in fuzzy rules. This paper aims at solving the
above two problems in a different aspect. The contribution of our approach here
mainly exists in effectively integrating rough set theory and FNN together to d
iscovery fuzzy rules from data. Concisely, this approach rst extracts certain and
possible rules from data in an incremental mode by using the new generalized in
cremental rule extraction algorithm GIREA, then applies the FNN to re ne/simplify
the extracted fuzzy rules. This paper are organized as follows: Section II gives
a brief description of fuzzy system and FNN. Section III introduces basic conce
pts of rough set theory. In Section IV, new generalized incremental fuzzy rule e
xtraction algorithm GIREA is presented. Section V deals with the method of mappi
ng fuzzy rule set to the corresponding FNN. Simulation results are demonstrated
in Section VI. Section VII concludes this paper.
2. Fuzzy system and its fuzzy neural network Generally speaking, a fuzzy system
consists of a set of fuzzy rules as follows [5]: Rule 1: if x1 is A1 and x2 is A
1 and . . . xn is A1 , then y is B 1 1 2 n Rule 2: if x1 is A2 and x2 is A2 and
. . . xn is A2 , then y is B 2 1 2 n . . . . . . Rule N: if x1 is AN and x2 is A
N and . . . xn is AN , then y is B N 1 2 n Fact: x1 is A1 and x2 is A2 and . . .
xn is An Conclusion: y is B . With max-product inference and centroid defuzzi cat
ion, the nal output of this fuzzy system can be written as:
y= B (y)ydy B (y)dy
n x1 ,x2 ,...,xn
(1)
N n i=1
where B (y) =

Dr. L.X. Wang [6] has proved that Eq. (1) is a universal approximator.
i=1
AI (xi )

j=1

Aj (xi )
i

B j (y) .

S.-t. Wang et al. / Integrating rough set theory and fuzzy neural network to dis
cover fuzzy rules
61
Fig. 1. The FNN implementation of the fuzzy system.
In practice, one can often consider that the output fuzzy sets B j are singleton
j , i.e.,
B (y) = 1, if (y = j ), j = 1, 2, . . . , N 0, otherwise, j = 1, 2, . . . , N
n
(2)
thus, we have
B j (y) =
i=1

0, otherwise

Aj (xi ), if (y = j ),
i
j = 1, 2, . . . , N
(3)
then the nal output can
N

e rewritten as follows:

y=
j=1 N j=1
j n
n i=1
Aj (xi )
i
(4)
i
i=1
Aj (xi )
The I/O relationship of the fuzzy system de ned in Eq. (4) can e implemented y a
corresponding FNN. The FNN consists of four components. They are input layer, f
uzzi cation layer, inference layer and defuzzi cation layer as shown in Fig. 1. Gene
ral speaking, FNN can e utilized in two modes, one is series-parallel mode and
the other is parallel mode [13,14], see Figs 2(a) and ( ), where TDL represents
time delayed logic, RS represents the real system, FNN represents fuzzy neural n
etwork, u k is the activation function, yk and yk are outputs of the RS and the
FNN, respectively, e k is the difference etween y k and yk . Figure 2(a) can e
called series-parallel mode and Fig. 2( ) parallel mode. When the FNN works in
series-parallel mode, all the delayed output data (used as the input data of the
FNN) are the o servation data of the real system. In this circumstance, high o
servation precision is needed; too much o servation noise will greatly degrade t
he performance of the FNN. While in parallel mode, all the delayed output data (
used as the input data of the FNN) are independent to the o servation data of th

e real system, and only relate to the FNN itself. No

62
S.-t. Wang et al. / Integrating rough set theory and fuzzy neural network to dis
cover fuzzy rules
(a)
( )
Fig. 2. Two modes that FNN can
mode.

e applied. (a) series-parallel mode ( ) parallel

matter in which kind of mode, when the FNN approximates the real system well eno
ugh, it can e applied independently. FNN has een widely used ut there still e
xists a question as we descri ed in Section 1, i.e., when there is no any prior e
ld knowledge, how can people get appropriate rules to construct FNN to reduce it
s searching space and time. The rest parts of this paper try to solve this pro l
em. 3. Rough set, decision matrix and rule extraction 3.1. Basic concepts of rou
gh sets Here, we just introduce some necessary concepts needed in this paper. Fo
r details, please refer to [3]. An information system K = (U, C D), where U deno
tes the domain of discourse, C denotes a non-empty condition attri ute set, and
D denotes a non-empty decision attri ute set. Let A = C D, an attri ute a(a A) c
an e regarded as a function from the domain of discourse U to value set V al a
. An information system may e represented in the form of attri ute-value ta le,
in which rows are la eled y o jects in the domain of discourse, and columns y
the attri utes. For every su set of attri utes B C , equivalence relation I B o
n U can e de ned as:
IB = {(x, y) U : for every a B, a(x) = a(y)}
(5)
thus, the equivalence class of the o ject x U relative to I B can
[x]B = {y|y U, yBx}

e de ned as:

(6)
Equivalence class can also e called indiscerni le class, ecause any two o ject
s in equivalence class are indiscerni le. Low and upper approximation are anothe
r two important concepts in rough set theory. Given su sets x U, B C , X s B-Lowe
r and B-Upper approximations can e de ned as BX{x U : [x] B X} and BX = {x U : [x
]B X = }, respectively. Boundary set BN B (X) can be de ned as BNB (X) = BX BX . If
BNB (X) = , i.e. BX = BX , then X is B rough, otherwise, X is B exact. 3.2. Rule
extraction using decision matrix Decision matrix is a generalized form of rough
set theory. The concept of decision matrix is derived from descernibility matri
ces [8], it can be used to compute decision rules and reducts of information sys
tem. It provides a way to generate the simplest set of rules and preserve all cl
assi cation information simultaneously [9].

S.t. Wang et al. / Integrating rough set theory and fuzzy neural network to dis
cover fuzzy rules Table 1 Consistent information table Attributes Headache Tempe
rature Yes Normal Yes High Yes Very High No Normal No High No Very High Decision
lu No Yes Yes No No Yes
63
Object1 Object2 Object3 Object4 Object5 Object6
Table 2 Decision matrix for class 0 ( u infected) Class i 1 2 3 j OBJ Obj2 Obj3 Ob
j6 1 Obj1 (T,1) (T,2) (H,1)(T,2) 2 Obj4 (T,1)(H,0) (T,2)(H,0) (T,2) 3 Obj5 (H,0)
(T,2)(H,0) (T,2)
3.2.1. Rule extraction from consistent information table Let us introduce decisi
on matrix rst. or an information system K = (U, C D), suppose U be divided into
m classes (c 1 , c2 , . . . , cm ) by equivalence relation de ned on D. Given any
class c (c1 , c2 , . . . , cm ), all objects which belong to and do not belong t
o this class are numbered with subscripts i(i = 1, 2, . . . , ) and j(j = 1, 2, .
. . , ), espectively. The decision matix M (K) = (M ij ) of infomation system
K is de ned as a matix, whose enty at position (i, j) is a set of attibute-value
pai:
Mij = {(a, a(i)) : a(i) = a(j)}, (i = 1, 2, . . . , ; j = 1, 2, . . . , ),
(7)
whee a(i) is a value of attibute a. Fo a iven object i(i = 1, 2, . . . , ) be
lonin to class c (c 1 , c2 , . . . , cm ), we can compute its minimal-lenth d
ecision ule
|Bi | = Mij ,
j
(8)
whee and ae enealized conjunction and disjunction opeato espectively. So
fo the iven class c (c1 , c2 , . . . , cm ), its decision ule set can be ep
esents as followin
RU L = |Bi |, (i = 1, 2, . . . , )
(9)
Let H epesent Headache, T and F epesent Tempeatue and Flu, espectively. V
ALH = {0, 1} epesents V ALHeadache = {Yes,No}. V ALT = {0, 1, 2} epesents V
ALTewmpeatue = {Nomal,Hih,Vey Hih}. V ALF = {0, 1} epesents V ALFlu = {
Yes,No}. Tables 2 and 3 demonstate the decision matix fo class 0 (Flu infecte
d) and 1 (not infected), espectively. 0 Let |Bi |(i = 1, 2, 3) denotes the i-th
minimal-lenth ule in decision matix of class 0. So,
0 |B1 | = (T, 1) ((T, 1) (H, 0)) (H, 0) = (T, 1) (H, 0)

64
S.-t. Wan et al. / Inteatin ouh set theoy and fuzzy neual netwok to dis
cove fuzzy ules Table 3 Decision matix fo class 1 ( u not infected) Class i 1
2 3 j OBJ Obj1 Obj4 Obj5 1 Obj2 (T,0) (T,0)(H,1) (H,1) 2 Obj3 (T,0) (T,0)(H,1) (
H,1)(T,1) 3 Obj6 (H,0)(T,0) (T,0) (T,1)
Object1 Object2 Object3 Object4 Object5 Object6 Object7 Object8
Table 4 Inconsistent infomation table Attibutes Decision Headache Tempeatue
Flu Yes Nomal No Yes Hih Yes Yes Vey Hih Yes No Nomal No No Hih No No Vey
Hih Yes No Hih Yes No Vey Hih No
0 |B2 | = (T, 2) ((T, 2) (H, 0)) ((T, 2) (H, 0)) = (T, 2) 0 |B3 | = ((T, 2) (H,
1)) (T, 2) (T, 0) = (T, 2)
Similaly, the i-th minimal-lenth ule in decision matix of class 1 can be com
pute as followin:
1 |B1 | = (T, 0) (T, 0)((T, 0) (H, 0)) = (T, 0) 1 |B2 | = ((T, 0) (H, 1)) ((T, 0
) (H, 1)) (T, 0) = (T, 0) 1 |B3 | = (H, 1) ((T, 1) (H, 1)) (T, 1) = (T, 1) (H, 1
)
The nal minimal-lenth decision ule set fo class 0 and class 1 can be epesent
ed as
RU L0 = (T, 2) ((T, 1) (H, 0)) RU L1 = (T, 0) ((T, 1) (H, 1))
3.3. Rule extaction fom inconsistent infomation table usin decision matix I
n eal-life applications, consistent infomation table often does not exist, so,
inconsistent infomation has to be coped with. Suppose we add Object 7 and Obje
ct 8 into Table 1 and then et Table 3. Table 3 is an inconsistent infomation t
able fo thee exist some Objects that have the same condition attibute value a
nd whose coespondin decision attibute values ae diffeent. Fo example, Obj
ect5 and Object7 have the same condition attibute value, but they have diffeen
t decision attibute values. Fom Table 3, we can et two concepts X 1 = {Object
2, Object3, Object6, Object7} and X 2 = {Object1, Object4, Object5, Object8}, e
pesentin u infected and u not infected, espectively. These two concepts ae ou
h because neithe of them is de nable. In ode to extact ules fom inconsisten
t infomation table, low and uppe appoximations ae needed. Rules extacted f
om low appoximation ae cetain ules. Rules extacted fom uppe appoximation
ae possible ules.

S.-t. Wan et al. / Inteatin ouh set theoy and fuzzy neual netwok to dis
cove fuzzy ules Table 5 Decision matix fo computin concept X1 s cetain ule
s Class i 1 2 j Object Object2 Object3 1 Object1 (T,1) (T,2) 2 Object4 (H,0)(T,1
) (H,0)(T,2) 3 Object5 (H,0) (H,0)(T,2) 4 Object6 (H,0)(T,1) (H,0) 5 Object7 (H,
0) (H,0)(T,2) 6 Object8 (H,0)(T,1) (H,0)
65
Table 6 Decision matix fo computin concept X1 s possible ules Class j 1 2 i O
bject Object1 Object4 1 Object2 (T,1) (T,1)(H,1) 2 Object3 (T,2) (T,2)(H,1) 3 Ob
ject5 (H,1)(T,1) (T,1) 4 Object6 (H,1)(T,2) (T,2) 5 Object7 (H,1)(T,1) (T,1) 6 O
bject8 (H,1)(T,2) (T,2)
Fistly, we compute concept X 1 and X2 s low and uppe appoximation:
BX1 = {Object2, Object3} BX2 = {Object1, Object4} BX1 = {Object2, Object3,Object
5, Object6,Object7, Object8} BX2 = {Object1, Object4,Object5, Object6,Object7, O
bject8}
0 Let |Bi |cetain (i = 1, 2) denote the i-th minimal-lenth cetain ule in dec
ision matix of class 0. Usin method poposed in Section 4.1, we can compute ce
tain ules fo concept X 1 (class 0 ) as follows: 0 |B1 |cetain = (T, 1) ((T,
1) (H, 0)) (H, 0) ((T, 1) (H, 0)) (H, 0) ((T, 1) (H, 0))
= (T, 1) (H, 0)
0 |B2 |cetain
= (T, 2) ((T, 2) (H, 0)) ((T, 2) (H, 0)) (H, 0) ((T, 2) (H, 0)) (H, 0) = (T,
(H, 0)
thus, we obtain cetain ule set fo class 0:
RU L0 cetain = ((T, 1) (H, 0)) ((T, 2) (H, 0))
In ode to obtain cetain ules, we de ne its belief function df = 1. In othe wo
ds, ules with df = 1 ae positively believable. 0 Let |Bi |possible denote the
i-th minimal-lenth cetain ule in decision matix of class 0. Similaly, we c
an use the same method to compute possible ules fo concept X 1 fom Table 6 as
follows:
0 |B1 |possible = (T, 1) ((T, 1) (H, 1)) = (T, 1), 0 |B2 |possible = (T, 2) ((T,
2) (H, 1)) = (T, 2) 0 |B3 |possible = ((T, 1) (H, 1)) (T, 1)) = (T, 1),

66
S.-t.
cove
0 |B4
1))

Wan et al. / Inteatin ouh set theoy and fuzzy neual netwok to dis
fuzzy ules
|possible = ((T, 2) (H, 1)) (T, 2) = (T, 2) 0 |B5 |possible = ((T, 1) (H,
(T, 1)) = (T, 1), 0 |B6 |possible = ((T, 2) (H, 1)) (T, 2) = (T, 2)

thus, we can obtain possible ule set fo class 0 as follows


RU L0 possibel = (T, 1) (T, 2) (T, 1) (T, 2) (T, 1) (T, 2) = (T, 1) (T, 2)
Fo possible ules, we de ne thei belief function
df = 1 card(BX BX) card(U )
where card() denotes the cardinality of the set. In other words, possible rules a
re believable with degree df, 0 < df < 1. The rationale of this de nition is intui
tive: The more the difference between BX and BX is, the more inexact the concept
X is, thus the belief degree of the possible rules extracted from X should be d
ecreased accordingly. When BX approaches to BX , df will approach to 1. Similarl
y, we can compute concept X 2 s certain and possible rules. 4. New Generalized In
cremental Rule Extracting Algorithm (GIREA) Suppose we have extracted certain an
d possible rules from an information table, when new objects are added into it,
the rule set may be changed. In this circumstance, incremental rule extraction a
lgorithm is required; otherwise it will take much more long time to re-compute r
ule set from the very beginning. It should be pointed out that the incremental r
ule extraction algorithm in [9] did not compute certain and possible rules and c
ope with consistent information table simultaneously. However, the new generaliz
ed incremental rule extraction algorithm (GIREA) is presented here, which can no
t only deal with both consistent and inconsistent information table, but also it
can extract certain and possible rule sets at the same time, although GIREA is
a generalization of the algorithm presented in [9]. The main idea of this new al
gorithm can be summarized as follows: Given a new added Object: Whether this new
added Object causes a new concept or not? If it does, update concept set. Colli
sion detection: Objecta collides with Objectb , if and only if Objecta and Objec
tb have the same condition attribute values, and their corresponding decision at
tribute values are different. For example, Object6 and Object8 collide with each
other (in Section 4.3). Update certain and possible rule sets in terms of colli
sion detection. Using this algorithm, when a new object is added up to informati
on system, it is unnecessary to re-compute rule sets from the very beginning, we
can update rule sets by partly modifying original rule sets, so a lot of time a
re saved, it is especially useful when extracting rules from large databases. GI
REA Algorithm: Condition: Rule set and concept set (X = {X 1 , X2 , . . . , X })
which have been computed fom the iven infomation system. A new object Object
new is added up to infomation system.

S.-t. Wan et al. / Inteatin ouh set theoy and fuzzy neual netwok to dis
cove fuzzy ules
67
BEGIN STEP 1. Detemine which concept the new added object belons to, if it doe
s not belon to any concept in concept set X = {X1 , X2 , . . . , X }, ceate a n
ew concept X +1 and add it to X , i.e. X = X {X } STEP 2. // Collision detection I
F (the new object Objectnew collides with oiinal objects in infomation table)
FLAG = 1; ELSE FLAG = 0; STEP 3. Get a concept Xi fom X , and X = X {Xi }. I(
LAG = 0) // no collision { I (Val(Xi ) = Val(Objnew )) { add up a new row for
concept X i s certain and possible decision matrix respectively (labeled with 1 a
nd 2 respectively), (Mk1j ) = {(a, a(k1))|a(k1) = a(j)} (Mk2j ) = {(a, a(k2))|a(
k2) = a(j)} compute decision rule for the added row respectively: |Bk1 | = Mk1j
|Bk2 | = Mk2j
j j
} ELSE { add a new column for concept s certain and possible decision matrix resp
ectively (labeled with 1 and 2 respectively), (Mk1j ) = {(a, a(i))|a(i) = a(k1)}
(Mk2j ) = {(a, a(i))|a(i) = a(k2)} compute decision rule for every row respecti
vely: |Bi |certain = |Bi |certain Mik1 |Bi |possible = |Bi |possible Mik2 update
concept X i s certain and possible rule sets as follows RU Li certain = |Bi |cert
ain } RU Li possible = |Bi |possible
i RU Li possible = RU Lpossible |Bk2 |
update concept X i s certain and possible rule sets as follows i RU Li certain =
RU Lcertain |Bk1 |
} ELSE //collision detected { I (Objectnew collides with Object which exists in
concept X i s low approximation) { delete the row which contains Object
from cert
ain decision matrix of concept

68
S.t. Wang et al. / Integrating rough set theory and fuzzy neural network to dis
cover fuzzy rules
Xi (labeled with l). Update certain rule set as follows: i RU Li certain = RU Lc
ertain |Bl |certain . Then add a new column to certain decision matrix of concep
t Xi (labeled with k). Update every rows decision rule as follows: |Bi |certain =
|Bi |certain Mik Update nal certain rule set as follows: RU Li certain = |Bi |cer
tain add a new row to possible decision matrix of concept X i (labeled with k):
(Mkj ) = {(a, a(k))|a(k) = a(j)} compute possible decision rule for this line: |
Bk |possible = Mkj
j
} ELSE I(Val(Xi ) = Val(Objectnew )) { add a new column to certain decision mat
rix of concept X i (labeled with k). (Mik ) = {(a, a(i))|a(i) = a(k)} update eve
ry rows decision rule as follows: |Bi |certain = |Bi |certain Mik update nal certa
in rule set as follows: RU Li certain = |Bi |certain delete the column which cont
ains Object from possible decision matrix of concept Xi and add a new row (Objec
t new ) to it; calculate each rows possible rule |B i |possible ; i calculate RU
Li possible as: RU Lpossible = |Bi |possible } ELSE { add a new column for concep
t s certain and possible decision matrix respectively (labeled with k1 and k2 res
pectively), (Mik1 = {(a, a(i))|a(i) a(k1)} (Mik2 = {(a, a(i))|a(i) a(k2)} comput
e decision rule for the added column respectively: |Bi |certain = |Bi |certain M
ik1 |Bi |possible = |Bi |possible Mik2 update concept X i s certain and possible
rule sets as follows RU Li certain = |Bi |certain RU Li possible = |Bi |possible }
} } STEP 4. I (X = ) GOTO STEP 3.
update

nal possible set as follows: RU Lpossible = RU Li possible |Bk |possible

S.t. Wang et al. / Integrating rough set theory and fuzzy neural network to dis
cover fuzzy rules
69
ELSE STOP. END A question one may raise here is that when a new object is added
to the domain of discourse U , the cardinality of U will change, thus the belief
degrees of possible rules must be recomputed, this will affect the entire learn
ed rule set, thereby making the algorithm not incremental. We analyze it as foll
ows: according to the de nition of belief function in Section 4.3, the belief degr
ees of possible rules extracted from the same concept are equal. When a new obje
ct is added, recomputing each concepts belief function can get the belief degrees
of all possible rules. Moreover, the incrementability of the proposed algorithm
is acquired by properly modifying the already existing rules; belief degree rec
omputation is just small part work of this kind modi cation. Compared with the com
putational cost of rule modi cation, computational cost of belief degree is rather
small. 5. Mapping rules into the NN When certain and possible rules are extrac
ted from information table, we need to map them into the corresponding NN just
like mapping fuzzy rules to NN, which is described in Section 2. Taking the rul
es extracted in Section 3.2.2 as an example, there are 3 certain rules and 3 pos
sible rules in the rule set as follows: Certain rules:
RU L0 certain = ((T, 1) (H, 0)) ((T, 2) (H, 0)) RU L1 certain = (T, 0)
Possible rules:
RU L0 possible = (T, 1) (T, 2) RU L1 possible = (H, 1)
We can describe these rules in the form of natural language as follows: (1) If T
emperature is High And Headache is Yes, Then the lu is Infected. (df 1 = 1) (2)
If Temperature is Very High And Headache is Yes, Then the lu is Infected. (df
2 = 1) (3) If Temperature is Normal, Then the lu is not Infected. (df 3 = 1) Ru
les (1), (2) and (3) are certain rules, the belief degrees (df ) of which are al
l 1, i.e., these certain rules are de nitely believable. (4) If Temperature is Hig
h, Then the lu is Infected. (df 4 = 0.5) (5) If Temperature is Very High, Then
the lu is Infected. (df 5 = 0.5) (6) If Headache is No, Then the lu is not Inf
ected. (df 6 = 0.5) Rules (4), (5) and (6) are possible rules, the belief degree
s (df ) of which lie between 0 and 1, i.e., these possible rules are partially b
elievable. As there are two kinds of rules (certain and possible), thus the infe
rence layer of the corresponding NN consists of two parts as shown in ig. 3, o
ne is certain part, which contains certain rules, and the other is possible part
, which contains possible rules.

70
S.t. Wang et al. / Integrating rough set theory and fuzzy neural network to dis
cover fuzzy rules
the NN as shown in ig

ig. 3. Mapping rules to NN.

Let dfi be the belief degree of the ith rule. The nal tness of the ith rule in NN
can be measured by dfi i , where i is the tness of the ith rule in conventionl me
ning. Let x be the input vrible of Hedche, y be the input vrible of Tempe
rture, C 1 represent u not infected nd C 2 represent u infected. De ne two fuzzy s
ets Yes nd No on input dimension nd three fuzzy sets N, H nd V on input dimensio
e N, H nd V represent Norml, High nd Very High, respectively. Then the six rul
d bove cn be mpped into the FNN s shown in Fig. 3.
6. Numericl simultions In this section, numericl simultions re demonstrted
to show our pprochs superiority over the rule extrction pproch only using t
he conventionl FNN [1]. Given  nonliner system:
y(t + 1) = u(t) = sin y(t)y(t 1)(y(t) + 2.5) + u(t) 1 + y 2 (t) + y 2 (t 1)
2t is activation function. 25 y(0) = 0.9, y(1) = 0.5.
(10)
Method 1: Use the conventional FNN [1]. First, we divide inut interval into thr
ee equal subintervals on each dimension, and then de ne three fuzzy subsets on th
em (see Fig. 4). Figure 4 shows how to de ne fuzzy sets on subintervals, where S,
M and L reresents fuzzy sets Small, Middle and Large, resectively; y min and ymax a
re the minimum and the maximum that may be taken on dimension y , resectively.

S.t. Wang et al. / Integrating rough set theory and fuzzy neural network to dis
cover fuzzy rules
71
Fig. 4. De ning fuzzy sets on y dimension. Table 7 Performance comarison between
method 1 and method 2 R ARL No. of Iterations Method 1 27 3 200 Method 2 20 2.2
89
We de ne the average rule length ARL as:
R
Pi ARL =
i=1
R
(11)
where R is the number of rules, Pi is the number of the remise variables in the
ith rule. Using the comlete combination rule set, there will be 27 (3 3 3) rul
es, and ARL is 3 (because there are 3 remise variables in each rule). Method 2:
Use the aroach in this aer, i.e., Discretizing samles (Quantifying continu
ous attribute value). In order to comare with Method 1, inut interval is also
divided into 3 equal subintervals on each dimension as done in method 1. In orde
r to demonstrate the incrementability of the roosed algorithm GIREA, setting i
nformation table null at beginning, then gradually add samle into it (one by ea
ch time), extracting certain and ossible rules using GIREA until all samles ha
ve been rocessed. Maing rules to the FNN, using the FNN to re ne the rules obta
ined in the above ste Using method 2, we got 20 rules and the average rule leng
th is 2.5. In our exeriment, in order to aroximate to the same level, the num
ber of iteration for method 1 and method 2 are 200 and 89 resectively. Figure 5
shows the nal identi cation results of method 1 and method 2, resectively (using
FNN indeendently when training nished and using different initial state values f
rom the real system (y(0) = 0.9, y(1) = 0.5), but the two FNNs use the same initi
al state values (y(0) = 0.4, y(1) = 0.2)). Table 7 comares the erformances of m
ethod 1 and method 2. From Fig. 5 we can see that comared with method 1, method
2 has the simler rule set, the more quick learning seed. The reason is that t
he FNN based on our aroach here contains knowledge got from samle data. Figure
6 also shows the nal identi cation results of method 1 and method 2 after 20% white
gauss noise added resectively. It is easy to see that the FNN based on method
2 has better robustness than the FNN based on method 1. Here another exeriment
is done to demonstrate the erformance sueriority of the roosed GIREA over th
e conventional rule extraction algorithm.

72
S.t. Wang et al. / Integrating rough set theory and fuzzy neural network to dis
cover fuzzy rules
(a)
(b)
Fig. 5. (a) and (b) are identi cation results using method 1 and method 2, resect
ively. Small dots real system (initial state y(0) = 0.9, y(1) = 0.5); Big dots FN
N (initial state y(0) = 0.4, y(1) = 0.2).
(a)
(b)
Fig. 6. (a) and (b) are identi cation results using method 1 and method 2, resect
ively. (20% white gauss noise added). Small dots real system (initial state y(0)
= 0.9, y(1) = 0.5); Big dots FNN (initial state y(0) = 0.4, y(1) = 0.2).
Suose there are 100 samles in original samle set. Rules have been extracted
from the samle set using the conventional rule extraction algorithm. Suose th
e used time be the benchmark time 1. Now another 20 samles are added to the sam
le set. The time of reextracting rules using the conventional rule extraction
algorithm is 1.19, while the time of reextracting rules using GIREA is only 1.0
8, as shown in Table 8. The reason is that when new objects added, the roosed
GIREA udates rule set by artly modifying original rule set, while the conventi
onal rule extraction algorithm needs to recomute rule set from the very beginn
ing. 7. Conclusions How to get rules from data without exert knowledge is the b
ottleneck of knowledge discovery. Our aroach here attemts to integrate rough
set and FNN together to discover knowledge. Rule set obtained by GIREA has chara
cteristics of fewer rules and shorter rule length. Simulation results on our a
roach here show its effectiveness and advantages over conventional FNN. The reas
on is that our aroach utilizes the distribution characteristics of samle data
and extract better rule set, so the FNN based on better rule set has better toology
and has better robustness and learning seed accordingly. Further

S.t. Wang et al. / Integrating rough set theory and fuzzy neural network to dis
cover fuzzy rules Table 8 Performance comarison between the conventional rule e
xtraction algorithm and the GIREA (Note: the time listed in Table 8 is relative
to the benchmark time 1) Algorithm The Conventional Rule Extraction Algorithm GI
REA Time used 1.19 1.08
73
studies should be focused on theoretical and ractical study of staticdynamic t
oologychangeable FNN and knowledge discovery. Acknowledgement The work here is
nancially suorted by National Science Foundation of China. The authors would l
ike to thank the anonymous reviewers for their valuable comments. ABOUT AUTHORS
Wang Shitong: Professor in comuter science Yu Dongjun, Ph.D candidate in comut
er science Yang JinYu: Professor in comuter science References
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] S.T. Wang, Fuzzy sy
stem and Fuzzy Neural Networks, Shanghai Science and Technology Press, 1998, Edi
tion 1. L.A. Zadeh, Fuzzy sets, Inform. Contr. 8 (1965), 338353. Z. Pawlak, Rough
Sets, Theoretical Asects of Reasoning About Data. Dordrecht, Kluwer, The Nethe
rlands, 1991. M. Banerjee et al., Rough Fuzzy MLP: Knowledge Encoding and Classi c
ation, IEEE Trans. Neural Networks 9(6) (1998), 12031216. C.T. Lin, Neural Fuzzy
System, PrenticeHall Press, USA, 1997. L.X. Wang, A Course on Fuzzy Systems, Pr
enticeHall ress, USA, 1999. S. Wang and D. Yu, Error analysis in nonlinear sys
tem identi cation using fuzzy system, J. of software research 11(4) (2000), 447452.
A. Skowron and C. Rauser, The discernability matrices and functions in informat
ion system, in Intelligent Decision Suort, Handbook of Alication and Advance
s of Rough Sets Theory, R. Slowinski, ed., Dordrecht, Kluwer, The Netherlands, 1
992, . 331362. N. Shan and W. Ziarko, An incremental Learning Algorithm for Con
structing Decision Rules, in: Rough Sets, Fuzzy Sets and Knowledge Discovery, R.
S. Kluwer, ed., SringerVerlag, 1994, . 326334. P. Wang, Constructive theory f
or fuzzy system, Fuzzy sets and systems 88(2) (1997), 10401045. Z. Mao et al., To
ologyChangeable neural network, Control theory and alication 16(1), 5460. M.
Setnes et al., Similarity measures in fuzzy rule base simli cation, IEEE Transact
ions on system, man, and cybernetics Part B: cybernetics 28(3) (June 1998). K.S.
Narendra and K. Parthasarathy, Identi cation and control of dynamical systems usi
ng neural networks, IEEE Trans. Neural Networks 1(1) (March 1990), 423. J. Lu, W.
Xu and Z. Han, Research on arallel Identi cation Algorithm of Neural Networks, C
ontrol Theory and alications 15(5) (1998), 741745.

You might also like