You are on page 1of 276

This page

intentionally left
blank
This page
intentionally left
blank
To
my ParenIs
This page
intentionally left
blank
This book attempts to provide the reader with basic concepts and engineering appIications oI Fuzzy
Logic and NeuraI Networks. Back in 2000, the absence oI any state-oI-the-art (Indian Domain) textbook
Iorced me to write this book.
Some oI the materiaI in this book contains timeIy materiaI and thus may heaviIy change throughout
the ages. The choice oI describing engineering appIications coincides with the Fuzzy Logic and NeuraI
Network research interests oI the readers.
ModeIing and controI oI dynamic systems beIong to the IieIds in which Iuzzy set techniques have
received considerabIe attention, not onIy Irom the scientiIic community but aIso Irom industry. Many
systems are not amenabIe to conventionaI modeIing approaches due to the Iack oI precise, IormaI
knowIedge about the system, due to strongIy non-Iinear behaviour, due to the high degree oI uncertainty,
or due to the time varying characteristics. Fuzzy modeIing aIong with other reIated techniques such as
neuraI networks have been recognized as powerIuI tooIs, which can IaciIitate the eIIective deveIopment
oI modeIs.
The approach adopted in this book aims at the deveIopment oI transparent ruIe-based Iuzzy modeIs
which can accurateIy predict the quantities oI interest, and at the same time provide insight into the
system that generated the data. Attention is paid to the seIection oI appropriate modeI structures in terms
oI the dynamic properties, as weII as the internaI structure oI the Iuzzy ruIes.
The IieId oI neuraI networks has a history oI some Iive decades but has Iound soIid appIication onIy
in the past IiIteen years, and the IieId is stiII deveIoping rapidIy. Thus, it is distinctIy diIIerent Irom the
IieIds oI controI systems or optimization where the terminoIogy, basic mathematics, and design
procedures have been IirmIy estabIished and appIied Ior many years. NeuraI networks are useIuI Ior
industry, education and research. This book is intended to cover wideIy primariIy the topics on neuraI
computing, neuraI modeIing, neuraI Iearning, and neuraI memory.
RecentIy, a great deaI oI research activity has Iocused on the deveIopment oI methods to buiId or
update Iuzzy modeIs Irom numericaI data. Most approaches are based on neuro-Iuzzy systems, which
expIoit the IunctionaI simiIarity between Iuzzy reasoning systems and neuraI networks. This
combination oI Iuzzy systems and neuraI networks enabIes a more eIIective use oI optimization
techniques Ior buiIding Iuzzy systems, especiaIIy with regard to their approximation accuracy. Neuro-
Iuzzy modeIs can be regarded as bIack-box modeIs, which provide IittIe insight to heIp understand the
underIying process.
The orientation oI the book is towards methodoIogies that in the author`s experience proved to be
practicaIIy useIuI. The presentation reIIects theoreticaI and practicaI issues in a baIanced way, aiming at
2HAB=?A
readership Irom the academic worId and aIso Irom industriaI practice. ExampIes are given throughout
the text and six seIected reaI-worId appIications are presented in detaiI.
Chennakesava R. Alavala
viii IRLIACL
2HAB=?A LE
Chapter 1: Introduction 1-5
1.1 Iuzzy Logic (IL) 1
1.2 NouraI NoIvorks (NN) 2
1.3 SiniIariIios and DissiniIariIios BoIvoon IL and NN 4
1.4 AIicaIions 4
Qucs|icn 8an| 4
Rcfcrcnccs 5
Part I: Fuzzy Logic
Chapter 2: Fuzzy Sets and Fuzzy Logic 6-18
2.1 InIroducIion 6
2.2 WhaI is Iuzzy Logic? 6
2.3 IisIoricaI Background 8
2.4 CharacIorisIics of Iuzzy Logic 9
2.5 CharacIorisIics of Iuzzy SysIons 9
2.6 Iuzzy SoIs 9
2.6.1 Iuzzy SoI 9
2.6.2 SuorI 11
2.6.3 NornaI Iuzzy SoI 11
2.6.4 =-CuI 11
2.6.5 Convox Iuzzy SoI 11
2.6.6 Iuzzy Nunbor 12
2.6.7 Quasi Iuzzy Nunbor 12
2.6.8 TrianguIar Iuzzy Nunbor 13
2.6.9 TraozoidaI Iuzzy Nunbor 14
2.6.1O SubsoIhood 14
2.6.11 LquaIiIy of Iuzzy SoIs 15
2.6.12 LnIy Iuzzy SoI 15
+JAJI
x CNTLNTS
2.6.13 InivorsaI Iuzzy SoI 15
2.6.14 Iuzzy IoinI 15
2.7 oraIions on Iuzzy SoIs 16
2.7.1 InIorsocIion 16
2.7.2 Inion 16
2.7.3 ConIononI 17
Qucs|icn 8an| 17
Rcfcrcnccs 18
Chapter 3: Fuzzy Relations 19-28
3.1 InIroducIion 19
3.2 Iuzzy RoIaIions 19
3.2.1 CIassicaI N-Array RoIaIion 19
3.2.2 RofIoxiviIy 2O
3.2.3 AnIi-RofIoxiviIy 2O
3.2.4 SynnoIriciIy 2O
3.2.5 AnIi-SynnoIriciIy 2O
3.2.6 TransiIiviIy 2O
3.2.7 LquivaIonco 2O
3.2.8 IarIiaI rdor 2O
3.2.9 ToIaI rdor 2O
3.2.1O Binary Iuzzy RoIaIion 21
3.3 oraIions on Iuzzy RoIaIions 21
3.3.1 InIorsocIion 21
3.3.2 Inion 22
3.3.3 Iro|ocIion 23
3.3.4 CarIosian IroducI of Tvo Iuzzy SoIs 24
3.3.5 Shadov of Iuzzy RoIaIion 24
3.3.6 Su-Min ConosiIion of Iuzzy RoIaIions 26
Qucs|icn 8an| 27
Rcfcrcnccs 27
Chapter 4: Fuzzy Implications 29-40
4.1 InIroducIion 29
4.2 Iuzzy InIicaIions 3O
4.3 Modifiors 33
4.3.1 LinguisIic VariabIos 34
4.3.2 Tho LinguisIic VariabIo TruIh 35
Qucs|icn 8an| 38
Rcfcrcnccs 39
CNTLNTS xi
Chaptcr 5: Thc Thcory of Approximatc Rcasoning 41-53
5.1 InIroducIion 41
5.2 TransIaIion RuIos 43
5.2.1 LnIaiInonI RuIo 43
5.2.2 Con|uncIion RuIo 43
5.2.3 Dis|uncIion RuIo 43
5.2.4 Iro|ocIion RuIo 44
5.2.5 NogaIion RuIo 44
5.2.6 ConosiIionaI RuIo of Inforonco 44
5.3 RaIionaI IroorIios 45
5.3.1 Basic IroorIy 45
5.3.2 ToIaI IndoIorninanco 46
5.3.3 SubsoI 46
5.3.4 SuorsoI 47
Qucs|icn 8an|. 50
Rcfcrcnccs. 51
Chapter 6: Fuzzy Rule-Based Systems 54-70
6.1 InIroducIion 54
6.2 TrianguIar Norn 54
6.3 TrianguIar Conorn 55
6.4 |-norn-basod InIorsocIion 57
6.5 |-conorn-Basod Inion 57
6.6 Avoraging oraIors 58
6.6.1 An Avoraging oraIor is a IuncIion 58
6.6.2 rdorod WoighIod Avoraging 6O
6.7 Moasuro of Disorsion or LnIroy of an va VocIor 63
6.8 Mandani SysIon 66
6.9 Larson SysIon 66
6.1O DofuzzificaIion 67
Qucs|icn 8an| 68
Rcfcrcnccs 68
Chapter 7: Fuzzy Reasoning Schemes 71-80
7.1 InIroducIion 71
7.2 Iuzzy RuIo-baso SysIon 71
7.3 Inforonco Mochanisns In Iuzzy RuIo-baso SysIons 72
7.3.1 Mandani inforonco Mochanisn 73
7.3.2 TsukanoIo inforonco Mochanisn 73
7.3.3 Sugono Inforonco Mochanisn 75
7.3.4 Larson inforonco Mochanisn 77
xii CNTLNTS
7.3.5 SinIifiod Iuzzy Roasoning 77
Qucs|icn 8an| 79
Rcfcrcnccs 79
Chapter 8: Fuzzy Logic Controllers 81-93
8.1 InIroducIion 81
8.2 Basic Ioodback ConIroI SysIon 81
8.3 Iuzzy Logic ConIroIIor 82
8.3.1 Tvo-InuI-SingIo-uIuI (TIS) Iuzzy SysIons 82
8.3.2 Mandani Tyo of Iuzzy Logic ConIroI 82
8.3.3 Iuzzy Logic ConIroI SysIons 84
8.4 DofuzzificaIion MoIhods 86
8.4.1 ConIor-of-Aroa/CraviIy 87
8.4.2 IirsI-of-Maxina 87
8.4.3 MiddIo-of-Maxina 87
8.4.4 Max-CriIorion 88
8.4.5 IoighI dofuzzificaIion 88
8.5 LffocIiviIy f Iuzzy Logic ConIroI SysIons 89
Qucs|icn 8an| 91
Rcfcrcnccs 91
Chapter 9: Fuzzy Logic-Applications 94-120
9.1 Why uso Iuzzy Logic? 94
9.2 AIicaIions of Iuzzy Logic 95
9.3 Whon NoI Io uso Iuzzy Logic? 96
9.4 Iuzzy Logic ModoI for IrovonIion of Road AccidonIs 96
9.4.1 Traffic AccidonIs and Traffic SafoIy 96
9.4.2 Iuzzy Logic Aroach 97
9.4.3 AIicaIion 97
9.4.4 Monborshi IuncIions 98
9.4.5 RuIo Baso 99
9.4.6 uIuI 99
9.4.7 ConcIusions 99
9.5 Iuzzy Logic ModoI Io ConIroI Roon TonoraIuro 1OO
9.5.1 Tho Mochanics of Iuzzy Logic 1OO
9.5.2 IuzzificaIion 1O1
9.5.3 RuIo AIicaIion 1O2
9.5.4 DofuzzificaIion 1O3
9.5.5 ConcIusions 1O4
9.6 Iuzzy Logic ModoI for Crading of AIos 1O4
9.6.1 AIo DofocIs Isod in Iho SIudy 1O5
CNTLNTS xiii
9.6.2 MaIoriaIs and MoIhods 1O5
9.6.3 AIicaIion of Iuzzy Logic 1O6
9.6.4 Iuzzy RuIos 1O8
9.6.5 DoIorninaIion of Monborshi IuncIions 1O9
9.6.6 DofuzzificaIion 11O
9.6.7 RosuIIs and Discussion 111
9.6.8 ConcIusion 112
9.7 An InIroducIory LxanIo: Iuzzy v/s Non-fuzzy 112
9.7.1 Tho Non-Iuzzy Aroach 112
9.7.2 Tho Iuzzy Aroach 116
9.7.3 Sono bsorvaIions 117
Qucs|icn 8an| 118
Rcfcrcnccs 118
Part II: Neural Networks
Chapter 10: Neural Networks Fundamentals 121-128
1O.1 InIroducIion 121
1O.2 BioIogicaI NouraI NoIvork 121
1O.3 A Iranovork for DisIribuIod RorosonIaIion 122
1O.3.1 Irocossing IniIs 123
1O.3.2 ConnocIions boIvoon IniIs 123
1O.3.3 AcIivaIion and uIuI RuIos 124
1O.4 NoIvork TooIogios 125
1O.5 Training of ArIificiaI NouraI NoIvorks 125
1O.5.1 Iaradigns of Loarning 125
1O.5.2 Modifying IaIIorns of ConnocIiviIy 126
1O.6 NoIaIion and TorninoIogy 126
1O.6.1 NoIaIion 126
1O.6.2 TorninoIogy 127
Qucs|icn 8an| 128
Rcfcrcnccs 128
Chapter 11: Perceptron and Adaline 129-138
11.1 InIroducIion 129
11.2 NoIvorks viIh ThroshoId AcIivaIion IuncIions 129
11.3 IorcoIron Loarning RuIo and Convorgonco Thooron 131
11.3.1 IorcoIron Loarning RuIo 131
11.3.2 Convorgonco Thooron 131
11.4 AdaIivo Linoar LIononI (AdaIino) 133
xiv CNTLNTS
11.5 Tho DoIIa RuIo 134
11.6 LxcIusivo-or IrobIon 135
11.7 MuIIi-Iayor IorcoIrons Can do LvoryIhing 137
Qucs|icn 8an| 138
Rcfcrcnccs 138
Chaptcr 12: Back-Propagation 139-156
12.1 InIroducIion 139
12.2 MuIIi - Layor Iood - Iorvard NoIvorks 139
12.3 Tho ConoraIisod DoIIa RuIo 14O
12.3.1 IndorsIanding Back-IroagaIion 142
12.4 Working viIh Back-roagaIion 143
12.4.1 WoighI Ad|usInonIs viIh Signoid AcIivaIion IuncIion 143
12.4.2 Loarning RaIo and MononIun 144
12.4.3 Loarning Ior IaIIorn 144
12.5 Ihor AcIivaIion IuncIions 146
12.6 Doficioncios of Back-roagaIion 146
12.6.1 NoIvork IaraIysis 148
12.6.2 LocaI Minina 148
12.7 Advancod AIgoriIhns 148
12.8 Iov Cood aro MuIIi-Iayor Iood-forvard NoIvorks? 151
12.8.1 Tho LffocI of Iho Nunbor of Loarning SanIos 152
12.8.2 Tho LffocI of Iho Nunbor of Iiddon IniIs 153
12.9 AIicaIions 153
Qucs|icn 8an| 155
Rcfcrcnccs 155
Chapter 13: Recurrent Networks 157-189
13.1 InIroducIion 157
13.2 Tho ConoraIisod DoIIa - RuIo In RocurronI NoIvorks 157
13.2.1 Tho Jordan NoIvork 158
13.2.2 Tho LInan NoIvork 159
13.2.3 Back-IroagaIion in IuIIy RocurronI NoIvorks 161
13.3 Tho IofioId NoIvork 161
13.3.1 DoscriIion 162
13.3.2 IofioId NoIvork as AssociaIivo Monory 163
13.3.3 Nourons viIh gradod rosonso 164
13.3.4 IofioId noIvorks for oIinizaIion robIons 164
13.4 BoIIznann Machinos 165
Qucs|icn 8an| 167
Rcfcrcnccs 167
CNTLNTS xv
Chapter 14: Self-Organising Networks 169
14.1 InIroducIion 169
14.2 ConoIiIivo Loarning 17O
14.2.1 CIusIoring 17O
14.2.2 VocIor QuanIisaIion 174
14.2.3 CounIor roagaIion 174
14.2.4 Loarning VocIor QuanIisaIion 176
14.3 Kohonon NoIvork 177
14.4 IrinciaI ConononI NoIvorks 179
14.4.1 NornaIizod Iobbian RuIo 18O
14.4.2 IrinciaI ConononI LxIracIor 181
14.4.3 Moro oigonvocIors 181
14.5 AdaIivo Rosonanco Thoory 182
14.5.1 Background: AdaIivo Rosonanco Thoory 182
14.5.2 ART1: Tho SinIifiod NouraI NoIvork ModoI 183
14.5.3 oraIion 184
14.5.4 ART 1: Tho riginaI ModoI 185
14.5.5 NornaIizaIion of Iho riginaI ModoI 186
14.5.6 ConIrasI onhancononI 187
Qucs|icn 8an| 188
Rcfcrcnccs 188
Chapter 15: Reinforcement Learning 190-198
15.1 InIroducIion 19O
15.2 Tho CriIic 19O
15.3 Tho ConIroIIor NoIvork 191
15.4 BarIos Aroach: Tho ASL-ACL ConbinaIion 192
15.4.1 AssociaIivo Soarch 193
15.4.2 AdaIivo CriIic 194
15.4.3 Tho CarI-IoIo SysIon 194
15.5 RoinforcononI Loarning Vorsus IinaI ConIroI 195
Qucs|icn 8an| 197
Rcfcrcnccs 197
Chapter 16: Neural Networks Applications 199-215
16.1 InIroducIion 199
16.2 RoboI ConIroI 2OO
16.2.1 Iorvard KinonaIics 2OO
16.2.2 Invorso KinonaIics 2OO
16.2.3 Dynanics 2O1
16.2.4 Tra|ocIory gonoraIion 2O1
xvi CNTLNTS
16.2.5 Lnd-LffocIor IosiIioning 2O1
16.2.5a InvoIvononI of nouraI noIvorks 2O2
16.2.6 Canora-RoboI CoordinaIion in IuncIion AroxinaIion 2O2
16.2.6a Aroach-1: Iood-forvard NoIvorks 2O3
16.2.6b Aroach 2: TooIogy consorving nas 2O6
16.2.7 RoboI Arn Dynanics 2O7
16.3 DoIocIion of TooI Broakago in MiIIing oraIions 21O
16.3.1 Insuorvisod AdaIivo Rosonanco Thoory (ART) NouraI NoIvorks 211
16.3.2 RosuIIs and Discussion 213
Qucs|icn 8an| 215
Rcfcrcnccs 215
Part III: Hybrid Fuzzy Neural Networks
Chapter 17: Hybrid Fuzzy Neural Networks 217-232
17.1 InIroducIion 217
17.2 Iybrid SysIons 217
17.2.1 SoquonIiaI Iybrid SysIons 217
17.2.2 AuxiIiary Iybrid SysIons 218
17.2.3 Lnboddod Iybrid SysIons 218
17.3 Iuzzy Logic in Loarning AIgoriIhns 219
17.4 Iuzzy Nourons 22O
17.5 NouraI NoIvorks as Iro-rocossors or IosI-rocossors 221
17.6 NouraI NoIvorks as Tunors of Iuzzy Logic SysIons 222
17.7 AdvanIagos and Dravbacks of Nourofuzzy SysIons 223
17.8 ConniIIoo of NoIvorks 223
17.9 Inn ArchiIocIuro Basod n Back IroagaIion 224
17.9.1 SIrong L-R RorosonIaIion of Iuzzy Nunbors 226
17.9.2 SinuIaIion 228
17.1O AdaIivo Nouro-fuzzy Inforonco SysIon (ANIIS) 229
17.1O.1 ANIIS SIrucIuro 231
Qucs|icn 8an| 232
Rcfcrcnccs 232
Chapter 18: Hybrid Fuzzy Neural Networks Applications 233-252
18.1 InIroducIion 233
18.2 TooI Broakago MoniIoring SysIon for ond MiIIing 233
18.2.1 MoIhodoIogy: Iorco signaIs in Iho ond niIIing cuIIing rocoss 234
18.2.2 NouraI NoIvorks 235
18.2.3 LxorinonIaI Dosign and SysIon DovoIononI LxorinonIaI Dosign 237
CNTLNTS xvii
18.2.4 NouraI NoIvork-BI SysIon DovoIononI 238
18.2.5 Iindings and ConcIusions 241
18.3 ConIroI of ConbusIion 243
18.3.1 AdaIivo nouro-fuzzy inforonco sysIon 243
18.3.2 Loarning MoIhod of ANIIS 245
18.3.3 ModoI of ConbusIion 246
18.3.4 IinizaIion of Iho II-ConIroIIors using ConoIic AIgoriIhns 247
Qucs|icn 8an| 251
Rcfcrcnccs 251
1@AN #!
This page
intentionally left
blank
Nowadays, fuzzy logic, neural netvorks have rooted in many appIication areas (expert systems, pattern
recognition, system controI, etc.). AIthough these methodoIogies seem to be diIIerent, they have many
common Ieatures - Iike the use oI basis Iunctions (Iuzzy Iogic has membership Iunctions and neuraI
networks have activation Iunctions) and the aim to estimate Iunctions Irom sampIe data or heuristics.
Fuzzy Iogic is mainIy associated to imprecision, approximate reasoning and computing with words,
and neuraI networks to Iearning and curve Iitting (aIso to cIassiIication). These methods have in
common that they
are non-Iinear,
have abiIity to deaI with non-Iinearities,
IoIIow more human-Iike reasoning paths than cIassicaI methods,
utiIize seII-Iearning,
1.1 FUZZY LOGIC {FL)
The concept oI Fuzzy Logic was conceived by LotIi A. Zadeh, a proIessor at the University oI
CaIiIornia at BerkIey, and presented not as a controI methodoIogy, but as a way oI processing data by
aIIowing partiaI set membership rather than crisp set membership or non-membership. This approach to
set theory was not appIied to controI systems untiI the 70`s due to insuIIicient smaII-computer capabiIity
prior to that time. ProIessor Zadeh reasoned that peopIe do not require precise, numericaI inIormation
input, and yet they are capabIe oI highIy adaptive controI. II Ieedback controIIers couId be programmed
to accept noisy, imprecise input, they wouId be much more eIIective and perhaps easier to impIement.
UnIortunateIy, U.S. manuIacturers have not been so quick to embrace this technoIogy whiIe the
Europeans and Japanese have been aggressiveIy buiIding reaI products around it.
BasicaIIy, FL is a muItivaIued Iogic that aIIows intermediate vaIues to be deIined between
conventionaI evaIuations Iike true/IaIse, yes/no, high/Iow, etc. Notions Iike rather taII or very Iast can be
IormuIated mathematicaIIy and processed by computers, in order to appIy a more human-Iike way oI
thinking in the programming oI computers.
1
1JH@K?JE
+ 0 ) 2 6 - 4
2 IIZZY LCIC AND NLIRAL NLTWRKS
Fuzzy systems is an aIternative to traditionaI notions oI set membership and Iogic that has its
origins in ancient Greek phiIosophy. The precision oI mathematics owes its success in Iarge part to the
eIIorts oI AristotIe and the phiIosophers who preceded him. In their eIIorts to devise a concise theory oI
Iogic, and Iater mathematics, the so-caIIed 'Laws oI Thought were posited. One oI these, the 'Law oI
the ExcIuded MiddIe, states that every proposition must either be True or FaIse. Even when
Parminedes proposed the Iirst version oI this Iaw (around 400 B.C.) there were strong and immediate
objections: Ior exampIe, HeracIitus proposed that things couId be simuItaneousIy True and not True. It
was PIato who Iaid the Ioundation Ior what wouId become Iuzzy Iogic, indicating that there was a third
region (beyond True and FaIse) where these opposites 'tumbIed about. But it was Lukasiewicz who
Iirst proposed a systematic aIternative to the bi-vaIued Iogic oI AristotIe.
II the conventionaI techniques oI system anaIysis cannot be successIuIIy incorporated to the
modeIing or controI probIem, the use oI heuristic Iinguistic ruIes may be the most reasonabIe soIution to
the probIem. For exampIe, there is no mathematicaI modeI Ior truck and traiIer reversing probIem, in
which the truck must be guided Irom an arbitrary initiaI position to a desired IinaI position. Humans and
Iuzzy systems can perIorm this nonIinear controI task with reIative ease by using practicaI and at the
same time imprecise ruIes as 'II the traiIer turns sIightIy IeIt, then turn the wheeI sIightIy IeIt.
The most signiIicant appIication area oI FL has been in controI IieId. It has been made a rough guess
that 90 oI appIications are in controI (the main part deaIs with rather simpIe appIications, see Fig. 1.1).
Fuzzy controI incIudes Ians, compIex aircraIt engines and controI surIaces, heIicopter controI, missiIe
guidance, automatic transmission, wheeI sIip controI, industriaI processes and so on. CommerciaIIy
most signiIicant have been various househoId and entertainment eIectronics, Ior exampIe washing
machine controIIers and autoIocus cameras. The most Iamous controIIer is the subway train controIIer in
Sengai, Japan. Fuzzy system perIorms better (uses Iess IueI, drives smoother) when compared with a
conventionaI PID controIIer. Companies that have Iuzzy research are GeneraI EIectric, Siemens, Nissan,
Mitsubishi, Honda, Sharp, Hitachi, Canon, Samsung, Omron, Fuji, McDonneII DougIas, RockweII, etc.
Fuzzy
Controller
Plant to be
controlled
+
Control Output

Input
1.2 NEURAL NETWORKS {NN)
The study oI neuraI networks started by the pubIication oI Mc CuIIoch and Pitts |1943|. The singIe-
Iayer networks, with threshoId activation Iunctions, were introduced by RosenbIatt |1959|. These types
oI networks were caIIed perceptrons. In the 1960s it was experimentaIIy shown that perceptrons couId
soIve many probIems, but many probIems, which did not seem to be more diIIicuIt couId not be soIved.
These Iimitations oI one-Iayer perceptron were mathematicaIIy shown by Minsky and Papert in their
book Perceptron |1969|. The resuIt oI this pubIication was that the neuraI networks Iost their
.EC Example of a control problem.
INTRDICTIN 3
interestingness Ior aImost two decades. In the mid-1980s, back-propagation aIgorithm was reported by
RumeIhart, Hinton, and WiIIiams |1986|, which revived the study oI neuraI networks. The signiIicance
oI this new aIgorithm was that muItipIayer networks couId be trained by using it.
NN makes an attempt to simuIate human brain. The simuIating is based on the present knowIedge oI
brain Iunction, and this knowIedge is even at its best primitive. So, it is not absoIuteIy wrong to cIaim
that artiIiciaI neuraI networks probabIy have no cIose reIationship to operation oI human brains. The
operation oI brain is beIieved to be based on simpIe basic eIements caIIed neurons, which are connected
to each other with transmission Iines caIIed axons and receptive Iines caIIed dendrites (see Fig. 1.2). The
Iearning may be based on two mechanisms: the creation oI new connections, and the modiIication oI
connections. Each neuron has an activation IeveI which, in contrast to BooIean Iogic, ranges between
some minimum and maximum vaIue.
dendrites
S
W
0
W
1
W
2
1
X
1
X
2
z
Summing
unit
threshold
Axon
Nucleus
Synapse
In artiIiciaI neuraI networks the inputs oI the neuron are combined in a Iinear way with diIIerent
weights. The resuIt oI this combination is then Ied into a non-Iinear activation unit (activation Iunction),
which can in its simpIest Iorm be a threshoId unit (See Fig. 1.2).
NeuraI networks are oIten used to enhance and optimize Iuzzy Iogic based systems, e.g., by giving
them a Iearning abiIity. This Iearning abiIity is achieved by presenting a training set oI diIIerent
exampIes to the network and using Iearning aIgorithm, which changes the weights (or the parameters oI
activation Iunctions) in such a way that the network wiII reproduce a correct output with the correct
input vaIues. The diIIicuIty is how to guarantee generaIization and to determine when the network is
suIIicientIy trained.
NeuraI networks oIIer nonIinearity, input-output mapping, adaptivity and IauIt toIerance.
NonIinearity is a desired property iI the generator oI input signaI is inherentIy nonIinear. The high
connectivity oI the network ensures that the inIIuence oI errors in a Iew terms wiII be minor, which
ideaIIy gives a high IauIt toIerance.
.EC Simple illustration of biological and artificial neuron (perceptron).
4 IIZZY LCIC AND NLIRAL NLTWRKS
1.3 SIMILARITIES AND DISSIMILARITIES BETWEEN FL AND NN
There are simiIarities between Iuzzy Iogic and neuraI networks:
estimate Iunctions Irom sampIe data
do not require mathematicaI modeI
are dynamic systems
can be expressed as a graph which is made up oI nodes and edges
convert numericaI inputs to numericaI outputs
process inexact inIormation inexactIy
have the same state space
produce bounded signaIs
a set oI n neurons deIines n-dimensionaI Iuzzy sets
Iearn some unknown probabiIity Iunction
can act as associative memories
can modeI any system provided the number oI nodes is suIIicient.
The main dissimiIarity between Iuzzy Iogic and neuraI network is that FL uses heuristic knowIedge
to Iorm ruIes and tunes these ruIes using sampIe data, whereas NN Iorms 'ruIes based entireIy on data.
1.4 APPLICATIONS
AppIications can be Iound in signaI processing, pattern recognition, quaIity assurance and industriaI
inspection, business Iorecasting, speech processing, credit rating, adaptive process controI, robotics
controI, naturaI-Ianguage understanding, etc. PossibIe new appIication areas are programming
Ianguages, user-IriendIy appIication interIaces, automaticized programming, computer networks,
database management, IauIt diagnostics and inIormation security.
In many cases, good resuIts have been achieved by combining both the methods. The number oI this
kind oI hybrid systems is growing. A very interesting combination is the neuro-Iuzzy architecture, in
which the good properties oI both methods are attempted to bring together. Most neuro-Iuzzy systems
are Iuzzy ruIe based systems in which techniques oI neuraI networks are used Ior ruIe induction and
caIibration. Fuzzy Iogic may aIso be empIoyed to improve the perIormance oI optimization methods
used with neuraI networks.
QUESTION BANK.
1. What is the ancient phiIosophy oI Iuzzy Iogic?
2. What are the various appIications oI Iuzzy Iogic?
3. What is the historicaI evoIution oI neuraI networks?
4. What are the simiIarities and dissimiIarities between Iuzzy Iogic and neuraI networks?
5. What are the various appIications oI neuraI networks?
INTRDICTIN 5
REFERENCES.
1. W.S. McCuIIoch and W. Pitts, A IogicaI caIcuIus oI the ideas immanent in nervous activity,
Bulletin of Mathematical Biophysics, VoI. 5, pp. 115-133, 1943.
2. F. RosenbIatt, Principles of Neurodynamics, New York: Spartan Books, pp. 23-26, 1959.
3. L.A. Zadeh, Fuzzy Sets, Information and Control, VoI. 8, 338-353, 1965.
4. S. Korner, 'Laws oI thought, Encyclopedia of Philosophy, VoI. 4, pp. 414- 417, MacMiIIan, NY:
1967.
5. C. Lejewski, 'Jan Lukasiewicz, Encyclopedia of Philosophy, VoI. 5, pp. 104-107, MacMiIIan,
NY: 1967.
6. M. Minsky and S. Papert, Perceptrons: An Introduction to Computational Geometry, The MTT
Press, 1969.
7. L.A. Zadeh, Making computers think Iike peopIe, IEEE. Spectrum, pp. 26-32, VoI. 8, 1984.
8. D.E. RumeIhart, G.E. Hinton and R.J. WiIIiams, Learning representations by backpropagating
errors, Nature, VoI. 323, pp. 533-536, 1986.
9. U.S. Loses Focus on Fuzzy Logic, Machine Design, June 21, 1990.
10. Europe Gets into Fuzzy Logic, Electronics Engineering 1imes, Nov. 11, 1991.
11. Smith, T., Why the Japanese are going in Ior this Iuzzy Iogic, Business Week, pp. 39, Feb. 20,
1993.
12. L.A. Zadeh, SoIt computing and Iuzzy Iogic. IEEE Softvare, VoI. (November), pp. 48-56. 1994.
2. 1 INTRODUCTION
Fuzzy sets were introduced by L.A Zadeh in 1965 to represent/manipuIate data and inIormation
possessing nonstatisticaI uncertainties. It was speciIicaIIy designed to mathematicaIIy represent
uncertainty and vagueness and to provide IormaIized tooIs Ior deaIing with the imprecision intrinsic to
many probIems.
Fuzzy Iogic provides an inIerence morphoIogy that enabIes approximate human reasoning
capabiIities to be appIied to knowIedge-based systems. The theory oI Iuzzy Iogic provides a
mathematicaI strength to capture the uncertainties associated with human cognitive processes, such as
thinking and reasoning. The conventionaI approaches to knowIedge representation Iack the means Ior
representing the meaning oI Iuzzy concepts. As a consequence, the approaches based on Iirst order
Iogic and cIassicaI probabiIity theory do not provide an appropriate conceptuaI Iramework Ior deaIing
with the representation oI commonsense knowIedge, since such knowIedge is by its nature both
IexicaIIy imprecise and noncategoricaI. The deveIopment oI Iuzzy Iogic was motivated in Iarge measure
by the need Ior a conceptuaI Irame work which can address the issue oI uncertainty and IexicaI
imprecision.
2.2 WHAT IS FUZZY LOGIC?
Fuzzy Iogic is aII about the reIative importance oI precision: How important is it to be exactIy right
when a rough answer wiII do? AII books on Iuzzy Iogic begin with a Iew good quotes on this very topic,
and this is no exception. Here is what some cIever peopIe have said in the past:
Precision is not truth.
Henri Matisse
Sometimes the more measurable drives out the most important.
Rene Dubos
Jagueness is no more to be done avay vith in the vorld of logic than friction in mechanics.
CharIes Sanders Peirce
2
Fuzzy Sets and Fuzzy Logic
+ 0 ) 2 6 - 4
IIZZY SLTS AND IIZZY LCIC 7
I believe that nothing is unconditionally true, and hence I am opposed to every statement of positive
truth and every man vho makes it.
H. L. Mencken
So far as the lavs of mathematics refer to reality, they are not certain. And so far as they are certain,
they do not refer to reality.
AIbert Einstein
As complexity rises, precise statements lose meaning and meaningful statements lose precision.
L. A Zadeh
Some pearIs oI IoIk wisdom aIso echo these thoughts:
Dont lose sight of the forest for the trees.
Dont be penny vise and pound foolish.
Fuzzy Iogic is a Iascinating area oI research because it does a good job oI trading oII between
signiIicance and precision - something that humans have been managing Ior a very Iong time (Fig. 2.1).
Fuzzy Iogic sometimes appears exotic or intimidating to those unIamiIiar with it, but once you become
acquainted with it, it seems aImost surprising that no one attempted it sooner. In this sense Iuzzy Iogic is
both oId and new because, aIthough the modern and methodicaI science oI Iuzzy Iogic is stiII young, the
concepts oI Iuzzy Iogic reach right down to our bones.
Fuzzy Iogic is a convenient way to map an input space to an output space. This is the starting point
Ior everything eIse, and the great emphasis here is on the word 'convenient. What do I mean by
mapping input space to output space? Here are a Iew exampIes: You teII me how good your service was
at a restaurant, and I`II teII you what the tip shouId be. You teII me how hot you want the water, and I`II
adjust the Iaucet vaIve to the right setting. You teII me how Iar away the subject oI your photograph is,
and I`II Iocus the Iens Ior you. You teII me how Iast the car is going and how hard the motor is working,
and I`II shiIt the gears Ior you.
Fig. 2.1 Precision and significance.
Precision and significance in the real world
A 1500 kg mass
is approaching
your head at
45.3 m/sec.
LOOK
OUT!!
Precision Significance
S IIZZY LCIC AND NLIRAL NLTWRKS
2.3 HISTORICAL BACKGROUND
AImost Iorty years have passed since the pubIication oI Iirst paper on Iuzzy sets. Where do we stand
today? In viewing the evoIution oI Iuzzy Iogic, three principaI phases may be discerned.
The Iirst phase, Irom 1965 to 1973, was concerned in the main with IuzziIication, that is, with
generaIization oI the concept oI a set, with two-vaIued characteristic Iunction generaIized to a
membership Iunction taking vaIues in the unit intervaI or, more generaIIy, in a Iattice. The basic issues
and appIications which were addressed were, Ior the most part, set-theoretic in nature, and Iogic and
reasoning were not at the center oI the stage.
The second phase, 1973-1999, two key concepts were introduced in this paper: (a) the concept oI a
Iinguistic variabIe; and (b) the concept oI a Iuzzy iI-then ruIe. Today, aImost aII appIications oI Iuzzy set
theory and Iuzzy Iogic invoIve the use oI these concepts.
The term fuzzy logic was used Ior the Iirst time in 1974. Today, fuzzy logic is used in two diIIerent
senses: (a) a narrow sense, in which Iuzzy Iogic, abbreviated as FLn, is a IogicaI system which is a
generaIization oI muItivaIued Iogic; and (b) a wide sense, in which Iuzzy Iogic, abbreviated as FL, is a
union oI FLn, Iuzzy set theory, possibiIity theory, caIcuIus oI Iuzzy iI-then ruIes, Iuzzy arithmetic,
caIcuIus oI Iuzzy quantiIiers and reIated concepts and caIcuIi. The distinguishing characteristic oI FL is
that in FL everything is, or is aIIowed to be, a matter oI degree. Today, the term fuzzy logic is used, Ior
the most part, in its wide sense.
Perhaps the most striking deveIopment during the second phase oI the evoIution was the naissance
and rapid growth oI Iuzzy controI, aIongside the boom in Iuzzy Iogic appIications, especiaIIy in Japan.
There were many other major deveIopments in Iuzzy-Iogic-reIated basic and appIied theories, among
them the genesis oI possibiIity theory and possibiIistic Iogic, knowIedge representation, decision
anaIysis, cIuster anaIysis, pattern recognition, Iuzzy arithmetic; Iuzzy mathematicaI programming,
Iuzzy topoIogy and, more generaIIy, Iuzzy mathematics. Fuzzy controI appIications proIiIerated but
their dominance in the Iiterature became Iess pronounced.
SoIt computing came into existence in 1981, with the Iaunching oI BISC (BerkeIey Initiative in
SoIt Computing) at UC BerkeIey. BasicaIIy, soIt computing is a coaIition oI methodoIogies which
coIIectiveIy provide a Ioundation Ior conception, design and utiIization oI inteIIigent systems. The
principaI members oI the coaIition are: Iuzzy Iogic, neurocomputing, evoIutionary computing,
probabiIistic computing, chaotic computing, rough set theory and machine Iearning. The basic tenet oI
soIt computing is that, in generaI, better resuIts can be obtained through the use oI constituent
methodoIogies oI soIt computing in combination rather than in a stand-aIone mode. A combination
which has attained wide visibiIity and importance is that oI neuro-Iuzzy systems. Other combinations,
e.g., neuro-Iuzzy-genetic systems, are appearing, and the impact oI soIt computing is growing on both
theoreticaI and appIied IeveIs.
An important deveIopment in the evoIution oI Iuzzy Iogic, marking the beginning oI the third
phase, 1996 is the genesis oI computing with words and the computationaI theory oI perceptions.
BasicaIIy, deveIopment oI computing with words and perceptions brings together earIier strands oI
Iuzzy Iogic and suggests that scientiIic theories shouId be based on Iuzzy Iogic rather than on
AristoteIian, bivaIent Iogic, as they are at present. A key component oI computing with words is the
concept oI Precisiated NaturaI Language (PNL). PNL opens the door to a major enIargement oI the roIe
IIZZY SLTS AND IIZZY LCIC 9
oI naturaI Ianguages in scientiIic theories. It may weII turn out to be the case that, in coming years, one
oI the most important appIication-areas oI Iuzzy Iogic, and especiaIIy PNL, wiII be the Internet,
centering on the conception and design oI search engines and question-answering systems.
From its inception, Iuzzy Iogic has been (and to some degree stiII is) an object oI skepticism and
controversy. In part, skepticism about Iuzzy Iogic is a reIIection oI the Iact that, in EngIish, the word
fuzzy is usuaIIy used in a pejorative sense. But, more importantIy, Ior some Iuzzy Iogic is hard to accept
because by abandoning bivaIence it breaks with centuries-oId tradition oI basing scientiIic theories on
bivaIent Iogic. It may take some time Ior this to happen, but eventuaIIy abandonment oI bivaIence wiII
be viewed as a IogicaI deveIopment in the evoIution oI science and human thought.
2.4 CHARACTERISTICS OF FUZZY LOGIC
Some oI the essentiaI characteristics oI Iuzzy Iogic reIate to the IoIIowing:
In Iuzzy Iogic, exact reasoning is viewed as a Iimiting case oI approximate reasoning.
In Iuzzy Iogic, everything is a matter oI degree.
In Iuzzy Iogic, knowIedge is interpreted a coIIection oI eIastic or, equivaIentIy, Iuzzy constraint
on a coIIection oI variabIes.
InIerence is viewed as a process oI propagation oI eIastic constraints.
Any IogicaI system can be IuzziIied.
2.5 CHARACTERISTICS OF FUZZY SYSTEMS
There are two main characteristics oI Iuzzy systems that give them better perIormance Ior speciIic
appIications:
Fuzzy systems are suitabIe Ior uncertain or approximate reasoning, especiaIIy Ior the system with
a mathematicaI modeI that is diIIicuIt to derive.
Fuzzy Iogic aIIows decision making with estimated vaIues under incompIete or uncertain
inIormation.
2.6 FUZZY SETS
2.6.1 Fuzzy Set
Let Xbe a nonempty set. A Iuzzy set Ain Xis characterized by its membership Iunction (Fig. 2.2).

A
: X |0,1| ...(2.1)
and
A
(x) is interpreted as the degree oI membership oI eIement xin Iuzzy set AIor each xe X.
It is cIear that Ais compIeteIy determined by the set oI tupIes
A (u,
A
(u)) ,ue X} ...(2.2)
1 IIZZY LCIC AND NLIRAL NLTWRKS
FrequentIy we wiII write A(x) instead oI A(x). The IamiIy oI aII Iuzzy sets in Xis denoted by F(X).
II X x
1
, ..., x
n
} is a Iinite set and Ais a Iuzzy set in Xthen we oIten use the notation
A
1
/x
1
...
n
/x
n . .
...(2.3)
where the term
i
/ x
i
, i1,...,nsigniIies that
i
is the grade oI membership oI x
i
in Aand the pIus sign
represents the union.
Example 2.1: The membership Iunction (Fig. 2.3) oI the Iuzzy set oI reaI numbers "cIose to 1", can be
deIined as
A(t) exp((t 1)
2
)
where is a positive reaI number.
1
2 1 0 1 2 3 4
Fig. 2.2 A discrete membership function for "N is close to 1.
1
1 1 2 3 4
Example 2.2: Assume someone wants to buy a cheap car. Cheap can be represented as a Iuzzy set on
a universe oI prices, and depends on his purse (Fig. 2.4). For instance, Irom the Figure cheap is roughIy
interpreted as IoIIows:
BeIow Rs. 300000 cars are considered as cheap, and prices make no reaI diIIerence to buyer`s
eyes.
Fig. 2.3 A membership function for "N is close to 1.
Rs. 300000
1
Rs. 450000 Rs. 600000
Fig. 2.4 Membership function of "cheap".
IIZZY SLTS AND IIZZY LCIC 11
Between Rs. 300000 and Rs. 450000, a variation in the price induces a weak preIerence in Iavor
oI the cheapest car.
Between Rs. 450000and Rs. 600000, a smaII variation in the price induces a cIear preIerence in
Iavor oI the cheapest car.
Beyond Rs. 600000 the costs are too high (out oI consideration).
2.6.2 Support
Let Abe a Iuzzy subset oI X; the support oI A, denoted supp (A), is the crisp subset oI Xwhose eIements
aII have nonzero membership grades in A.
supp(A) xeX [ A(x) ~0}. ...(2.5)
2.6.3 NormaI Fuzzy Set
A Iuzzy subset AoI a cIassicaI set Xis caIIed normaI iI there exists an xeX such that A(x) 1. Otherwise
A is subnormaI.
2.6.4 =-Cut
An a-IeveI set oI a Iuzzy set A oI X is a non-Iuzzy set denoted by |A|
o
and is deIined by
|A|
o

, ( ) }
(sup )
t X A t
cl A
r o o
o
> R
S
T
iI ~
p iI 0
0
...(2.6)
where cl (supp A) denotes the cIosure oI the support oI A.
Example 2.3: Assume
X 2, 1, 0, 1, 2, 3, 4} and
A 0.0/2 0.3/1 0.6/0 1.0/1 0.6/2 0.3/3 0.0/4.
In this case
|A|
o

, , , , } .
, , } . .
}
s s
< s
s
R
S
|
T
|
1 0 1 2 3 03
0 1 2 03 06
1 1
iI 0
iI
iI 0.6
o
o
o
2.6.5 Convex Fuzzy Set
A Iuzzy set A oI X is caIIed convex iI |A|
o
is a convex subset oI XVo e |0, 1|. An o-cut oI a trianguIar
Iuzzy number is shown in Fig. 2.5.
In many situations peopIe are onIy abIe to characterize numeric inIormation impreciseIy. For
exampIe, peopIe use terms such as, about 5000, near zero, or essentiaIIy bigger than 5000. These are
exampIes oI what are caIIed Iuzzy numbers. Using the theory oI Iuzzy subsets we can represent these
Iuzzy numbers as Iuzzy subsets oI the set oI reaI numbers. More exactIy,
12 IIZZY LCIC AND NLIRAL NLTWRKS
2.6.6 Fuzzy Number
A Iuzzy number (Fig. 2.6) Ais a Iuzzy set oI the reaI Iine with a normaI, (Iuzzy) convex and continuous
membership Iunction oI bounded support. The IamiIy oI Iuzzy numbers wiII be denoted by F .
a cut
a
Fig. 2.5 An o-cut of a triangular fuzzy number.
2 1 1 2 3
1
2.6.7 QuasI Fuzzy Number
A quasi-Iuzzy number A is a Iuzzy set oI the reaI Iine with a normaI, Iuzzy convex and continuous
membership Iunction satisIying the Iimit conditions
Iim
1~
A(t) 0 ...(2.7)
Let Abe a Iuzzy number. Then |A|
y
is a cIosed convex (compact) subset oI JIor aII y e |0,1|. Let us
introduce the notations
a
1
(y) min|A|
y
, a
2
(y) max|A|
y
...(2.8)
In other words, a
1
(y) denotes the IeIt-hand side and a
2
(y) denotes the right-hand side oI the y - cut.
It is easy to see that
iI o s then |A|
o
. |A|

...(2.9)
Furthermore, the IeIt-hand side Iunction
a
1
: |0, 1| J ...(2.10)
is monotone increasing and Iower semicontinuous, and the right-hand side Iunction
a
2
: |0, 1| J ...(2.11)
is monotone decreasing and upper semicontinuous.
Fig. 2.6 Fuzzy number.
IIZZY SLTS AND IIZZY LCIC 13
We shaII use the notation
|A|
y
|a
1
(y), a
2
(y)| ...(2.12)
The support oI A is the open intervaI |a
1
(0), a
2
(0)| and it is iIIustrated in Fig. 2.7.
A
1
a
1
( ) g
g
a
2
( ) g
a
1
(0) a
2
(0)
Fig. 2.7 The support of ) is [=
1
(0), =
2
(0)].
II A is not a Iuzzy number then there exists an ye|0, 1| such that |A|
y
is not a convex subset oI R.
The not Iuzzy number is shown in Fig. 2.8.
1
3 2 1 1 2 3
2.6.S TrIanguIar Fuzzy Number
A Iuzzy set A is caIIed trianguIar Iuzzy number with peak (or center) a, IeIt width o ~0 and right width
~0 iI its membership Iunction has the IoIIowing Iorm
A(t)
1
1
0


s s


s s +
R
S
|
|
|
T
|
|
|
a t
a t a
a t
a t a
o
o

iI
iI
otherwise
...(2.13)
and we use the notation A (o, o, ). It can easiIy be veriIied that
|A|
y
|a (1 y)o, a (1 y)|, V
y
e|0,1| ...(2.14)
The support oI A is (a o, b ).
A trianguIar Iuzzy number (Fig. 2.9) with center a may be seen as a Iuzzy quantity 'xis
approximateIy equaI to a.
Fig. 2.8 Not fuzzy number.
14 IIZZY LCIC AND NLIRAL NLTWRKS
2.6.9 TrapezoIdaI Fuzzy Number
A Iuzzy set Ais caIIed trapezoidaI Iuzzy number with toIerance intervaI |a,b|, IeIt width and right width
iI its membership Iunction has the IoIIowing Iorm.
A(t)
1
1
0


s s
s s

s s +
R
S
|
|
|
T
|
|
|
a t
a t a
a t b
t b
a t b
o
o

iI
iI
1 iI
otherwise

...(2.15)
and we use the notation A (a,b,o, ). It can easiIy be shown that
|A|
y
|a (1 y)o, b (1 y)|, V
y
e|0, 1| ...(2.16)
The support oIis (a o,b ).
A trapezoidaI Iuzzy number (Fig. 2.10) may be seen as a Iuzzy quantity 'xis approximateIy in the
intervaI |a,b|.
1
a a a a + b
Fig. 2.9 Triangular fuzzy number.
a a a b b + b
1
Fig. 2.10 Trapezoidal fuzzy number.
2.6.1 Subsethood
Let Aand Bare Iuzzy subsets oI a cIassicaI set X. We say that Ais a subset oI BiI A(t) s B(t), Vt e X. The
subsethood is iIIustrated in Fig. 2.11.
IIZZY SLTS AND IIZZY LCIC 15
2.6.11 EquaIIty of Fuzzy Sets
Let Aand Bare Iuzzy subsets oI a cIassicaI set X. Aand Bare said to be equaI, denoted A B, iI A

c B
and B

c A. We note that A B iI and onIy iI A(x) B(x) Ior x e X.
2.6.12 Empty Fuzzy Set
The empty Iuzzy subset oI Xis deIined as the Iuzzy subset O oI Xsuch that O(x) 0 Ior each xe X. It
is easy to see that O

c A hoIds Ior any Iuzzy subset AoI X.
2.6.13 UnIversaI Fuzzy Set
The Iargest Iuzzy set in X, caIIed universaI Iuzzy set (Fig. 2.12) in X, denoted by 1
X
, is deIined by
1
X
(t) 1, Vt e X. It is easy to see that Ac 1
X
hoIds Ior any Iuzzy subset AoI X.
B
A
Fig. 2.11 ) is a subset of *
1
1x
x
10
2.6.14 Fuzzy PoInt
Let Abe a Iuzzy number. II supp (A) x
0
}, then Ais caIIed a Iuzzy point (Fig. 2.13) and we use the
notation A x
0
.
Fig. 2.12 The graph of the universal fuzzy subset in : = [0 10].
1
X
0
X
0
Fig. 2.13 Fuzzy point.
16 IIZZY LCIC AND NLIRAL NLTWRKS
Let A x
0
be a Iuzzy point. It is easy to see that
|A|
y
|x
0
, x
0
| x
0
}, V
y
e |0, 1| ...(2.17)
2.7 OPERATIONS ON FUZZY SETS
We extend the cIassicaI set theoretic operations Irom ordinary set theory to Iuzzy sets. We note that aII
those operations which are extensions oI crisp concepts reduce to their usuaI meaning when the Iuzzy
subsets have membership degrees that are drawn Irom 0, 1}. For this reason, when extending
operations to Iuzzy sets we use the same symboI as in set theory.
Let A and B are Iuzzy subsets oI a nonempty (crisp) set X.
2.7.1 IntersectIon
The intersection oI Aand B is deIined as
(A B)(t) minA(t), B(t)} A(t) r B(t) Ior aII t e X ...(2.18)
The intersection oI A and B is shown in Fig. 2.14.
2.7.2 UnIon
The union oI A and B is deIined as
(A . B) (t) maxA(t), B(t)} A(t) v B(t) Ior aII t e X ...(2.19)
The union oI two trianguIar numbers is shown in Fig. 2.15.
Fig. 2.14 Intersection of two triangular fuzzy numbers.
Fig. 2.15 Union of two triangular fuzzy numbers.
A B
A
B
IIZZY SLTS AND IIZZY LCIC 17
2.7.3 CompIement
The compIement oI a Iuzzy set A is deIined as
(A)(t) 1 A(t) ...(2.20)
A cIoseIy reIated pair oI properties which hoId in ordinary set theory are the Iaw oI excIuded middIe
A v

A X ...(2.21)
and the Iaw oI non-contradiction principIe
A

r A o ...(2.22)
It is cIear that 1
X
o and o 1
X
, however, the Iaws oI excIuded middIe and noncontradiction are
not satisIied in Iuzzy Iogic.
Lemma-2.1: The Iaw oI excIuded middIe is not vaIid. Let A(t) 1/2, Vt e R, then it is easy to see that
(A v A)(t) maxA(t), A(t)}
max1 1/2, 1/2}
1/2 = 1
Lemma-2.2: The Iaw oI non-contradiction is not vaIid. Let A(t) 1/2, Vt e R, then it is easy to see
that
(A v A)(t) mixA(t), A(t)}
mix1 1/2, 1/2}
1/2 = 0
However, Iuzzy Iogic does satisIy De Morgan`s Iaws
(A r B) A v B, (A v B A r B)
QUESTION BANK.
1. What is Iuzzy Iogic?
2. ExpIain the evoIution phases oI Iuzzy Iogic.
3. What are the characteristics oI Iuzzy Iogic?
4. What are the characteristics oI Iuzzy systems?
5. What are the diIIerent Iuzzy sets? DeIine them.
6. What are the roIes oI o-cut in Iuzzy set theory?
7. What are the diIIerent Iuzzy numbers? DeIine them.
8. DeIine the IoIIowing: (i) equaIity oI Iuzzy sets, (ii) empty Iuzzy set, (iii) universaI Iuzzy set.
9. What are the operations on Iuzzy sets? ExpIain with exampIes.
10. Given A a, b, c, 1, 2} and B 1, 2, 3, b, c}. Find A . B, and A B.
11. Given X 1, 2, 3, 4, 5, 6,} and A 2, 4, 6}. Find A.
12. Let A be a Iuzzy set deIined by
A 0.5/x
1
0.4/x
2
0.7/x
3
0.8/x
4
1/x
5
. List aII o-cuts.
1S IIZZY LCIC AND NLIRAL NLTWRKS
REFERENCES.
1. L.A. Zadeh, Fuzzy Sets, Information and Control, VoI. 8, 338-353, 1965.
2. L.A. Zadeh, Fuzzy aIgorithms, Information and Control, VoI. 12, pp. 94-102, 1968.
3. J. G. Brown, A note on Iuzzy sets, Information and Control, VoI. 18, No. 1, pp. 32-39, 1971.
4. L.A. Zadeh, A Iuzzy-set-theoretic interpretation oI Iinguistic hedges. Journal of Cybernetics, VoI.
2, No. 2, pp. 4-34. 1972.
5. A. DeLuca, and S. Termini, AIgebraic properties oI Iuzzy sets, Journal of Mathematics Analysis
and Applications, VoI. 40, No. 2, pp. 373-386, 1972.
6. L.A. Zadeh, OutIine oI a new approach to the anaIysis oI compIex systems and decision process,
IEEE Transactions on systems, Man and Cybernetics, VoI. SMC-3, No. 1, pp. 28-44, 1973.
7. L.A. Zadeh, The concept oI a Iinguistic variabIe and its appIication to approximate reasoning,
Part 1, Information Sciences, VoI. 8, pp. 199-249. 1974.
8. L.A. Zadeh, The concept oI a Iinguistic variabIe and its appIication to approximate reasoning,
Part 2, Information Sciences, VoI. 8, pp. 301-357. 1975.
9. P. AIbert, The aIgebra oI Iuzzy Iogic, Fuzzy Sets and Systems, VoI. 1, No. 3, pp. 203-230, 1978.
10. Dubois, D. and H. Prade, Operations on Fuzzy Numbers, International Journal of Systems
Science, VoI. 9, pp. 613-626, 1978.
11. S. WatanabIe, A generaIized Iuzzy set theory, IEEE 1ransactions, on systems, man and
cybernetics, VoI. 8, No. 10, pp. 756-759, 1978.
12. S. GottwaId, Set theory Ior Iuzzy sets oI higher IeveI, Fuzzy Sets and Systems, VoI. 2, No. 2, pp.
125-151, 1979.
13. L.A. Zadeh, PossibiIity theory and soIt data anaIysis, Mathematics Frontiers oI the SociaI and
PoIicy Sciences, L.Cobb and R.M. ThroII (eds.), pp. 69-129, Westview Press, BouIder, 1981.
14. L.A. Zadeh, Making computers think Iike peopIe, IEEE Spectrum, VoI.8, pp. 26-32, 1984.
15. L.A. Zadeh, Fuzzy Iogic, IEEE Computer, VoI. 21, No. 4, pp. 83-93, 1988.
16. U. HohIe and L.N. Stout, Foundations oI Iuzzy sets, Fuzzy Sets and Systems, VoI. 40, No. 2,
pp.257-296, 1991.
17. L.A. Zadeh, SoIt computing and Iuzzy Iogic. IEEE Softvare, VoI. (November), pp. 48-56. 1994.
18. L.A. Zadeh, Fuzzy Iogic, neutraI networks and soIt computing, Communications of the ACM,
VoI. 37, No. 3, pp. 77-84, 1994.
19. L.A. Zadeh, Fuzzy Iogiccomputing with words, IEEE 1ransactions on Fuzzy Systems, VoI.4,
No. 2, pp. 103-111, 1996.
20. L.A. Zadeh, RoIes oI soIt computing and Iuzzy Iogic in the conception, design and depIoyment oI
inIormation/inteIIigent systems, ComputationaI InteIIigence: SoIt Computing and Fuzzy-Neuro
Integration with AppIications, edited by O. Kaynak, L. A. Zadeh, B. Turksen, and I. J. Rudas,
Springer-VerIag, BerIin, pp.10-37, 1998.
3.1 INTRODUCTION
A cIassicaI reIation can be considered as a set oI tupIes, where a tupIe is an ordered pair. A binary tupIe
is denoted by (u, v), an exampIe oI a ternary tupIe is (u, v, v) and an exampIe oI n-ary tupIe is (X
1
, ..., X
n
).
Example 3.1: Let X be the domain oI man John, CharIes, James} and Y the domain oI women
Diana, Rita, Eva}, then the reIation 'married to on X Y is, Ior exampIe
(CharIes, Diana), (John, Eva), (James, Rita)}
3.2 FUZZY RELATIONS
3.2.1 CIassIcaI N-Array ReIatIon
Let X
1
, ..., X
n
be cIassicaI sets. The subsets oI the Cartesian product X
1
x .x X
n
are caIIed n-ary
reIations. II X
1
... X
n
and R c X
n
, then R is caIIed an n-ary reIation in X. Let R be a binary reIation in
J. Then the characteristic Iunction oI R is deIined as
X
R
(u, v)
1
0
iI
otherwise
( , ) u v R e R
S
T
...(3.1)
Example 3.2 Consider the IoIIowing reIation
(u, v)eR . ue|a, b| and v e|0, c| ...(3.2)
X
R
(u, v)
1 0
0
iI
otherwise
( , ) | , | | , | u v a b x c e R
S
T
3
Fuzzy ReIations
+ 0 ) 2 6 - 4
IIZZY RLLATINS 21
Consider the reIation 'mod 3 on naturaI numbers
(m, n) , (n m) mod 3 0}
This is an equivaIence reIation.
3.2.1 BInary Fuzzy ReIatIon
Let X and Y be nonempty sets. A Iuzzy reIation R is a Iuzzy subset oI X Y.
In other words, R e F (X Y).
II X Y then we say that R is a binary Iuzzy reIation in X.
Let R be a binary Iuzzy reIation on R. Then R (u, v) is interpreted as the degree oI membership oI the
ordered pair (u, v) in R.
Example 3.5: A simpIe exampIe oI a binary Iuzzy reIation on U 1, 2, 3}, caIIed 'approximateIy
equaI can be deIined as
R(1, 1) R(2, 2) R(3, 3) 1
R(1, 2) R(2, 1) R(2, 3) R(3, 2) 0.8
R(1, 3) R(3, 1) 0.3
The membership Iunction oI R is given by
R(u, v)
1
08 1
03 2
iI
iI
iI ,
u v
u v
u v



R
S
|
T
|
. , ,
. ,
In matrix notation it can be represented as
1 2 3
1 1 08 03
2 08 1 08
3 03 08 1
. .
. .
. .
L
N
M
M
M
M
O
Q
P
P
P
P
3.3 OPERATIONS ON FUZZY RELATIONS
Fuzzy reIations are very important because they can describe interactions between variabIes. Let R and
S be two binary Iuzzy reIations on X Y.
3.3.1 IntersectIon
The intersection oI R and S is deIined by
(R r S) (u, v) min R(u, v), S(u, v)} ...(3.3)
Note that R: X Y |0, 1|, i.e. R the domain oI R is the whoIe Cartesian product X Y.
22 IIZZY LCIC AND NLIRAL NLTWRKS
3.3.2 UnIon
The union oI R and S is deIined by
(R v S) (u, v) minR(u, v), S(u, v)} ...(3.4)
Example 3.6: Let us deIine two binary reIations
R 'x is considerabIe Iarger than y
y y y y
x
x
x
1 2 3 4
1
2
3
08 01 01 07
0 08 0 0
09 1 07 08
. . . .
.
. . .
L
N
M
M
M
M
O
Q
P
P
P
P
S 'x is very cIose to y
y y y y
x
x
x
1 2 3 4
1
2
3
04 0 09 06
09 04 05 07
03 0 08 05
. . .
. . . .
. . .
L
N
M
M
M
M
O
Q
P
P
P
P
The intersection oI R and S means that 'x is considerabIe Iarger than y and 'x is very cIose to y.
(R r S) (x, y)
y y y y
x
x
x
1 2 3 4
1
2
3
04 0 01 06
0 04 0 0
03 0 07 05
. . .
.
. . .
L
N
M
M
M
M
O
Q
P
P
P
P
The union oI R and S means that 'x is considerabIe Iarger than y or 'x is very cIose to y.
(R v S) (x, y)
y y y y
x
x
x
1 2 3 4
1
2
3
08 0 09 07
09 08 05 07
09 1 08 08
. . .
. . . .
. . .
L
N
M
M
M
M
O
Q
P
P
P
P
Consider a cIassicaI reIation R on J.
R(u, v)
1 0
0
iI
otherwise
( , ) | , | | , | u v a b x c e R
S
T
...(3.5)
It is cIear that the projection (or shadow) oI R on the X-axis is the cIosed intervaI |a, b| and its
projection on the Y-axis is |0, c|.
IIZZY RLLATINS 23
II R is a cIassicaI reIation in X Y, then
j
x
x e X, y e Y(x, y) e R} ...(3.6)
j
y
y e Y, x e X(x, y) e R} ...(3.7)
where j
x
denotes projection on X and j
y
denotes projection on Y.
3.3.3 ProjectIon
Let R be a Iuzzy binary Iuzzy reIation on X Y. The projection oI R on X is deIined as
j
x
(x) supR(x, y) , y e Y} ...(3.8)
and the projection oI R on Y is deIined as
j
y
(y) supR(x, y) , x e X} ...(3.9)
Example 3.7: Consider the reIation
R 'x is considerabIe Iarger than y
y y y y
x
x
x
1 2 3 4
1
2
3
08 01 01 07
0 08 0 0
09 1 07 0
. . . .
.
. .
L
N
M
M
M
M
O
Q
P
P
P
P
then the projection on X means that
x
1
is assigned the highest membership degree Irom the tupIes (x
1
, y
1
), (x
1
, y
2
), (x
1
, y
3
), (x
1
, y
4
), i.e.
j
x
(x
1
) 1, which is the maximum oI the Iirst row.
x
2
is assigned the highest membership degree Irom the tupIes (x
2
, y
1
), (x
2
, y
2
), (x
2
, y
3
), (x
2
, y
4
), i.e.
j
x
(x
2
) 0.8, which is the maximum oI the second row.
x
3
is assigned the highest membership degree Irom the tupIes (x
3
, y
1
), (x
3
, y
2
), (x
3
, y
3
), (x
3
, y
4
), i.e.
j
x
(x
3
) 1, which is the maximum oI the third row.
X
Y
.EC ! Shadows of a fuzzy relation.
24 IIZZY LCIC AND NLIRAL NLTWRKS
3.3.4 CartesIan Product of Two Fuzzy Sets
The Cartesian product oI A e F (X) and B e F (Y) is deIined as
(A B) (u, v) minA(u), B(v)} ...(3.10)
Ior aII u e X and v e Y.
It is cIear that the Cartesian product oI two Iuzzy sets (Fig. 3.3) is a Iuzzy reIation in X Y.
B
A
A B
.EC !! Cartesian product of two fuzzy sets.
II A and B are normaI, then j
y
(A B) B and j
x
(A B) A.
ReaIIy,
j
x
(x) supA B(x, y), y}
supA(x) r B(y), y}
minA(x), supB(y)}, y} ...(3.11)
minA(x), 1}
A(x)
3.3.5 Shadow of Fuzzy ReIatIon
The sup-min composition oI a Iuzzy set C e F (X) and a Iuzzy reIation R e F (X Y) is deIined as
(CoR) (y) sup minC(x), R(x, y)} ...(3.12)
Ior aII x e X and y e Y.
The composition oI a Iuzzy set C and a Iuzzy reIation R can be considered as the shadow oI the
reIation R on the Iuzzy set C (Fig. 3.4).
IIZZY RLLATINS 25
Example 3.8: Let A and B be Iuzzy numbers and Iet
R A B
a Iuzzy reIation.
Observe the IoIIowing property oI composition
A o R A o (A B) A
B o R B o (A B) B
Example 3.9: Let C be a Iuzzy set in the universe oI discourse 1, 2, 3} and Iet R be a binary Iuzzy
reIation in 1, 2, 3}. Assume that C 0.2/1 1/2 0.2/3 and
R
1 2 3
1 1 08 03
2 08 1 08
3 03 08 1
. .
. .
. .
L
N
M
M
M
M
O
Q
P
P
P
P
Using the deIinition oI sup-min composition we get
C o R (0.2/1 1/2 0.2/3) o
1 2 3
1 1 08 03
2 08 1 08
3 03 08 1
. .
. .
. .
L
N
M
M
M
M
O
Q
P
P
P
P


0.8/1 / 0.8/3
.EC !" Shadow of fuzzy relation 4 on the fuzzy set +.
(CoR) ( ) y
C x ( )
R x y ( , )
Y
X
Y
R x y ( , )
26 IIZZY LCIC AND NLIRAL NLTWRKS
Example 3.10: Let C be a Iuzzy set in the universe oI discourse |0, 1| and Iet R be a binary Iuzzy
reIation in |0, 1|. Assume that C(x) x and R(x, y) 1 ,x y,.
Using the deIinition oI sup-min composition we get
C o R(y) sup minx, 1 ,x y,}
1
2
+ y
Ior aII x e|0, 1| and y e|0, 1|
3.3.6 Sup-MIn ComposItIon of Fuzzy ReIatIons
Let R e F(X Y ) and S e F(Y Z). The sup-min composition oI R and S, denoted by R o S is deIined as
(R o S) (u, v) sup minR(u, v), S(v, v)} ...(3.13)
Ior v e Y
It is cIear that R o S is a binary Iuzzy reIation in X Z.
Example 3.11: Consider two Iuzzy reIations
R 'x is considerabIe Iarger than y
y y y y
x
x
x
1 2 3 4
1
2
3
08 01 01 07
0 08 0 0
09 1 07 08
. . . .
.
. . .
L
N
M
M
M
M
O
Q
P
P
P
P
S ' y is very cIose to z
z z z
y
y
y
y
1 2 3
1
2
3
4
04 09 03
0 04 0
09 05 08
06 07 05
. . .
.
. . .
. . .
L
N
M
M
M
M
M
M
O
Q
P
P
P
P
P
P
Then their composition is
R o S
z z z
x
x
x
1 2 3
1
2
3
06 08 05
0 04 0
07 09 07
. . .
.
. . .
L
N
M
M
M
M
O
Q
P
P
P
P
IIZZY RLLATINS 27
IormaIIy,
y y y y
x
x
x
1 2 3 4
1
2
3
08 01 01 07
0 08 0 0
09 1 07 08
. . . .
.
. . .
L
N
M
M
M
M
O
Q
P
P
P
P
o
z z z
y
y
y
y
1 2 3
1
2
3
4
04 09 03
0 04 0
09 05 08
06 07 05
. . .
.
. . .
. . .
L
N
M
M
M
M
M
M
O
Q
P
P
P
P
P
P

z z z
x
x
x
1 2 3
1
2
3
06 08 05
0 04 0
07 09 07
. . .
.
. . .
L
N
M
M
M
M
O
Q
P
P
P
P
i.e., the composition oI R and S is nothing eIse, but the cIassicaI product oI the matrices R and S with the
diIIerence that instead oI addition we use maximum and instead oI muItipIication we use minimum
operator.
QUESTION BANK.
1. What are the Iuzzy reIations? ExpIain them.
2. ExpIain the operations on the Iuzzy reIations.
3. Given any N-ary reIation, how many diIIerent projections oI the reIation can be taken?
4. Given A (x
1
, 0.1), (x
2
, 0.5), (x
3
, 0.3)} and B (y
1
, 0.3), (y
2
, 0.4)} be the two Iuzzy sets on the
universes oI discourse X x
1
, x
2
, x
3
} and Y y
1
, y
2
} respectiveIy. Find the Cartesian product oI
A and B.
5. Given X x
1
, x
2
, x
3
, x
4
} oI Iour varieties oI paddy pIants, D d
1
, d
2
, d
3
, d
4
} oI the various
diseases aIIecting the pIants and Y y
1
, y
2
, y
3
, y
4
} be the common symptoms oI the diseases.
Find SUP-MIN composition.
REFERENCES.
1. L.A. Zadeh, Fuzzy Sets, Information and Control, VoI. 8, 338-353, 1965.
2. L.A, Zadeh, SimiIarity reIations and Iuzzy orderings, Information Sciences, VoI. 2, No. 2, pp.
177-200, 1971.
3. Dubois, D. and H. Prade, Fuzzy Sets and Systems: 1heory and Applications, Academic Press, NY,
1980.
4. J.F. BIadwin, and N.C.F. GuiId, ModeIIing ControIIers Using Fuzzy ReIations, Kybernetes, VoI.
9, No. 3, pp. 223-229, 1980.
5. R.R. Yager, Some properties oI Iuzzy reIationships, Cybernetics and Systems, VoI. 12, No. 2, pp.
123-140, 1981.
6. S.V. Ovchinnikov, Structure oI Iuzzy binary reIations, Fuzzy Sets and Systems, VoI. 6, No. 2, pp.
169-195, 1981.
7. B. Bouchon, G. Cohen and P. FrankI, MetricaI properties oI Iuzzy reIations, Problems of Control
and Information 1heory, VoI. 11, No. 5, pp. 389-396, 1982.
2S IIZZY LCIC AND NLIRAL NLTWRKS
8. W. BandIer and L.J. Kohout, On-new types oI homomorphisms and congruences Ior partiaI
aIgebraic structures and n-ary reIations, International Journal of General Systems, VoI. 12, No. 2,
pp. 149-157, 1986.
9. L.A. Zadeh, Fuzzy Iogic, IEEE Computer, VoI. 21, No. 4, pp. 83-93, 1988.
10. U. HohIe, Quotients with respect to simiIarity reIations, Fuzzy Sets and Systems, VoI. 27, No. 1,
pp. 31-44, 1988.
11. W. KoIodziejczyk, Decomposition probIem oI Iuzzy reIations: Further resuIts, International
Journal of General Systems, VoI. 14, No. 4, pp. 307-315, 1988.
12. KauIman, A. and M.M. Gupta, Introduction to Fuzzy Arithmetic: Theory and AppIications, Van
Nostrand ReinhoId, NY, 1991.
13. J.C. Fodor, Traces oI Iuzzy binary reIations, Fuzzy Sets and systems, VoI. 50, No. 3, pp. 331-341,
1991.
14. J.X. Li, An upper bound on indices oI Iinite Iuzzy reIations, Fuzzy Sets and Systems, VoI. 49,
No. 3, pp. 317-321, 1992.
15. B. De Baets and E.E. Kerre, Fuzzy reIationaI compositions, Fuzzy Sets and Systems, VoI. 60,
No. 1, pp. 109-120, 1993.
16. J. Vrba, GeneraI decomposition probIem oI Iuzzy reIations, Fuzzy Sets and Systems, VoI. 54,
No. 1, pp. 69-79, 1993.
17. P. Faurous and J.P. FiIIard, A new approach to the simiIarity in the Iuzzy set theory, Information
Sciences, VoI. 75, No. 3, pp. 213-221, 1993.
18. R. Kruse, J. Gebhardt, and F. KIawon, Foundations of Fuzzy Systems, WiIey, Chichester, 1994.
19. J. Mordeson, and C.S. Peng, Operations on Iuzzy graphs, Information Sciences, VoI. 79, No. 3,
pp. 159-170, 1994.
20. KIir, G.J. and B. Yuan, Fuzzy Sets and Fuzzy Logic: 1heory and Applications, Prentice HaII,
Upper SaddIe River, NJ, 1995.
21. T.J. Ross, Fuzzy Logic vith Engineering Applications, McGraw-HiII, Inc., New York, NY, pp.
134-146,1995.
4.1 INTRODUCTION
Let p 'x is in A and q 'y is in B are crisp propositions, where A and B are crisp sets Ior the moment.
The impIication p q is interpreted as (p r q). ...(4.1)
'p entaiIs q means that it can never happen that p is true and q is not true. It is easy to see that
p q p v q ...(4.2)
The IuII interpretation oI the materiaI impIication p q is that the degree oI truth oI p q
quantiIies to what extend q is at Ieast as true as p, i.e.
p q is true . t(p) s t(q) ...(4.3)
1 iI t(p) s t(q)
p q

R
S
T
0 otherwise
p q p q
1 1 1
0 1 1
0 0 1
1 0 0
The truth tabIe Ior the materiaI impIication.
Example 4.1: Let p 'x is bigger than 10 and Iet q 'x is bigger than 9. It is easy to see that p q
is true, because it can never happen that x is bigger than 10 and x is not bigger than 9.
This property oI materiaI impIication can be interpreted as:
iI X c Y then X Y ...(4.4)
4
Fuzzy ImpIications
+ 0 ) 2 6 - 4
3 IIZZY LCIC AND NLIRAL NLTWRKS
Other interpretation oI the impIication operator is
X Y supZ,X Z c Y} ...(4.5)
4.2 FUZZY IMPLICATIONS
Consider the impIication statement, 'iI pressure is high then voIume is smaII.
X
1
1 5
Fig. 4.1 Membership function for "big pressure.
The membership Iunction oI the Iuzzy set A, big pressure, iIIustrated in the Fig. 4.1 can be
interpreted as
1 is in the Iuzzy set big pressure with grade oI membership 0
2 is in the Iuzzy set big pressure with grade oI membership 0.25
4 is in the Iuzzy set big pressure with grade oI membership 0.75
x is in the Iuzzy set big pressure with grade oI membership 1, Ior aII x > 5
A(u)
1 5
1
5
4
iI
iI 1 5
0 otherwise
u
u
u
>


s s
R
S
|
T
|
...(4.6)
The membership Iunction oI the Iuzzy set B, smaII voIume, can be interpreted as (See Fig. 4.2)
y
1
1 5
Fig. 4.2 Membership function for "small volume.
IIZZY IMILICATINS 31
5 is in the Iuzzy set smaII voIume with grade oI membership 0
4 is in the Iuzzy set smaII voIume with grade oI membership 0.25
2 is in the Iuzzy set smaII voIume with grade oI membership 0.75
x is in the Iuzzy set smaII voIume with grade oI membership 1, Ior aII x s1
B(v)
1 1
1
1
4
iI
iI 1 5
0 otherwise
v
v
v
>


s s
R
S
|
T
|
...(4.7)
II p is a proposition oI the Iorm
x is A
where A is a Iuzzy set, Ior exampIe, big pressure and q is a proposition oI the Iorm
y is B
Ior exampIe, smaII voIume then we deIine the Iuzzy impIication A B as a Iuzzy reIation.
It is cIear that (A B)(u, v) shouId be deIined pointwise and Iikewise, i.e. (A

B)(u, v) depends
onIy on A(u) and B(v).
That is
(A B)(u, v) I(A(u), B(v)) A(u) B(v) ...(4.8)
In our interpretation A(u) is considered as the truth vaIue oI the proposition
'u is big pressure,
and B(v) is considered as the truth vaIue oI the proposition
'v is smaII voIume.
that is
u is big pressure v is smaII voIume A(u) B(v)
Remembering the IuII interpretation oI the materiaI impIication
p q
1
0
iI
otherwise
t t ( ) ( ) p q s R
S
T
...(4.9)
One possibIe extension oI materiaI impIication to impIications with intermediate truth vaIues can be
A(u) B(v)
1
0
iI
otherwise
t t ( ) ( ) p q s R
S
T
...(4.10)
'4 is big pressure 1 is smaII voIume
A(4) B(1) 0.75 1 1
32 IIZZY LCIC AND NLIRAL NLTWRKS
However, it is easy to see that this Iuzzy impIication operator (caIIed Standard Strict) sometimes is
not appropriate Ior reaI-IiIe appIications. NameIy, Iet A(u) 0.8 and B(v) 0.8. Then we have
A(u) B(v) 0.8 0.8 1
Suppose there is a smaII error oI measurement in B(v), and instead oI 0.8 we have 0.7999. Then
A(u) B(v) 0.8 0.7999 0
This exampIe shows that smaII changes in the input can cause a big deviation in the output, i.e. our
system is very sensitive to rounding errors oI digitaI computation and smaII errors oI measurement.
A smoother extension oI materiaI impIication operator can be derived Irom the equation
X Y sup Z, X Z c Y} ...(4.11)
That is
A(u) B(v) sup z, min A(u), z} s B(v)} ...(4.12)
so,

A(u) B(v)
1 iI
otherwise
A u B v
B v
( ) ( )
( )
s R
S
T
...(4.13)
This operator is caIIed GodeI impIication. Other possibiIity is to extend the originaI deIinition,
q p v q
using the deIinition oI negation and union
A(u) B(v) max 1 A(u), B(v)} ...(4.14)
This operator is caIIed KIeene-Dienes impIication.
In many practicaI appIications they use Mamdani`s impIication operator to modeI causaI
reIationship between Iuzzy variabIes.
This operator simpIy takes the minimum oI truth vaIues oI Iuzzy predicates
A(u) B(v) min A(u), B(v)} ...(4.15)
It is easy to see this is not a correct extension oI materiaI impIications, because 0 0 yieIds zero.
However, in knowIedge-based systems, we are usuaIIy not interested in ruIes, where the antecedent part
is IaIse.
Larsen x y xy ...(4.16)
Lukasiewiez x y min1, 1 x y} ...(4.17)
Mamdani x y minx, y} ...(4.18)
Standard Strict x y
1
0
iI
otherwise
x y s R
S
T
...(4.19)
GodeI x y
1 iI
otherwise
x y
y
s R
S
T
...(4.20)
IIZZY IMILICATINS 33
Gains x y
1 iI
otherwise
x y
y x
s R
S
T
/
...(4.21)
KIeene-Dienes x y max 1 x, y} ...(4.22)
KIeene-Dienes-Luk x y 1 x xy ...(4.23)
4.3 MODIFIERS
Let A be a Iuzzy set in X. Then we can deIine the Iuzzy sets 'very A and 'more or Iess A by
(very A)(x) A(x)
2
, (more or Iess A)(x) A x ( ) ...(4.24)
The use oI Iuzzy sets provides a basis Ior a systematic way Ior the manipuIation oI vague and
imprecise concepts.
In particuIar, we can empIoy Iuzzy sets to represent Iinguistic variabIes.
30 60
Old
Very old
Fig. 4.3 "Very old.
A Iinguistic variabIe can be regarded either as a variabIe whose vaIue is a Iuzzy number or as a
variabIe whose vaIues are deIined in Iinguistic terms.
30 60
Old
More or less old
Fig. 4.4 "More or less old.
34 IIZZY LCIC AND NLIRAL NLTWRKS
4.3.1 LINGUISTIC VARIABLES
A Iinguistic variabIe is characterized by a quintupIe
(x, 1(x), U, G, M) ...(4.25)
in which
x is the name oI variabIe;
1(x) is the term set oI x, that is, the set oI names oI Iinguistic vaIues oI x with each vaIue being a
Iuzzy number deIined on U;
G is a syntactic ruIe Ior generating the names oI vaIues oI x; and
M is a semantic ruIe Ior associating with each vaIue its meaning.
For exampIe, iI speed is interpreted as a Iinguistic variabIe, then its term set 1 (speed) couId be
1 sIow, moderate, Iast, very sIow, more or Iess Iast, ...}
where each term in 1 (speed) is characterized by a Iuzzy set in a universe oI discourse U |0, 100|.
We might interpret 'sIow as 'a speed beIow about 40 mph, 'moderate as 'a speed cIose to
55 mph, and 'Iast as 'a speed above about 70 mph.
These terms can be characterized as Iuzzy sets whose membership Iunctions are shown in Fig. 4.5.
1
Slow Medium Fast
40 55 70 Speed
Fig. 4.5 Values of linguistic variable speed.
In many practicaI appIications we normaIize the domain oI inputs and use the type oI Iuzzy partition
shown in Fig. 4.6.
Fig. 4.6 A possible partition of [1, 1].
NB NM NS ZE PS PM PB
1 1
IIZZY IMILICATINS 35
Here we used the abbreviations
NB Negative Big,
|NM| Negative Medium,
NS Negative SmaII,
|ZE| Zero,
|PS| Positive SmaII,
|PM| Positive Medium,
|PB| Positive Big.
4.3.2 The LInguIstIc VarIabIe Truth
Truth AbsoIuteIy IaIse, Very IaIse, FaIse, FairIy true, True, Very true, AbsoIuteIy true}.
One may deIine the membership Iunction oI Iinguistic terms oI truth as
True (u) u ...(4.26)
Ior each ue|0, 1|.
FaIse (u) 1 u ...(4.27)
Ior each ue|0, 1|.
AbsoIuteIy IaIse (u)
1 0
0
iI
otherwise
u R
S
T
...(4.28)
AbsoIuteIy true (u)

1 1
0
iI
otherwise
u R
S
T
...(4.29)
The interpoIation iI absoIuteIy IaIse and absoIuteIy true are shown in Fig. 4.7.
Fig. 4.7 Interpretation of absolutely false and absolutely true.
Truth
1
True
False
Absolutely
true
Absolutely
false
1
36 IIZZY LCIC AND NLIRAL NLTWRKS
The word 'FairIy is interpreted as 'more or Iess.
FairIy true (u) u ...(4.30)
Ior each u e |0, 1|.
Very true (u) u
2
....(4.31)
Ior each u e |0, 1|.
Truth
Fairly true
1
Very true
Fig. 4.8 Interpretation of fairly true and very true.
The word 'FairIy is interpreted as 'more or Iess.
FairIy IaIse (u) 1 u ...(4.32)
Ior each ue|0, 1|.
Very IaIse (u) (1 u)
2
...(4.33)
Ior each ue|0, 1|.
Fig. 4.9 Interpretation of fairly false and very false.
Suppose we have the Iuzzy statement 'x is A. Let t be a term oI Iinguistic variabIe Truth. Then the
statement 'x is A is t is interpreted as 'x is t o A.
Where
(t o A)(u) t(A(u)) ...(4.34)
Ior each u e |0, 1|.
Truth
Fairly false
1
Very false
IIZZY IMILICATINS 37
For exampIe, Iet t 'true. Then
'x is A is true
is deIined by
'x is t o A 'x is A
because
(t o A)(u) t(A(u)) A(u)
Ior each u e |0, 1|.
It is why 'everything we write is considered to be true.
a a a b b b
1
A = A is true
Fig. 4.10 Interpretation of ") is true.
Let t 'absoIuteIy true. Then the statement 'x is A is AbsoIuteIy true is deIined by 'x is t o A,
where
(t o A)(x)
1 1
0
iI
otherwise
A x ( ) R
S
T
...(4.35)
a a a b b b
1
A is absolutely true
Fig. 4.11 Interpretation of ") is absolutely true.
Let t 'absoIuteIy IaIse. Then the statement 'x is A is AbsoIuteIy IaIse is deIined by 'x is t o A,
where
(t o A) (x)
1 0
0
iI
otherwise
A x ( ) R
S
T
...(4.36)
3S IIZZY LCIC AND NLIRAL NLTWRKS
Let t 'FairIy true. Then the statement 'x is A is FairIy true is deIined by 'x is t o A, where
(t o A) (x) A x ( ) ...(4.37)
Fig. 4.12 Interpretation of ") is absolutely false.
a a a b b b
1
A is fairly true
Fig. 4.13 Interpretation of ") is fairly true.
Let t 'Very true. Then the statement 'x is A is FairIy true is deIined by 'x is t o A, where
(t o A)(x) (A(x))
2
...(4.38)
a a a
b b b
1
A is very true
Fig. 4.14 Interpretation of " ) is very true.
QUESTION BANK.
1. What are the Iuzzy impIications? ExpIain with exampIes.
2. What are the Iuzzy modiIiers? ExpIain with an exampIe.
3. What are the Iinguistic variabIes? Give exampIes.
a a a b b b
1
A is absolutely false
IIZZY IMILICATINS 39
4. ExpIain the Iinguistic variabIe TRUTH with exampIes.
5. Given the set 6 oI peopIe in the IoIIowing age groups:
0 10
10 20
20 30
30 40
40 50
50 60
60 70
70 80
80 and above
Represent graphicaIIy the membership Iunctions oI 'young, 'middIe-aged and 'oId.
REFERENCES.
1. L.A. Zadeh, The concept oI a Iinguistic variabIe and its appIication to approximate reasoning,
Part 1, Information Sciences, VoI. 8, pp. 199-249, 1975.
2. E.H. Mamdani, Advances in the Iinguistic synthesis oI Iuzzy controIIers, International Journal of
Man-Machine Studies, VoI. 8, No. 6, pp. 669-678, 1976.
3. E.H. Mamdani, AppIications oI Iuzzy Iogic to approximate reasoning using Iinguistic systems,
IEEE 1ransactions on Systems, Man, and Cybernetics, VoI. 26, No. 12, pp. 1182-1191, 1977.
4. J.F. BaIdwin and B.W. PiIsworth, Axiomatic approach to impIication Ior approximate reasoning
with Iuzzy Iogic, Fuzzy Sets and Systems, VoI. 3, No. 2, pp. 193-219, 1980.
5. W. BandIer and L.J. Kohout, Fuzzy power sets and Iuzzy impIication operators, Fuzzy Sets and
Systems, VoI. 4, No. 1, pp. 13-30, 1980.
6. W. BandIer, and L.J. Kohout, Semantics oI impIication operators and Iuzzy reIationaI products,
International Journal of Man-Machine Studies, VoI. 12, No. 1, pp. 89-116, 1980.
7. R. WiIImott, Two Iuzzier impIication operators in the theory oI Iuzzy power sets, Fuzzy Sets and
Systems, VoI. 4, No. 1, pp. 31-36, 1980.
8. S. Weber, A generaI concept oI Iuzzy connectives, negations and impIications based on t-norms
and t-conorms, Fuzzy Sets and Systems, VoI. 11, No. 2, pp. 115-134, 1983.
9. D. Dubois and H. Prade, A theorem on impIication Iunctions deIined Irom trianguIar norms,
Stochastica, VoI. 8, No. 3, pp. 267-279, 1984.
10. E. TriIIas and L. VaIverde, On mode and impIications in approximate reasoning, In: M. M. Gupta,
A. KandeI, W. BandIer and J.B. Kisska |Eds.|, Approximate Reasoning in Expert Systems, North-
HoIIand, New York, pp. 157-166, 1985.
11. J.E. AhIquist, AppIication oI Iuzzy impIication to probe nonsymmetric reIations: Part 1, Fuzzy
Sets and Systems, VoI. 22, No. 3, pp. 229-244, 1987.
12. K.W. Oh and W. BandIer, Properties oI Iuzzy impIication operators, International Journal of
Approximate Reasoning, VoI. 1, No. 3, pp. 273-285, 1987.
4 IIZZY LCIC AND NLIRAL NLTWRKS
13. P. Smets and P. Magrez, ImpIication in Iuzzy Iogic, International Journal of Approximate
Reasoning, VoI. 1, No. 4, pp. 327-347, 1987.
14. P. Smets and P. Magrez, The measure oI the degree oI truth and the grade oI membership, Fuzzy
Sets and Systems, VoI. 25, No. 1, pp. 67-72, 1988.
15. Z. Cao and A. KandeI, Applicability of Some Fuzzy Implication Operators, VoI. 31, No. 2, pp.
151-186, 1989.
16. R. Da, E.E. Kerre, G. De Cooman, B. CappeIIe and F. Vanmassenhove, InIIuence oI the Iuzzy
impIication operator on the method-oI-cases inIerence ruIe, International Journal of
Approximate Reasoning, VoI. 4, No. 4, pp. 307-318, 1990.
17. J.C. Fodor, On Iuzzy impIication operators, Fuzzy Sets and Systems, VoI. 42, No. 3, pp. 293-300,
1991.
18. A. Piskunov, Fuzzy impIication in Iuzzy systems controI, Fuzzy Sets and Systems, VoI. 45, No. 1,
pp. 25-35, 1992.
19. D. Ruan and E.E Kerre, Fuzzy impIication operators and generaIized Iuzzy method oI cases,
Fuzzy Sets and Systems, VoI. 54, No. 1, pp. 23-37, 1993.
20. J.L. Castro, M. DeIgado and E. TriIIas, Inducing impIication reIations, International Journal of
Approximate Reasoning, VoI. 10, No. 3, pp. 235-250, 1994.
21. W.M. Wu, Commutative impIications on compIete Iattices, International Journal of Uncertainty,
Fuzziness and Knovledge-Based Systems, VoI. 2, No. 3, pp. 333-341, 1994.
5
The Theory of Approximate
Reasoning
+ 0 ) 2 6 - 4
5.1 INTRODUCTION
In 1975 Zadeh introduced the theory oI approximate reasoning. This theory provides a powerIuI
Iramework Ior reasoning in the Iace oI imprecise and uncertain inIormation. CentraI to this theory is the
representation oI propositions as statements assigning Iuzzy sets as vaIues to variabIes.
Suppose we have two interactive variabIes x e X and y e Y and the causaI reIationship between x
and y is compIeteIy known. NameIy, we know that y is a Iunction oI x
y f(x)
Then we can make inIerences easiIy
Premise y f(x)
Fact x x
1
Consequence y f(x
1
)
This inIerence ruIe says that iI we have y f (x), Vx e X and we observe that x x
1
, then y takes the
vaIue f (x
1
).
More oIten than not we do not know the compIete causaI Iink f between x and y, onIy we know the
vaIues oI f (x) Ior some particuIar vaIues oI x
J
1
: II x x
1
then y y
1
aIso
J
2
: II x x
2
then y y
2
42 IIZZY LCIC AND NLIRAL NLTWRKS
aIso
.
aIso
J
n
: II x x
n
then y y
n
Fig. 5.1 Simple crisp inference.
Suppose that we are given an x
1
eX and want to Iind an y
1
eY which corresponds to x
1
under the
ruIe-base.
J
1
: II x x
1
then y y
1
aIso
J
2
: II x x
2
then y y
2
aIso
.
aIso
J
n
: II x x
n
then y y
n
Iact: x x
1
Consequence: y y
1
This probIem is IrequentIy quoted as interpoIation.
Let x and y be Iinguistic variabIes, e.g. 'x is high and 'y is smaII. The basic probIem oI
approximate reasoning is to Iind the membership Iunction oI the consequence C Irom the ruIe-base
J
1
, . . . , J
n
} and the Iact A.
J
1
: II x is A
1
then y is C
1
aIso
J
2
: II x is A
2
then y is C
2
y
x
x = x
y =f x ( )
y = f x ( )
TIL TILRY I AIIRXIMATL RLASNINC 43
aIso
.
aIso
J
n
: II x is A
n
then y is C
n
Iact: x is A
Consequence: y is c
Zadeh introduced a number oI transIation ruIes, which aIIow us to represent some common
Iinguistic statements in terms oI propositions in our Ianguage.
5.2 TRANSLATION RULES
5.2.1 EntaIIment RuIe
x is A Menaka is very young
A c B very young c young
x is B Menaka is young
5.2.2 ConjunctIon RuIe
x is A
x is B
x is A B
Temperature is not very high
Temperature is not very Iow
Temperature is not very high and not very Iow
5.2.3 DIsjunctIon RuIe
x is A
or x is B
x is A . B
Temperature is not very high
or Temperature is not very Iow
Temperature is not very high or not very Iow
44 IIZZY LCIC AND NLIRAL NLTWRKS
5.2.4 ProjectIon RuIe
(x, y) have reIation R
x is j
x
(R)
(x, y) have reIation R
y is j
y
(R)
(x, y) is cIose to (3, 2)
x is cIose to 3
(x, y) is cIose to (3, 2)
y is cIose to 2
5.2.5 NegatIon RuIe
not (x is A)
x is A
not (x is high)
x is not high
In Iuzzy Iogic and approximate reasoning, the most important Iuzzy impIication inIerence ruIe is
the GeneraIized Modus Ponens (GMP). The cIassicaI Modus Ponens inIerence ruIe says:
premise iI p then q
Iact p
consequence: q
This inIerence ruIe can be interpreted as: II p is true and p q is true then q is true.
The Iuzzy impIication inIerence is based on the compositionaI ruIe oI inIerence Ior approximate
reasoning suggested by Zadeh.
5.2.6 ComposItIonaI RuIe Of Inference
premise iI x is A then y is B
Iact x is A
1
consequence: y is B
1
TIL TILRY I AIIRXIMATL RLASNINC 45
where the consequence B
1
is determined as a composition oI the Iact and the Iuzzy impIication operator.
B
1
A
1
o (A B) ...(5.1)
that is,
B(v) sup
u U e
min A(u), (A B) (u, v)}, v e J ...(5.2)
The consequence B
1
is nothing eIse but the shadow oI A B on A
1
.
The GeneraIized Modus Ponens, which reduces to caIssicaI modus ponens when A
1
A and B
1
B,
is cIoseIy reIated to the Iorward data-driven inIerence which is particuIarIy useIuI in the Fuzzy Logic
ControI.
The cIassicaI Modus ToIIens inIerence ruIe says: II p q is true and q is IaIse then p is IaIse. The
GeneraIized Modus ToIIens,
premise iI x is A then y is B
Iact y is B
1
consequence: x is A
1
which reduces to 'Modus ToIIens when B B and A
1
A, is cIoseIy reIated to the backward goaI-
driven inIerence which is commonIy used in expert systems, especiaIIy in the reaIm oI medicaI
diagnosis.
5.3 RATIONAL PROPERTIES
Suppose that A, B and A
1
are Iuzzy numbers. The GeneraIized Modus Ponens shouId satisIy some
rationaI properties.
5.3.1 BasIc Property
iI x is A then y is B
x is A
y is B
iI pressure is big then voIume is smaII
pressure is big
voIume is smaII
46 IIZZY LCIC AND NLIRAL NLTWRKS
5.3.2 TotaI IndetermInance
iI x is A then y is B
x is A
y is unknown
iI pressure is big then voIume is smaII
pressure is not big
voIume is unknown
A = A B = B
Fig. 5.2 Basic property.
Fig. 5.3 Total indeterminance.
5.3.3 Subset
iI x is A then y is B
x is A
1
c A
y is B
A
A
B
B
TIL TILRY I AIIRXIMATL RLASNINC 47
iI pressure is big then voIume is smaII
pressure is very big
voIume is smaII
A
A
B = B
Fig. 5.4 Subset property.
5.3.4 Superset
iI x is A then y is B
x is A
1
y is B
1
. B
B
B
x
A
A
Fig. 5.5 Superset property.
Suppose that A, B and A
1
are Iuzzy numbers. We show that the GeneraIized Modus Ponens with
Mamdani`s impIication operator does not satisIy aII the Iour properties Iisted above.
Example 5.1: (The GMP with Mamdani impIication)
iI x is A then y is B
x is A
1
y is B
1
4S IIZZY LCIC AND NLIRAL NLTWRKS
where the membership Iunction oI the consequence B
1
is deIined by
B(y) sup A(x) r A(x) r B(y) ,x e R}, y e R
Basic property: Let A
1
A and Iet y e R be arbitrariIy Iixed. Then we have
B(y) sup
x
min A(x), min A(x), B(y)}}
sup
x
min A(x), B(y)}
sup
x
min B y A x
x
( ), sup ( )
R
S
T
U
V
W
min B(y), 1}
B(y)
So the basic property is satisIied.
Total indeterminance: Let A
1
A 1 A and Iet y e R be arbitrariIy Iixed. Then we have
B(y) sup
x
min 1 A(x), min A(x), B(y)}}
sup
x
min A(x), 1 A(x), B(y)}
min B y A x A x
x
( ), sup min ( ), ( ),} 1
R
S
T
U
V
W
min B(y), 1/2}
1/2 B(y) < 1
This means that the totaI indeterminance property is not satisIied.
Subset: Let A c A and Iet y e R be arbitrariIy Iixed. Then we have
B(y) sup
x
min A(x), min A(x), B(y)}}
sup
x
min A(x), A(x), B(y)}
min B y A x
x
( ), sup ( )
R
S
T
U
V
W
min B(y), 1}
B(y)
So the subset is satisIied.
TIL TILRY I AIIRXIMATL RLASNINC 49
Superset: Let y e R be arbitrariIy Iixed. Then we have
B(y) sup
x
min A(x), min A(x), B(y)}}
sup
x
min A(x), A(x), B(y)} s B(y)
So the superset property oI GMP is not satisIied by Mamdani`s impIication operator.
B
B
A
A x ( )
x
Fig. 5.6 The GMP with Mamdani's implication operator.
Example 5.2: (The GMP with Larsen`s product impIication)
iI x is A then y is B
x is A
1
y is B
1
where the membership Iunction oI the consequence B
1
is deIined by
B(y) sup
x
min A(x), A(x) B(y) ,x e R} y e R
Basic property: Let A
1
A and Iet y e R be arbitrariIy Iixed. Then we have
B(y) sup
x
min A(x), A(x) B(y)} B(y)
So the basic property is satisIied.
Total indeterminance: Let A
1
A 1 A and Iet y e R be arbitrariIy Iixed. Then we have
B(y) sup
x
min 1 A(x), A(x) B(y)}

B y
B y
( )
( ) 1+
1
This means that the totaI indeterminance property is not satisIied.
5 IIZZY LCIC AND NLIRAL NLTWRKS
Subset: Let A c A and Iet y e R be arbitrariIy Iixed. Then we have
B(y) sup
x
min A(x), A(x) B(y)}
sup
x
min A(x), A(x) B(y)}
B(y)
So the subset property is satisIied.
Superset: Let y e R be arbitrariIy Iixed. Then we have
B(y) sup
x
min A(x), A(x) B(y)} s B(y)
So, the superset property is not satisIied.
B
B
x
A A
Fig. 5.7 The GMP with Larsen's implication operator.
QUESTION BANK.
1. ExpIain the theory oI approximate reasoning.
2. What are the transIation ruIes? ExpIain them with exampIes.
3. What are the rationaI properties? ExpIain them.
4. ExpIain generaIized modus ponens with Mamdani`s impIication.
5. ExpIain generaIized modus ponens with Larsen`s impIication.
6. Given
C v D
~ H = (A r ~ B)
C v D = ~ H
(A r ~ B) = (R v S)
Can (R v S) be inIerred Irom the above?
TIL TILRY I AIIRXIMATL RLASNINC 51
REFERENCES.
1. L.A. Zadeh, Fuzzy Iogic and approximate reasoning, Syntheses, VoI. 30, No. 1, pp. 407-428,
1975.
2. L.A. Zadeh, The concept oI a Iinguistic variabIe and its appIication to approximate reasoning-
Part I, Information Sciences, VoI. 8, pp. 199-251, 1975.
3. L.A. Zadeh, The concept oI a Iinguistic variabIe and its appIication to approximate reasoning-
Part II, Information Sciences, VoI. 8, pp. 301-357, 1975.
4. L.A. Zadeh, The concept oI a Iinguistic variabIe and its appIication to approximate reasoning-
Part III, Information Sciences, VoI. 9, pp. 43-80, 1975.
5. B.R. Gainess, Foundations oI Iuzzy reasoning, International Journal of Man-Machine Studies,
VoI. 8, No. 6, pp. 623-668, 1976.
6. E.H. Mamdani, AppIication oI Iuzzy Iogic to approximate reasoning using Iinguistic systems,
IEEE 1ransactions on Systems, Man and Cybernetics, VoI. 26, No. 12, pp. 1182-1191, 1977.
7. J.F. BaIdwin, A new approach to approximate reasoning using a Iuzzy Iogic, Fuzzy Sets and
Systems, VoI. 2, No. 4, pp. 309-325, 1979.
8. M. Mizumoto and H.J. Zimmermann, Comparison oI Iuzzy reasoning methods, Fuzzy Sets and
Systems, VoI. 2, No. 4, pp. 309-325, 1979.
9. R. GiIes, Semantics Ior Iuzzy reasoning, International Journal of Man-Machine Studies, VoI. 17,
No. 4, pp. 401-415, 1982.
10. L.A. Zadeh, Linguistic variabIes, approximate reasoning and dispositions, MedicaI InIormation,
VoI. 8, pp. 173-186, 1983.
11. L.A. Zadeh, The roIe oI Iuzzy Iogic in the management oI uncertainty in expert systems, Fuzzy
Sets and Systems, VoI. 11, No. 3, pp. 199-228, 1983.
12. M. Sugeno and T. Takagi, Multi-dimensional Fuzzy Reasoning, Fuzzy Sets and Systems, VoI. 9,
No. 3, pp. 313-325, 1983.
13. L.A. Zadeh, SyIIogistic reasoning in Iuzzy Iogic and its appIication to usuaIIy and reasoning with
dispositions, IEEE 1ransactions on Systems, Man and Cybernetics, VoI. 15, No. 6, pp. 754-765,
1985.
14. H. Prade, A computationaI approach to approximate and pIausibIe reasoning with appIications to
expert systems, IEEE 1ransactions on Pattern Analysis and Machine Intelligence, VoI. 7, No. 3,
pp. 260-283, 1985.
15. R.R. Yager, Strong truth and ruIes oI inIerence in Iuzzy Iogic and approximate reasoning,
Cybernetics and Systems, VoI. 16, No. 1, pp. 23-63, 1985.
16. H. Farreny and H. Prade, DeIauIt and inexact reasoning with possibiIity degress, IEEE
1ransactions on Systems, Man and Cybernetics, VoI. 16, No. 2, pp. 270-276, 1986.
17. W.M. Wu, Fuzzy reasoning and Iuzzy reIationaI equations, Fuzzy Sets and Systems, VoI. 20, No.
1, pp. 67-78, 1986.
18. L.A. Zadeh, A computationaI theory oI dispositions, International Journal of Intelligent Systems,
VoI. 2, No. 1, pp. 39-63, 1987.
52 IIZZY LCIC AND NLIRAL NLTWRKS
19. M.B. GorzaIczany, A method oI inIerence in approximate reasoning based on intervaI-vaIued
Iuzzy sets, Fuzzy Sets and Systems, VoI. 21, No. 1, pp. 1-17, 1987.
20. N.S. Lee, Y.L Grize and K. Dehnad, QuaIitative modeIs Ior reasoning under uncertainty in
knowIedgebased expert systems, International Journal of Intelligent Systems, VoI. 2, No. 1, pp.
15-38, 1987.
21. I.B. Turksen, Approximate reasoning Ior production pIanning, Fuzzy Sets and Systems, VoI. 26,
No. 1, pp. 23-37, 1988.
22. M.S. Ying, Some notes on muIti-dimensionaI Iuzzy reasoning, Cybernetics and Systems, VoI. 19,
No. 4, pp. 281-293, 1988.
23. Z. Cao, A. KandeI and L. Li, A new modeI oI Iuzzy reasoning, Fuzzy Sets and Systems, VoI. 36,
No. 3, pp. 311-325, 1989.
24. P. Magrez and P. Smets, Fuzzy modus ponens: A new modeI suitabIe Ior appIications in
knowIedge-based systems, International Journal of Intelligent Systems, VoI. 4, No. 4, pp. 181-
200, 1989.
25. P. Torasso and L. ConsoIe, approximate reasoning and prototypicaI knowIedge, International
Journal of Approximate Reasoning, VoI. 3, No. 2, pp. 157-178, 1989.
26. R. Kruse and E. Schwecke, Fuzzy reasoning in a muIti-dimensionaI space oI hypothesis,
International Journal of Approximate Reasoning, VoI. 4, No. 1, pp. 47-68, 1990.
27. C.Z. Luo and Z.P. Wang, Representation oI compositionaI reIations in Iuzzy reasoning, Fuzzy
Sets and Systems, VoI. 36, No. 1, pp. 77-81, 1990.
28. C. Feng, Quantitative evaIuation oI university teaching quaIity An appIication oI Iuzzy set and
approximate reasoning, Fuzzy Sets and Systems, VoI. 37, No. 1, pp. 1-11, 1990.
29. J.M. KeIIer, D, Subhangkasen, K. UnkIesbay and N. UnkIesbay, An approximate reasoning
technique Ior recognition in coIor images oI beeI steaks, International Journal of General
Systems, VoI. 16, No. 4, pp. 331-342, 1990.
30. R. Lopez de Mantaras, Approximate Reasoning Models, EIIis Horwood, Chischster, 1990.
31. D. Dubois and H. Prade, Fuzzy sets in approximate reasoning, Part 1: InIerence with PossibiIity
distributions, Fuzzy Sets and Systems, VoI. 40, No. 1, pp. 143-202, 1991.
32. D. Dubois and H. Prade, Fuzzy sets in approximate reasoning, Part 2: LogicaI approaches, Fuzzy
Sets and Systems, VoI. 40, No. 1, pp. 202-244, 1991.
33. M.M. Gupta and J. Qi, Theory oI t-norms and Iuzzy inIerence methods, Fuzzy Sets and Systems,
VoI. 40, No. 3, pp. 431-450, 1991.
34. S. Dutta, approximate spatiaI reasoning: Integrating quaIitative and quantitative constraints,
International Journal of Approximate Reasoning, VoI. 5, No. 3, pp. 307-330, 1991.
35. D.G. Schwartz, A system for Reasoning vith Imprecise Linguistic Information, VoI. 5, No. 5, pp.
463-488, 1991.
36. X. T. Peng, A. KandeI and P.Z. Wang, Concepts, ruIes and Iuzzy reasoning: A Iactor space
approach, IEEE 1ransactions on Systems, Man and Cybernetics, VoI. 21, No. 1, pp. 194-205,
1991.
37. E.H. Ruspini, Approximate reasoning: past, present, Iuture, Information Sciences, VoI. 57, pp.
297-317, 1991.
TIL TILRY I AIIRXIMATL RLASNINC 53
38. D.L. Hudson, M.E. Cohen and M.F. Anderson, Approximate reasoning with IF-THEN-UNLESS
ruIe in a medicaI expert system, International Journal of Intelligent Systems, VoI. 7, No. 1, pp.
71-79, 1992.
39. K. Lano, FormaI Irameworks Ior approximate reasoning, Fuzzy Sets and Systems, VoI. 51, No. 2,
pp. 131-146, 1992.
40. S.M. Chen, An improved aIgorithm Ior inexact reasoning based on extended Iuzzy production
ruIes, Cybernetics and Systems, VoI. 23, No. 5, pp. 463-481, 1992.
41. S. Raha and K.S. Ray, AnaIogy between approximate reasoning and the method oI interpoIation,
Fuzzy Sets and Systems, VoI. 51, No. 3, pp. 259-266, 1992.
42. P.V. Reddy and M.S, Babu, Some methods oI reasoning Ior Iuzzy conditionaI propositions, Fuzzy
Sets and Systems, VoI. 52, No. 3, pp. 229-250, 1992.
43. H.L. Larsen and R.R. Yager, The use oI Iuzzy reIationaI thesauri Ior cIassiIicatory probIem
soIving in inIormation retrievaI and expert systems, IEEE 1ransactions, on Systems, Man and
Cybernetics, VoI. 23, No. 1, pp. 31-41, 1993.
44. V.A. Niskanen, Metric truth as a basis Ior Iuzzy Iinguistic reasoning, Fuzzy Sets and Systems, VoI.
57, No. 1, pp. 1-25, 1993.
45. L.T. Koczy and K. Hirota, Approximate reasoning by Iinear ruIe interpoIation and generaI
approximation, International Journal of Approximate Reasoning, VoI. 9, No. 3, pp. 197-225,
1993.
46. Z. Zhang, An approximate reasoning system: Design and impIementation, InternationaI journaI
oI approximate reasoning, VoI. 9, No. 4, pp. 315-326, 1993.
47. Z. Bien and M.G. Chun, An inIerence network Ior bi-directionaI approximate reasoning based on
an equaIity measure, IEEE 1ransactions on Fuzzy Systems, VoI. 2, No. 2, pp. 177-180, 1994.
48. Q. Zhao and B. Li, An approximate reasoning system, International Journal of Pattern
Recognition and Artificial Intelligence, VoI. 7, No. 3, pp. 431-440, 1993.
49. J. Kacprzyk, On measuring the speciIicity oI IF-THEN ruIes, International Journal of
Approximate Reasoning, VoI. 11, No. 1, pp. 29-53, 1994.
50. C. EIbert, Rule-based Fuzzy Classification for Softvare Quality Control, VoI. 63, No. 3, pp. 349-
358, 1994.
6
Fuzzy RuIe-Based Systems
+ 0 ) 2 6 - 4
6.1 INTRODUCTION
TrianguIar norms were introduced by Schweizer and SkIar to modeI the distances in probabiIistic metric
spaces. Functions that quaIiIy as Iuzzy intersections and Iuzzy unions are usuaIIy reIerred to in the
Iiterature as t-norms and t-conorms, respectiveIy. Furthermore, the standard Iuzzy intersection (min
operator) produces Ior any given Iuzzy sets the Iargest Iuzzy set Irom among those produced by aII
possibIe Iuzzy intersections (t-norms). The standard Iuzzy union (max operator) produces, on the
contrary, the smaIIest Iuzzy set among the Iuzzy sets produced by aII possibIe Iuzzy unions (t-conorms).
That is, the standard Iuzzy operations occupy speciIic positions in the whoIe spectrum oI Iuzzy
operations: the standard Iuzzy intersection is the weakest Iuzzy intersection, whiIe the standard Iuzzy
union is the strongest Iuzzy union. In Iuzzy sets theory trianguIar norm are extensiveIy used to modeI
the IogicaI connective and.
6.2 TRIANGULAR NORM
A mapping 1: |0, 1| |0, 1| |0, 1|
is a trianguIar norm (t-norm Ior short) iI it is symmetric, associative, non-decreasing in each argument
and 1(a, 1) a, Ior aII a e |0, 1|. In other words, any t-norm 1 satisIies the properties:
Symmetricity : 1(x, y) 1(y, x), Vx, y e |0, 1| ...(6.1)
Associativity : 1(x, 1(y, z)) 1(1(x, y), z), Vx, y, z e |0, 1| ...(6.2)
Monotonicity : 1(x, y) s 1(x, y) iI x s x and y s y. ...(6.3)
One identy : 1(x, 1) x, Vx e |0, 1| ...(6.4)
These axioms attempt to capture the basic properties oI set intersection. The basic t-norms are:
minimum : min (a, b) min a, b} ...(6.5)
ukasiewicz: 1
L
(a, b) max a b.1, 0} ...(6.6)
IIZZY RILL-BASLD SYSTLMS 55
product : 1P(a, b) ab ...(6.7)
weak : 1
v
(a, b)
min , } max , } a b a b iI
otherwise
R
S
T
1
0
...(6.8)
Hamacher : H
y
(a, b)
ab
a b ab y y + + ( )( ) 1
, y > 0 ...(6.9)
Dubois and Prade : D
o
(a, b)
ab
a b max , , } o
, o e (0, 1) ...(6.10)
Yager : Yp(a, b) 1 min 1 1 1 , |( ) ( ) | + a b
p p p
{ }
, p ~ 0 ...(6.11)
Frank : F
i
(a, b)
min , } iI 0
( , ) iI 1
( , ) iI
( 1) ( 1)
1 Iog 1 otherwise
1
P
L
a b
a b
1 a b
1 a b
i
i

i ~
,


i i

+

i

]
...(6.12)
AII t-norms may be extended, through associativity, to n ~ 2 arguments. The minimum t-norm is
automaticaIIy extended and
1
P
(a
1
, a
2
,...., a
n
) a
1
X a
2
X ... X a
n
...(6.13)
1
L
(a
1
, a
2
,..., a
n
) max a n
i
i
n
+
R
S
|
T
|
U
V
|
W
|

_
1 0
1
, ...(6.14)
A t-norm 1 is caIIed strict iI 1 is strictIy increasing in each argument. TrianguIar co-norms are
extensiveIy used to modeI IogicaI connectives or.
6.3 TRIANGULAR CONORM
A mapping S : |0, 1| |0, 1| |0, 1|,
is a trianguIar co-norm (t-conorm) iI it is symmetric, associative, non-decreasing in each argument and
S(a, 0) a, Ior aII a e |0, 1|. In other words, any t-conorm S satisIies the properties:
Symmetry : S(x, y) S(y, x) ...(6.15)
Associativity : S(x, S(y, z)) S(S(x, y), z) ...(6.16)
Monotonicity : S(x, y) s S(x, y) iI x s x and y s y ...(6.17)
Zero identity : S(x, 0) x, Vx e |0, 1| ...(6.18)
56 IIZZY LCIC AND NLIRAL NLTWRKS
II 1 is a t-norm then the equaIity
S(a, b) 1 1(1 a, 1 b) ...(6.19)
deIines a t-conorm and we say that S is derived Irom 1. The basic t-conorms are:
maximum : max (a, b) max a, b} ...(6.20)
ukasiewicz : S
L
(a, b) min a b, 1} ...(6.21)
probabiIistic : S
P
(a, b) a b. ab ...(6.22)
strong : STRONG (a, b)
max iI
otherwise
, } min ( , ) a b a b R
S
T
0
1
...(6.23)
Hamacher : HOR
y
(a, b)
a b ab
ab
+

( )
( )
2
1 1
y
y
, y > 0 ...(6.24)
Yager : YOR
P
(a, b) min 1, a b
p p p
+
{ }
, P ~ 0 ...(6.25)
Lemma 6.1: Let 1 be a t-norm. Then the IoIIowing statement hoIds
1
v
(x, y) s 1(x, y) s min x, y}, Vx, y e|0, 1|
Proof: From monotonicity, symmetricity and the externaI condition we get
1(x, y) s 1(x, 1) s x
1(x, y) 1(y, x) s 1(y, 1) s y
This means that 1(x, y) s min x, y}.
Lemma 6.2: Let S be a t-conorm. Then the IoIIowing statement hoIds
maxa, b} s s (a, b) s S1RONG(a, b), Va, b e |0,1|
Proof: From monotonicity, symmetricity and the extremaI condition we get
S(x, y) > S(x, 0) > x
S(x, y) > S(y, x) > S(y, 0) > y
This means that S(x, y) > maxx, y}.
Lemma 6.3: 1(a, a) a hoIds Ior any ae|0, 1| iI and onIy iI 1 is the minimum norm.
Proof: II 1(a, b) min (a, b) then 1(a, a) a hoIds obviousIy. Suppose 1(a, a) a Ior any ae|0, 1|,
and a s b s 1. We can obtain the IoIIowing expression using monotonicity oI 1
a 1(a, a) s 1(a, b) s min a, b}.
From commutativity oI 1 it IoIIows that
a 1(a, a) s 1(b, a) s min b, a}.
These equations show that 1(a, b) min a, b} Ior any a e |0, 1|.
IIZZY RILL-BASLD SYSTLMS 57
Lemma 6.4: The distributive Iaw oI t-norm 1 on the max operator hoIds Ior any a, b, c e|0, 1|.
1(maxa, b}, c) max 1(a, c), 1(b, c)}.
6.4 J-NORM-BASED INTERSECTION
Let 1 be a t-norm. The 1-intersection oI A and B is deIined as
(A B) (t) 1 (A(t), B(t)) ...(6.26)
Ior aII t e X,
Example 6. 1: Let 1(x, y) ZAND (x, y) x y 1, 0}
be the ukasiewicz t-norm. Then we have
(A B) (t) max A(t) B(t 1, 0)}
Ior aII t e X.
Let A and B be Iuzzy subsets oI X x
1
, x
2
, x
3
, x
4
, x
5
, x
6
, x
7
} and be deIined by
A 0.0/x
1
0.3/x
2
0.6/x
3
1.0/x
4
0.6/x
5
0.3/x
6
0.0/x
7
B 0.1/x
1
0.3/x
2
0.9/x
3
1.0/x
4
1.0/x
5
0.3/x
6
0.2/x
7
.
Then A B has the IoIIowing Iorm
A B 0.0/x
1
0.0/x
2
0.5/x
3
1.0/x
4
0.6/x
5
0.0/x
6
0.2/x
7
.
The operation union can be deIined by the heIp oI trianguIar conorms.
6.5 J-CONORM-BASED UNION
Let S be a t-conorm. The S-union oI A and B is deIined as
(A B) (t) S(A(t), B(t)) ...(6.27)
Ior aII t e X.
Example 6. 2: Let (S(x, y) LOR (x, y) min x y, 1}be the ukasiewicz t-conorm.
Then we have
(A . B) (t) min A(t), B(t)1} ...(6.28)
Ior aII t e X.
Let A and B be Iuzzy subsets oI X x
1
, x
2
, x
3
, x
4
, x
5
, x
6
, x
7
} and be deIined by
A 0.0/x
1
0.3/x
2
0.6/x
3
1.0/x
4
0.6/x
5
0.3/x
6
0.0/x
7
B 0.1/x
1
0.3/x
2
0.9/x
3
1.0/x
4
1.0/x
5
0.3/x
6
0.2/x
7
Then A . B has the IoIIowing Iorm
A . B 0.1/x
1
0.6/x
2
1.0/x
3
1.0/x
4
1.0/x
5
0.6/x
6
0.2/x
7
.
II we are given an operator C such that
min a, b} s C(a, b) s max a, b}, Va, b e|0,1|
then we say that C is a compensatory operator.
5S IIZZY LCIC AND NLIRAL NLTWRKS
6.6 AVERAGING OPERATORS
A typicaI compensatory operator is the arithmeticaI mean deIined as
MEAN (a, b)
a b +
2
...(6.29)
Fuzzy set theory provides a host oI attractive aggregation connectives Ior integrating membership
vaIues representing uncertain inIormation. These connectives can be categorized into the IoIIowing
three cIasses union, intersection and compensation connectives.
Union produces a high output whenever any one oI the input vaIues representing degrees oI
satisIaction oI diIIerent Ieatures or criteria is high.
Intersection connectives produce a high output onIy when aII oI the inputs have high vaIues.
Compensative connectives have the property that a higher degree oI satisIaction oI one oI the criteria
can compensate Ior a Iower degree oI satisIaction oI another criteria to a certain extent.
In the sense, union connectives provide IuII compensation and intersection connectives provide no
compensation. In a decision process the idea oI trade-oIIs corresponds to viewing the gIobaI evaIuation
oI an action as Iying between the worst and the best IocaI ratings. This occurs in the presence oI
conIIicting goaIs, when a compensation between the corresponding compabiIities is aIIowed. Averaging
operators reaIize trade-oIIs between objectives, by aIIowing a positive compensation between ratings.
6.6.1 An AveragIng Operator Is a FunctIon
M : |0, 1| |0, 1| |0, 1| ...(6.30)
satisIying the IoIIowing properties
Idempotency
M(x, x) x, Vx e|0, 1| ...(6.31)
Commutativity
M(x,y) M(y, x), Vx, y e|0, 1| ...(6.32)
ExtremaI conditions
M(0, 0) 0, M(1, 1) 1 ...(6.33)
Monotonicity
M(x, y) s M(x, y), iI x s x and y s y ...(6.34)
M is continuous.
Averaging operators represent a wide cIass oI aggregation operators. We prove that whatever is the
particuIar deIinition oI an averaging operator, M, the gIobaI evaIuation oI an action wiII Iie between the
vorst and the best IocaI ratings:
Lemma 6.5: II M is an averaging operator then
min x, y} s M(x, y) s max x, y}, Vx, y e|0, 1|.
IIZZY RILL-BASLD SYSTLMS 59
Proof: From idempotency and monotonicity oI M it IoIIows that
min x, y} M(min x, y}, min x, y}) s M(x, y)
and Mx, y} s M(max x, y), max x, y}) max x, y}
Which ends the prooI.
Averaging operators have the IoIIowing interesting properties:
Property 1. A strictIy increasing averaging operator cannot be associative.
Property 2. The onIy associative averaging operators are deIined by
M(x, y, o) med (a, y, o)
y x y
x y
x x y
iI
iI
iI
s s
s s
s s
R
S
|
T
|
o
o o
o
...(6.35)
where oe(0, 1)
An important IamiIy oI averaging operators is Iormed by quasi-arithmetic means
M(a
1
,...o
n
) f
1
1
1
1
n
f a
i
n
( )

_
F
H
G
I
K
J
This IamiIy has been characterized by KoImogorov as being the cIass oI aII decomposabIe
continuous averaging operators. For exampIe, the quasi-arithmetic mean oI a
1
and a
2
is deIined by
M(a
1
, a
2
) f
1
f a f a ( ) ( )
1 2
2
+ F
H
G
I
K
J
...(6.36)
The next tabIe shows the most oIten used mean operators.
TabIe 6.1 Mean operators
Name (N, O)
Harmonic mean 2 ( NO N O + )
Geometric mean NO
Arithmetic mean ( ) N O + 2
Dual of geometric mean 1 1 1 ( )( ) N O
Dual of harmonic mean ( ) ( ) N O NO N O + 2 2
Median med (N, O, o), oe(0, 1)
Generalized F-mean ( )
/
N O
F F
F
+ 2
1
e j
, p > 1
6 IIZZY LCIC AND NLIRAL NLTWRKS
6.6.2 Ordered WeIghted AveragIng
The process oI inIormation aggregation appears in many appIications reIated to the deveIopment oI
inteIIigent systems. One sees aggregation in neuraI networks, Iuzzy Iogic controIIers, vision systems,
expert systems and muIti-criteria decision aids. In 1988 Yager introduced a new aggregation technique
based on the ordered weighted averaging (OWA) operators.
An OWA operator oI dimension n is mapping F: J
n
J, that has an associated weighting
vector W (v
1
, v
2
,...v
n
)
1
such as v
i
e|0, 1|, 1 s i s n and v
1
v
2
... v
n
1.
Furthermore
F(a
1
, a
2
,...a
n
) v
1
b
1
v
2
b
2
... v
n
b
n

f
n

_
1
v
f
b
f
...(6.37)
vhere b
f
is the f-th largest element of the bag (a
1
,..., a
n
).
Example 6.3: Assume W (0.4, 0.3, 0.2, 0.1)
1
, then
F(0.7, 1, 0.2, 0.6) 0.4 1 0.3 0.7 0.2 0.6 0.1 0.2 0.75.
A IundamentaI aspect oI this operator is the re-ordering step, in particuIar an aggregate a
i
is not
associated with a particuIar weight v
i
but rather a weight is associated with a particuIar ordered position
oI aggregate. When we view the OWA weights as a coIumn vector we shaII Iind it convenient to reIer to
the weights with the Iow indices as weights at the top and those with the higher indices with weights at
the bottom.
It is noted that diIIerent OWA operators are distinguished by their weighting Iunction. In 1988
Yager pointed out three important speciaI cases oI OWA aggregations:
F*: In this case W W* (1, 0,...,0)
1
and
F*(a
1
, a
2
, ..., a
n
) max a
1
, a
2
, ..., a
n
} ...(6.38)
F
*
: In this case W W
*
(1, 0, ..., 0)
1
and
F
*
(a
1
, a
2
, ..., a
n
) min a
1
, a
2
, ..., a
n
} ...(6.39)
F
A
: In this case W W
A
(1/n ..., 1/n)
1
and
F
A
(a
1
, a
2
,..., a
n
)
a a
n
n 1
+ + ...
...(6.40)
A number oI important properties can be associated with the OWA operators. We shaII now discuss
some oI these. For any OWA operator F hoIds
F
*
(a
1
, a
2
, ..., a
n
) s F(a
1
, a
2
, ..., a
n
) s F*(a
1
, a
2
, ..., a
n
) ...(6.41)
Thus the upper an Iower star OWA operator are its boundaries. From the above it becomes cIear that
Ior any F
min a
1
, a
2
, ..., a
n
} s F(a
1
, a
2
, ..., a
n
) s max a
1
, a
2
, ..., a
n
} ...(6.42)
IIZZY RILL-BASLD SYSTLMS 61
The OWA operator can be seen to be commutative. Let (a
1
, a
2
, ..., a
n
) be a bag oI aggregates and Iet
d
1
, ..., d
n
} be any permutation oI the a
i
. Then Ior any OWA operator
F(a
1
, a
2
, ..., a
n
) F(d
1
, d
2
, ..., d
n
) ...(6.43)
A third characteristic associated with these operators is monotonicity. Assume a
i
and c
i
are a
coIIection oI aggregates, i 1, ..., n such that Ior each i, a
i
> c
i
. Then
F(a
1
, a
2
, ..., a
n
) > (c
1
, c
2
, ..., c
n
) ...(6.44)
where F is some Iixed weight OWA operator.
Another characteristic associated with these operators is idempotency. II a
i
a Ior aII i then Ior any
OWA operator
F(a
1
, ..., a
n
) a. ...(6.45)
From the above we can see the OWA operators have the basic properties associated with an
averaging operator.
Example 6. 4: A window type OWA operator takes the average oI the m arguments around the center.
For this cIass oI operators we have
v
i

0
1
0
iI
iI
iI
i k
m
k i k m
i k m
<
s s +
>
R
S
|
T
|
k m + 1 1 k n
1/m
Fig. 6.1 Window type OWA operator.
In order to cIassiIy OWA operators in regard to their Iocation between and and or, a measure oI
orness, associated with any vector W is introduce by Yager as IoIIows
orness (W)
1
1 n

i
n

_
1
(n i)v
i
It is easy to see that Ior anyW the orness(W) is aIways in the unit intervaI. Furthermore, note that the
nearer W is to an or, the cIoser its measure is to none; whiIe the nearer it is to an and, the cIoser is to zero.
Lemma 6.6: Let us consider the vectors
W
*
(1, 0, ..., 0)
1
, W
*
(1, 0, ..., 0)
1
and W
A
(1/n, ..., 1/n)
1
.
62 IIZZY LCIC AND NLIRAL NLTWRKS
Then it can easiIy be shown that
orness (W
*
) 1, orness (W
*
) 0 and orness (W
A
) 0.5.
A measure oI andness is deIined as
andness (W) 1 orness (W).
GeneraIIy, an OWA opeartor with much oI non-zero weights near the top wiII be an orlike operator,
that is,
orness (W) > 0.5
and when much oI the weights are non-zero near the bottom, the OWA operator wiII be andlike, that is,
andness (W) > 0.5.
Example 6.5: Let W (0.8, 0.2, 0.0)
1
. Then
orness (W)
1
3
(2 0.8 0.2) 0.6 and
andness (W) 1 orness (W) 1 0.6 0.4
This means that the OWA operator, deIined by
F(a
1
, a
2
, a
3
) 0.8b
1
0.2b
2
0.0b
3
where b
f
is the f-th Iargest eIement oI the bag (a
1
, a
2
, a
3
) is an or Iike aggregation.
The IoIIowing theorem shows that as we move weight up the vector we increase the orness, whiIe
moving weight down causes us to decrease orness(W).
Theorem 6.1: (Yager, 1993) Assume W and W are two n-dimensionaI OWA vectors such that
W (v
1
,..., v
n
)
1
and
W (v
1
, ..., v
f
r, ... v
k
r, ... v
n
)
1
...(6.46)
where r ~ 0, f k. Then orness (W ) > orness (W)
Proof: From the deIinition oI the measure oI orness we get
orness (W)
1
1 n
( ) ( ) ( ) ( ) n i v
n
n i v n f n k
i
i i

+
_ _ 1
1
1
r r
orness (W ) orness (W )
1
1 n
r(k f)
since k > f, orness (W ) > orness (W ).
IIZZY RILL-BASLD SYSTLMS 63
6.7 MEASURE OF DISPERSION OR ENTROPY OF AN OWA VECTOR
In 1988 Yager deIined the measure oI dispersion (or entropy) oI an OWA vector by
disp (W) v v
i i
i
In
_
...(6.47)
We can see when using the OWA operator as an averaging operator Disp (W) measures the degree to
which we use aII the aggregates equaIIy.
We can see when using the OWA operator as an averaging operator Disp(W) measures the degree to
which we use aII the aggregates equaIIy.
Fig. 6.2 Fuzzy singleton.
Suppose now that the Iact oI the GMP is given by a Iuzzy singIeton. Then the process oI
computation oI the membership Iunction oI the consequence becomes very simpIe.
For exampIe, iI we use Mamdani`s impIication operator in the GMP then
RuIe 1: iI x is A
1
then z is C
1
Fact: x is x
0
consequence: z is C
where the membership Iunction oI the consequence C is computed as
C(v) sup
u
min x
0
(u), (A
1
C
1
) (u, v)} sup
u
min x
0
(u), min A
1
(u), C
1
(v)}} ...(6.48)
Ior aII v.
Observing that x
0
(u) 0, Vu = x
0
, the supremum turns into a simpIe minimum
C(v) min x
0
(x
0
) r A
1
(x
0
) r C
1
(v)} min 1 r A
1
(x
0
) r C
1
(v)} min A
1
(x
0
), C
1
(v)} ...(6.49)
Ior aII v.
X
0
1 x
0
64 IIZZY LCIC AND NLIRAL NLTWRKS
II we use GodeI impIication operator in the GMP, then
C(v) sup
u
min x
0
(u), (A
1
C
1
) (u, v)} A
1
(x
0
) C
1
(v) ...(6.50)
Ior aII v.
So,
C(v)
1
1 0 1
1
iI
otherwise
A x C v
C v
( ) ( )
( )
s R
S
T
...(6.51)
W
A x ( )
1 0
A
1
C
1
C
X
0
U
Fig. 6. 3: Inference with Mamdani's implication operator.
A
1
X
0
u
C
C
1
W
Fig. 6.4 Inference with Godel implication operator.
RuIe 1: iI x is A
1
then z is C
1
Fact x is x
0
Consequence: z is C
where the membership Iunction oI the consequence C is computed as
C(v) sup
u
min x
0
(u), (A
1
C
1
) (u, v)} A
1
(x
0
) C
1
(v) ...(6.52)
Ior aII v.
IIZZY RILL-BASLD SYSTLMS 65
Consider a bIock oI Iuzzy IF-THEN ruIes
R
1
: iI x is A
1
then z is C
1
aIso
R
2
: iI x is A
2
then z is C
2
aIso
.......
aIso
R
n
: iI x is A
n
then z is C
n
Iact: x is x
0
Consequence: z is C
The i-th Iuzzy ruIe Irom this ruIe-base
R
i
: iI x is A
i
then z is C
i
...(6.53)
is impIemented by a fuzzy implication R
i
and is deIined as
R
i
(u, v) (A
i
C
i
)(u, v) A
i
(u) C
i
(v) ...(6.54)
Ior i 1,..., n.
Find C Irom the input x
0
and Irom the ruIe base
R R
1
, ..., R
n
} ...(6.55)
Interpretation oI
sentence connective 'aIso
impIication operator 'then
compositionaI operator 'o
We Iirst compose x
0
with each R
i
producing intermediate resuIt
C
1
x
0
o R
i
...(6.56)
Ior i 1,..., n.
C
1
is caIIed the output oI the i-th ruIe
C
1
(v) A
i
(x
0
) C
i
(v) ...(6.57)
Ior each v.
Then combine the C
1
component wise into C by some aggregation operator:
C U
i
n
1
C
1
x
0
o R
1
. ... .x
0
o R
n
...(6.58)
C(v) A
1
(x
0
) C
1
(v) v ... v
A
n
(x
0
) C
n
(v)
66 IIZZY LCIC AND NLIRAL NLTWRKS
So, the inIerence process is the IoIIowing
input to the system is x
0
IuzziIied input is x
0
Iiring strength oI the i-th ruIe is A
i
(x
0
)
the i-th individuaI ruIe output is
C
1
(v): A
1
(x
0
) C
1
(v) ...(6.59)
overaII system output (action) is
C C
1
. ... .C
n
...(6.60)
OveraII system output union oI the individuaI ruIe outputs.
6.S MAMDANI SYSTEM
For the Mamdani (Fig. 6.5) system
(a b a r b) ...(6.61)
input to the system is x
0
IuzziIied input is x
0
Iiring strength oI the i-th ruIe is A
i
(x
0
)
the i-th individuaI ruIe output is
C
1
(v) A
i
(x
0
) r C
i
(v) ...(6.62)
overaII system output (action) is
C(v) V
i
n
1
A
i
(x
0
) r C
i
(v) ...(6.63)
6.9 LARSEN SYSTEM
For the Larsen (Fig. 6.6) system
(a b ab) ...(6.64)
input to the system is x
0
IuzziIied input is x
0
Iiring strength oI the i-th ruIe is A
i
(x
0
)
the i-th individuaI ruIe output is
C
1
(v) A
i
(x
0
) C
i
(v) ...(6.65)
overaII system output (action) is
C(v) V
i
n
1
A
i
(x
0
) C
i
(v) ...(6.66)
IIZZY RILL-BASLD SYSTLMS 67
Fig. 6.5 Illustration of Mamdani system.
Fig. 6.6 Illustration of Larsen system.
6.1 DEFUZZIFICATION
The output oI the inIerence process so Iar is a Iuzzy set, speciIying a possibiIity distribution oI the
(controI) action. In the on-Iine controI, a non-Iuzzy (crisp) controI action is usuaIIy required.
ConsequentIy, one must deIuzziIy the Iuzzy controI action (output) inIerred Irom the Iuzzy reasoning
aIgorithm, nameIy:
A
2
A
1
C
1
C
1
X
0
Degree of match Individual rule output
C = C
2 2
Degree of match
Individual rule output
X
0
Overall system output
A
1 C
1
C
1
A X ( )
1 0
A
2
C
2
C
2
A X ( )
2 0
X
0
C = C
2
6S IIZZY LCIC AND NLIRAL NLTWRKS
z
0
deIuzziIier (C) ...(6.67)
where z
0
is the crisp action and defuzzifier is the deIuzziIication operator.
DeIuzziIication is a process to seIect a representative eIement Irom the Iuzzy output C inIerred Irom
the Iuzzy controI aIgorithm.
QUESTION BANK.
1. What is t-norm?
2. What are the properties to be satisIied by a t-norm?
3. What are the various basic t-norms?
4. What is t-conorm?
5. What are the properties to be satisIied by a t-conorm?
6. What are the various basic t-conorms?
7. Let 1 be a t-norm. Prove the IoIIowing statement
1
W
(x, y) s 1(x, y) s minx, y}, Vx, y e|0, 1|
8. Let S be a t-conorm. Prove the IoIIowing statement:
max a, b} s (S(a, b) s STRONG (a, b), Va, b e |0, 1|.
9. What is t-norm based intersection? ExpIain with an exampIe.
10. What is t-conorm based union? ExpIain with an exampIe.
11. What are the averaging operators?
12. What are the important properties oI averaging operators?
13. ExpIain order weighted averaging with an exampIe.
14. ExpIain the Measure oI dispersion.
15. What is entropy oI an ordered weighted averaging (OWA) vector?
16. ExpIain the inIerence with Mamdani`s impIication operator.
17. ExpIain the inIerence with GodeI`s impIication operator.
18. ExpIain Mamdani ruIe-based system.
19. ExpIain Larsen ruIe-based system.
20. What is deIuzziIication?
REFERENCES.
1. B. Schwartz and A. SkIar, StatisticaI metric spaces, Pacific Journal of Mathematics, VoI. 10, pp.
313-334, 1960.
2. B. Schwartz and A. SkIar, Associative Iunctions and statisticaI triangIe inequaIities, Publication
Mathematics, Debrecen, VoI. 8, pp. 169-186, 1961.
3. B. Schwartz and A. SkIar, Associative Iunctions and abstract semigroups, Publication
Mathematics, Debrecen, VoI. 10, pp. 69-81, 1963.
IIZZY RILL-BASLD SYSTLMS 69
4. E. CzogaIa and W. Pedrycz, Fuzzy ruIe generation Ior Iuzzy controI, Cybernetics and Systems,
VoI. 13, No. 3, pp. 275-29358, 1982.
5. R.R. Yagar, Measures oI Iuzziness based on t-norms, Stochastica, VoI. 6, No. 1, pp. 207-229,
1982.
6. R.R. Yagar, Strong truth and ruIes oI inIerence in Iuzzy Iogic and approximate reasoning,
Cybernetics and Systems, VoI. 16, No. 1, pp. 23-63, 1985.
7. J.A. Bernard, Use oI ruIe-based system Ior process controI, IEEE Control Systems Magazine, VoI.
8, No. 5, pp. 3-13, 1988.
8. V. Novak and W. Pedrucz, Fuzzy sets and t-norms in the Iight oI Iuzzy Iogic, International
Journal of Man-machine Studies, VoI. 29, No. 1, pp. 113-127, 1988.
9. M.H. Lim and T. TakeIuji, ImpIementing Iuzzy ruIe-based systems on siIicon chips, IEEE Expert,
VoI. 5, No. 1, pp. 31-45, 1990.
10. X.T. Peng, Generating ruIes Ior Iuzzy Iogic controIIers by Iunctions, Fuzzy Sets and Systems, VoI.
36, No. 1, pp. 83-89, 1990.
11. D.P. FiIev and R.R. Yagar, A generaIized deIuzziIication method via bad distributions,
International Journal of Intelligent Systems, VoI. 6, No. 7, pp. 687-687, 1991.
12. J.C. Fodor, A remark on constructing t-norms, Fuzzy Sets and Systems, VoI. 41, No. 2, pp. 195-
199, 1991.
13. M.M. Gupta and J. Qi, Theory oI t-norms and Iuzzy inIerence methods, Fuzzy Sets and Systems,
VoI. 40, No. 3, pp. 431-450, 1991.
14. A. NaIarich and J.M. KeIIer, A Iuzzy Iogic ruIe-based automatic target recognition, International
Journal of Intelligent Systems, VoI. 6, No. 3, pp. 295-312, 1991.
15. R.R. Yagar, A generaI approach to ruIe aggregation in Iuzzy Iogic controI, Applied Intellignece,
VoI. 2, No. 4, pp. 335-351, 1992.
16. L.X. Wang, and J.M. MendeI, Generating Iuzzy ruIes by Iearning through exampIes, IEEE
1ransactions on Systems, Man and Cybernetics, VoI. 22, No. 6, pp. 1414-1427, 1992.
17. F. BousIama and A Ichikawa, Fuzzy controI ruIes and their naturaI controI Iaws, Fuzzy Sets and
Systems, VoI. 48, No. 1, pp. 65-86, 1992.
18. J.J. BukIey, A generaI theory oI uncertainty based on t-conorms, Fuzzy Sets and Systems, VoI. 48,
No. 3, pp. 289-296, 1992.
19. D. Dubois and H. Prade, GraduaI inIerence ruIes in approximate reasoning, Information
Sciences, VoI. 61, No. 1, pp. 103-122, 1992.
20. R. FuIIer and H.J. Zimmerman, On computation oI the compositionaI ruIe oI inIerence under
trianguIar norms, Fuzzy Sets and Systems, VoI. 51, No. 3, pp. 267-275, 1992.
21. D.L. Hudson, M.E. Coben and M.F. Anderson, Approximate reasoning with IF-THEN-UNLESS
ruIe in a medicaI expert system, International Journal of Intelligent Systems, VoI. 7, No. 1, pp.
71-79, 1992.
22. F.C.H. Rhee and R. Krishanpuram, Fuzzy ruIe generation methods Ior high-IeveI computer
vision, Fuzzy Sets and Systems, VoI. 60, No. 3, pp. 245-258, 1993.
23. B. Cao, Input-output mathematicaI modeI with t-Iuzzy sets, Fuzzy Sets and Systems, VoI. 59,
No. 1, pp. 15-23, 1993.
7 IIZZY LCIC AND NLIRAL NLTWRKS
24. P. Doherry, P. Driankov and H. HeIIendoom, Fuzzy IF-THEN-UNLESS ruIes and their
impIementation, International Journal of Uncertainity, Fuzziness and Knovledge-based
Systems, VoI. 1, No. 2, pp. 167-182, 1993.
25. S. Dutta and P.P. Bonissone, Integrating case and ruIe-based reasoning, International Journal of
Approximate Reasoning, VoI. 8, No. 3, pp. 163-204, 1993.
26. T. Sudkamp, SimiIarity, interpoIation and Iuzzy ruIe construction, Fuzzy Sets and Systems, VoI.
58, No. 1, pp. 73-86, 1993.
27. Y.Tian and I.B. Turksen, Combination oI ruIes or their consequences in Iuzzy expert systems,
Fuzzy Sets and Systems, VoI. 58, No. 1, pp. 3-40, 1993.
28. E. Uchino, T. Yamakawa, T. Miki and S. Nakamura, Fuzzy ruIe-based simpIe interpoIation
aIgorithm Ior discrete signaI, Fuzzy Sets and Systems, VoI. 59, No. 3, pp. 259-270, 1993.
29. T. ArnouId and S. Tano, A ruIe-based method to caIcuIate exactIy the widest soIutions sets oI a
max-min Iuzzy reIations inequaIity, Fuzzy Sets and Systems, VoI. 64, No. 1, pp. 39-58, 1994.
30. V. Cross and T. Sudkamp, Patterns oI Iuzzy-ruIe based interIerence, International Journal of
Approximate Reasoning, VoI. 11, No. 3, pp. 235-255, 1994.
31. C. Ebert, RuIe-based Iuzzy cIassiIication Ior soItware quaIity controI, Fuzzy Sets and Systems,
VoI. 63, No. 3, pp. 349-358, 1994.
32. J. Kacprzyk, On measuring the speciIicity oI IF-THEN ruIes, International Journal of
Approximate Reasoning, VoI. 11, No. 1, pp. 29-53, 1994.
33. W. Pedrycz, Why trianguIar membership Iunctions? Fuzzy Sets and Systems, VoI. 64, No. 1, pp.
21-30, 1994.
7
Fuzzy Reasoning Schemes
+ 0 ) 2 6 - 4
7.1 INTRODUCTION
This chapter Iocuses diIIerent inIerence mechanisms in Iuzzy ruIe-based systems with exampIes. The
inIerence engine oI a Iuzzy expert system operates on a series oI production ruIes and makes Iuzzy
inIerences.
There exist two approaches to evaIuating reIevant production ruIes. The Iirst is data-driven and is
exempIiIied by the generaIized modus ponens. In this case, avaiIabIe data are suppIied to the expert
system, which then uses them to evaIuate reIevant production ruIes and draw aII possibIe concIusions.
An aIternative method oI evaIuation is goaI-driven; it is exempIiIied by the generaIized modus toIIens
Iorm oI IogicaI inIerence. Here, the expert system searches Ior data speciIied in the IF cIauses oI
production ruIes that wiII Iead to the objective; these data are Iound either in the knowIedge base, in the
1HEN cIauses oI other production ruIes, or by querying the user.
Since the data-driven method proceeds Irom IF cIauses to 1HEN cIauses in the chain through the
production ruIes, it is commonIy caIIed Iorward chaining. SimiIarIy, since the goaI-driven method
proceeds backward Irom 1HEN cIauses to the IF cIauses, in its search Ior the required data, it is
commonIy caIIed backward chaining. Backward chaining has the advantage oI speed, since onIy the
ruIes Ieading to the objective need to be evaIuated.
7.2 FUZZY RULE-BASE SYSTEM
R
1
: iI x is A
1
and y is B
1
then z is C
1
R
2
: iI x is A
2
and y is B
2
then z is C
2
. . . . . . . . . . . .
R
n
: iI x is A
n
and y is B
n
then z is C
n
x is x
0
and y is y
0
z is C
72 IIZZY LCIC AND NLIRAL NLTWRKS
The i-th Iuzzy ruIe Irom this ruIe-base
R
i
: iI x is A
i
and y is B
i
then z is C
i
is impIemented by a Iuzzy reIation R
i
and is deIined as
R
i
(u, v, v) (A
i
B
i
C
i
)(u, v) |A
i
(u) r B
i
(v)| C
i
(v) ...(7.1)
Ior i 1, ..., n.
Find C Irom the input x
0
and Irom the ruIe base
R R
1
, ..., R
n
} ...(7.2)
Interpretation oI
IogicaI connective 'and
sentence connective 'aIso
impIication operator 'then
compositionaI operator 'o
We Iirst compose x
0
y
0
with each R
i
producing intermediate resuIt
C
1
x
0
y
0
o R
i
...(7.3)
Ior i 1, ..., n. Here C
1
is caIIed the output oI the i-th ruIe
C
1
(v) |A
i
(x
0
) r B
i
(y
0
)| C
i
(v) ...(7.4)
Ior each v.
Then combine the C
1
component wise into C by some aggregation operator:
C U
i
n
1
C
1
x
0
y
0
o R
1
. ... . x
0
y
0
o R
n
C(v) A
i
(x
0
) B
i
(y
0
) C
1
(v) v ... v ...(7.5)
A
n
(x
0
) B
n
(y
0
) C
n
(v)
input to the system is (x
0
, y
0
)
IuzziIied input is ( x
0
, y
0
)
Iiring strength oI the i-th ruIe is A
i
(x
0
) r B
i
(y
0
)
the i-th individuaI ruIe output is C
1
(v): A
1
(x
0
) r B
1
(x
0
) C
1
(v)
overaII system output is C C
i
. ... .C
n
.
overaII system output union oI the individuaI ruIe outputs.
7.3 INFERENCE MECHANISMS IN FUZZY RULE-BASE SYSTEMS
We present Iive weII-known inIerence mechanisms in Iuzzy ruIe-based systems.
For simpIicity we assume that we have two Iuzzy IF-1HEN ruIes oI the Iorm
IIZZY RLASNINC SCILMLS 73
R
1
: iI x is A
1
and y is B
1
then z is C
1
aIso
R
2
: iI x is A
2
and y is B
2
then z is C
2
Iact: x is x
0
and y is y
0
Consequence: z is C
7.3.1 MamdanI Inference MechanIsm
The Iuzzy impIication is modeIIed by Mamdani`s minimum operator and the sentence connective aIso is
interpreted as oring the propositions and deIined by max operator.
The Iiring IeveIs oI the ruIes, denoted by o
i
, i 1, 2, are computed by
o
1
A
1
(x
0
) r B
1
(y
0
), o
2
A
2
(x
0
) r B
2
(y
0
) ...(7.6)
The individuaI ruIe outputs are obtained by
C
1
(v) (o
1
r C
1
(v)), C
2
(v) (o
2
r C
2
(v)) ...(7.7)
Then the overaII system output is computed by oring the individuaI ruIe outputs
C(v) C
1
(v) v C
2
(v) (o
1
r C
1
(v)) v (o
2
r C
2
(v)) ...(7.8)
FinaIIy, to obtain a deterministic controI action, we empIoy any deIuzziIication strategy.
Fig. 7.1 Inference with Mamdani's implication operator.
7.3.2 Tsukamoto Inference MechanIsm
AII Iinguistic terms are supposed to have monotonic membership Iunctions. The Iiring IeveIs oI the
ruIes, denoted by o
i
, i 1, 2, are computed by
o
1
A
1
(x
0
) r B
1
(y
0
), o
2
A
2
(x
0
) r B
2
(y
0
) ...(7.9)
C
1 A
1
B
1
C
2 A
2
B
2
u v w
u v w
Min
y
0
x
0
74 IIZZY LCIC AND NLIRAL NLTWRKS
In this mode oI reasoning the individuaI crisp controI actions z
1
and z
2
are computed Irom the
equations
o
1
C
1
(z
1
), o
2
C
2
(z
2
) ...(7.10)
and the overaII crisp controI action is expressed as
z
0

o o
o o
1 1 2 2
1 2
z z +
+
...(7.11)
i.e. z
0
is computed by the discrete Center oI-Gravity method.
II we have n ruIes in our ruIe-base then the crisp controI action is computed as
z
0

o
o
i i
i
n
i
i
n
z

1
1
...(7.12)
where o
i
is the Iiring IeveI and z
i
is the (crisp) output oI the i-th ruIe, i 1,..., n
Example 7.1: We iIIustrate Tsukamoto`s reasoning method by the IoIIowing simpIe exampIe
R
1
: iI x is A
1
and y is B
1
then z is C
1
aIso
R
2
: iI x is A
2
and y is B
2
then z is C
2
Iact: x is x
0
and y is y
0
Consequence: z is C
Then according to the Iigure we see that
A
1
(x
0
) 0.7, B
1
(y
0
) 0.3
ThereIore, the Iiring IeveI oI the Iirst ruIe is
o
1
minA
1
(x
0
), B
1
(y
0
)} min0.7, 0.3} 0.3
and Irom
A
2
(x
0
) 0.6. B
2
(y
0
) 0.8
It IoIIows that the Iiring IeveI oI the second ruIe is
o
2
minA
2
(x
0
), B
2
(y
0
)} min0.6, 0.8} 0.6
The individuaI ruIe outputs z
1
8 and z
2
4 are derived Irom the equations
C
1
(z
1
) 0.3, C
2
(z
2
) 0.6
and the crisp controI action is
z
0
(8 0.3 4 0.6)/(0.3 0.6) 6
IIZZY RLASNINC SCILMLS 75
7.3.3 Sugeno Inference MechanIsm
Sugeno and Takagi use the IoIIowing architecture
R
1
: iI x is A
1
and y is B
1
then z
1
a
1
x b
1
y
aIso
R
2
: iI x is A
2
and y is B
2
then z
2
a
2
x b
2
y
Iact: x is x
0
and y is y
0
Consequence: z
0
The Iiring IeveIs oI the ruIes are computed by
o
1
A
1
(x
0
) r B
1
(y
0
), o
2
A
2
(x
0
) r B
2
(y
0
) ...(7.13)
then the individuaI ruIe outputs are derived Irom the reIationships
z
*
1
a
1
x
0
b
1
y
0
, z
*
2
a
2
x
0
b
2
y
0
...(7.14)
and the crisp controI action is expressed as
z
0

o o
o o
1 1 2 2
1 2
z z
* *
+
+
...(7.15)
II we have n ruIes in our ruIe-base then the crisp controI action is computed as
z
0

o
o
1
1
1
z
i
i
n
i
i
n
*

...(7.16)
where o
i
denotes the Iiring IeveI oI the i-th ruIe, i 1, ..., n.
Fig. 7.2 Tsukamoto's inference mechanism.
A
1
B
1
C
1
C
2
B
2
A
2
u v w
u v w
0.7
0.3
0.3
Z
1
= 8
Z
2
= 4
Min
Y
0
0.6 0.8 0.6
X
0
76 IIZZY LCIC AND NLIRAL NLTWRKS
Example 7.2: We iIIustrate Sugeno`s reasoning method by the IoIIowing simpIe exampIe
R
1
: iI x is BIG and y is SMALL then z
1
x y
aIso
R
2
: iI x is MEDIUM and y is BIG then z
2
2x y
Iact : x is 3 and y is 2
Consequence: z
0
Then according to the Iigure we see that

BIG
(x
0
)
BIG
(3) 0.8

SMALL
(y
0
)
SMALL
(2) 0.2
ThereIore, the Iiring IeveI oI the Iirst ruIe is
o
1
min
BIG
(x
0
),
SMALL
(y
0
)} min 0.8, 0.2} 0.2
and Irom

MFDIUM
(x
0
)
MFDIUM
(3) 0.6,
BIG
(y
0
)
BIG
(2) 0.9
It IoIIows that the Iiring IeveI oI the second ruIe is
o
2
min
MFDIUM
(x
0
),
BIG
(y
0
)} min 0.6, 0.9} 0.6
The individuaI ruIe outputs are computed as
z
*
1
x
0
y
0
3 2 5, z
*
2
2x
0
y
0
2 3 2 4
So the crisp controI action is
z
0
(5 0.2 4 0.6)/(0.2 0.6) 4.25
Fig. 7.3 Sugeno's inference mechanism.
A
1
A
2
B
1
B
2
u v
u x
v
y
a
1
a
2
a x b y
1 1
+
a x b y
2 2
+ Min
IIZZY RLASNINC SCILMLS 77
7.3.4 Larsen Inference MechanIsm
The Iuzzy impIication is modeIed by Larsen`s product operator and the sentence
connective aIso is interpreted as oring the propositions and deIined by max operator.
Let us denote o
i
the Iiring IeveI oI the i-th ruIe, i 1, 2
o
1
A
1
(x
0
) r B
1
(y
0
), o
2
A
2
(x
0
) r B
2
(y
0
) ...(7.17)
Then membership Iunction oI the inIerred consequence C is pointwise given by
C(v) (o
1
C
1
(v)) v (o
2
C
2
(v)) ...(7.18)
To obtain a deterministic controI action, we empIoy any deIuzziIication strategy.
II we have n ruIes in our ruIe-base then the consequence C is computed as
C(v) V
i
n
1
v (o
i
C
i
(v)) ...(7.19)
where o
i
denotes the Iiring IeveI oI the i-th ruIe, i 1, ..., n
7.3.5 SImpIIfIed Fuzzy ReasonIng
R
1
: iI x is A
1
and y is B
1
then z
1
C
1
aIso
R
2
: iI x is A
2
and y is B
2
then z
2
C
2
Iact: x is x
0
and y is y
0
Consequence: z
0
Fig. 7.4 Example of Sugeno's inference mechanism.
1
1
0.8
0.2
v u
a
1
= 0.2
x y + = 5
a
2
= 0.6
2 = 4 x y
Min
0.9
0.6
3 2 v u
7S IIZZY LCIC AND NLIRAL NLTWRKS
The Iiring IeveIs oI the ruIes are computed by
o
1
A
1
(x
0
) r B
1
(y
0
), o
2
A
2
(x
0
) r B
2
(y
0
) ...(7.20)
then the individuaI ruIe outputs are c
1
and c
2
, and the crisp controI action is expressed as
z
0

o o
o o
1 1 2 2
1 2
c c +
+
...(7.21)
II we have n ruIes in our ruIe-base then the crisp controI action is computed as
z
0

o
o
i i
i
n
i
i
n
C

1
1
...(7.22)
where o
i
denotes the Iiring IeveI oI the i-th ruIe, i 1, ..., n.
Fig. 7.5 Inference with Larsen's product operation rule.
Fig. 7.6 Simplified fuzzy reasoning.
A
1
B
1
C
1
u v w
A
2
B
2
C
2
u v w X
0
Y
0
Min
Z
3
a
3
Min
a
2
C
2
a
1
C
1
H
3
M
3
L
3
H
2
H
1
M
2
M
1
H
2
L
1
IIZZY RLASNINC SCILMLS 79
QUESTION BANK.
1. What are the diIIerent approaches to evaIuating reIevant production ruIes? FxpIain them.
2. FxpIain Mamdani inIerence mechanism.
3. FxpIain Tsukamoto inIerence mechanism.
4. FxpIain Sugeno inIerence mechanism.
5. FxpIain Larsen inIerence mechanism.
6. FxpIain simpIiIied reasoning scheme.
REFERENCES.
1. L.A. Zadeh, Fuzzy Iogic and approximate reasoning, Synthese, VoI. 30, No. 1, pp. 407-428, 1975.
2. L.A. Zadeh, The concept oI a Iinguistic variabIe and its appIication to approximate reasoning I,
Information Sciences, VoI. 8, pp. 199-251, 1975.
3. L.A. Zadeh, The concept oI a Iinguistic variabIe and its appIication to approximate reasoning II,
Information Sciences, VoI. 8, pp. 301-357, 1975.
4. L.A. Zadeh, The concept oI a Iinguistic variabIe and its appIication to approximate reasoning III,
InIormation sciences, VoI. 9, pp. 43-80, 1975.
5. B.R. Gaines, Foundations oI Iuzzy reasoning, International Journal of Man-machine Studies,
VoI. 8, No. 6, pp. 623-668, 1976.
6. F.H. Mamdani, AppIications oI Iuzzy Iogic to approximate reasoning using Iinguistic systems,
IEEE 1ransactions on Systems, Man and Cybernetics, VoI. 26, No. 12, pp. 1182-1191, 1977.
7. J.F. BaIdwin, Fuzzy Iogic and reasoning, International Journal of Man-machine Studies, VoI. 11,
No. 4, pp. 465-480, 1979.
8. F.H. Mamdani and B.R. Gaines, Fuzzy Reasoning and Its Applications, Academic Press,
London, 1981.
9. M. Sugeno and T. Takagi, MuItidi-mensionaI Iuzzy reasoning, Fuzzy Sets and Systems, VoI. 9,
No. 3, pp. 313-325, 1983.
10. W. Pedrycz, AppIications oI Iuzzy reIationaI equations Ior methods oI reasoning in presence oI
Iuzzy data, Fuzzy Sets and Systems, VoI. 16, No. 2, pp. 163-175, 1985.
11. H. Farreny and H. Prade, DeIauIt and inexact reasoning with possibiIity degrees, IEEE
1ransactions on Systems, Man and Cybernetics, VoI. 16, No. 2, pp. 270-276, 1986.
12. M.B. GorzaIczany, A method oI inIerence in approximate reasoning based on intervaI-vaIued
Iuzzy sets, Fuzzy Sets and Systems, VoI. 21, No. 1, pp. 1-17, 1987.
13. F. Sanchez and L.A. Zadeh, Approximate Reasoning in Intelligent Systems, Decision and
Control, Pergamon Press, OxIord, U.K, 1987.
14. I.B. Turksen, Approximate reasoning Ior production pIanning, Fuzzy Sets and Systems, VoI. 26,
No. 1, pp. 23-37, 1988.
15. I.B. Turksen, Four methods oI approximate reasoning with intervaI-vaIued Iuzzy sets,
International Journal of Approximate Reasoning, VoI. 3, No. 2, pp. 121-142, 1989.
S IIZZY LCIC AND NLIRAL NLTWRKS
16. A. Basu and A. Dutta, Reasoning with imprecise knowIedge to enhance inteIIigent decision
support, IEEE 1ransactions on Systems, man and cybernetics, VoI. 19, No. 4, pp. 756-770, 1989.
17. Z. Cao, A. KandeI and L. Li, A new modeI Ior Iuzzy reasoning, Fuzzy Sets and Systems, VoI. 36,
No. 3, pp. 311-325, 1990.
18. R. Kruse and F. Schwecke, Fuzzy reasoning in a muItidimensionaI space oI hypotheses,
International Journal of Approximate Reasoning, VoI. 4, No. 1, pp. 47-68, 1990.
19. C.Z. Luo and Z.P. Wang, Representation oI compositionaI reIations in Iuzzy reasoning, Fuzzy
Sets and Systems, VoI. 36, No. 1, pp. 77-81, 1990.
20. D. Dubois and H. Prade, Fuzzy sets in approximate reasoning, Part I: InIerence with possibiIity
distributions, Fuzzy Sets and Systems, VoI. 40, No. 1, pp. 143-202, 1991.
21. S. Dutta, Approximate spatiaI reasoning: Integrating quaIitative and quantitative constraints,
International Journal of Approximate Reasoning, VoI. 5, No. 3, pp. 307-330, 1991.
22. Z. PawIak, Rough sets: TheoreticaI aspects oI reasoning about data, KIuwer, Bostan, 1991.
23. F.H. Ruspini, Approximate reasoning: past, present, Iuture, Information Sciences, VoI. 57, pp.
297-317, 1991.
24. S.M. Chen, A new improved aIgorithm Ior inexact reasoning based on extended Iuzzy production
ruIes, Cybernetics and Systems, VoI. 23, No. 5, pp. 409-420, 1992.
25. D.L. Hudson, M.F. Coben and M.F. Anderson, Approximate reasoning with IF-THFN-UNLFSS
ruIe in a medicaI expert system, International Journal of Intelligent Systems, VoI. 7, No. 1, pp.
71-79, 1992.
26. H. Nakanishi, I.B. Turksen and M. Sugeno, A review and comparison oI six reasoning methods,
Fuzzy Sets and Systems, VoI. 57, No. 3, pp. 257-294, 1993.
27. Z. Bien and M.G. Chun, An inIerence network Ior bidirectionaI approximate reasoning based on
an equaIity measure, IEEE 1ransactions on Fuzzy Systems, VoI. 2, No. 2, pp. 177-180, 1994.
8
Fuzzy Logic ControIIers
+ 0 ) 2 6 - 4
S.1 INTRODUCTION
ConventionaI controIIers are derived Irom controI theory techniques based on mathematicaI modeIs oI
the open-Ioop process, caIIed system, to be controIIed. The purpose oI the Ieedback controIIer is to
guarantee a desired response oI the output y. The process oI keeping the output y cIose to the set point
(reIerence input) y*, despite the presence disturbances oI the system parameters, and noise
measurements, is caIIed reguIation. The output oI the controIIer (which is the input oI the system) is the
controI action u.
S.2 BASIC FEEDBACK CONTROL SYSTEM
The generaI Iorm oI the discrete-time controI Iaw is
u(k) f(e(k), e(k 1), ..., e(k t), u(k 1), ..., e(k t)) ...(8.1)
providing a controI action that describes the reIationship between the input and the output oI the
controIIer.
e represents the error between the desired set point y
*
and the output oI the system y,
parameter t deIines the order oI the controIIer,
f is in generaI a non-Iinear Iunction.
Controller System
y* e u y
Fig. 8.1 A basic feedback control system.
S2 IIZZY LCIC AND NLIRAL NLTWRKS
S.3 FUZZY LOGIC CONTROLLER
L.A. Zadeh (1973) was introduced the idea oI IormuIating the controI aIgorithm by IogicaI ruIes. In a
Iuzzy Iogic controIIer (FLC), the dynamic behaviour oI a Iuzzy system is characterized by a set oI
Iinguistic description ruIes based on expert knowIedge. The expert knowIedge is usuaIIy oI the Iorm
IF (a set oI conditions are satisIied) THEN (a set oI consequences can be inIerred).
Since the antecedents and the consequents oI these IF-THEN ruIes are associated with Iuzzy
concepts (Iinguistic terms), they are oIten caIIed Iuzzy conditionaI statements.
In our terminoIogy, a Iuzzy controI ruIe is a Iuzzy conditionaI statement in which the antecedent is
a condition in its appIication domain and the consequent is a controI action Ior the system under controI.
BasicaIIy, Iuzzy controI ruIes provide a convenient way Ior expressing controI poIicy and domain
knowIedge. Furthermore, severaI Iinguistic variabIes might be invoIved in the antecedents and the
concIusions oI these ruIes. When this is the case, the system wiII be reIerred to as a muIti-input-muIti-
output (MIMO) Iuzzy system.
S.3.1 Two-Input-SIngIe-Output {TISO) Fuzzy Systems
For exampIe, in the case oI two-input-singIe-output Iuzzy systems, Iuzzy controI ruIes have the Iorm.
R
1
: iI x is A
1
and y is B
1
then z is C
1
aIso
R
2
: iI x is A
2
and y is B
2
then z is C
2
aIso
. . .
aIso
R
n
: iI x is A
n
and y is B
n
then z is C
n
where x and y are the process state variabIes, z is the controI variabIe, A
i
, B
i
, and C
i
are Iinguistic vaIues
oI the Iinguistic variabIes x, y and z in the universes oI discourse U, J, and W, respectiveIy, and an
impIicit sentence connective aIso Iinks the ruIes into a ruIe set or, equivaIentIy, a ruIe-base.
S.3.2 MamdanI Type of Fuzzy LogIc ControI
We can represent the FLC in a Iorm simiIar to the conventionaI controI Iaw
u(k) f(e(k), e(k 1), ..., e(k t), u(k 1), ..., e(k t) ...(8.2)
where the Iunction F is described by a Iuzzy ruIe base. However, it does not mean that the FLC is a kind
oI transIer Iunction or diIIerence equation.
The knowIedge-based nature oI FLC dictates a Iimited usage oI the past vaIues oI the error e and
controI u because it is rather unreasonabIe to expect meaningIuI Iinguistic statements Ior
e(k 3), e(k 4), ..., e(k t).
IIZZY LCIC CNTRLLLRS S3
A typicaI FLC describes the reIationship between the changes oI the controI
Au(k) u(k) u(k 1) ...(8.3)
On the one hand, and the error e(k) and its change
Ae(k) e(k) e(k 1) ...(8.4)
On the other hand, such controI Iaw can be IormaIized as
Au(k) F(e(k), Ae(k)) ...(8.5)
and is a maniIestation oI the generaI FLC expression with t 1.
The actuaI output oI the controIIer u(k) is obtained Irom the previous vaIue oI controI u(k 1) that
is updated by Au(k)
u(k) u(k 1) Au(k). ...(8.6)
This type oI controIIer was suggested originaIIy by Mamdani and Assilian in 1975 and is caIIed the
Mamdani type FLC. A prototypicaI ruIe-base oI a simpIe FLC reaIizing the controI Iaw above is Iisted in
the IoIIowing
R
1
: iI e is "positive" and Ae is "near zero" then Au is "positive"
R
2
: iI e is "negative" and Ae is "near zero" then Au is "negative"
R
3
: iI e is "near zero" and Ae is "near zero" then Au is "near zero"
R
4
: iI e is "near zero" and Ae is "positive" then Au is "positive"
R
5
: iI e is "near zero" and Ae is "negative" then Au is "negative"
N P Error ZE
Fig. 8.2 Membership functions for the error.
So, our task is the Iind a crisp controI action z
0
Irom the Iuzzy ruIe-base and Irom the actuaI crisp
inputs x
0
and y
0
:
R
1
: iI x is A
1
and y is B
1
then z is C
1
aIso
R
2
: iI x is A
2
and y is B
2
then z is C
2
aIso
. . . .
S4 IIZZY LCIC AND NLIRAL NLTWRKS
aIso
R
n
: iI x is A
n
and y is B
n
then z is C
n
input x is x
0
and y is y
0
output z
0
OI course, the inputs oI Iuzzy ruIe-based systems shouId be given by Iuzzy sets, and thereIore, we
have to IuzziIy the crisp inputs. Furthermore, the output oI a Iuzzy system is aIways a Iuzzy set, and
thereIore to get crisp vaIue we have to deIuzziIy it.
Fig. 8.3 Fuzzy logic controller.
S.3.3 Fuzzy LogIc ControI Systems
Fuzzy Iogic controI systems (Figure 8.3) usuaIIy consist oI Iour major parts:
FuzziIication interIace,
Fuzzy ruIebase,
Fuzzy inIerence machine and
DeIuzziIication interIace.
A IuzziIication operator has the eIIect oI transIorming crisp data into Iuzzy sets. In most oI the
cases we use Iuzzy singIetons as IuzziIiers
IuzziIier (x
0
): x
0
...(8.7)
where x
0
is a crisp input vaIue Irom a process.
1
X
0
X
0
Fig. 8.4 Fuzzy singleton as fuzzifier.
Fuzzifier
Fuzzy set in U
Fuzzy
inference
engine
Fuzzy
rule
base
Defuzzifier
Fuzzy set in V
Crisp in x U
Crisp y in V
IIZZY LCIC CNTRLLLRS S5
Suppose now that we have two input variabIes x and y. A Iuzzy controI ruIe
R
i
: iI (x is A
i
and y is B
i
then (z is C
i
)
is impIemented by a fuzzy implication R
i
and is deIined as
R(u,v, v) |A
i
(u) and B
i
(v)| C
i
(v) ...(8.8)
where the IogicaI connective and is impIemented by the minimum operator, i.e.
|A
i
(u) and B
i
(v)| C
i
(v) |A
i
(u) B
i
(v)| C
i
(v) ...(8.9)
min |A
i
(u), B
i
(v)| C
i
(v)}
OI course, we can use any t-norm to modeI the IogicaI connective and. Fuzzy controI ruIes are
combined by using the sentence connective also. Since each Iuzzy controI ruIe is represented by a Iuzzy
reIation, the overaII behavior oI a Iuzzy system is characterized by these Iuzzy reIations.
In other words, a Iuzzy system can be characterized by a singIe Iuzzy reIation which is the
combination in question invoIves the sentence connective also.
SymboIicaIIy, iI we have the coIIection oI ruIes
R
1
: iI x is A
1
and y is B
1
then z is C
1
aIso
R
2
: iI x is A
2
and y is B
2
then z is C
2
aIso
. . .
aIso
R
n
: iI x is A
n
and y is B
n
then z is C
n
The procedure Ior obtaining the Iuzzy output oI such a knowIedge base consists Irom the IoIIowing
three steps:
Find the Iiring IeveI oI each oI the ruIes.
Find the output oI each oI the ruIes.
Aggregate the individuaI ruIe outputs to obtain the overaII system output.
To inIer the output z Irom the given process states x, y and Iuzzy reIations R
i
, we appIy the
compositionaI ruIe oI inIerence:
R
1
: iI x is A
1
and y is B
1
then z is C
1
aIso
R
2
: iI x is A
2
and y is B
2
then z is C
2
aIso
. . .
aIso
R
n
: iI x is A
n
and y is B
n
then z is C
n
input x is x
0
and y is y
0
Consequence : z is C
S6 IIZZY LCIC AND NLIRAL NLTWRKS
Where the consequence is computed by
consequence Agg (Iact o R
1
, ..., Iact o R
n
) ...(8.10)
That is,
C Agg ( x
0
y
0
o R
1
,

..., x
0
y
0
o R
n
) ...(8.11)
taking into consideration that
x
0
(u) 0, u = x
0
...(8.12)
and
y
0
(v) 0, v = y
0
...(8.13)
The computation oI the membership Iunction oI C is very simpIe:
C(v) Agg A
1
(x
0
) B
1
(y
0
) C
1
(v), ..., A
n
(x
0
) B
n
(y
0
) C
n
(v)} ...(8.14)
Ior aII v e W.
The procedure Ior obtaining the Iuzzy output oI such a knowIedge base can be IormuIated as
The Iiring IeveI oI the I-th ruIe is determined by
A
i
(x
0
) B
i
(y
0
) ...(8.15)
The output oI the I-th ruIe is caIcuIated by
C
1
(v) A
i
(x
0
) B
i
(y
0
) C
i
(v) Ior aII v e W ...(8.16)
The overaII system output, C, is obtained Irom the individuaI ruIe outputs C`
i
by
C(v) Agg C
1
, ..., C
n
} Ior aII v e W. ...(8.17)
Example 8.1: II the sentence connective aIso is interpreted as oring the ruIes by using minimum-norm
then the membership Iunction oI the consequence is computed as
C ( x
0
y
0
o R
1
.... .x
0
y
0
o R
n
)
That is
C(v) A
1
(x
0
) B
1
(y
0
) C
1
(v) J... J A
n
(x
0
) B
n
(y
0
) C
n
(v)
Ior aII v e W.
S.4 DEFUZZIFICATION METHODS
The output oI the inIerence process so Iar is a Iuzzy set, speciIying a possibiIity distribution oI controI
action. In the on-Iine controI, a nonIuzzy (crisp) controI action is usuaIIy required.
ConsequentIy, one must deIuzziIy the Iuzzy controI action (output) inIerred Irom the Iuzzy controI
aIgorithm, nameIy:
z
0
deIuzziIier (C) ...(8.18)
where z
0
is the nonIuzzy controI output and defuzzifier is the deIuzziIication operator.
IIZZY LCIC CNTRLLLRS S7
DeIuzziIication is a process to seIect a representative eIement Irom the Iuzzy output C inIerred Irom
the Iuzzy controI aIgorithm.
The most oIten used deIuzziIication operators are:
S.4.1 Center-of-AreaJGravIty
The deIuzziIied vaIue oI a Iuzzy set C is deIined as its Iuzzy centroid:
z
0

zC z dz
c z dz
v
v
( )
( )
z
z
...(8.19)
The caIcuIation oI the Center-oI-Area deIuzziIied vaIue is simpIiIied iI we consider Iinite universe
oI discourse W and thus discrete membership Iunction C (v)
z
0

z C z dz
c z
f f
f
( )
( )
_
_
...(8.20)
S.4.2 FIrst-of-MaxIma
The deIuzziIied vaIue oI a Iuzzy set C is its smaIIest maximizing eIement, i.e.
z
0
min z C z C v
u
( ) max ( )
R
S
T
U
V
W
...(8.21)
Z
0
Fig. 8.5 First-of-maxima defuzzification method.
S.4.3 MIddIe-of-MaxIma
The deIuzziIied vaIue oI a discrete Iuzzy set C is deIined as a mean oI aII vaIues oI the universe oI
discourse, having maximaI membership grades
z
0

1
1
N
z
f
f
n

_
...(8.22)
SS IIZZY LCIC AND NLIRAL NLTWRKS
where z
1
, ..., z
N
} is the set oI eIements oI the universe W which attain the maximum vaIue oI C.
II C is not discrete then deIuzziIied vaIue oI a Iuzzy set C is deIined as
z
0

zdz
dz
G
G
z
z
...(8.23)
where G denotes the set oI maximizing eIement oI C.
Z
0
Fig. 8.6 Middle-of-maxima defuzzification method.
S.4.4 Max-CrIterIon
This method chooses an arbitrary vaIue, Irom the set oI maximizing eIements oI C, i.e.
z
0
e z C z C v
v
( ) max ( )
R
S
T
U
V
W
...(8.24)
S.4.5 HeIght DefuzzIfIcatIon
The eIements oI the universe oI discourse W that have membership grades Iower than a certain IeveI o
are compIeteIy discounted and the deIuzziIied vaIue z
0
is caIcuIated by the appIication oI the Center-oI-
Area method on those eIements oI W that have membership grades not Iess than o:
z
0

zC z dz
c z dz
C
C
( )
( )
| |
| |
o
o
z
z
...(8.25)
where |C|
o
denotes the o-IeveI set oI C as usuaIIy.
Example 8.2: Consider a Iuzzy controIIer steering a car in a way to avoid obstacIes. II an obstacIe
occurs right ahead, the pIausibIe controI action depicted in Figure couId be interpreted as 'turn right or
left
Both Center-oI-Area and MiddIe-oI-Maxima deIuzziIication methods resuIt in a controI action
'driveahead straightIorward which causes an accident.
IIZZY LCIC CNTRLLLRS S9
A suitabIe deIuzziIication method wouId have to choose between diIIerent controI actions (choose
one oI two triangIes in the Figure) and then transIorm the Iuzzy set into a crisp vaIue.
S.5 EFFECTIVITY OF FUZZY LOGIC CONTROL SYSTEMS
Using the Stone-Weierstrass theorem, Wang (1992) showed that Iuzzy Iogic controI systems oI the Iorm
R
i
: iI x is A
i
and y is B
i
then z is C
i,
i 1, ., n
with
Gaussian membership Iunctions
A
i
(u) exp
F
H
G
I
K
J
L
N
M
M
O
Q
P
P
1
2
1
1
2
u
i
i
o

B
i
(u) exp
F
H
G
I
K
J
L
N
M
M
O
Q
P
P
1
2
2
2
2
v
i
i
o

...(8.26)
C
i
(v) exp
F
H
G
I
K
J
L
N
M
M
O
Q
P
P
1
2
3
3
2
v
i
i
o

SingIeton IuzziIier
IuzziIier (x): x , IuzziIier (y): y ...(8.27)
Product Iuzzy conjunction
|A
i
(u) and B
i
(v)| A
i
(u) B
i
(v) ...(8.28)
Product Iuzzy impIication (Larsen impIication)
|A
i
(u) and B
i
(v)| C
i
(v) A
i
(u) B
i
(v) C
i
(v) (8.29)
Z
0
C
Fig. 8.7 Undesired result by Center-of-Area and Middle-of-Maxima defuzzification methods.
9 IIZZY LCIC AND NLIRAL NLTWRKS
Centroid deIuzziIication method
z
o
i i i
i
n
i i
i
n
A x B y
A x B y
3
1
1
( ) ( )
( ) ( )

_
_
... (8.30)
where o
i3
is the center oI C
i
.
are universaI approximators, i.e. they can approximate any continuous Iunction on a compact set to
arbitrary accuracy.
NameIy, he proved the IoIIowing theorem
Theorem 8.1 For a given real-valued continuous function g on the compact set U and arbitrary r ~ 0,
there exists a fuzzy logic control system vith output function f such that
sup
x U e
,,g(x) f(x),, s r ...(8.31)
Castro in 1995 showed that Mamdani`s Iuzzy Iogic controIIers
R
i
: iI x is A
i
and y is B
i
then z is C
i,
i 1, ., n
with
Symmetric trianguIar membership Iunctions
A
i
(u) exp
1
0
s R
S
T
, , , , a u a u
i i i i
o o iI
otherwise
B
i
(v) exp
1
0
s R
S
T
, , , , b v b v
i i i i
iI
otherwise
...(8.32)
C
i
(v) exp
1
0
s R
S
T
, , , , c v c v
i i i i
y y iI
otherwise
SingIeton Iuzzier
IuzziIier (x
0
): x
0
...(8.33)
Minimum norm Iuzzy conjunction
|A
i
(u) and B
i
(v)| min A
i
(u)B
i
(v)} ... (8.34)
Minimum norm Iuzzy impIication
|A
i
(u) and B
i
(v)| C
i
(W) min A
i
(u), B
i
(v), C
i
(W)} ...(8.35)
Maximum t-conorm ruIe aggregation
Agg (R
1
, R
2
, ..., R
n
) max R
1
, R
2
, ..., R
n
} ...(8.36)
IIZZY LCIC CNTRLLLRS 91
Centroid deIuzziIication method
z
c A x B y
A x B y
i i i
i
n
i i
i
n
min ( ) ( )}
min ( ) ( )}

_
_
1
1
...(8.37)
where c
i
is the center oI C
i
are aIso universaI approximators.
QUESTION BANK.
1. What is Iuzzy Iogic controIIer?
2. ExpIain two-input-singIe-output Iuzzy system.
3. ExpIain Mamdani type oI Iuzzy Iogic controIIer.
4. What are the various parts oI Iuzzy Iogic controI system? ExpIain them.
5. What are the various deIuziIication methods? ExpIain them.
6. What is the eIIectivity oI Iuzzy Iogic controI systems?
REFERENCES.
1. L.A. Zadeh, a rationaIe Ior Iuzzy controI, JournaI oI dynamicaI systems, Measurement and
Control, VoI. 94, No. 1, pp. 3-4, 1971.
2. E.H. Mamdani and S. AssiIian, An experiment in Iinguistic synthesis with a Iuzzy Iogic
controIIer, International Journal of Man, Machine Studies, VoI. 7, No. 1, pp. 1-13, 1975.
3. E.H. Mamdani, Advances in the Iinguistic synthesis oI Iuzzy controIIers, International Journal of
Machine Studies, VoI. 8, No. 6, pp. 669-678, 1976.
4. P.J. King and E.H. Mamdani, The appIication oI Iuzzy controI systems to industriaI process,
Automatica, VoI. 13, No. 3, pp. 235-242, 1977.
5. W.J.M. Kickert and E.H. Mamdani, AnaIysis oI a Iuzzy Iogic controIIer, Fuzzy sets and systems,
VoI. 1, No. 1, pp. 29-44, 1978.
6. M. Brase and D.A. RutherIord, SeIection oI parameters Ior a Iuzzy Iogic controIIer, Fuzzy Sets
and Systems, VoI. 2, No. 3, pp. 185-199, 1979.
7. C.C. Lee, SeIection oI parameters Ior a Iuzzy Iogic controIIer, Fuzzy Sets and Systems, VoI. 2,
No. 3, pp. 185-199, 1979.
8. E. CzogaIa and W. Pedrycz, ControI probIems in Iuzzy systems, Fuzzy Sets and Systems, VoI. 7,
No. 3, pp. 257-274, 1982.
9. E. CzogaIa and W. Pedrycz, Fuzzy ruIe generation Ior Iuzzy controI, Cybernetics and Systems,
VoI. 13, No. 3, pp. 275-293, 1982.
92 IIZZY LCIC AND NLIRAL NLTWRKS
10. K.S. Ray and D. Dutta Majumdar, AppIication oI circIe criteria Ior stabiIity anaIysis oI Iinear
SISO and MIMO systems associated with Iuzzy Iogic controIIer, IEEE 1ransactions of on
Systems, Man and Cybernetics, VoI. 14, No. 2, pp. 345-349, 1984.
11. M. Sugeno, An introductory survey oI Iuzzy controI, Infromation Sciences, VoI. 36, No. 1, pp. 59-
83, 1985.
12. T. Takagi and M. Sugeno, Fuzzy identiIication oI systems and its appIications to modeIing and
controI, IEEE 1ransactions on Systems, Man and Cybernetics, VoI. 15, No. 1, pp. 116-132, 1985.
13. M.M. Gupta, J.B. Kiszks and G.M. Trojan, MuItivariabIe structure oI Iuzzy controI systems,
IEEE 1ransactions on Systems, Man and Cybernetics, VoI. 16, No. 5, pp. 638-656, 1986.
14. J.A. Bernard, Use oI ruIe-based system Ior process controI, IEEE Control Systems Magazine,
VoI. 8, No. 5, pp. 3-13, 1988.
15. B.P. Graham and R.B. NeweII, Fuzzy identiIication and controI oI a Iiquid IeveI rig, Fuzzy Sets
and Systems, VoI. 26, No. 3, pp. 255-273, 1988.
16. J.J. BuckIey, Fuzzy v/s non-Iuzzy controIIers, Control and Cybernetics, VoI. 18, No. 2, pp. 127-
130, 1989.
17. X.T. Peng, Generating ruIes Ior Iuzzy Iogic controIIers by Iunctions, Fuzzy Sets and Systems,
VoI. 36, No. 1, pp. 83-89, 1990.
18. J.F. BIadwin and N.C.F. GuiId, ModeIing controIIers using Iuzzy reIations, Kybernets, VoI. 9,
No. 3, pp. 223-229, 1991.
19. J.J. BuckIey, Theory oI the Iuzzy controIIer: an introduction, Fuzzy Sets and Systems, VoI. 51,
No. 3, pp. 249-258, 1992.
20. K. Tanaka and M. Sugeno, StabiIity anaIysis and design oI Iuzzy controI systems, Fuzzy Sets and
Systems, VoI. 45, No. 2, pp. 135-156, 1992.
21. G.M. AbdeInour, C.H. Chang, F.H. Huang and J.Y. Cheung, Design oI a Iuzzy controIIer using
input and output mapping Iactors, IEEE 1ransactions on Systems, Man and Cybernetics, VoI. 21,
No. 5, pp. 925-960, 1991.
22. F. BouIIama and A. Ichikawa, Fuzzy controI ruIes and their naturaI controI Iaws, Fuzzy Sets and
Systems, VoI. 48, No .1, pp. 65-86, 1992.
23. A. KandeI , L.H. Li and Z.Q. Cao, Fuzzy inIerence and its appIicabiIity to controI systems, Fuzzy
Sets and Systems, VoI. 48, No. 1, pp. 99-111, 1992.
24. R.R. Yager, A generaI approach to ruIe aggregation in Iuzzy Iogic controI, Applied Intelligence,
VoI. 2, No. 4, pp. 335-351, 1992.
25. C. Wong, C. Chou and D. Mon, Studies on the output oI Iuzzy controIIer with muItipIe inputs,
Fuzzy Sets and Systems, VoI. 57, No. 2, pp. 149-158, 1993.
26. R. Ragot and M. Lamotte, Fuzzy Iogic controI, International Journal of Systems Sciences, VoI.
24, No. 10, pp. 1825-1848, 1993.
27. B. Chung and J. Oh, ControI oI dynamic systems using Iuzzy Iearning aIgorithm, Fuzzy Sets and
Systems, VoI. 59, No. 1, pp. 1-14, 1993.
28. J.Q. Chen and L.J. Chen, Study on stabiIity oI Iuzzy cIosed-Ioop controI systems, Fuzzy Sets and
Systems, VoI. 57, No. 2, pp. 159-168, 1993.
IIZZY LCIC CNTRLLLRS 93
29. N. KiupeI and P.M. Frank, Fuzzy controI oI steam turbines, Journal of Systems Science, VoI. 24,
No. 10, pp.1905-1914, 1993.
30. D.P. FiIev and R.R. Yagar, Three modeIs oI Iuzzy Iogic controIIers, Cybernetics and Systems,
VoI. 24, No. 2, pp. 91-114, 1993.
31. W. Pedrycz, Fuzzy controIIers: PrincipIes and architectures, Asia-Pacific Engineering Journal,
VoI. 3, No. 1, pp. 1-32, 1993.
32. J.Y. Han and V. Mc Murray, Two-Iayer muItipIe-variabIe Fuzzy Iogic controIIer, IEEE
1ransactions of Systems, Man and Cybernetics, VoI. 23, No. 1, pp. 277-285, 1993.
33. C.V. AItrock, H.O. Arend, B. Krause, C. SteIIess and E.B. RommIer, Adaptive Iuzzy controI
appIied to home heating system, Fuzzy Sets and Systems, VoI. 61, No. 1, pp. 29-36, 1994.
34. R.R. Yager, and D.P. FiIev, Essentials of Fuzzy Modeling and Control, John WiIey, New York,
1994.
35. A.J. Bugarin, S. Barro and R. Ruiz, Fuzzy controI architectures, Journal of Intelligent and Fuzzy
Systems, VoI.2, No.2, pp.125-146, 1994.
9
Fuzzy Logic Applications Fuzzy Logic Applications Fuzzy Logic Applications Fuzzy Logic Applications Fuzzy Logic Applications
C H A P T E R
9.1 WHY USE FUZZY LOGI C?
Here is a list of general observations about fuzzy logic:
1. Fuzzy logic is conceptually easy to understand.
The mathematical concepts behind fuzzy reasoning are very simple. What makes fuzzy nice is the
naturalness of its approach and not its far-reaching complexity.
2. Fuzzy logic is flexible.
With any given system, its easy to massage it or layer more functionality on top of it without
starting again from scratch.
3. Fuzzy logic is tolerant of imprecise data.
Everything is imprecise if you look closely enough, but more than that, most things are imprecise
even on careful inspection. Fuzzy reasoning builds this understanding into the process rather than
tacking it onto the end.
4. Fuzzy logic can model nonlinear functions of arbitrary complexity.
You can create a fuzzy system to match any set of input-output data. This process is made
particularly easy by adaptive techniques like ANFIS (Adaptive Neuro-Fuzzy Inference Systems),
which are available in the Fuzzy Logic Toolbox.
5. Fuzzy logic can be built on top of the experience of experts.
In direct contrast to neural networks, which take training data and generate opaque, impenetrable
models, fuzzy logic lets you rely on the experience of people who already understand your
system.
6. Fuzzy logic can be blended with conventional control techniques.
Fuzzy systems dont necessarily replace conventional control methods. In many cases fuzzy
systems augment themand simplify their implementation.
7. Fuzzy logic is based on natural language.
The basis for fuzzy logic is the basis for human communication. This observation underpins
many of the other statements about fuzzy logic.
FUZZY LOGIC APPLICATIONS 95
The last statement is perhaps the most important one and deserves more discussion. Natural
language, that which is used by ordinary people on a daily basis, has been shaped by thousands of years
of human history to be convenient and efficient. Sentences written in ordinary language represent a
triumph of efficient communication. We are generally unaware of this because ordinary language is, of
course, something we use every day. Since fuzzy logic is built.
9.2 APPLI CATI ONS OF FUZZY LOGI C
Fuzzy logic deals with uncertainty in engineering by attaching degrees of certainty to the answer to a
logical question. Why should this be useful? The answer is commercial and practical. Commercially,
fuzzy logic has been used with great success to control machines and consumer products. In the right
application fuzzy logic systems are simple to design, and can be understood and implemented by non-
specialists in control theory.
In most cases someone with a intermediate technical background can design a fuzzy logic
controller. The control system will not be optimal but it can be acceptable. Control engineers also use it
in applications where the on-board computing is very limited and adequate control is enough. Fuzzy
logic is not the answer to all technical problems, but for control problems where simplicity and speed of
implementation is important then fuzzy logic is a strong candidate. A cross section of applications that
have successfully used fuzzy control includes:
1. Environmental
Air Conditioners
Humidifiers
2. Domestic Goods
Washing Machines/Dryers
Vacuum Cleaners
Toasters
Microwave Ovens
Refrigerators
3. Consumer Electronics
Television
Photocopiers
Still and Video Cameras Auto-focus, Exposure and Anti-shake
Hi-Fi Systems
4. Automotive Systems
Vehicle Climate Control
Automatic Gearboxes
Four-wheel Steering
Seat/Mirror Control Systems
96 FUZZY LOGIC AND NEURAL NETWORKS
9.3 WHEN NOT TO USE FUZZY LOGI C?
Fuzzy logic is not a cure-all. When should you not use fuzzy logic? Fuzzy logic is a convenient way to
map an input space to an output space. If you find it is not convenient, try something else. If a simpler
solution already exists, use it. Fuzzy logic is the codification of common sense-use common sense when
you implement it and you will probably make the right decision. Many controllers, for example, do a
fine job without using fuzzy logic. However, if you take the time to become familiar with fuzzy logic,
you will see it can be a very powerful tool for dealing quickly and efficiently with imprecision and non-
linearity.
9.4 FUZZY LOGI C MODEL FOR PREVENTI ON OF ROAD ACCI DENTS
Traffic accidents are rare and random. However, many people died or injured because of traffic
accidents all over the world. When statistics are investigated India is the most dangerous country in
terms of number of traffic accidents among Asian countries. Many reasons can contribute these results,
which are mainly driver fault, lack of infrastructure, environment, literacy, weather conditions etc. Cost
of traffic accident is roughly 3% of gross national product. However, agree that this rate is higher in
India since many traffic accidents are not recorded, for example single vehicle accidents or some
accidents without injury or fatality.
In this study, using fuzzy logic method, which has increasing usage area in Intelligent
Transportation Systems (ITS), a model was developed which would obtain to prevent the vehicle pursuit
distance automatically. Using velocity of vehicle and pursuit distance that can be measured with a
sensor on vehicle a model has been established to brake pedal (slowing down) by fuzzy logic.
9.4.1 Traffic Accidents And Traffic Safety
The general goal of traffic safety policy is to eliminate the number of deaths and casualties in traffic.
This goal forms the background for the present traffic safety program. The program is partly based on
the assumption that high speed contributes to accidents. Many researchers support the idea of a positive
correlation between speed and traffic accidents. One way to reduce the number of accidents is to reduce
average speeds. Speed reduction can be accomplished by police surveillance, but also through physical
obstacles on the roads. Obstacles such as flower pots, road humps, small circulation points and elevated
pedestrian crossings are frequently found in many residential areas around India. However, physical
measures are not always appreciated by drivers. These obstacles can cause damages to cars, they can
cause difficulties for emergency vehicles, and in winter these obstacles can reduce access for snow
clearing vehicles. An alternative to these physical measures is different applications of Intelligent
Transportation Systems (ITS). The major objectives with ITS are to achieve traffic efficiency, by for
instance redirecting traffic, and to increase safety for drivers, pedestrians, cyclists and other traffic
groups.
One important aspect when planning and implementing traffic safety programs is therefore drivers
acceptance of different safety measures aimed at speed reduction. Another aspect is whether the
individuals acceptance, when there is a certain degree of freedom of choice, might also be reflected in
a higher acceptance of other measures, and whether acceptance of safety measures is also reflected in
their perception of road traffic, and might reduce dangerous behaviour in traffic.
FUZZY LOGIC APPLICATIONS 97
9.4.2 Fuzzy Logic Approach
The basic elements of each fuzzy logic system are, as shown in Figure 9.1, rules, fuzzifier, inference
engine, and defuzzifier. Input data are most often crisp values. The task of the fuzzifier is to map crisp
numbers into fuzzy sets (cases are also encountered where inputs are fuzzy variables described by fuzzy
membership functions). Models based on fuzzy logic consist of If-Then rules. A typical If-Then
rule would be:
If the ratio between the flow intensity and capacity of an arterial road is SMALL
Then vehicle speed in the flow is BIG
The fact following If is called a premise or hypothesis or antecedent. Based on this fact we can
infer another fact that is called a conclusion or consequent (the fact following Then). A set of a large
number of rules of the type:
If premise
Then conclusion is called a fuzzy rule base.
Fig. 9.1 Basic elements of a fuzzy logic.
In fuzzy rule-based systems, the rule base is formed with the assistance of human experts; recently,
numerical data has been used as well as through a combination of numerical data-human experts. An
interesting case appears when a combination of numerical information obtained from measurements
and linguistic information obtained from human experts is used to form the fuzzy rule base. In this case,
rules are extracted from numerical data in the first step. In the next step this fuzzy rule base can (but
need not) be supplemented with the rules collected from human experts. The inference engine of the
fuzzy logic maps fuzzy sets onto fuzzy sets. A large number of different inferential procedures are found
in the literature. In most papers and practical engineering applications, minimum inference or product
inference is used. During defuzzification, one value is chosen for the output variable. The literature also
contains a large number of different defuzzification procedures. The final value chosen is most often
either the value corresponding to the highest grade of membership or the coordinate of the center of
gravity.
9.4.3 Application
In the study, a model was established which estimates brake rate using fuzzy logic. The general
structure of the model is shown in Fig. 9.2.
Fuzzifier Defuzzifier
Rules Inference
Input Crips output
98 FUZZY LOGIC AND NEURAL NETWORKS
9.4.4 Membership Functions
In the established model, different membership functions were formed for speed, distance and brake
rate. Membership functions are given in Figures 9.3, 9.4, and 9.5. For maximum allowable car speed (in
motorways) in India, speed scale selected as 0-120 km/h on its membership function. Because of the
fact that current distance sensors perceive approximately 100-150 m distance, distance membership
function is used 0-150 m scale. Brake rate membership function is used 0-100 scale for expressing
percent type.
Fig. 9.2 General structure of fuzzy logic model.
Low Medium High
1
0.5
0
0 20 40 60 80 100
120
Fig. 9.3 Membership function of speed.
Low Medium High
1
0.5
0
0 50 100 150
Fig. 9.4 Membership function of distance.
Brake rate
Speed
Distance
Rule base
FUZZY LOGIC APPLICATIONS 99
9.4.5 Rule Base
We need a rule base to run the fuzzy model. Fuzzy Allocation Map (rules) of the model was constituted
for membership functions whose figures are given on Table-9.1. It is important that the rules were not
completely written for all probability. Figure 6 shows that the relationship between inputs, speed and
distance, and brake rate.
Table 9.1: Fuzzy allocation map of the model
Speed Distance Brake rate
LOW LOW LOW
LOW MEDIUM LOW
LOW HIGH MEDIUM
MEDIUM LOW MEDIUM
MEDIUM MEDIUM LOW
MEDIUM HIGH LOW
HIGH LOW HIGH
HIGH MEDIUM MEDIUM
HIGH HIGH LOW
9.4.6 Output
Fuzzy logic is also an estimation algorithm. For this model, various alternatives are able to cross-
examine using the developed model. Fig. 9.6 is an example for such the case.
9.4.7 Conclusions
Many people die or injure because of traffic accidents in India. Many reasons can contribute these
results for example mainly driver fault, lack of infrastructure, environment, weather conditions etc. In
this study, a model was established for estimation of brake rate using fuzzy logic approach. Car brake
rate is estimated using the developed model from speed and distance data. So, it can be said that this
fuzzy logic approach can be effectively used for reduce to traffic accident rate. This model can be
adapted to vehicles.
Low Medium High
1
0.5
0
0 10 20 30 40 50 60 70 80 90 100
Fig. 9.5 Membership function of brake rate.
100 FUZZY LOGIC AND NEURAL NETWORKS
9.5 FUZZY LOGI C MODEL TO CONTROL ROOM TEMPERATURE
Although the behaviour of complex or nonlinear systems is difficult or impossible to describe using
numerical models, quantitative observations are often required to make quantitative control decisions.
These decisions could be the determination of a flow rate for a chemical process or a drug dosage in
medical practice. The form of the control model also determines the appropriate level of precision in the
result obtained. Numerical models provide high precision, but the complexity or non-linearity of a
process may make a numerical model unfeasible. In these cases, linguistic models provide an
alternative. Here the process is described in common language.
The linguistic model is built from a set of if-then rules, which describe the control model. Although
Zadeh was attempting to model human activities, Mamdani showed that fuzzy logic could be used to
develop operational automatic control systems.
9.5.1 The Mechanics of Fuzzy Logic
The mechanics of fuzzy mathematics involve the manipulation of fuzzy variables through a set of
linguistic equations, which can take the form of ifthen rules. Much of the fuzzy literature uses set
theory notation, which obscures the ease of the formulation of a fuzzy controller. Although the
controllers are simple to construct, the proof of stability and other validations remain important topics.
The outline of fuzzy operations will be shown here through the design of a familiar room thermostat.
A fuzzy variable is one of the parameters of a fuzzy model, which can take one or more fuzzy
values, each represented by a fuzzy set and a word descriptor. The room temperature is the variable
shown in Fig. 9.7. Three fuzzy sets: hot, cold and comfortable have been defined by membership
distributions over a range of actual temperatures.
The power of a fuzzy model is the overlap between the fuzzy values. A single temperature value at
an instant in time can be a member of both of the overlapping sets. In conventional set theory, an object
(in this case a temperature value) is either a member of a set or it is not a member. This implies a crisp
80
60
40
20
0
50
100
150
100
50
0
Speed
Distance
B
r
a
k
e
r
a
t
e
Fig. 9.6 Relationship between inputs and brake rate.
FUZZY LOGIC APPLICATIONS 101
boundary between the sets. In fuzzy logic, the boundaries between sets are blurred. In the overlap
region, an object can be a partial member of each of the overlapping sets. The blurred set boundaries
give fuzzy logic its name. By admitting multiple possibilities in the model, the linguistic imprecision is
taken into account.
The membership functions defining the three fuzzy sets shown in Fig. 9.7 are triangular. There are
no constraints on the specification of the form of the membership distribution. The Gaussian form from
statistics has been used, but the triangular form is commonly chosen, as its computation is simple. The
number of values and the range of actual values covered by each one are also arbitrary. Finer resolution
is possible with additional sets, but the computation cost increases.
Guidance for these choices is provided by Zadehs Principle of Incompatibility: As the complexity
of a system increases, our ability to make precise and yet significant statements about its behaviour
diminishes until a threshold is reached beyond which precision and significance (or relevance) become
almost mutually exclusive characteristics.
The operation of a fuzzy controller proceeds in three steps. The first is fuzzification, where
measurements are converted into memberships in the fuzzy sets. The second step is the application of
the linguistic model, usually in the form of if-then rules. Finally the resulting fuzzy output is converted
back into physical values through a defuzzfication process.
9.5.2 Fuzzification
For a single measured value, the fuzzification process is simple, as shown in Fig. 9.7. The membership
functions are used to calculate the memberships in all of the fuzzy sets. Thus, a temperature of 15C
becomes three fuzzy values, 0.66 cold, 0.33 comfortable and 0.00 hot.
Fig. 9.7 Room temperature.
1.2
Hot Comfortable Cold
0.67
0.33
1.0
0.8
0.6
0.4
0.2
0.0
0 5 10 15 20 25 30 35 40 45 50
Temperature (Degrees C)
M
e
m
b
e
r
s
h
i
p
v
a
l
u
e
102 FUZZY LOGIC AND NEURAL NETWORKS
A series of measurements are collected in the form of a histogram and use this as the fuzzy input as
shown in Fig. 9.8. The fuzzy inference is extended to include the uncertainty due to measurement error
as well as the vagueness in the linguistic descriptions. In Fig. 9.8 the measurement data histogram is
normalized so that its peak is a membership value of 1.0 and it can be used as a fuzzy set. The
membership of the histogram in cold is given by: max {min [m
cold
(T), m
histogram
(T)]} where the
maximum and minimum operations are taken using the membership values at each point T over the
temperature range of the two distributions.
The minimum operation yields the overlap region of the two sets and the maximum operation gives
the highest membership in the overlap. The membership of the histogram in cold, indicated by the
arrow in Fig. 9.8, is 0.73. By similar operations, the membership of the histogram in comfortable and
hot are 0.40 and 0.00. It is interesting to note that there is no requirement that the sum of all
memberships be 1.00.
9.5.3 Rule Application
The linguistic model of a process is commonly made of a series of if - then rules. These use the
measured state of the process, the rule antecedents, to estimate the extent of control action, the rule
consequents. Although each rule is simple, there must be a rule to cover every possible combination of
fuzzy input values. Thus, the simplicity of the rules trades off against the number of rules. For complex
systems the number of rules required may be very large.
The rules needed to describe a process are often obtained through consultation with workers who
have expert knowledge of the process operation. These experts include the process designers, but more
Fig. 9.8 Fuzzification with measurement noise.
1.2
1.0
0.8
Hot Comfortable
Cold
0.6
0.4
0.2
0.0
0 5 10 15 20 25 30 35 40 45 50
Temperature (Degrees C)
M
e
m
b
e
r
s
h
i
p
v
a
l
u
e
FUZZY LOGIC APPLICATIONS 103
importantly, the process operators. The rules can include both the normal operation of the process as
well as the experience obtained through upsets and other abnormal conditions. Exception handling is a
particular strength of fuzzy control systems.
For very complex systems, the experts may not be able to identify their thought processes in
sufficient detail for rule creation. Rules may also be generated from operating data by searching for
clusters in the input data space. A simple temperature control model can be constructed from the
example of Fig. 9.7:
Rule 1 : IF (Temperature is Cold) THEN (Heater is On)
Rule 2 : IF (Temperature is Comfortable) THEN (Heater is Off)
Rule 3 : IF (Temperature is Hot) THEN (Heater is Off)
In Rule 1, (Temperature is Cold) is the membership value of the actual temperature in the cold set.
Rule 1 transfers the 0.66 membership in cold to become 0.66 membership in the heater setting on.
Similar values from rules 2 and 3 are 0.33 and 0.00 in the off setting for the heater. When several rules
give membership values for the same output set, Mamdani used the maximum of the membership
values. The result for the three rules is then 0.66 membership in on and 0.33 membership in off.
The rules presented in the above example are simple yet effective. To extend these to more complex
control models, compound rules may be formulated. For example, if humidity was to be included in the
room temperature control example, rules of the form: IF (Temperature is Cold) AND (Humidity is High)
THEN (Heater is ON) might be used. Zadeh defined the logical operators as AND = Min (m
A
, m
B
) and
OR = Max (m
A
, m
B
), where m
A
and m
B
are membership values in sets A and B respectively. In the above
rule, the membership in on will be the minimum of the two antecedent membership values. Zadeh also
defined the NOT operator by assuming that complete membership in the set A is given by m
A
= 1. The
membership in NOT (A) is then given by mNOT (A) = 1 m
A
. This gives the interesting result that A
AND NOT (A) does not vanish, but gives a distribution corresponding to the overlap between A and its
adjacent sets.
9.5.4 Defuzzification
The results of rule application are membership values in each of the consequent or output sets. These
can be used directly where the membership values are viewed as the strength of the recommendations
provided by the rules. It is possible that several outputs are recommended and some may be
contradictory (e.g. heater on and heater off). In automatic control, one physical value of a controller
output must be chosen from multiple recommendations. In decision support systems, there must be a
consistent method to resolve conflict and define an appropriate compromise. Defuzzification is the
process for converting fuzzy output values to a single value or final decision. Two methods are
commonly used.
The first is the maximum membership method. All of the output membership functions are
combined using the OR operator and the position of the highest membership value in the range of the
output variable is used as the controller output. This method fails when there are two or more equal
maximum membership values for different recommendations. Here the method becomes indecisive and
does not produce a satisfactory result.
104 FUZZY LOGIC AND NEURAL NETWORKS
The second method uses the center of gravity of the combined output distribution to resolve this
potential conflict and to consider all recommendations based on the strengths of their membership
values. The center of gravity is given by X
F
=
x x dx
x dx
( )
( )
z
z
where x is a point in the output range and X
F
is the final control value. These integrals are taken over the entire range of the output. By taking the
center of gravity, conflicting rules essentially cancel and a fair weighting is obtained.
The output values used in the thermostat example are singletons. Singletons are fuzzy values with a
membership of 1.00 at a single value rather than a membership function between 0 and 1 defined over
an interval of values. In the example there were two, off at 0% power and on at 100% power. With
singletons, the center of gravity equation integrals become a simple weighted average. Applying the
rules gave m
ON
= 0.67 and m
OFF
= 0.33. Defuzzifying these gives a control output of 67% power.
Although only two singleton output functions were used, with center of gravity defuzzification, the
heater power decreases smoothly between fully on and fully off as the temperature increases between
10C and 25C.
In the histogram input case, applying the same rules gave m
ON
= 0.73 and m
OFF
= 0.40. Center of
gravity defuzzification gave, in this case, a heater power of 65%. The sum of the membership functions
was normalized by the denominator of the center of gravity calculation.
9.5.5 Conclusions
Linguistic descriptions in the form of membership functions and rules make up the model. The rules are
generated a priori from expert knowledge or from data through system identification methods. Input
membership functions are based on estimates of the vagueness of the descriptors used. Output
membership functions can be initially set, but can be revised for controller tuning.
Once these are defined, the operating procedures for the calculations are well set out. Measurement
data are converted to memberships through fuzzification procedures. The rules are applied using
formalized operations to yield memberships in output sets. Finally, these are combined through
defuzzification to give a final control output.
9.6 FUZZY LOGI C MODEL FOR GRADI NG OF APPLES
Agricultural produce is subject to quality inspection for optimum evaluation in the consumption cycle.
Efforts to develop automated fruit classification systems have been increasing recently due to the
drawbacks of manual grading such as subjectivity, tediousness, labor requirements, availability, cost
and inconsistency.
However, applying automation in agriculture is not as simple as automating the industrial
operations. There are two main differences. First, the agricultural environment is highly variable, in
terms of weather, soil, etc. Second, biological materials, such as plants and commodities, display high
variation due to their inherent morphological diversity. Techniques used in industrial applications, such
as template matching and fixed object modeling are unlikely to produce satisfactory results in the
classification or control of input from agricultural products. Therefore, self-learning techniques such as
neural networks (NN) and fuzzy logic (FL) seem to represent a good approach.
FUZZY LOGIC APPLICATIONS 105
Fuzzy logic can handle uncertainty, ambiguity and vagueness. It provides a means of translating
qualitative and imprecise information into quantitative (linguistic) terms. Fuzzy logic is a non-
parametric classification procedure, which can infer with nonlinear relations between input and output
categories, maintaining flexibility in making decisions even on complex biological systems.
Fuzzy logic was successfully used to determine field trafficability, to decide the transfer of dairy
cows between feeding groups, to predict the yield for precision farming, to control the start-up and shut-
down of food extrusion processes, to steer a sprayer automatically, to predict corn breakage, to manage
crop production, to reduce grain losses from a combine, to manage a food supply and to predict peanut
maturity.
The main purpose of this study was to investigate the applicability of fuzzy logic to constructing
and tuning fuzzy membership functions and to compare the accuracies of predictions of apple quality by
a human expert and the proposed fuzzy logic model. Grading of apples was performed in terms of
characteristics such as color, external defects, shape, weight and size. Readings of these properties were
obtained from different measurement apparatuses, assuming that the same measurements can be done
using a sensor fusion system in which measurements of features are collected and controlled
automatically. The following objectives were included in this study:
1. To design a FL technique to classify apples according to their external features developing
effective fuzzy membership functions and fuzzy rules for input and output variables based on
quality standards and expert expectations.
2. To compare the classification results from the FL approach and from sensory evaluation by a
human expert.
3. To establish a multi-sensor measuring system for quality features in the long term.
9.6.1 Apple Defects Used in the Study
No defect formation practices by applying forces on apples were performed. Only defects occurring
naturally or forcedly on apple surfaces during the growing season and handling operations were
accounted for in terms of number and size, ignoring their age. Scars, bitter pit, leaf roller, russeting,
punctures and bruises were among the defects encountered on the surfaces of Golden Delicious apples.
In addition to these defects, a size defect (lopsidedness) was also measured by taking the ratio of
maximum height of the apple to the minimum height.
9.6.2 Materials and Methods
Five quality features, color, defect, shape, weight and size, were measured. Color was measured using a
CR-200 Minolta colorimeter in the domain of L, a and b, where L is the lightness factor and a and b are
the chromaticity coordinates. Sizes of surface defects (natural and bruises) on apples were determined
using a special figure template, which consisted of a number of holes of different diameters. Size defects
were determined measuring the maximum and minimum heights of apples using a Mitutoya electronic
caliper. Maximum circumference measurement was performed using a Cranton circumference
measuring device. Weight was measured using an electronic scale. Programming for fuzzy membership
functions, fuzzification and defuzzification was done in Matlab.
106 FUZZY LOGIC AND NEURAL NETWORKS
The number of apples used was determined based on the availability of apples with quality features
of the 3 quality groups (bad, medium and good). A total of 181 golden delicious apples were graded first
by a human expert and then by the proposed fuzzy logic approach. The expert was trained on the
external quality criteria for good, medium and bad apple groups defined by USDA standards (USDA,
1976). The USDA standards for apple quality explicitly define the quality criteria so that it is quite
straightforward for an expert to follow up and apply them. Extremely large or small apples were already
excluded by the handling personnel. Eighty of the apples were kept at room temperature for 4 days
while another 80 were kept in a cooler (at about 3C) for the same period to create color variation on the
surfaces of apples. In addition, 21 of the apples were harvested before the others and kept for 15 days at
room temperature for the same purpose of creating a variation in the appearance of the apples to be
tested.
The Hue angle (tan
-1
(b/a)), which was used to represent the color of apples, was shown to be the
best representation of human recognition of color. To simplify the problem, defects were collected
under a single numerical value, defect after normalizing each defect component such as bruises,
natural defects, russetting and size defects (lopsidedness).
Defect = 10 B + 5 ND + 3 R + 0.3 SD ...(9.1)
where B is the amount of bruising, ND is the amount of natural defects, such as scars and leaf roller, as
total area (normalized), R is the total area of russeting defect (normalized) and SD is the normalized size
defect. Similarly, circumference, blush (reddish spots on the cheek of an apple) percentage and weight
were combined under Size using the same procedure as with Defect
Size = 5 C + 3 W + 5 BL ...(9.2)
where C is the circumference of the apple (normalized), W is weight (normalized) and BL is the
normalized blush percentage. Coefficients used in the above equations were subjectively selected,
based on the experts expectations and USDA standards (USDA, 1976).
Although it was measured at the beginning, firmness was excluded from the evaluation, as it was
difficult for the human expert to quantify it nondestructively. After the combinations of features given
in the above equations, input variables were reduced to 3 defect, size and color. Along with the
measurements of features, the apples were graded by the human expert into three quality groups, bad,
medium and good, depending on the experts experience, expectations and USDA standards (USDA,
1976). Fuzzy logic techniques were applied to classify apples after measuring the quality features. The
grading performance of fuzzy logic proposed was determined by comparing the classification results
from FL and the expert.
9.6.3 Application of Fuzzy Logic
Three main operations were applied in the fuzzy logic decision making process: selection of fuzzy
inputs and outputs, formation of fuzzy rules, and fuzzy inference. A trial and error approach was used to
develop membership functions. Although triangular and trapezoidal functions were used in establishing
membership functions for defects and color (Fig. 9.9 and 9.10), an exponential function with the base
of the irrational number e was used to simulate the inclination of the human expert in grading apples in
terms of size (Fig. 9.11).
FUZZY LOGIC APPLICATIONS 107
Fig. 9.9 Membership functions for the defect feature.
Yellow
1
90 95 100 104.5 106 114 116 117
Greenish-yellow Green
Hue values
Fig. 9.10 Membership functions for the color feature.
Fig. 9.11 Membership functions for the size feature.
Size = e
x
...(9.3)
where e is approximately 2.71828 and x is the value of size feature.
Small
11.27 11.15 8.05 7.80 7.10 6.13 6.05
Medium Big
Size
1
Low Medium High
0.2 1.1 1.7 2.0 2.4 4.5 7.6
1
Defects
108 FUZZY LOGIC AND NEURAL NETWORKS
9.6.4 Fuzzy Rules
At this stage, human linguistic expressions were involved in fuzzy rules. The rules used in the
evaluations of apple quality are given in Table 9.2. Two of the rules used to evaluate the quality of
Golden Delicious apples are given below:
If the color is greenish, there is no defect, and it is a well formed large apple, then quality is very
good (rule Q
1
,
1
in Table 9.2).
Table 9.2: Fuzzy rule tabulation
C
1
+ S
1
C
1
+ S
2
C
1
+ S
3
C
2
+ S
1
C
2
+ S
2
C
2
+ S
3
C
3
+ S
1
C
2
+ S
2
C
3
+

S
3
D
1
Q
1,1
Q
1,2
Q
2,3
Q
1,3
Q
2,5
Q
3,8
Q
2,6
Q
2,7
Q
3,15
D
2
Q
2,1
Q
2,2
Q
3,3
Q
2,4
Q
3,6
Q
3,9
Q
3,11
Q
3,13
Q
3,16
D
3
Q
3,1
Q
3,2
Q
3,4
Q
3,5
Q
3,7
Q
3,10
Q
3,12
Q
3,14
Q
3,17
Where, C
1
is the greenish color quality (desired), C
2
is greenish-yellow color quality medium), and C
3
is yellow color
quality (bad); S
1
, on the other hand, is well formed size (desired), S
2
is moderately formed size (medium), S
3
is badly
formed size (bad). Finally, D
1
represents a low amount of defects (desired), while D
2
and D
3
represent moderate
(medium) and high (bad) amounts of defects, respectively. For quality groups represented with Q in Table 1, the first
subscript 1 stands for the best quality group, while 2 and 3 stand for the moderate and bad quality groups, respectively.
The second subscript of Q shows the number of rules for the particular quality group, which ranges from 1 to 17 for the
bad quality group.
If the color is pure yellow (overripe), there are a lot of defects, and it is a badly formed (small)
apple, then quality is very bad (rule Q
3
,
17
in Table 9.2).
A fuzzy set is defined by the expression below:
D = {X. m
0
(x))| x X} ...(9.4)
m
0
(x): [0, 1]
where X represents the universal set, D is a fuzzy subset in X and D(x) is the membership function of
fuzzy set D. Degree of membership for any set ranges from 0 to1. A value of 1.0 represents a 100%
membership while a value of 0 means 0% membership. If there are three subgroups of size, then three
memberships are required to express the size values in a fuzzy rule.
Three primary set operations in fuzzy logic are AND, OR, and the Complement, which are given as
follows
AND: m
C
m
D
= min {m
C
, m
D
} ...(9.5)
OR: m
C
m
D
= (m
C
m
D
) = max {m
C
, m
D
} ...(9.6)
complement =
C
= 1 m
D
...(9.7)
The minimum method given by equation (9.5) was used to combine the membership degrees from
each rule established. The minimum method chooses the most certain output among all the membership
degrees. An example of the fuzzy AND (the minimum method) used in if-then rules to form the Q
11
quality group in Table 9.2 is given as follows;
FUZZY LOGIC APPLICATIONS 109
Q
11
= (C
1
S
1
D
1
) = min {C
1
, S
1
, D
1
} ...(9.8)
On the other hand, the fuzzy OR (the maximum method) rule was used in evaluating the results of
the fuzzy rules given in Table 9.2; determination of the quality group that an apple would belong to, for
instance, was done by calculating the most likely membership degree using equations 9.9 through 9.13.
If,
k
1
= ( , , )
, , ,
Q Q Q
11 1 2 1 3
...(9.9)
k
2
= ( , , , , )
, , , , , ,
Q Q Q Q Q Q
2 1 2 2 2 3 2 4 2 5 2 6
...(9.10)
k
3
= ( , , , ,
, , , , ,
Q Q Q Q Q
3 1 3 2 3 3 3 4 3 5
Q Q Q Q Q Q Q Q Q Q Q Q
3 6 3 7 3 8 3 9 3 10 3 11 3 12 3 13 3 14 3 15 3 16 3 17 , , , , , , , , , , , ,
, , , , , , , , , , , ) ...(9.11)
where k is the quality output group that contains different class membership degrees and the output
vector y given in equation 10 below determines the probabilities of belonging to a quality group for an
input sample before defuzzification:
y = [max (k
1
) max (k
2
) max (k
3
)] ...(9.12)
where, for example,
max (k
1
) = (Q
1, 1
Q
1, 2
Q
1, 3
) = max {Q
1, 1,
Q
1, 2
, Q
1, 3
} ...(9.13)
then, equation 11 produces the membership degree for the best class (Lee, 1990).
9.6.5 Determination of Membership Functions
Membership functions are in general developed by using intuition and qualitative assessment of the
relations between the input variable(s) and output classes. In the existence of more than one
membership function that is actually in the nature of the fuzzy logic approach, the challenge is to assign
input data into one or more of the overlapping membership functions. These functions can be defined
either by linguistic terms or numerical ranges, or both. The membership function used in this study for
defect quality in general is given in equation 9.4. The membership function for high amounts of defects,
for instance, was formed as given below:
If the input vector x is given as x = [defects, size, color], then the membership function for the class
of a high amount of defects (D
3
) is
m(D
3
) = 0, when x (1) < 1.75
m(D
3
) =
( ( ) . )
.
x 1 175
2 77
, when 1.75 x(1) 4.52 or ...(9.14)
m(D
3
) = 1, when x(1) 4.52
For a medium amount of defects (D
2
), the membership function is
m(D
2
) = 0, when defect innput x (1) < 0.24 or x (1) > 7.6
m(D
2
) =
( ( ) . )
.
x 1 0 24
1 76
, when 0.24 x (1) 2 ...(9.15)
112 FUZZY LOGIC AND NEURAL NETWORKS
9.6.8 Conclusion
Fuzzy logic was successfully applied to serve as a decision support technique in grading apples. Grading
results obtained from fuzzy logic showed a good general agreement with the results from the human
expert, providing good flexibility in reflecting the experts expectations and grading standards into the
results. It was also seen that color, defects and size are three important criteria in apple classification.
However, variables such as firmness, internal defects and some other sensory evaluations, in addition to
the features mentioned earlier, could increase the efficiency of decisions made regarding apple quality.
9.7 AN I NTRODUCTORY EXAMPLE: FUZZY V/ S NON-FUZZY
To illustrate the value of fuzzy logic, fuzzy and non-fuzzy approaches are applied to the same problem.
First the problem is solved using the conventional (non-fuzzy) method, writing MATLAB commands
that spell out linear and piecewise-linear relations. Then, the same system is solved using fuzzy logic.
Consider the tipping problem: what is the right amount to tip your waitperson? Given a number
between 0 and 10 that represents the quality of service at a restaurant (where 10 is excellent), what
should the tip be?
This problem is based on tipping as it is typically practiced in the United States. An average tip for
a meal in the U.S. is 15%, though the actual amount may vary depending on the quality of the service
provided.
9.7.1 The Non-Fuzzy Approach
Lets start with the simplest possible relationship (Fig. 9.13). Suppose that the tip always equals 15% of
the total bill.
tip = 0.15
0.25
0.15
0.05
0.2
0.1
T
i
p
Service
0
0 2 4 6 8 10
Fig. 9.13 Constant tipping.
FUZZY LOGIC APPLICATIONS 113
This does not really take into account the quality of the service, so we need to add a new term to the
equation. Since service is rated on a scale of 0 to 10, we might have the tip go linearly from 5% if the
service is bad to 25% if the service is excellent (Fig. 9.14). Now our relation looks like this:
tip = 0.20/10 * service + 0.05
0.25
0.15
0.05
0.2
0.1
T
i
p
Service
0 2 4 6 8 10
Fig. 9. 14 Linear tipping.
The formula does what we want it to do, and it is pretty straight forward. However, we may want
the tip to reflect the quality of the food as well. This extension of the problem is defined as follows:
Given two sets of numbers between 0 and 10 (where 10 is excellent) that respectively represent the
quality of the service and the quality of the food at a restaurant, what should the tip be? Lets see how the
formula will be affected now that weve added another variable (Fig. 9.15). Suppose we try:
tip = 0.20/20 (service + food) + 0.05
10
5
0 0
5
10
0.05
0.1
0.15
0.2
0.25
Food
Service
T
i
p
Fig. 9.15 Tipping depend on service and quality of food.
114 FUZZY LOGIC AND NEURAL NETWORKS
In this case, the results look pretty, but when you look at them closely, they do not seem quite right.
Suppose you want the service to be a more important factor than the food quality. Lets say that the
service will account for 80% of the overall tipping grade and the food will make up the other 20%.
Try:
servRatio = 0.8;
tip= servRatio (0.20/10 service + 0.05) + (1 servRatio) (0.20/10 food + 0.05);
The response is still somehow too uniformly linear. Suppose you want more of a flat response in the
middle, i.e., you want to give a 15% tip in general, and will depart from this plateau only if the service
is exceptionally good or bad (Fig. 9.16).
10
5
0 0
5
10
0.05
0.1
0.15
0.2
0.25
Food
Service
T
i
p
Fig. 9.16 Tipping based on the service to be a more important factor than the food quality.
This, in turn, means that those nice linear mappings no longer apply. We can still salvage things by
using a piecewise linear construction (Fig. 9.17). Lets return to the one-dimensional problem of just
considering the service. You can string together a simple conditional statement using breakpoints like
this:
if service < 3,
tip = (0.10/3) service + 0.05;
else if service < 7 ,
tip = 0.15;
else if service < =10,
tip = (0.10/3) (service 7) + 0.15;
end
FUZZY LOGIC APPLICATIONS 115
If we extend this to two dimensions (Fig. 9.18), where we take food into account again, something
like this result:
servRatio = 0.8;
if service < 3,
tip = ((0.10/3) service + 0.05) servRatio + (1 servRatio) (0.20/10 food + 0.05);
else if service < 7,
tip = (0.15) servRatio + (1 servRatio) (0.20/10 food + 0.05);
else,
tip = ((0.10/3) (service 7) + 0.15) servRatio + (1 servRatio) (0.20/10 food + 0.05);
end
0.25
0.2
0.15
0.1
0.05
0 2 4 6 8 10
Service
T
i
p
Fig. 9. 17 Tipping using a piecewise linear construction.
10
5
0 0
5
10
0.05
0.1
0.15
0.2
0.25
Food
Service
T
i
p
Fig. 9.18 Tipping with two-dimensional variation.
116 FUZZY LOGIC AND NEURAL NETWORKS
The plot looks good, but the function is surprisingly complicated. It was a little tricky to code this
correctly, and it is definitely not easy to modify this code in the future. Moreover, it is even less apparent
how the algorithm works to someone who did not witness the original design process.
9.7.2 The Fuzzy Approach
It would be nice if we could just capture the essentials of this problem, leaving aside all the factors that
could be arbitrary. If we make a list of what really matters in this problem, we might end up with the
following rule descriptions:
1. If service is poor, then tip is cheap
2. If service is good, then tip is average
3. If service is excellent, then tip is generous
The order in which the rules are presented here is arbitrary. It does not matter which rules come
first. If we wanted to include the foods effect on the tip, we might add the following two rules:
4. If food is rancid, then tip is cheap
5. If food is delicious, then tip is generous
In fact, we can combine the two different lists of rules into one tight list of three rules like so:
1. If service is poor or the food is rancid, then tip is cheap
2. If service is good, then tip is average
3. If service is excellent or food is delicious, then tip is generous
These three rules are the core of our solution. And coincidentally, we have just defined the rules for
a fuzzy logic system. Now if we give mathematical meaning to the linguistic variables (what is an
average tip, for example?) we would have a complete fuzzy inference system. Of course, theres a lot
left to the methodology of fuzzy logic that were not mentioning right now, things like:
How are the rules all combined?
How do I define mathematically what an average tip is?
The details of the method do not really change much from problem to problem - the mechanics of
fuzzy logic are not terribly complex. What matters is what we have shown in this preliminary
exposition: fuzzy is adaptable, simple, and easily applied.
Fig. 9.19 Tipping using fuzzy logic.
10
5
0 0
5
10
0.05
0.1
0.15
0.2
0.25
Food
Service
T
i
p
FUZZY LOGIC APPLICATIONS 117
Here is the picture associated with the fuzzy system that solves this problem (Fig. 9.19). The
picture above was generated by the three rules above.
9.7.3 Some Observations
Here are some observations about the example so far. We found a piecewise linear relation that solved
the problem. It worked, but it was something of a nuisance to derive, and once we wrote it down as code,
it was not very easy to interpret. On the other hand, the fuzzy system is based on some common sense
statements. Also, we were able to add two more rules to the bottom of the list that influenced the shape
of the overall output without needing to undo what had already been done. In other words, the
subsequent modification was pretty easy.
Moreover, by using fuzzy logic rules, the maintenance of the structure of the algorithm decouples
along fairly clean lines. The notion of an average tip might change from day to day, city to city, country
to country, but the underlying logic the same: if the service is good, the tip should be average. You can
recalibrate the method quickly by simply shifting the fuzzy set that defines average without rewriting
the fuzzy rules.
You can do this sort of thing with lists of piecewise linear functions, but there is a greater likelihood
that recalibration will not be so quick and simple. For example, here is the piecewise linear tipping
problem slightly rewritten to make it more generic. It performs the same function as before, only now
the constants can be easily changed.
% Establish constants
lowTip=0.05; averTip=0.15; highTip=0.25;
tipRange=highTiplowTip;
badService=0; okayService=3;
goodService=7; greatService=10;
serviceRange=greatServicebadService;
badFood=0; greatFood=10;
foodRange=greatFoodbadFood;
% If service is poor or food is rancid, tip is cheap
if service<okayService,
tip=(((averTiplowTip)/(okayServicebadService)) ...
*service+lowTip)*servRatio + ...
(1servRatio)*(tipRange/foodRange*food+lowTip);
% If service is good, tip is average
elseif service<goodService,
tip=averTip*servRatio + (1servRatio)* ...
(tipRange/foodRange*food+lowTip);
118 FUZZY LOGIC AND NEURAL NETWORKS
% If service is excellent or food is delicious, tip is generous
else,
tip=(((highTipaverTip)/ ...
(greatServicegoodService))* ...
(servicegoodService)+averTip)*servRatio + ...
(1servRatio)*(tipRange/foodRange*food+lowTip);
end
Notice the tendency here, as with all code, for creeping generality to render the algorithm more and
more opaque, threatening eventually to obscure it completely. What we are doing here is not that
complicated. True, we can fight this tendency to be obscure by adding still more comments, or perhaps
by trying to rewrite it in slightly more self-evident ways, but the medium is not on our side.
The truly fascinating thing to notice is that if we remove everything except for three comments,
what remain are exactly the fuzzy rules we wrote down before:
% If service is poor or food is rancid, tip is cheap
% If service is good, tip is average
% If service is excellent or food is delicious, tip is generous
If, as with a fuzzy system, the comment is identical with the code, think how much more likely your
code is to have comments! Fuzzy logic lets the language thats clearest to you, high level comments,
also have meaning to the machine, which is why it is a very successful technique for bridging the gap
between people and machines.
QUESTI ON BANK.
1. Why use fuzzy logic?
2. What are the applications of fuzzy logic?
3. When not use fuzzy logic?
4. Compare non-fuzzy logic and fuzzy logic approaches.
REFERENCES.
1. L.A. Zadeh, Fuzzy sets, Information and Control, Vol. 8, pp. 338-353, 1965.
2. USDA Agricultural Marketing Service, United States Standards for Grades of Apples,
Washington, D.C., 1976.
3. W.J.M. Kickert and H.R. Van Nauta Lemke, Application of a fuzzy controller in a warm water
plat, Automatica, Vol. 12, No. 4, pp. 301-308, 1976.
4. C.P. Pappis and E.H. Mamdani, A fuzzy logic controller for a traffic junction, IEEE Transactions
on Systems, Man and Cybernetics, Vol. 7, No. 10, pp. 707-717, 1977.
FUZZY LOGIC APPLICATIONS 119
5. M. Sugeno and M. Nishida, Fuzzy control of model car, Fuzzy Sets and Systems, Vol. 16, No. 2,
pp. 103-113, 1985.
6. B.P. Graham and R.B. Newell, Fuzzy identification and control of a liquid level rig, Fuzzy Sets
and Systems, Vol. 26, No. 3, pp. 255-273, 1988.
7. E. Czogala and T. Rawlik, Modeling of a fuzzy controller with application to the control of
biological processes, Fuzzy Sets and Systems, Vol. 31, No. 1, pp. 13-22, 1989.
8. C.C. Lee, Fuzzy logic in control systems: Fuzzy logic controller- Part I and Part II, IEEE
Transactions on Systems, Man and Cybernetics, 20: 404-435, 1990.
9. S. Thangavadivelu and T.S. Colvin, Trafficability determination using fuzzy set theory,
Transactions of the ASAE, Vol. 34, No. 5, pp. 2272- 2278, 1991.
10. T. Tobi and T. Hanafusa, A practical application of fuzzy control for an air-conditioning system,
International Journal of Approximate Reasoning, Vol. 5, No. 3, pp. 331-348, 1991.
11. U. Ben-Hannan, K. Peleg and P.O. Gutman, Classification of fruits by a Boltzman perceptron
neural network, Automatica, Vol. 28, pp. 961-968, 1992.
12. R. Palm, Control of a redundant manipulator using fuzzy rules, Fuzzy Sets and Systems, Vol. 45,
No. 3, pp. 279-298, 1992.
13. Q. Yang, Classification of apple surface features using machine vision and neural networks,
Computer, Electron. Agriculture, Vol. 9, pp. 1-12, 1993.
14. J.J. Song and S. Park, A Fuzzy Dynamic Learning Controller for Chemical Process Control, Vol.
54, No. 2, pp. 121-133, 1993.
15. S. Kikuchi, V. Perincherry, P. Chakroborty and H. Takahasgi, Modeling of driver anxiety during
signal change intervals, Transportation research record, No. 1339, pp. 27-35, 1993.
16. N. Kiupel and P.M. Frank, Fuzzy control of steam turbines, International Journal of Systems
Science, Vol. 24, No. 10, pp. 1905-1914, 1993.
17. T.S. Liu and J.C. Wu, A model for rider-motorcycle system using fuzzy control, IEEE
Transactions on Systems, Man and Cybernetics, Vol. 23, No. 1, pp. 267-276, 1993.
18. J.R. Ambuel, T.S. Colvin and D.L. Karlen, A fuzzy logic yield simulator for prescription farming,
Transactions of the ASAE, Vol. 37, No. 6, pp. 1999-2009, 1994.
19. A. Hofaifar, B. Sayyarodsari and J.E. Hogans, Fuzzy controller robot arm trajectory, Information
Sciences: Applications, Vol. 2, No. 2, pp. 69-83, 1994.
20. S. Chen and E.G. Roger, Evaluation of cabbage seedling quality by fuzzy logic, ASAE Paper No.
943028, St. Joseph, MI, 1994.
21. P. Grinspan, Y. Edan, E.H. Kahn and E. Maltz, A fuzzy logic expert system for dairy cow transfer
between feeding groups, Transactions of the ASAE, Vol. 37, and No. 5, pp. 1647-1654, 1994.
22. P.L. Chang and Y.C. Chen, A fuzzy multi-criteria decision making method for technology
transfer strategy selection in biotechnology, Fuzzy Sets and Systems, Vol. 63, No. 2, pp. 131-139,
1994.
23. A. Marell and K. Westin, Intelligent Transportation System and Traffic Safety Drivers Perception
and Acceptance of Electronic Speed Checkers, Transportation Research Part C, Vol. 7 pp. 131-
147, USA, 1999.
120 FUZZY LOGIC AND NEURAL NETWORKS
24. D. Teodorovic, Fuzzy Logic Systems for Transportation Engineering: The State Of The Art,
Transportation Research Part A, Vol. 33, pp. 337-364, USA, 1999.
25. R. Elvik, How much do road accidents cost the national economy, Accident Analysis and
Prevention, Volume: 32, pp: 849-851, 2000.
26. M.A. Shahin, B.P. Verma, and E.W. Tollner, Fuzzy logic model for predicting peanut maturity,
Transactions of the ASAE, Vol. 43, No. 2, pp. 483-490, 2000.
1O
NeuraI Networks FundamentaIs
+ 0 ) 2 6 - 4
1.1 INTRODUCTION
The artiIiciaI neuraI networks, which we describe in this course, are aII variations on the paraIIeI
distributed processing (PDP) idea. The architecture oI each network is based on very simiIar buiIding
bIocks, which perIorm the processing. In this chapter we Iirst discuss these processing units and discuss
diIIerent network topoIogies. Learning strategies as a basis Ior an adaptive system wiII be presented in
the Iast section.
1.2 BIOLOGICAL NEURAL NETWORK
The term neuraI network` comes Irom the intended anaIogy with the Iunctioning oI the human brain
adopting simpIiIied modeIs oI bioIogicaI neuraI network`. The human brain consists oI nearIy 1011
neurons (nerve ceIIs) oI diIIerent types. In a typicaI neuron, one can Iind nucIeus with which the
connections with other neurons are made through a network oI Iibres caIIed dendrites. Extending out
Irom the nucIeus is the axon, which transmits, by means oI compIex chemicaI process, eIectric
potentiaIs to the neurons, with which the axon is connected to (Fig. 10.1). When signaIs, received by
neuron, become equaI or surpass their threshoId vaIues, it triggers` sending an eIectric signaI oI
constant IeveI and duration through axon. In this way, the message is transIerred Irom one neuron to the
other.
In the neuraI network, the neurons or the processing units may have severaI input paths
corresponding to the dendrites. The units combine usuaIIy by a simpIe summation, that is, the weighted
vaIues oI these paths (Fig. 10.2). The weighted vaIue is passed to the neuron, where it is modiIied by
threshoId Iunction such as sigmoid Iunction. The modiIied vaIue is directIy presented to the next
neuron.
122 IIZZY LCIC AND NLIRAL NLTWRKS
1.3 A FRAMEWORK FOR DISTRIBUTED REPRESENTATION
An artiIiciaI network consists oI a pooI oI simpIe processing units, which communicate by sending
signaIs to each other over a Iarge number oI weighted connections. A set oI major aspects oI a paraIIeI
distributed modeI can be distinguished as:
a set oI processing units (neurons`, ceIIs`);
a state oI activation y
k
Ior every unit, which is equivaIent to the output oI the unit;
connections between the units. GeneraIIy each connection is deIined by a weight vfk which
determines the eIIect which the signaI oI unit f has on unit k;
a propagation ruIe, which determines the eIIective input s
k
oI a unit Irom its externaI inputs;
an activation Iunction F
k
, which determines the new IeveI oI activation based on the eIIective
input sk(t) and the current activation y
k
(t) (i.e., the update);`
2 1
X
1
X
2
X
i
X
n
W
ij
W
2j
W
ij
W
nj
Fig. 10.2 Schematic representation of mathematical neuron network.
Fig. 10.1 Schematic representation of biological neuron network.
Dendrite
Cell body
Myelin sheath
Axon
Nucleus
Nerve ending
Synapse
NLIRAL NLTWRKS IINDAMLNTALS 123
an externaI input (aka bias, oIIset) G
k
Ior each unit;
a method Ior inIormation gathering (the Iearning ruIe);
an environment within which the system must operate, providing input signaIs and iI necessary
error signaIs.
Figure 10.3 iIIustrates these basics, some oI which wiII be discussed in the next sections.
Fig. 10.3 The basic components of an artificial neural network. The propagation rule used here is the 'standard'
weighted summation.
1.3.1 ProcessIng UnIts
Each unit perIorms a reIativeIy simpIe job: receive input Irom neighbors or externaI sources and use this
to compute an output signaI, which is propagated to other units. Apart Irom this processing, a second
task is the adjustment oI the weights. The system is inherentIy paraIIeI in the sense that many units can
carry out their computations at the same time.
Within neuraI systems it is useIuI to distinguish three types oI units: input units (indicated by an
index i) which receive data Irom outside the neuraI network, output units (indicated by an index o)
which send data out oI the neuraI network, and hidden units (indicated by an index h) whose input and
output signaIs remain within the neuraI network.
During operation, units can be updated either synchronousIy or asynchronousIy. With synchronous
updating, aII units update their activation simuItaneousIy; with asynchronous updating, each unit has a
(usuaIIy Iixed) probabiIity oI updating its activation at a time t, and usuaIIy onIy one unit wiII be abIe to
do this at a time. In some cases the Iatter modeI has some advantages.
1.3.2 ConnectIons between UnIts
In most cases we assume that each unit provides an additive contribution to the input oI the unit with
which it is connected. The totaI input to unit k is simpIy the weighted sum oI the separate outputs Irom
each oI the connected units pIus a bias or oIIset term G
k
:
s
k
(t) v
fk
f

(t) y
f
(t) G
k
(t) ...(10.1)
w
w
w
S
k
j
w
jk
y
j k
= +

q
k
y
k
wj
k
F
k
y
j
j
q
k
124 IIZZY LCIC AND NLIRAL NLTWRKS
The contribution Ior positive v
fk
is considered as an excitation and Ior negative v
fk
as inhibition. In
some cases more compIex ruIes Ior combining inputs are used, in which a distinction is made between
excitatory and inhibitory inputs. We caII units with a propagation ruIe (10.1) sigma units.
A diIIerent propagation ruIe, introduced by FeIdman and BaIIard, is known as the propagation ruIe
Ior the sigma-pi unit is given by
s
k
(t) v
fk
f

(t) y
fm
m

(t) G
k
(t) ...(10.2)
OIten, the y
fm
are weighted beIore muItipIication. AIthough these units are not IrequentIy used, they
have their vaIue Ior gating oI input, as weII as impIementation oI Iookup tabIes.
1.3.3 ActIvatIon and Output RuIes
We aIso need a ruIe, which gives the eIIect oI the totaI input on the activation oI the unit. We need a
Iunction F
k
which takes the totaI input s
k
(t) and the current activation y
k
(t) and produces a new vaIue oI
the activation oI the unit k:
y
k
(t 1) F
k
( y
k
(t), s
k
(t)). ...(10.3)
OIten, the activation Iunction is a non-decreasing Iunction oI the totaI input oI the unit:
y
k
(t 1) F
k
( s
k
(t)) F
k
v t y t t
fk f
f
k
( ) ( ) ( )


F
H
G
I
K
J
G ...(10.4)
aIthough activation Iunctions are not restricted to non-decreasing Iunctions. GeneraIIy, some sort oI
threshoId Iunction is used: a hard Iimiting threshoId Iunction (a sgn Iunction), or a Iinear or semi-Iinear
Iunction, or a smoothIy Iimiting threshoId (see Fig. 10.4). For this smoothIy Iimiting Iunction oIten a
sigmoid (S-shaped) Iunction Iike
i i i
Sgn Semi-linear Sigmoid
Fig. 10.4 Various activation functions for a unit.
y
k
F(s
k
)
1
1 e
s
k

...(10.5)
is used. In some appIications a hyperboIic tangent is used, yieIding output vaIues in the range |1; 1|.
In some cases, the output oI a unit can be a stochastic Iunction oI the totaI input oI the unit. In that
case the activation is not deterministicaIIy determined by the neuron input, but the neuron input
determines the probabiIity p that a neuron get a high activation vaIue:
NLIRAL NLTWRKS IINDAMLNTALS 125
p(y
k
1)
1
1 e
s 1
k
/
...(10.6)
in which 1 (temperature) is a parameter which determines the sIope oI the probabiIity Iunction.
In aII networks we consider that the output oI a neuron is to be identicaI to its activation IeveI.
1.4 NETWORK TOPOLOGIES
In the previous section we discussed the properties oI the basic processing unit in an artiIiciaI neuraI
network. This section Iocuses on the pattern oI connections between the units and the propagation oI
data.
As Ior this pattern oI connections, the main distinction we can make is between:
Feed-Iorward networks, where the data IIow Irom input to output units is strictIy Ieed-Iorward.
The data processing can extend over muItipIe (Iayers oI) units, but no Ieedback connections are
present, that is, connections extending Irom outputs oI units to inputs oI units in the same Iayer or
previous Iayers.
Recurrent networks that do contain Ieedback connections. Contrary to Ieed-Iorward networks,
the dynamicaI properties oI the network are important. In some cases, the activation vaIues oI the
units undergo a reIaxation process such that the network wiII evoIve to a stabIe state in which
these activations do not change anymore. In other appIications, the change oI the activation
vaIues oI the output neurons are signiIicant, such that the dynamicaI behavior constitutes the
output oI the network.
CIassicaI exampIes oI Ieed-Iorward networks are the Perceptron and AdaIine, which wiII be
discussed in the next chapter. ExampIes oI recurrent networks have been presented by Anderson,
Kohonen, and HopIieId and wiII be discussed in subsequent chapters.
1.5 TRAINING OF ARTIFICIAL NEURAL NETWORKS
A neuraI network has to be conIigured such that the appIication oI a set oI inputs produces (either
direct` or via a reIaxation process) the desired set oI outputs. Various methods to set the strengths oI the
connections exist. One way is to set the weights expIicitIy, using a priori knowIedge. Another way is to
train` the neuraI network by Ieeding it teaching patterns and Ietting it change its weights according to
some Iearning ruIe.
1.5.1 ParadIgms of LearnIng
We can categorize the Iearning situations in two distinct sorts. These are:
Supervised Iearning or Associative Iearning in which the network is trained by providing it with
input and matching output patterns. These input-output pairs can be provided by an externaI
teacher, or by the system, which contains the network (seII-supervised).
Unsupervised Iearning or SeII-organization in which an (output) unit is trained to respond to
cIusters oI pattern within the input. In this paradigm the system is supposed to discover
126 IIZZY LCIC AND NLIRAL NLTWRKS
statisticaIIy saIient Ieatures oI the input popuIation. UnIike the supervised Iearning paradigm,
there is no a priori set oI categories into which the patterns are to be cIassiIied rather the system
must deveIop its own representation oI the input stimuIi.
1.5.2 ModIfyIng Patterns of ConnectIvIty
Both Iearning paradigms discussed above resuIt in an adjustment oI the weights oI the connections
between units, according to some modiIication ruIe. VirtuaIIy aII Iearning ruIes Ior modeIs oI this type
can be considered as a variant oI the Hebbian Iearning ruIe. The basic idea is that iI two units f and k are
active simuItaneousIy, their interconnection must be strengthened. II f receives input Irom k, the simpIest
version oI Hebbian Iearning prescribes to modiIy the weight v
fk
with
,v
fk
C y
f
y
k
...(10.7)
where C is a positive constant oI proportionaIity representing the Iearning rate. Another common ruIe
uses not the actuaI activation oI unit k but the diIIerence between the actuaI and desired activation Ior
adjusting the weights:
,v
fk
C y
f
(d
k
y
k
) ...(10.8)
in which d
k
is the desired activation provided by a teacher. This is oIten caIIed the Widrow-HoII ruIe or
the deIta ruIe, and wiII be discussed in the next chapter.
Many variants (oIten very exotic ones) have been pubIished the Iast Iew years. In the next chapters
some oI these update ruIes wiII be discussed.
1.6 NOTATION AND TERMINOLOGY
1.6.1 NotatIon
We use the IoIIowing notation in our IormuIae. Note that not aII symboIs are meaningIuI Ior aII
networks, and that in some cases subscripts or superscripts may be IeIt out (e.g., p is oIten not
necessary) or added (e.g., vectors can, contrariwise to the notation beIow, have indices) where
necessary. Vectors are indicated with a boId non-sIanted Iont:
f, k,.. the unit f, k, .;
i an input unit;
h a hidden unit;
o an output unit;
x
p
the pth input pattern vector;
x
p
f
the fth eIement oI the pth input pattern vector;
s
p
the input to a set oI neurons when input pattern vector p is cIamped (i.e., presented to the
network); oIten: the input oI the network by cIamping input pattern vector p;
d
p
the desired output oI the network when input pattern vector p was input to the network;
d
p
f
the fth eIement oI the desired output oI the network when input pattern vector p was input to the
network;
NLIRAL NLTWRKS IINDAMLNTALS 127
y
p
the activation vaIues oI the network when input pattern vector p was input to the network;
y
p
f
the activation vaIues oI eIement f oI the network when input pattern vector p was input to the
network;
W the matrix oI connection weights;
v
f
the weights oI the connections which Ieed into unit f;
v
fk
the weight oI the connection Irom unit f to unit k;
F
f
the activation Iunction associated with unit f;
C
fk
the Iearning rate associated with weight v
fk
;
G the biases to the units;
G
f
the bias input to unit f;
U
f
the threshoId oI unit f in F
f
;
E
p
the error in the output oI the network when input pattern vector p is input;
A the energy oI the network.
1.6.2 TermInoIogy
Output vs. activation oI a unit. Since there is no need to do otherwise, we consider the output and the
activation vaIue oI a unit to be one and the same thing. That is, the output oI each neuron equaIs its
activation vaIue.
Bias, offset, threshold: These terms aII reIer to a constant (i.e., independent oI the network input but
adapted by the Iearning ruIe) term which is input to a unit. They may be used interchangeabIy, aIthough
the Iatter two terms are oIten envisaged as a property oI the activation Iunction. Furthermore, this
externaI input is usuaIIy impIemented (and can be written) as a weight Irom a unit with activation
vaIue 1.
Number of layers: In a Ieed-Iorward network, the inputs perIorm no computation and their Iayer is
thereIore not counted. Thus a network with one input Iayer, one hidden Iayer, and one output Iayer is
reIerred to as a network with two Iayers. This convention is wideIy though not yet universaIIy used.
Representation vs. learning: When using a neuraI network one has to distinguish two issues which
inIIuence the perIormance oI the system. The Iirst one is the representationaI power oI the network, the
second one is the Iearning aIgorithm.
The representationaI power oI a neuraI network reIers to the abiIity oI a neuraI network to represent
a desired Iunction. Because a neuraI network is buiIt Irom a set oI standard Iunctions, in most cases the
network wiII onIy approximate the desired Iunction, and even Ior an optimaI set oI weights the
approximation error is not zero.
The second issue is the Iearning aIgorithm. Given that there exist a set oI optimaI weights in the
network, is there a procedure to (iterativeIy) Iind this set oI weights?
12S IIZZY LCIC AND NLIRAL NLTWRKS
QUESTION BANK.
1. What are the major aspects oI paraIIeI distributed modeI?
2. ExpIain the bioIogicaI neuraI network.
3. What are the basic components oI artiIiciaI neuraI network?
4. What are the network topoIogies?
5. What are the various activation Iunction? ExpIain them schematicaIIy.
6. What are the paradigms oI neuraI network Iearning?
REFERENCES.
1. D.O. Hebb, 1he Organization of Behaviour, New York: WiIey, 1949.
2. B. Widrow, Generalization and Information Storage in Netvorks of Adaline Neurons, in Self
Organizing Systems 1962, ed. M.C. Jovitz, G.T. Jacobi, G. GoIdstein, Washington, D.C: Spartan
Books, PP. 435-461, 1962.
3. J.A. Anderson, NeuraI modeIs with cognitive impIications. In D. LaBerge and S.J. SamueIs
(Eds.), Basic Processes in Reading Perception and Comprehension Models (pp. 27-90).
HiIIsdaIe, NJ: ErIbaum, 1977.
4. T. Kohonen, Associative Memory: A System-1heoretical Approach, Springer-VerIag, 1977.
5. J.A. FeIdman, and D.H. BaIIard, Connectionist modeIs and their properties, Cognitive Science,
VoI. 6, pp. 205-254, 1982.
6. J.J. HopIieId,. NeuraI networks and physicaI systems with emergent coIIective computationaI
abiIities, Proceedings of the National Academy of Sciences, VoI. 79, pp. 2554-2558, 1982.
7. D.E. RumeIhart and J.L. McCIeIIand, Parallel Distributed Processing: Explorations in the
Microstructure of Cognition. The MIT Press, 1986.
8. B.A. PearImutter, Learning state space trajectories in recurrent neuraI networks, Neural
Computation, VoI. 1, No. 2, pp. 263-269, 1989.
9. B.W. MeI, Connectionist Robot Motion Planning. San Diego, CA: Academic Press, 1990.
11
Perceptron and AdaIine
+ 0 ) 2 6 - 4
11.1 INTRODUCTION
This chapter describes singIe Iayer neuraI networks, incIuding some oI the cIassicaI approaches to the
neuraI computing and Iearning probIem. In the Iirst part oI this chapter we discuss the representationaI
power oI the singIe Iayer networks and their Iearning aIgorithms and wiII give some exampIes oI using
the networks. In the second part we wiII discuss the representationaI Iimitations oI singIe Iayer networks.
Two cIassicaI` modeIs wiII be described in the Iirst part oI the chapter: the Perceptron, proposed by
RosenbIatt and the AdaIine, presented by Widrow and HoII.
11.2 NETWORKS WITH THRESHOLD ACTIVATION FUNCTIONS
A singIe Iayer Ieed-Iorward network consists oI one or more output neurons o, each oI which is
connected with a weighting Iactor v
io
to aII oI the inputs i. In the simpIest case the network has onIy two
inputs and a singIe output, as sketched in Fig. 11.1 (we Ieave the output index o out). The input oI the
neuron is the weighted sum oI the inputs pIus the bias term. The output oI the network is Iormed by the
activation oI the output neuron, which is some Iunction oI the input:
X
1
X
2
+ 1
W
1
W
2
q
y
Fig. 11.1 Single layer network with one output and two inputs.
13 IIZZY LCIC AND NLIRAL NLTWRKS
y F v x
i i
i
+
F
H
G
I
K
J

G
1
2
...(11.1)
The activation Iunction F can be Iinear so that we have a Iinear network, or non-Iinear. In this
section we consider the threshoId (sgn) Iunction:
F(s)
+ >
R
S
T
1 0 iI
otherwise
s
1
...(11.2)
The output oI the network thus is either 1 or 1, depending on the input. The network can now be
used Ior a cIassiIication task: it can decide whether an input pattern beIongs to one oI two cIasses. II the
totaI input is positive, the pattern wiII be assigned to cIass 1, iI the totaI input is negative, the sampIe
wiII be assigned to cIass 1. The separation between the two cIasses in this case is a straight Iine, given
by the equation:
v
1
x
1
v
2
x
2
u 0 ...(11.3)
The singIe Iayer network represents a Iinear discriminant Iunction.
A geometricaI representation oI the Iinear threshoId neuraI network is given in Fig. 11.2. Equation
(11.3) can be written as
x
2

v
v
x
1
2
1

u
v
2
...(11.4)
and we see that the weights determine the sIope oI the Iine and the bias determines the oIIset`, i.e. how
Iar the Iine is Irom the origin. Note that aIso the weights can be pIotted in the input space: the weight
Fig. 11.2 Geometric representation of the discriminant function and the weights.
x
2
w
1
w
2
|| || W
q x
1
+
+
+
+
+ +
+
ILRCLITRN AND ADALINL 131
vector is aIways perpendicuIar to the discriminant Iunction.
Now that we have shown the representationaI power oI the singIe Iayer network with Iinear
threshoId units, we come to the second issue: how do we Iearn the weights and biases in the network?
We wiII describe two Iearning methods Ior these types oI networks: the perceptron` Iearning ruIe and
the `deIta` or `LMS` ruIe. Both methods are iterative procedures that adjust the weights. A Iearning
sampIe is presented to the network. For each weight the new vaIue is computed by adding a correction to
the oId vaIue. The threshoId is updated in a same way:
v
i
(t 1) v
i
(t) Av
i
(t) ...(11.5)
u(t 1) u(t) Au(t) ...(11.6)
The Iearning probIem can now be IormuIated as: how do we compute Av
i
(t) and Au(t) in order to
cIassiIy the Iarning patterns correctIy?
11.3 PERCEPTRON LEARNING RULE AND CONVERGENCE THEOREM
11.3.1 Perceptron LearnIng RuIe
Suppose we have a set oI Iearning sampIes consisting oI an input vector x and a desired output d(x). For
a cIassiIication task the d(x) is usuaIIy 1 or 1. The perceptron Iearning ruIe is very simpIe and can be
stated as IoIIows:
1. Start with random weights Ior the connections;
2. SeIect an input vector x Irom the set oI training sampIes;
3. II y = d(x) (the perceptron gives an incorrect response), modiIy aII connections v
i
according to:
Av
i
d(x)x
i
;
4. Go back to 2.
Note that the procedure is very simiIar to the Hebb ruIe; the onIy diIIerence is that, when the
network responds correctIy, no connection weights are modiIied. Besides modiIying the weights, we
must aIso modiIy the threshoId u. This u is considered as a connection v
0
between the output neuron and
a dummy` predicate unit which is aIways on: x
0
1. Given the perceptron Iearning ruIe as stated above,
this threshoId is modiIied according to:
Au
0 iI the perceptron responds correctIy
( ) otherwise d x
R
S
T
...(11.7)
11.3.2 Convergence Theorem
For the Iearning ruIe there exists a convergence theorem, which states the IoIIowing: ~If there exists a
set of connection weights w which is able to perform the transformation y d(x), the perceptron
learning rule will converge to some solution (which may or may not be the same as w

) in a finite
number of steps for any initial choice of the weights".
Proof: Given the Iact that the Iength oI the vector v* does not pIay a roIe (because oI the sgn
operation), we take ,,v
*
,, 1. Because v
*
is a correct soIution, the vaIue ,v

o x,, where o denotes dot or
inner product, wiII be greater than 0 or: there exists a o ~ 0 such that ,v
*
o x, ~ o Ior aII inputs x.
132 IIZZY LCIC AND NLIRAL NLTWRKS
Now deIine cos o
v v
v
o
*
,, ,,
.
When according to the perceptron Iearning ruIe, connection weights are modiIied at a given input x,
we know that Av d(x)x, and the weight aIter modiIication is v v Av. From this it IoIIows that:
v o v* v o v
*
d(x) o v
*
o x
v o v
*
sgn(v
*
o x) v
*
o x
~ v o v
*
o
,,v,,
2
,,v d(x)x,,
2
v
2
2d (x) v o x x
2
v
2
x
2
(because d(x) sgn |v o x|)
v
2
M
AIter t modiIications we have:
v(t) o v
*
~ v o v
*
to
,,v(t),,
2
v
2
tM
such that
cos o(t)
v v t
v t
*
( )
,, ( ),,
o
~
v v t
v tM
*
o +
+
o
2
From this IoIIows that
Iim
t~
cos o(t) Iim
t~
o
M
t ~ whiIe cos o s 1.
The concIusion is that there must be an upper Iimit t
max
Ior t. the system modiIies its connections
onIy a Iimited number oI times. In other words, aIter maximaIIy t
max
modiIications oI the weights the
perceptron is correctIy perIorming the mapping. t
max
wiII be reached when cos o 1. II we start with
connections v 0,
t
max

M
o
2
...(11.8)
Example 11.1: A perceptron is initiaIized with the IoIIowing weights: v
1
1; v
2
2; u 2. The
perceptron Iearning ruIe is used to Iearn a correct discriminant Iunction Ior a number oI sampIes,
sketched in Fig. 11.3.
ILRCLITRN AND ADALINL 133
The Iirst sampIe A, with vaIues x (0:5; 1:5) and target vaIue d(x) 1 is presented to the network.
From equation (11.1) it can be caIcuIated that the network output is 1, so no weights are adjusted. The
same is the case Ior point B, with vaIues x (0:5; 0:5) and target vaIue d(x) -1; the network output is
negative, so no change. When presenting point C with vaIues x (0:5; 0:5) the network output wiII be
1, whiIe the target vaIue d(x) 1.
According to the perceptron Iearning ruIe, the weight changes are:
Av
1
0:5, Av
2
0:5, u 1.
The new weights are now:
v
1
1:5, v
2
2:5, u 1,
and sampIe C is cIassiIied correctIy.
In Fig. 11.3 the discriminant Iunction beIore and aIter this weight update is shown.
11.4 ADAPTIVE LINEAR ELEMENT {AdaIIne)
An important generaIisation oI the perceptron training aIgorithm was presented by Widrow and HoII as
the Ieast mean square` (LMS) Iearning procedure, aIso known as the deIta ruIe. The main IunctionaI
dierence with the perceptron training ruIe is the way the output oI the system is used in the Iearning
ruIe. The perceptron Iearning ruIe uses the output oI the threshoId Iunction (either 1 or 1) Ior Iearning.
The deIta-ruIe uses the net output without Iurther mapping into output vaIues 1 or 1.
The Iearning ruIe was appIied to the adaptive Iinear eIement`, aIso named AdaIine, deveIoped by
Widrow and HoII. In a simpIe physicaI impIementation (Fig. 11.4) this device consists oI a set oI
controIIabIe resistors connected to a circuit, which can sum up currents caused by the input voItage
signaIs. UsuaIIy the centraI bIock, the summer, is aIso IoIIowed by a quantiser, which outputs either 1
or 1, depending on the poIarity oI the sum.
AIthough the adaptive process is here exempIiIied in a case when there is onIy one output, it may be
cIear that a system with many paraIIeI outputs is directIy impIementabIe by muItipIe units oI the above
kind.
II the input conductances are denoted by v
i
, i 0; 1,..., n, and the input and output signaIs by x
i
and
y, respectiveIy, then the output oI the centraI bIock is deIined to be
Fig. 11.3 Discriminant function before and after weight update.
+ +
A
+ C
1
2
2
1
x
2
x
1
B
Original discriminant function
After weight update
134 IIZZY LCIC AND NLIRAL NLTWRKS
y v x
i i
i
n

1
u ...(11.9)
where u v
0
. The purpose oI this device is to yieId a given vaIue y d
p
at its output when the set oI
vaIues x
p
i
, i 1; 2, ., n, is appIied at the inputs. The probIem is to determine the coeIicients v
i
, i 0, 1,
., n, in such a way that the input-output response is correct Ior a Iarge number oI arbitrariIy chosen
signaI sets. II an exact mapping is not possibIe, the average error must be minimised, Ior instance, in the
sense oI Ieast squares. An adaptive operation means that there exists a mechanism by which the v
i
can
be adjusted, usuaIIy iterativeIy, to attain the correct vaIues. For the AdaIine, Widrow introduced the
deIta ruIe to adjust the weights.
11.5 THE DELTA RULE
For a singIe Iayer network with an output unit with a Iinear activation Iunction the output is simpIy
given by
y v x
f f
f

u ...(11.10)
Such a simpIe network is abIe to represent a Iinear reIationship between the vaIue oI the output unit
and the vaIue oI the input units. By threshoIding the output vaIue, a cIassiIier can be constructed (such
as AdaIine), but here we Iocus on the Iinear reIationship and use the network Ior a Iunction
approximation task. In high dimensionaI input spaces the network represents a (hyper) pIane and it wiII
be cIear that aIso muItipIe output units may be deIined.
Suppose we want to train the network such that a hyperpIane is Iitted as weII as possibIe to a set oI
training sampIes consisting oI input vaIues x
p
and desired (or target) output vaIues d
p
. For every given
input sampIe, the output oI the network diIIers Irom the target vaIue d
p
by (d
p
y
p
), where y
p
is the
actuaI output Ior this pattern. The deIta-ruIe now uses a cost-or error-Iunction based on these diIIerences
to adjust the weights.
Fig. 11.4 The adaline.
Input
pattern
switches
S
S
+
+1 1

+ 1
Level
w
0
w
1
w
2
w
3
1 + 1
Summer
Gains
Error
Reference
switch
Output
Quantizer
ILRCLITRN AND ADALINL 135
The error Iunction, as indicated by the name Ieast mean square, is the summed squared error. That
is, the totaI error E is deIined to be
E E
p
p


1
2
2
( ) d y
p p
p

...(11.11)
where the index p ranges over the set oI input patterns and E
p
represents the error on pattern p. The LMS
procedure Iinds the vaIues oI aII the weights that minimize the error Iunction by a method caIIed
gradient descent. The idea is to make a change in the weight proportionaI to the negative oI the
derivative oI the error as measured on the current pattern with respect to each weight:
A
p
v
f
y
o
o
E
v
p
f
...(11.12)
where y is a constant oI proportionaIity. The derivative is
o
o
E
v
p
f

o
o
E
y
p
p

o
o
y
v
p
f
. ...(11.13)
Because oI the Iinear units, eq. (11.10),
o
o
y
v
p
f
x
f
...(11.14)
and
o
o
E
y
p
p
(d
p
y
p
) ...(11.15)
such that
A
p
v
f
yo
p
x
f
...(11.16)
where o
p
d
p
y
p
is the diIIerence between the target output and the actuaI output Ior pattern p.
The deIta ruIe modiIies weight appropriateIy Ior target and actuaI outputs oI either poIarity and Ior
both continuous and binary input and output units. These characteristics have opened up a weaIth oI
new appIications.
11.6 EXCLUSIVE-OR PROBLEM
In the previous sections we have discussed two Iearning aIgorithms Ior singIe Iayer networks, but we
have not discussed the Iimitations on the representation oI these networks.
136 IIZZY LCIC AND NLIRAL NLTWRKS
TabIe 11.1 Exclusive-or truth table.
N

@
1 1 1
1 1 1
1 1 1
1 1 1
One oI Minsky and Papert`s most discouraging resuIts shows that a singIe Iayer perceptron cannot
represent a simpIe excIusive-or Iunction. TabIe 3.1 shows the desired reIationships between inputs and
output units Ior this Iunction.
In a simpIe network with two inputs and one output, as depicted in Fig. 11.1, the net input is equaI
to:
s v
1
x
1
v
2
x
2
u ...(11.17)
According to eq. (11.1), the output oI the perceptron is zero when s is negative and equaI to one
when s is positive. In Fig. 11.5 a geometricaI representation oI the input domain is given. For a constant
u, the output oI the perceptron is equaI to one on one side oI the dividing Iine which is deIined by:
v
1
x
1
v
2
x
2
u ...(11.18)
and equaI to zero on the other side oI this Iine.
To see that such a soIution cannot be Iound, take a Ioot at Fig. 11.5. The input space consists oI Iour
points, and the two soIid circIes at (1, 1) and (1, 1) cannot be separated by a straight Iine Irom the two
open circIes at (1, 1) and (1, 1). The obvious question to ask is: How can this probIem be overcome?
Minsky and Papert prove that Ior binary inputs, any transIormation can be carried out by adding a Iayer
oI predicates which are connected to aII inputs. The prooI is given in the next section.
Fig. 11.5 Geometric representation of input space
x
1
x
2
And
( 1, 1)
( 1, 1)
(1, 1)
(1, 1)
x
1
x
2
x
1
x
2
?
?
OR XOR
For the speciIic XOR probIem we geometricaIIy show that by introducing hidden units, thereby
extending the network to a muIti-Iayer perceptron, the probIem can be soIved. Fig. 11.6a demonstrates
that the Iour input points are now embedded in a three-dimensionaI space deIined by the two inputs pIus
the singIe hidden unit. These Iour points are now easiIy separated by a Iinear maniIoId (pIane) into two
groups, as desired. This simpIe exampIe demonstrates that adding hidden units increases the cIass oI
ILRCLITRN AND ADALINL 137
probIems that are soIubIe by Ieed-Iorward, perceptron- Iike networks. However, by this generaIization
oI the basic architecture we have aIso incurred a serious Ioss: we no Ionger have a Iearning ruIe to
determine the optimaI weights.
(a) The perceptron oI Fig. 11.1 with an extra hidden unit. With the indicated vaIues oI the weights
v
if
(next to the connecting Iines) and the threshoIds u
i
(in the circIes) this perceptron soIves the XOR
probIem. (b) This is accompIished by mapping the Iour points oI Fig. 11.6 onto the Iour points indicated
here; cIearIy, separation (by a Iinear maniIoId) into the required groups is now possibIe.
Fig. 11.7 Solution of the XOR problem.
11.7 MULTI-LAYER PERCEPTRONS CAN DO EVERYTHING
In the previous section we showed that by adding an extra hidden unit, the XOR probIem can be
soIved. For binary units, one can prove that this architecture is abIe to perIorm any transIormation given
the correct connections and weights. The most primitive is the next one. For a given transIormation y
d(x), we can divide the set oI aII possibIe input vectors into two cIasses:
X

x,d(x) 1} and X

x,d(x) 1} ...(11.19)
Since there are N input units, the totaI number oI possibIe input vectors x is 2N. For every x
p
e X

a hidden unit h can be reserved oI which the activation y


h
is 1 iI and onIy iI the speciIic pattern p is
present at the input: we can choose its weights v
ih
equaI to the speciIic pattern x
p
and the bias u
h
equaI
to 1 - N such that
y
p
h
sgn v x N
ih i
p
i
+
F
H
G
I
K
J
1
2
...(11.20)
is equaI to 1 Ior x
p
v
h
onIy. SimiIarIy, the weights to the output neuron can be chosen such that the
output is one as soon as one oI the M predicate neurons is one:
y
p
o
sgn y M
h
h
M
+
F
H
G
I
K
J


1
2
1
...(11.21)
0.5 0.5
1
1
1
1
1
(1, 1, 1)
( 1, 1, 1)
a. b.
13S IIZZY LCIC AND NLIRAL NLTWRKS
This perceptron wiII give y
0
1 onIy iI x e X

: it perIorms the desired mapping. The probIem is the


Iarge number oI predicate units, which is equaI to the number oI patterns in X

, which is maximaIIy 2
N
.
OI course we can do the same trick Ior X

, and we wiII aIways take the minimaI number oI mask units,


which is maximaIIy 2
N-1
. A more eIegant prooI is given by Minsky and Papert, but the point is that Ior
compIex transIormations the number oI required units in the hidden Iayer is exponentiaI in N.
QUESTION BANK.
1. ExpIain singIe Iayer neuraI network with one output and two inputs.
2. Describe the perceptron Iearning ruIe.
3. Derive the convergence theorem Ior perceptron Iearning ruIe.
4. ExpIain AdaIine neuraI network.
5. ExpIain the deIta ruIe used to adjust the weights oI AdaIine network.
6. SingIe Iayer perceptron cannot represent excIusive-OR. JustiIy this statement.
7. What are the advantages oI muItipIayer perceptron over singIe Iayer perceptron?
REFERENCES.
1. F. RosenbIatt, Principles of Neurodynamics, New York: Spartan Books, 1959.
2. B. Widrow, and M.E. HoII, Adaptive Svitching Circuits, In 1960 Ire Wescon Convention Record,
Dunno, 1960.
3. D.O. Hebb, 1he Organization of Behaviour. New York: WiIey. 1949.
4. M. Minsky, and S. Papert, Perceptrons: An Introduction to Computational Geometry, The MIT
Press, 1969.
12
Back-Propagation
+ 0 ) 2 6 - 4
12. 1 INTRODUCTION
As we have seen in the previous chapter, a singIe-Iayer network has severe restrictions: the cIass oI tasks
that can be accompIished is very Iimited. In this chapter we wiII Iocus on Ieed Iorward networks with
Iayers oI processing units.
Minsky and Papert showed in 1969 that a two Iayer Ieed-Iorward network can overcome many
restrictions, but did not present a soIution to the probIem oI how to adjust the weights Irom input to
hidden units. An answer to this question was presented by RumeIhart, Hinton and WiIIiams in 1986, and
simiIar soIutions appeared to have been pubIished earIier (Parker, 1985; Cun, 1985).
The centraI idea behind this soIution is that the errors Ior the units oI the hidden Iayer are
determined by back-propagating the errors oI the units oI the output Iayer. For this reason the method is
oIten caIIed the back-propagation Iearning ruIe. Back-propagation can aIso be considered as a
generaIization oI the deIta ruIe Ior non-Iinear activation Iunctions and muItiIayer networks.
12.2 MULTI - LAYER FEED - FORWARD NETWORKS
A Ieed-Iorward network has a Iayered structure. Each Iayer consists oI units, which receive their input
Irom units Irom a Iayer directIy beIow and send their output to units in a Iayer directIy above the unit.
There are no connections within a Iayer. The N
i
inputs are Ied into the Iirst Iayer oI N
h, 1
hidden units.
The input units are mereIy Ian-out` units; no processing takes pIace in these units. The activation oI a
hidden unit is a Iunction F
i
oI the weighted inputs pIus a bias, as given in eq. (10.4). The output oI the
hidden units is distributed over the next Iayer oI N
h, 2
hidden units, untiI the Iast Iayer oI hidden units, oI
which the outputs are Ied into a Iayer oI No output units (see Fig. 12.1).
AIthough back-propagation can be appIied to networks with any number oI Iayers, just as Ior
networks with binary units (section 11.7) it has been shown (Cybenko, 1989; Funahashi, 1989; Hornik,
Stinchcombe, & White, 1989; Hartman, KeeIer, & KowaIski, 1990) that onIy one Iayer oI hidden units
suIIices to approximate any Iunction with IiniteIy many discontinuities to arbitrary precision, provided
the activation Iunctions oI the hidden units are non-Iinear (the universaI approximation theorem).
14 IIZZY LCIC AND NLIRAL NLTWRKS
In most appIications a Ieed-Iorward network with a singIe Iayer oI hidden units is used with a
sigmoid activation Iunction Ior the units.
12.3 THE GENERALISED DELTA RULE
Since we are now using units with nonIinear activation Iunctions, we have to generaIise the deIta ruIe,
which was presented in chapter 11 Ior Iinear Iunctions to the set oI non-Iinear activation Iunctions. The
activation is a diIIerentiabIe Iunction oI the totaI input, given by
y
p
k
F(S
p
k
) ... (12.1)
in which
s
p
k
v y
fk k
p
f

u
k
...(12.2)
To get the correct generaIization oI the deIta ruIe as presented in the previous chapter, we must set
A
p
v
fk
y
o
o
E
v
p
fk
...(12.3)
The error E
p
is deIined as the totaI quadratic error Ior pattern p at the output units:
E
p

1
2
2
1
d y
o
p
o
p
o
N
o

d i

...(12.4)
Fig. 12.1 A multi-layer network with layers of units.
N
0
N
h11
N
h12
N
h,1
h o
N
i
BACK-IRIACATIN 141
where d
p
o
is the desired output Ior unit 0 when pattern p is cIamped. We Iurther set E E
p
p

as the
summed squared error. We can write
o
o
E
v
p
fk

o
o
E
S
p
k
p

o
o
S
v
k
p
fk
...(12.5)
By equation (12.2) we see that the second Iactor is

o
o
S
v
k
p
fk
y
p
f
...(12.6)
When we deIine
o
p
k

o
o
E
S
p
k
p
...(12.7)
we wiII get an update ruIe which is equivaIent to the deIta ruIe as described in the previous chapter,
resuIting in a gradient descent on the error surIace iI we make the weight changes according to:
A
p
v
fk
yo
p
k
y
p
f
...(12.8)
The trick is to Iigure out what o
p
k
shouId be Ior each unit k in the network. The interesting resuIt,
which we now derive, is that there is a simpIe recursive computation oI these o`s which can be
impIemented by propagating error signaIs backward through the network.
To compute o
p
k
we appIy the chain ruIe to write this partiaI derivative as the product oI two Iactors,
one Iactor reIIecting the change in error as a Iunction oI the output oI the unit and one reIIecting the
change in the output as a Iunction oI changes in the input. Thus, we have
o
p
k

o
o
E
S
p
k
p

o
o
E
y
p
k
p
o
o
y
S
k
p
k
p
...(12.9)
Let us compute the second Iactor. By equation (12.1) we see that
o
o
y
S
k
p
k
p
F(S
p
k
) ...(12.10)
which is simpIy the derivative oI the squashing Iunction F Ior the kth unit, evaIuated at the net input S
p
k
to that unit. To compute the Iirst Iactor oI equation (12.9), we consider two cases. First, assume that unit
k is an output unit k o oI the network. In this case, it IoIIows Irom the deIinition oI E
p
that
o
o
E
y
p
o
p
(d
p
o
y
p
o
) ... (12.11)
which is the same resuIt as we obtained with the standard deIta ruIe. Substituting this and equation
(12.10) in equation (12.9), we get
142 IIZZY LCIC AND NLIRAL NLTWRKS
o
p
o
(d
p
o
y
p
o
) F
o
'
(S
p
o
) ...(12.12)
Ior any output unit o. SecondIy, iI k is not an output unit but a hidden unit k h, we do not readiIy know
the contribution oI the unit to the output error oI the network. However, the error measure can be written
as a Iunction oI the net inputs Irom hidden to output Iayer E
p
E
p
(s
p
1
, s
p
2
,..., s
p
f
,...) and we use the chain
ruIe to write
o
o
E
y
p
h
p

o
o

E
S
p
o
p
o
N
o
1
o
o
S
S
o
p
h
p

o
o

E
S
p
o
p
o
N
o
1
o
oy
h
p
v y
ko f
p
f
N
o

1

o
o

E
S
v
p
o
p
f
N
ho
o
1
o
o
p
ho
f
N
v
o

1
... (12.13)
Substituting this in equation (12.9) yieIds
o
p
h
F(S
p
h
) @
o
p
ho
f
N
v
o

1
... (12.14)
Equations (12.12) and (12.14) give a recursive procedure Ior computing the o`s Ior aII units in the
network, which are then used to compute the weight changes according to equation (12.8). This
procedure constitutes the generaIized deIta ruIe Ior a Ieed-Iorward network oI non-Iinear units.
12.3.1 UnderstandIng Back-PropagatIon
The equations derived in the previous section may be mathematicaIIy correct, but what do they actuaIIy
mean? Is there a way oI understanding back-propagation other than reciting the necessary equations?
The answer is, oI course, yes. In Iact, the whoIe back-propagation process is intuitiveIy very cIear.
What happens in the above equations is the IoIIowing. When a Iearning pattern is cIamped, the
activation vaIues are propagated to the output units, and the actuaI network output is compared with the
desired output vaIues, we usuaIIy end up with an error in each oI the output units. Let`s caII this error e
o
Ior a particuIar output unit o. We have to bring e
o
to zero.
The simpIest method to do this is the greedy method: we strive to change the connections in the
neuraI network in such a way that, next time around, the error e
o
wiII be zero Ior this particuIar pattern.
We know Irom the deIta ruIe that, in order to reduce an error, we have to adapt its incoming weights
according to
Av
ho
(d

) y
h
...(12.15)
That is step one. But it aIone is not enough: when we onIy appIy this ruIe, the weights Irom input to
hidden units are never changed, and we do not have the IuII representationaI power oI the Ieed-Iorward
network as promised by the universaI approximation theorem.
In order to adapt the weights Irom input to hidden units, we again want to appIy the deIta ruIe. In
this case, however, we do not have a vaIue Ior o Ior the hidden units. This is soIved by the chain ruIe
which does the IoIIowing: distribute the error oI an output unit o to aII the hidden units that is it
connected to, weighted by this connection. DiIIerentIy put, a hidden unit h receives a deIta Irom each
output unit o equaI to the deIta oI that output unit weighted with ( muItipIied by) the weight oI the
BACK-IRIACATIN 143
connection between those units. In symboIs: o
h
o
0
0
v
ho
WeII, not exactIy: we Iorgot the activation
Iunction oI the hidden unit; F has to be appIied to the deIta, beIore the back-propagation process can
continue.
12.4 WORKING WITH BACK-PROPAGATION
The appIication oI the generaIised deIta ruIe thus invoIves two phases: During the Iirst phase the input
x is presented and propagated Iorward through the network to compute the output vaIues y
o
p
Ior each
output unit. This output is compared with its desired vaIue d
o
, resuIting in an error signaI o
o
p
Ior each
output unit.
The second phase invoIves a backward pass through the network during which the error signaI is
passed to each unit in the network and appropriate weight changes are caIcuIated.
12.4.1 WeIght Adjustments wIth SIgmoId ActIvatIon FunctIon
The resuIts Irom the previous section can be summarised in three equations:
The weight oI a connection is adjusted by an amount proportionaI to the product oI an error signaI
o, on the unit k receiving the input and the output oI the unit f sending this signaI aIong the
connection:
A
p
v
kf
yo
k
p
y
p
f
...(12.16)
II the unit is an output unit, the error signaI is given by
o
o
p
(d
p
o
y
p
o
) F
o
'
(S
p
o
) ...(12.17)
Take as the activation Iunction F the sigmoid` Iunction as deIined in chapter 2:
y
p


F(S
p
)
1
1+ e
s
p

...(12.18)
In this case the derivative is equaI to
F(S
p
)
o
oS
p

1
1+ e
s
p


1
1
2
+ e
e
s
s
e
p

e j
e j

1
1
2
+ e
s
p
e j

e
e
s
s
p
p
e j
e j
1+
y
p
(1 y
p
) ...(12.19)
such that the error signaI Ior an output unit can be written as:
o
o
p
(d
p
o
y
p
o
) y
p
o o
(1 y
p
o
) ...(12.20)
The error signaI Ior a hidden unit is determined recursiveIy in terms oI error signaIs oI the units to
which it directIy connects and the weights oI those connections. For the sigmoid activation
Iunction:
144 IIZZY LCIC AND NLIRAL NLTWRKS
o
p
h
F(S
p
h
) d v
o
p
ho
f
N
o

1
y
p
h
(1 y
p
h
) d v
o
p
ho
f
N
o

1
...(12.21)
12.4.2 LearnIng Rate And Momentum
The Iearning procedure requires that the change in weight is proportionaI to
o
o
E
v
p
. True gradient descent
requires that inIinitesimaI steps are taken. The constant oI proportionaIity is the Iearning rate y. For
practicaI purposes we choose a Iearning rate that is as Iarge as possibIe without Ieading to osciIIation.
One way to avoid osciIIation at Iarge, is to make the change in weight dependent oI the past weight
change by adding a momentum term:
Av
fk
(t 1) yo
p
k
y
p
f
oAv
fk
(t) ...(12.22)
where t indexes the presentation number and o is a constant which determines the eIIect oI the previous
weight change.
The roIe oI the momentum term is shown in Fig. 12.2. When no momentum term is used, it takes a
Iong time beIore the minimum has been reached with a Iow Iearning rate, whereas Ior high Iearning rates
the minimum is never reached because oI the osciIIations. When adding the momentum term, the
minimum wiII be reached Iaster.
b
a
c
Fig. 12.2 The descent in weight space. (a) for small learning rate; (b) for large learning rate: note the oscillations, and
(c) with large learning rate and momentum term added.
12.4.3 LearnIng Per Pattern
AIthough, theoreticaIIy, the back-propagation aIgorithm perIorms gradient descent on the totaI error
onIy iI the weights are adjusted aIter the IuII set oI Iearning patterns has been presented, more oIten than
not the Iearning ruIe is appIied to each pattern separateIy, i.e., a pattern p is appIied, E
p
is caIcuIated, and
the weights are adapted (p 1, 2, ., P). There exists empiricaI indication that this resuIts in Iaster
convergence. Care has to be taken, however, with the order in which the patterns are taught. For
exampIe, when using the same sequence over and over again the network may become Iocused on the
Iirst Iew patterns. This probIem can be overcome by using a permuted training method.
BACK-IRIACATIN 145
Example 12.1: A Ieed-Iorward network can be used to approximate a Iunction Irom exampIes.
Suppose we have a system (Ior exampIe a chemicaI process or a IinanciaI market) oI which we want to
know the characteristics. The input oI the system is given by the two-dimensionaI vector x and the
output is given by the one-dimensionaI vector d. We want to estimate the reIationship d f(x) Irom 80
exampIes x
p
, d
p
} as depicted in Fig. 12.3 (top IeIt). A Ieed-Iorward network was programmed with two
inputs, 10 hidden units with sigmoid activation Iunction and an output unit with a Iinear activation
Iunction. Check Ior yourseII how equation (4.20) shouId be adapted Ior the Iinear instead oI sigmoid
activation Iunction. The network weights are initiaIized to smaII vaIues and the network is trained Ior
5,000 Iearning iterations with the back-propagation training ruIe, described in the previous section. The
reIationship between x and d as represented by the network is shown in Fig. 12.3 (top right), whiIe the
Iunction which generated the Iearning sampIes is given in Fig. 12.3 (bottom IeIt). The approximation
error is depicted in Fig. 12.3 (bottom right). We see that the error is higher at the edges oI the region
within which the Iearning sampIes were generated. The network is considerabIy better at interpoIation
than extrapoIation.
Fig. 12.3 Example of function approximation with a feed forward network. Top left: The original learning samples; Top
right: The approximation with the network; Bottom left: The function which generated the learning samples;
Bottom right: The error in the approximation.
1
0
1
1
0
1
1
1
0
1
1
0
1
1
0
1
1
0
0
1
1
0
1
1 1
0
1
0
1
1
0
1
1
1
0
1
146 IIZZY LCIC AND NLIRAL NLTWRKS
12.5 OTHER ACTIVATION FUNCTIONS
AIthough sigmoid Iunctions are quite oIten used as activation Iunctions, other Iunctions can be used as
weII. In some cases this Ieads to a IormuIa, which is known Irom traditionaI Iunction approximation
theories.
For exampIe, Irom Fourier anaIysis it is known that any periodic Iunction can be written as a inIinite
sum oI sine and cosine terms (Fourier series):
f(x)
n
~

0
(a
n
cos nx b
n
sin nx) ...(12.23)
We can rewrite this as a summation oI sine terms
f(x) a
0

n
~

1
c
n
sin (nx u
n
) ...(12.24)
with c
n
a b
n n
2 2
+ and u
n
arctan (b/a). This can be seen as a Ieed-Iorward network with a singIe input
unit Ior x, a singIe output unit Ior f (x) and hidden units with an activation Iunction F sin (s). The Iactor
a
0
corresponds with the bias oI the output unit, the Iactors c
n
correspond with the weighs Irom hidden to
output unit; the phase Iactor u
n
corresponds with the bias term oI the hidden units and the Iactor n
corresponds with the weights between the input and hidden Iayer.
The basic diIIerence between the Fourier approach and the back-propagation approach is that the in
the Fourier approach the weights` between the input and the hidden units (these are the Iactors n) are
Iixed integer numbers which are anaIyticaIIy determined, whereas in the back-propagation approach
these weights can take any vaIue and are typicaIIy Iearning using a Iearning heuristic.
To iIIustrate the use oI other activation Iunctions we have trained a Ieed-Iorward network with one
output unit, Iour hidden units, and one input with ten patterns drawn Irom the Iunction f (x) sin(2x)
sin(x). The resuIt is depicted in Fig. 12.4. The same Iunction (aIbeit with other Iearning points) is
Iearned with a network with eight sigmoid hidden units (see Figure 12.5). From the Iigures it is cIear
that it pays oII to use as much knowIedge oI the probIem at hand as possibIe.
12.6 DEFICIENCIES OF BACK-PROPAGATION
Despite the apparent success oI the back-propagation Iearning aIgorithm, there are some aspects, which
make the aIgorithm not guaranteed to be universaIIy useIuI. Most troubIesome is the Iong training
process. This can be a resuIt oI a non-optimum Iearning rate and momentum. A Iot oI advanced
aIgorithms based on back-propagation Iearning have some optimized method to adapt this Iearning rate,
as wiII be discussed in the next section. Outright training IaiIures generaIIy arise Irom two sources:
network paraIysis and IocaI minima.
BACK-IRIACATIN 147
4 2
0.5
+ 1
2 4 6 8
Fig. 12.4 The periodic function B(N) = sin (2N) sin (N) approximated with sine activation functions.
Fig. 12.5 The periodic function B(N) = sin (2N) sin (N) approximated with sigmoid activation functions.
4
+ 1
2 4 6
1
14S IIZZY LCIC AND NLIRAL NLTWRKS
12.6.1 Network ParaIysIs
As the network trains, the weights can be adjusted to very Iarge vaIues. The totaI input oI a hidden unit
or output unit can thereIore reach very high (either positive or negative) vaIues, and because oI the
sigmoid activation Iunction the unit wiII have an activation very cIose to zero or very cIose to one. As is
cIear Irom equations (12.20) and (12.21), the weight adjustments which are proportionaI to y
p
k
(1 y
p
k
)
wiII be cIose to zero, and the training process can come to a virtuaI standstiII.
12.6.2 LocaI MInIma
The error surIace oI a compIex network is IuII oI hiIIs and vaIIeys. Because oI the gradient descent, the
network can get trapped in a IocaI minimum when there is a much deeper minimum nearby.
ProbabiIistic methods can heIp to avoid this trap, but they tend to be sIow. Another suggested possibiIity
is to increase the number oI hidden units. AIthough this wiII work because oI the higher dimensionaIity
oI the error space, and the chance to get trapped is smaIIer, it appears that there is some upper Iimit oI the
number oI hidden units which, when exceeded, again resuIts in the system being trapped in IocaI
minima.
12.7 ADVANCED ALGORITHMS
Many researchers have devised improvements oI and extensions to the basic back-propagation
aIgorithm described above. It is too earIy Ior a IuII evaIuation: some oI these techniques may prove to be
IundamentaI, others may simpIy Iade away. A Iew methods are discussed in this section.
May be the most obvious improvement is to repIace the rather primitive steepest descent method
with a direction set minimization method, e.g., conjugate gradient minimization. Note that
minimization aIong a direction u brings the Iunction f at a pIace where its gradient is perpendicuIar to u
(otherwise minimization aIong u is not compIete).
Instead oI IoIIowing the gradient at every step, a set oI n directions is constructed which are aII
conjugate to each other such that minimization aIong one oI these directions u
f
does not spoiI the
minimization aIong one oI the earIier directions u
i
, i.e., the directions are non-interIering. Thus one
minimization in the direction oI u
i
suIIices, such that n minimizations in a system with n degrees oI
Ireedom bring this system to a minimum (provided the system is quadratic). This is diIIerent Irom
gradient descent, which directIy minimizes in the direction oI the steepest descent (Press, FIannery,
TeukoIsky, & VetterIing, 1986).
Suppose the Iunction to be minimized is approximated by its TayIor series
f (x) f (p)
o
o
+
o
o o

f
x
x
f
x x
i
p
i
i f
p
i f i
1
2
2
,
x
i
x
f
...-
1
2
x
1
Ax b
1
x c ...(12.25)
where 1 denotes transpose, and
c f (p)
BACK-IRIACATIN 149
b V f
p
|A|
if

o
o o
2
f
x x
i f
p
...(12.26)
A is a symmetric positive deIinite n n matrix, the Hessian oI f at p. The gradient oI f is
Vf Ax b ...(12.27)
such that a change oI x resuIts in a change oI the gradient as
o(Vf ) A(ox) ...(12.28)
Now suppose f was minimized aIong a direction ui to a point where the gradient g
i 1
oI f is
perpendicuIar to u
i
, i.e.,
u
1
i
g
i 1
0 ...(12.29)
and a new direction u
i1
is sought. In order to make sure that moving aIong u
i1
does not spoiI
minimization aIong u
i
we require that the gradient oI f remain perpendicuIar to u
i
, i.e.,
u
1
i
g
i 2
0
...
(12.30)
otherwise we wouId once more have to minimise in a direction which has a component oI u
i
.
Combining (12.29) and (12.30), we get
0 u
1
i
(g
i1


g
i2
) u
1
i
o(Vf) u
1
i
Au
i1
...(12.31)
When eq. (12.31) hoIds Ior two vectors u
i
and u
i 1
they are said to be conjugate.
Now, starting at some point p
0
, the Iirst minimization direction u
0
is taken equaI to g
0
Vf (p
0
),
resuIting in a new point p
1
. For i > 0, caIcuIate the directions
u
i1
g
i 1
y
i
u
i
...(12.32)
where y
i
is chosen to make u
1
i
Au
i 1
and the successive gradients perpendicuIar, i.e.,
y
i

g g
g g
i
1
i
i
1
i
+ + 1 1
with g
k
Vf ,
pk
Ior aII k > 0 ...(12.33)
Next, caIcuIate p
i2
p
i1
i
i1
u
i1
where i
i1
is chosen so as to minimize f(P
i 2
)
3
.
It can be shown that the u`s thus constructed are aII mutuaIIy conjugate (e.g., see (Stoer & BuIirsch,
1980)). The process described above is known as the FIetcher-Reeves method, but there are many
variants, which work more or Iess the same (Hestenes & StieIeI, 1952; PoIak, 1971; PoweII, 1977).
AIthough onIy n iterations are needed Ior a quadratic system with n degrees oI Ireedom, due to the
Iact that we are not minimizing quadratic systems, as weII as a resuIt oI round-oII errors, the n directions
have to be IoIIowed severaI times (see Fig. 12.6). PoweII introduced some improvements to correct Ior
behaviour in non-quadratic systems. The resuIting cost is O(n) which is signiIicantIy better than the
Iinear convergence 4 oI steepest descent.
15 IIZZY LCIC AND NLIRAL NLTWRKS
Some improvements on back-propagation have been presented based on an independent adaptive
arning rate parameter Ior each weight.
Van den Boomgaard and SmeuIders (Boomgaard & SmeuIders, 1989) show that Ior a Ieed-Iorward
network without hidden units an incrementaI procedure to Iind the optimaI weight matrix W needs an
adjustment oI the weights with
Av(t 1) y(t 1) |d(t 1) v(t) (t 1)| (t 1) ...(12.34)
in which y is not a constant but an variabIe (N
i
1) (N
i
1) matrix which depends on the input vector.
By using a priori knowIedge about the input signaI, the storage requirements Ior can be reduced.
SiIva and AImeida (SiIva & AImeida, 1990) aIso show the advantages oI an independent step size
Ior each weight in the network. In their aIgorithm the Iearning rate is adapted aIter every Iearning
pattern:
y
fk
(t 1)
u t
E t
v
E t
v
d t
E t
v
E t
v
fk
fk fk
fk
fk fk
C
C
( )
( ) ( )
( )
( ) ( )
iI and have the same signs
iI and have the opposite signs
o +
o
o
o
o +
o
o
o
R
S
|
|
T
|
|
1
1
...(12.35)
Gradient
u
t l +
u
t
A very slow approximation
Fig. 12.6 Slow decrease with conjugate gradient in non-quadratic systems. [The hills on the left are very steep, resulting
in a large search vector K
E
. When the quadratic portion is entered the new search direction is constructed from
the previous direction and the gradient, resulting in a spiraling minimization. This problem can be overcome by
detecting such spiraling minimizations and restarting the algorithm with K
0
= VB ].
BACK-IRIACATIN 151
where u and d are positive constants with vaIues sIightIy above and beIow unity, respectiveIy. The idea
is to decrease the Iearning rate in case oI osciIIations.
12.S HOW GOOD ARE MULTI-LAYER FEED-FORWARD NETWORKS?
From the exampIe shown in Fig. 12.3 is cIear that the approximation oI the network is not perIect. The
resuIting approximation error is inIIuenced by:
1. The Iearning aIgorithm and number oI iterations. This determines how good the error on the
training set is minimized.
2. The number oI Iearning sampIes. This determines how good the training sampIes represent the
actuaI Iunction.
3. The number oI hidden units. This determines the expressive power` oI the network. For
smooth` Iunctions onIy a Iew number oI hidden units are needed, Ior wiIdIy IIuctuating
Iunctions more hidden units wiII be needed.
In the previous sections we discussed the Iearning ruIes such as back-propagation and the other
gradient based Iearning aIgorithms, and the probIem oI Iinding the minimum error. In this section we
particuIarIy address the eIIect oI the number oI Iearning sampIes and the eIIect oI the number oI hidden
units.
We Iirst have to deIine an adequate error measure. AII neuraI network training aIgorithms try to
minimize the error oI the set oI Iearning sampIes which are avaiIabIe Ior training the network. The
average error per Iearning sampIe is deIined as the Iearning error rate error rate:
E
Iearning

1
P
Iearning
E
p
p
P

1
Iearning
...(12.36)
in which E
p
is the diIIerence between the desired output vaIue and the actuaI network output Ior the
Iearning sampIes:
E
p

1
2
( ) d y
o
p
o
p
N
o
0 1

...(12.37)
This is the error, which is measurabIe during the training process.
It is obvious that the actuaI error oI the network wiII diIIer Irom the error at the Iocations oI the
training sampIes. The diIIerence between the desired output vaIue and the actuaI network output shouId
be integrated over the entire input domain to give a more reaIistic error measure. This integraI can be
estimated iI we have a Iarge set oI sampIes.
We now deIine the test error rate as the average error oI the test set:
E
test

1
P
test
E
p
p
P

1
test
...(12.38)
In the IoIIowing subsections we wiII see how these error measures depend on Iearning set size and
number oI hidden units.
152 IIZZY LCIC AND NLIRAL NLTWRKS
12.S.1 The Effect Of the Number of LearnIng SampIes
A simpIe probIem is used as exampIe: a Iunction y f(x) has to be approximated with a Ieed-Iorward
neuraI network. A neuraI network is created with an input, 5 hidden units with sigmoid activation
Iunction and a Iinear output unit. Suppose we have onIy a smaII number oI Iearning sampIes (e.g., 4) and
the networks is trained with these sampIes. Training is stopped when the error does not decrease
anymore.
The originaI (desired) Iunction is shown in Fig. 4.7A as a dashed Iine. The Iearning sampIes and the
approximation oI the network are shown in the same Iigure. We see that in this case E
Iearning
is smaII (the
network output goes perIectIy through the Iearning sampIes) but E
test
is Iarge: the test error oI the
network is Iarge. The approximation obtained Irom 20 Iearning sampIes is shown in Fig. 12.7B. The
E
Iearning
is Iarger than in the case oI 5 Iearning sampIes, but the E
test
is smaIIer.
Fig. 12.7 Effect of the learning set size on the generalization. The dashed line gives the desired function, the learning
samples are depicted as circles and the approximation by the network is shown by the drawn line. 5 hidden
units are used. a) 4 learning samples. b) 20 learning samples.
This experiment was carried out with other Iearning set sizes, where Ior each Iearning set size the
experiment was repeated 10 times. The average Iearning and test error rates as a Iunction oI the Iearning
set size are given in Fig. 12.8. Note that the Iearning error increases with an increasing Iearning set size,
and the test error decreases with increasing Iearning set size.
A Iow Iearning error on the (smaII) Iearning set is no guarantee Ior a good network perIormance!
With increasing number oI Iearning sampIes the two error rates converge to the same vaIue. This vaIue
depends on the representationaI power oI the network: given the optimaI weights, how good is the
approximation. This error depends on the number oI hidden units and the activation Iunction. II the
Iearning error rate does not converge to the test error rate the Iearning procedure has not Iound a gIobaI
minimum.
0
0.2
0.4
0.6
0.8
1
0 0.5 1
A
X
0
0.2
0.4
0.6
0.8
1
0 0.5 1
B
X
y y
BACK-IRIACATIN 153
12.S.2 The Effect of the Number of HIdden UnIts
The same Iunction as in the previous subsection is used, but now the number oI hidden units is varied.
The originaI (desired) Iunction, Iearning sampIes and network approximation is shown in Fig. 4.9A Ior
5 hidden units and in Fig. 4.9B Ior 20 hidden units. The eIIect visibIe in Fig. 4.9B is caIIed over training.
The network Iits exactIy with the Iearning sampIes, but because oI the Iarge number oI hidden units the
Iunction which is actuaIIy represented by the network is Iar more wiId than the originaI one. ParticuIarIy
in case oI Iearning sampIes which contain a certain amount oI noise (which aII reaI-worId data have), the
network wiII Iit the noise` oI the Iearning sampIes instead oI making a smooth approximation.
This exampIe shows that a Iarge number oI hidden units Ieads to a smaII error on the training set but
not necessariIy Ieads to a smaII error on the test set. Adding hidden units wiII aIways Iead to a reduction
oI the E
Iearning
. However, adding hidden units wiII Iirst Iead to a reduction oI the E
test
, but then Iead to an
increase oI E
test
. This eIIect is caIIed the peaking eIIect. The average Iearning and test error rates as a
Iunction oI the Iearning set size are given in Fig. 12.10.
12.9 APPLICATIONS
Back-propagation has been appIied to a wide variety oI research appIications.
Sejnowski and Rosenberg (1986) produced a spectacuIar success with NETtaIk, a system that
converts printed EngIish text into highIy inteIIigibIe speech.
A Ieed-Iorward network with one Iayer oI hidden units has been described by Gorman and
Sejnowski (1988) as a cIassiIication machine Ior sonar signaIs.
Test set
Learning set
Number of learning samples
Error
rate
Fig. 12.8 Effect of the learning set size on the error rate. The average error rate and the average test error rate are as a
function of the number of learning samples.
154 IIZZY LCIC AND NLIRAL NLTWRKS
0
0.2
0.4
0.6
0.8
1
0 0.5 1
A
X
0
0.2
0.4
0.6
0.8
1
0 0.5
1
B
X
y y
Fig. 12.9 Effect of the number of hidden units on the network performance. The dashed line gives the desired function,
the circles denote the learning samples and the drawn line gives the approximation by the network. 12 learning
samples are used. a) 5 hidden units. b) 20 hidden units.
Test set
Learning set
Number of hidden units
Error
rate
Fig. 12.10 The average learning error rate and the average test error rate as a function of the number of hidden units.
A muIti-Iayer Ieed-Iorward network with a back-propagation training aIgorithm is used to Iearn
an unknown Iunction between input and output signaIs Irom the presentation oI exampIes. It is
hoped that the network is abIe to generaIize correctIy, so that input vaIues which are not presented
as Iearning patterns wiII resuIt in correct output vaIues. An exampIe is the work oI Josin (1988),
who used a two-Iayer Ieed-Iorward network with back-propagation Iearning to perIorm the
inverse kinematic transIorm which is needed by a robot arm controIIer.
BACK-IRIACATIN 155
QUESTION BANK
1. ExpIain the muIti-Iayer Ieed Iorward networks.
2. Describe the generaIized deIta ruIe.
3. What is back-propagation aIgorithm? ExpIain.
4. How the weights are adjusted with sigmoid activation Iunction? ExpIain with an exampIe.
5. ExpIain Iearning rate and momentum with back-propagation with an exampIe.
6. ExpIain the sine activation Iunction with an exampIe.
7. What are the deIiciencies oI back-propagation aIgorithm?
8. ExpIain various methods empIoyed to overcome the deIiciencies oI back-propagation aIgorithm.
9. How good are muIti-Iayer Ieed Iorward networks? ExpIain.
10. ExpIain the eIIect oI the number oI Iearning sampIes in muIti-Iayer Ieed Iorward networks.
11. ExpIain the eIIect oI the number oI hidden Iinks in muIti-Iayer Ieed Iorward networks.
12. What are the appIications oI back-propagation aIgorithm?
REFERENCES.
1. M. Minsky, and S. Papert, Perceptrons: An Introduction to Computational Geometry, The MIT
Press, 1969.
2. D.E. RumeIhart, G.E. Hinton, and R.J. WiIIiams, Learning representations by back-propagating
errors, Nature, VoI. 323, pp. 533-536, 1986.
3. D.B. Parker, Learning-Logic (Tech. Rep. Nos. TR (47), Cambridge, MA: Massachusetts Institute
oI TechnoIogy, Center Ior ComputationaI Research in Economics and Management Science,
1985.
4. Y.L. Cun, Y. L, Une procedure d`apprentissage pour reseau a seuiI assymetrique. Proceedings of
Cognitiva, VoI. 85, pp. 599-604, 1985.
5. K. Hornik, M. Stinchcombe, and H. White, MuItiIayer Ieed Iorward networks are universaI
approximates, Neural Netvorks, VoI. 2, No. 5, pp. 359-366, 1989.
6. K.I. Funahashi, On the approximate reaIization oI continuous mappings by neuraI networks,
Neural Netvorks, VoI. 2, No. 3, VoI. 193-192, 1989.
7. G. Cybenko, Approximation by superpositions oI a sigmoidaI Iunction. Mathematics of Control,
Signals, and Systems, VoI. 2, No. 4, pp. 303-314, 1989.
8. E.J. Hartman, J.D. KeeIer, and J.M. KowaIski, Layered neuraI networks with Gaussian hidden
units as universaI approximations, Neural Computation, VoI. 2, No. 2, pp. 210-215, 1990.
9. W.H. Press, B.P. FIannery, S.A. TeukoIsky, and W.T. VetterIing, Numerical Recipes: 1he Art of
Scientific Computing, Cambridge: Cambridge University Press, 1986.
10. J. Stoer, and R. BuIirsch, Introduction to Numerical Analysis, New York-HeideIberg- BerIin:
Springer-VerIag, 1980.
156 IIZZY LCIC AND NLIRAL NLTWRKS
11. M.R. Hestenes, and E. StieIeI, Methods oI conjugate gradients Ior soIving Iinear systems,
Journal of National Bureau of Standards, VoI. 49, pp. 409-436, 1952.
12. E. PoIak, Computational Methods in Optimization, New York: Academic Press, 1971.
13. M.J.D. PoweII, Restart procedures Ior the conjugate gradient method, Mathematical
Programming, VoI. 12, pp. 241-254, 1977.
14. T.J. Sejnowski, and C.R. Rosenberg, NE1talk: A Parallel Netvork that Learns to Read Aloud
(Tech. Rep. Nos. JHU/EECS-86/01), The John Hopkins University EIectricaI Engineering and
Computer Science Department, 1986.
15. R.P. Gorman, and T.J. Sejnowski, AnaIysis oI hidden units in a Iayered network trained to cIassiIy
sonar targets, Neural Netvorks, VoI. 1, No. 1, pp. 75-89, 1988.
16. G. Josin, NeuraI-space generaIization oI a topoIogicaI transIormation, Biological Cybernetics,
VoI. 59, pp. 283-290, 1988.
13
Recurrent Networks
+ 0 ) 2 6 - 4
13. 1 INTRODUCTION
The Iearning aIgorithms discussed in the previous chapter were appIied to Ieed-Iorward networks: aII
data IIows in a network in which no cycIes are present.
But what happens when we introduce a cycIe? For instance, we can connect a hidden unit with itseII
over a weighted connection, connect hidden units to input units, or even connect aII units with each
other. AIthough, as we know Irom the previous chapter, the approximation capabiIities oI such networks
do not increase, we may obtain decreased compIexity, network size, etc. to soIve the same probIem.
An important question we have to consider is the IoIIowing: what do we want to Iearn in a recurrent
network? AIter aII, when one is considering a recurrent network, it is possibIe to continue propagating
activation vaIues untiI a stabIe point (attractor) is reached. As we wiII see in the sequeI, there exist
recurrent network, which are attractor based, i.e., the activation vaIues in the network are repeatedIy
updated untiI a stabIe point is reached aIter which the weights are adapted, but there are aIso recurrent
networks where the Iearning ruIe is used aIter each propagation (where an activation vaIue is
transversed over each weight onIy once), whiIe externaI inputs are incIuded in each propagation. In such
networks, the recurrent connections can be regarded as extra inputs to the network (the vaIues oI which
are computed by the network itseII).
In this chapter, recurrent extensions to the Ieed-Iorward network wiII be discussed. The theory oI
the dynamics oI recurrent networks extends beyond the scope oI a one-semester course on neuraI
networks. Yet the basics oI these networks wiII be discussed.
AIso some speciaI recurrent networks wiII be discussed: the HopIieId network, which can be used
Ior the representation oI binary patterns; subsequentIy we touch upon BoItzmann machines, therewith
introducing stochasticity in neuraI computation.
13.2 THE GENERALISED DELTA - RULE IN RECURRENT NETWORKS
The back-propagation Iearning ruIe, introduced in chapter 12, can be easiIy used Ior training patterns in
recurrent networks. BeIore we wiII consider this generaI case, however, we wiII Iirst describe networks
15S IIZZY LCIC AND NLIRAL NLTWRKS
where some oI the hidden unit activation vaIues are Ied back to an extra set oI input units (the EIman
network), or where output vaIues are Ied back into hidden units (the Jordan network).
A typicaI appIication oI such a network is the IoIIowing. Suppose we have to construct a network
that must generate a controI command depending on an externaI input, which is a time series x(t),
x(t 1), x(t 2), .. With a Ieed-Iorward network there are two possibIe approaches:
1. Create inputs x
1
, x
2
, ., x
n
which constitute the Iast n vaIues oI the input vector. Thus a time
window` oI the input vector is input to the network.
2. Create inputs x, x, x, .. Besides onIy inputting x(t), we aIso input its Iirst, second, etc.
derivatives. NaturaIIy, computation oI these derivatives is not a triviaI task Ior higher-order
derivatives.
The disadvantage is, oI course, that the input dimensionaIity oI the Ieed-Iorward network is
muItipIied with n, Ieading to a very Iarge network, which is sIow and diIIicuIt to train. The Jordan and
EIman networks provide a soIution to this probIem. Due to the recurrent connections, a window oI
inputs need not be input anymore; instead, the network is supposed to Iearn the inIIuence oI the previous
time steps itseII.
13.2.1 The Jordan Network
One oI the earIiest recurrent neuraI networks was the Jordan network. An exampIe oI this network is
shown in Fig. 13.1. In the Jordan network, the activation vaIues oI the output units are Ied back into the
Fig. 13.1 The Jordan network. Output activation values are fed back to the input layer, to a set of extra neurons called
the state units.
Input units
h
o
State units
RLCIRRLNT NLTWRKS 159
input Iayer through a set oI extra input units caIIed the state units. There are as many state units as there
are output units in the network. The connections between the output and state units have a Iixed weight
oI 1; Iearning takes pIace onIy in the connections between input and hidden units as weII as hidden and
output units. Thus aII the Iearning ruIes derived Ior the muIti-Iayer perceptron can be used to train this
network.
13.2.2 The EIman Network
In the EIman network a set oI context units are introduced, which are extra input units whose activation
vaIues are Ied back Irom the hidden units. Thus the network is very simiIar to the Jordan network, except
that (1) the hidden units instead oI the output units are Ied back; and (2) the extra input units have no
seII-connections.
The schematic structure oI this network is shown in Fig. 13.2.
Context layer Input layer
Hidden layer
Output layer
Fig. 13.2 The Elman network. With this network, the hidden unit activation values are fed back to the input layer, to a set
of extra neurons called the context units.
Again the hidden units are connected to the context units with a Iixed weight oI vaIue 1. Learning
is done as IoIIows:
1. The context units are set to 0; t 1
2. Pattern x
t
is cIamped, the Iorward caIcuIations are perIormed once;
3. The back-propagation Iearning ruIe is appIied;
4. t t 1; go to 2.
The context units at step t thus aIways have the activation vaIue oI the hidden units at step t 1.
Example 13.1: As we mentioned above, the Jordan and EIman networks can be used to train a
network on reproducing time sequences. The idea oI the recurrent connections is that the network is abIe
to remember` the previous states oI the input vaIues. As an exampIe, we trained an EIman network on
controIIing an object moving in 1 D. This object has to IoIIow a pre-speciIied trajectory x
d
. To controI
the object, Iorces F must be appIied, since the object suIIers Irom Iriction and perhaps other externaI
Iorces.
16 IIZZY LCIC AND NLIRAL NLTWRKS
To tackIe this probIem, we use an EIman net with inputs x and x
d
, one output F, and three hidden
units. The hidden units are connected to three context units. In totaI, Iive units Ieed into the hidden Iayer.
The resuIts oI training are shown in Fig. 13.3. The same test can be done with an ordinary Ieed-
Iorward network with sIiding window input. We tested this with a network with Iive inputs, Iour oI
100
200 300 400 500
4
2
2
4
0
Fig. 13.3 Training an Elman network to control an object. The solid line depicts the desired trajectory N
@
; the dashed line
the realized trajectory. The third line is the error.
100
200 300 400 500
4
2
2
4
0
Fig. 13.4 Training a feed-forward network to control an object. The solid line depicts the desired trajectory N
@
; the
dashed line the realized trajectory. The third line is the error.
RLCIRRLNT NLTWRKS 161
which constituted the sIiding window x
3
, x
2
, x
1
and x
0
, and one the desired next position oI the object.
ResuIts are shown in Fig. 13.4. The disappointing observation is that the resuIts are actuaIIy better with
the ordinary Ieed-Iorward network, which has the same compIexity as the EIman network.
13.2.3 Back-PropagatIon In FuIIy Recurrent Networks
More compIex schemes than the above are possibIe. For instance, independentIy oI each other Pineda
(1987) and AImeida (1987) discovered that error back-propagation is in Iact a speciaI case oI a more
generaI gradient Iearning method, which can be used Ior training attractor networks. However, aIso
when a network does not reach a Iixed point, a Iearning method can be used: back-propagation through
time (PearImutter, 1989, 1990). This Iearning method, the discussion oI which extents beyond the scope
oI our course, can be used to train a muIti-Iayer perceptron to IoIIow trajectories in its activation vaIues.
13.3 THE HOPFIELD NETWORK
One oI the earIiest recurrent neuraI networks reported in Iiterature was the auto-associator
independentIy described by Anderson (1977) and Kohonen (1977). It consists oI a pooI oI neurons with
connections between each unit i and f, i = f (see Fig. 15.5). AII connections are weighted.
HopIiIed (1982) brings together severaI earIier ideas concerning these networks and presents a
compIete mathematicaI anaIysis.
Fig. 13.5 The auto-associator network. All neurons are both input and output neurons, i.e., a pattern is clamped, the
network iterates to a stable state, and the output of the network consists of the new activation values of the
neurons.
162 IIZZY LCIC AND NLIRAL NLTWRKS
13.3.1 DescrIptIon
The HopIieId network consists oI a set oI N interconnected neurons (Fig. 13.5), which update their
activation vaIues asynchronousIy and independentIy oI other neurons. AII neurons are both input and
output neurons. The activation vaIues are binary. OriginaIIy, HopIieId chose activation vaIues oI 1 and 0,
but using vaIues 1 and 1 presents some advantages discussed beIow. We wiII thereIore adhere to the
Iatter convention.
The state oI the system is given by the activation vaIues Y y(k). The net input S
k
(t 1) oI a neuron
k at cycIe t 1 is a weighted sum
S
k
(t 1)
f k =
_
y
f
(t)v
fk
u
k
...(13.1)
A simpIe threshoId Iunction (Fig. 10.2) is appIied to the net input to obtain the new activation vaIue
y
i
(t 1) at time t 1:
y
k
(t 1)
+ + >
+ <
R
S
|
T
|
1 1
1 1
iI
iI
otherwise
S t U
S t U
y t
k k
k k
k
( )
( )
( )
...(13.2)
i.e., yk(t 1) sgn (S
k
(t 1)) For simpIicity we henceIorth choose U
k
0, but this is oI course not
essentiaI.
A neuron k in the HopIieId network is caIIed stabIe at time t iI, in accordance with equations (13.1)
and (13.2),
y
k
(t) sgn (S
k
(t 1)) ...(13.3)
A state o is caIIed stabIe iI, when the network is in state o, aII neurons are stabIe. A pattern x
p
is
caIIed stabIe iI, when x
p
is cIamped, aII neurons are stabIe.
When the extra restriction v
fk
v
kf
is made, the behavior oI the system can be described with an
energy Iunction
r
1
2
y y v y
f k fk k k
k f k

_ _ _
=
u ...(13.4)
Theorem 13.1: A recurrent network with connections v
fk
v
kf
in which the neurons are updated
using ruIe (13.2) has stabIe Iimit points.
Proof: First, note that the energy expressed in eq. (13.4) is bounded Irom beIow, since the y
k
are
bounded Irom beIow and the v
fk
and u
k
are constant. SecondIy, r is monotonicaIIy decreasing when state
changes occur, because
Ar Ay
k
y v
f fk k
f k
+
F
H
G
I
K
J
=
_
u ...(13.5)
is aIways negative when y
k
changes according to eqs. (13.1) and (13.2).
RLCIRRLNT NLTWRKS 163
The advantage oI a 1/1 modeI over a 1/0 modeI then is symmetry oI the states oI the network.
For, when some pattern x is stabIe, its inverse is stabIe, too, whereas in the 1/0 modeI this is not aIways
true (as an exampIe, the pattern 00 . 00 is aIways stabIe, but 11 . 11 need not be). SimiIarIy, both a
pattern and its inverse have the same energy in the 1/1 modeI.
Removing the restriction oI bidirectionaI connections (i.e., v
fk
v
kf
) resuIts in a system that is not
guaranteed to settIe to a stabIe state.
13.3.2 HopfIeId Network as AssocIatIve Memory
A primary appIication oI the HopIieId network is an associative memory. In this case, the weights oI the
connections between the neurons have to be thus set that the states oI the system corresponding with the
patterns which are to be stored in the network are stabIe. These states can be seen as dips` in energy
space. When the network is cued with a noisy or incompIete test pattern, it wiII render the incorrect or
missing data by iterating to a stabIe state, which is in some sense near` to the cued pattern.
The Hebb ruIe can be used to store P patterns:
v
fk

x x f k
f
p
k
p
p
p

_
=
R
S
|
T
|
1
0
iI
otherwise
...(13.6)
i.e., iI x
f
p
and x
k
p
are equaI, v
fk
is increased, otherwise decreased by one (note that, in the originaI Hebb
ruIe, weights onIy increase). It appears, however, that the network gets saturated very quickIy, and that
about 0:15N memories can be stored beIore recaII errors become severe.
There are two probIems associated with storing too many patterns:
1. The stored patterns become unstabIe;
2. Spurious stabIe states appear (i.e., stabIe states which do not correspond with stored patterns).
The Iirst oI these two probIems can be soIved by an aIgorithm proposed by Bruce et aI. (Bruce,
Canning, Forrest, Gardner, & WaIIace, 1986).
Algorithm 13.1: Given a starting weight matrix W |v
fk
|, Ior each pattern x
p
to be stored and each
eIement x
k
p
in x
p
deIine a correction r
k
such that
A
k

0
1
iI is stabIe and is cIamped
otherwise
y x
k
p
R
S
T
...(13.7)
Now modiIy v
fk
by Av
fk
y
f
y
k
(r
f
r
k
) iI f = k. Repeat this procedure untiI aII patterns are stabIe.
It appears that, in practice, this aIgorithm usuaIIy converges. There exist cases, however, where the
aIgorithm remains osciIIatory (try to Iind one)!
The second probIem stated above can be aIIeviated by appIying the Hebb ruIe in reverse to the
spurious stabIe state, but with a Iow Iearning Iactor (HopIieId, Feinstein, & PaImer, 1983). Thus these
patterns are weakIy unstored and wiII become unstabIe again.
164 IIZZY LCIC AND NLIRAL NLTWRKS
13.3.3 Neurons wIth Graded Response
The network described in section 13.3.1 can be generaIized by aIIowing continuous activation vaIues.
Here, the threshoId activation Iunction is repIaced by a sigmoid. As beIore, this system can be proved to
be stabIe when a symmetric weight matrix is used (HopIieId, 1984).
13.3.4 HopfIeId Networks for OptImIzatIon ProbIems
An interesting appIication oI the HopIieId network with graded response arises in a heuristic soIution to
the NP-compIete traveIing saIesman probIem (Garey & Johnson, 1979). In this probIem, a path oI
minimaI distance must be Iound between n cities, such that the begin- and end-points are the same.
HopIieId and Tank (1985) use a network with n n neurons. Each row in the matrix represents a
city, whereas each coIumn represents the position in the tour. When the network is settIed, each row and
each coIumn shouId have one and onIy one active neuron, indicating a speciIic city occupying a speciIic
position in the tour. The neurons are updated using ruIe (13.2) with a sigmoid activation Iunction
between 0 and 1. The activation vaIue y
xf
1 indicates that city X occupies the f
th
pIace in the tour.
An energy Iunction describing this probIem can be set up as IoIIows. To ensure a correct soIution,
the IoIIowing energy must be minimized:
r
A
y y
B
y y
C
y n
Xf Xk Xf Yf Xf
f X X Y X f k f f X
2 2 2
2
+ +
F
H
G
I
K
J _ _ _ _ _ _ _ _
= =
...(13.8)
where A, B, and C are constants. The Iirst and second terms in equation (13.8) are zero iI and onIy iI
there is a maximum oI one active neuron in each row and coIumn, respectiveIy. The Iast term is zero iI
and onIy iI there are exactIy n active neurons.
To minimise the distance oI the tour, an extra term
r
D
d y y y
XY Xf Y f Y f
f Y X X
2
1 1
( )
, , +
=
+
_ _ _
...(13.9)
is added to the energy, where d
XY
is the distance between cities X and Y and D is a constant. For
convenience, the subscripts are deIined moduIo n.
The weights are set as IoIIows:
v
XJ
,
Yk
Ao
XY
(1 o
fk
) inhibitory connections within each row
Bo
fk
(1 o
XY
) inhibitory connections within each coIumn ...(13.10)
C gIobaI inhibition
Dd
XY
(o
k, f1
o
k, f1
) data term
where o
fk
1iI f k and 0 otherwise. FinaIIy, each neuron has an externaI bias input C
n
.
AIthough this appIication is interesting Irom a theoreticaI point oI view, the appIicabiIity is Iimited.
Whereas HopIieId and Tank state that the network converges to a vaIid soIution in 16 out oI 20 triaIs
whiIe 50 oI the soIutions are optimaI, other reports show Iess encouraging resuIts. For exampIe,
(WiIson and PawIey, 1988) Iind that in onIy 15 oI the runs a vaIid resuIt is obtained, Iew oI which Iead
RLCIRRLNT NLTWRKS 165
to an optimaI or near-optimaI soIution. The main probIem is the Iack oI gIobaI inIormation. Since, Ior an
N-city probIem, there are N! possibIe tours, each oI which may be traversed in two directions as weII as
started in N points, the number oI diIIerent tours is N!/2N. DiIIerentIy put, the N-dimensionaI hypercube
in which the soIutions are situated is 2N degenerate. The degenerate soIutions occur evenIy within the
hypercube, such that aII but one oI the IinaI 2N conIigurations are redundant. The competition between
the degenerate tours oIten Ieads to soIutions which are piecewise optimaI but gIobaIIy ineIIicient.
13.4 BOLTZMANN MACHINES
The BoItzmann machine, as Iirst described by AckIey, Hinton, and Sejnowski in 1985 is a neuraI
network that can be seen as an extension to HopIieId networks to incIude hidden units, and with a
stochastic instead oI deterministic update ruIe. The weights are stiII symmetric. The operation oI the
network is based on the physics principIe oI anneaIing. This is a process whereby a materiaI is heated
and then cooIed very, very sIowIy to a Ireezing point. As a resuIt, the crystaI Iattice wiII be highIy
ordered, without any impurities, such that the system is in a state oI very Iow energy. In the BoItzmann
machine this system is mimicked by changing the deterministic update oI equation (13.2) in a stochastic
update, in which a neuron becomes active with a probabiIity p,
p(y
k
1)
1
1+

e
k
1 Ar /
...(13.11)
where 1 is a parameter comparabIe with the (synthetic) temperature oI the system. This stochastic
activation Iunction is not to be conIused with neurons having a sigmoid deterministic activation
Iunction.
In accordance with a physicaI system obeying a BoItzmann distribution, the network wiII eventuaIIy
reach thermaI equiIibrium` and the reIative probabiIity oI two gIobaI states o and wiII IoIIow the
BoItzmann distribution
P
P
o

e
1 ( )/ r r
o
...(13.12)
where P
o
is the probabiIity oI being in the o
th
gIobaI state, and r
o
is the energy oI that state. Note that at
thermaI equiIibrium the units stiII change state, but the probabiIity oI Iinding the network in any gIobaI
state remains constant.
At Iow temperatures there is a strong bias in Iavor oI states with Iow energy, but the time required to
reach equiIibrium may be Iong. At higher temperatures the bias is not so IavorabIe but equiIibrium is
reached Iaster. A good way to beat this trade-oII is to start at a high temperature and graduaIIy reduce it.
At high temperatures, the network wiII ignore smaII energy diIIerences and wiII rapidIy approach
equiIibrium. In doing so, it wiII perIorm a search oI the coarse overaII structure oI the space oI gIobaI
states, and wiII Iind a good minimum at that coarse IeveI. As the temperature is Iowered, it wiII begin to
respond to smaIIer energy diIIerences and wiII Iind one oI the better minima within the coarse-scaIe
minimum it discovered at high temperature.
166 IIZZY LCIC AND NLIRAL NLTWRKS
As muIti-Iayer perceptions, the BoItzmann machine consists oI a non-empty set oI visibIe and a
possibIy empty set oI hidden units. Here, however, the units are binary-vaIued and are updated
stochasticaIIy and asynchronousIy. The simpIicity oI the BoItzmann distribution Ieads to a simpIe
Iearning procedure, which adjusts the weights so as to use the hidden units in an optimaI way (AckIey et
aI., 1985). This aIgorithm works as IoIIows:
First, the input and output vectors are cIamped.
The network is then anneaIed untiI it approaches thermaI equiIibrium at a temperature oI 0. It then
runs Ior a Iixed time at equiIibrium and each connection measures the Iraction oI the time during
which both the units it connects are active. This is repeated Ior aII input-output pairs so that each
connection can measure (y
f
y
k
)
cIamped
, the expected probabiIity, averaged over aII cases, that units
f and k are simuItaneousIy active at thermaI equiIibrium when the input and output vectors are
cIamped.
SimiIarIy, (y
f
y
k
)
Iree
is measured when the output units are not cIamped but determined by the
network.
In order to determine optimaI weights in the network, an error Iunction must be determined. Now,
the probabiIity P
Iree
(Y
p
) that the visibIe units are in state Y
p
when the system is running IreeIy can
be measured. AIso, the desired probabiIity P
cIamped
(Y
p
)that the visibIe units are in state (Y
p
) is
determined by cIamping the visibIe units and Ietting the network run.
Now, iI the weights in the network are correctIy set, both probabiIities are equaI to each other, and
the error E in the network must be 0. Otherwise, the error must have a positive vaIue measuring
the discrepancy between the network`s internaI mode and the environment. For this eIIect, the
asymmetric divergence` or KuIIback inIormation` is used:
E P Y
P Y
P Y
p
p
P
p
cIamped
cIamped
Iree
( ) Iog
( )
( )
_
...(13.13)
Now, in order to minimize E using gradient descent, we must change the weights according to
Av
fk
y
o
o
E
v
fk
...(13.14)
It is not diIIicuIt to show that
o
o
E
v
fk

1
1
y y y y
f k f k
( ) ( )
cIamped Iree
...(13.15)
ThereIore, each weight is updated by
Av
fk
y ( ) ( ) y y y y
f k f k
cIamped Iree
...(13.16)
RLCIRRLNT NLTWRKS 167
QUESTION BANK.
1. What happens when a cycIic data is introduced to Ieed Iorward networks?
2. ExpIain the generaIized deIta-ruIe in recurrent networks.
3. Describe the Jordan network with an exampIe.
4. Describe EIman network with an exampIe.
5. Describe the HopIieId network.
6. Describe the HopIieId network as associative memory.
7. Describe HopIieId network Ior optimization probIems.
8. Describe the BoItzman machine.
9. What are the probIems resuIted whiIe storing too many patterns using associative memory? How
these probIems can be soIved?
REFERENCES.
1. M.I. Jordan, Attractor dynamics and paraIIeIism in a connectionist sequentiaI machine, In
Proceedings oI the Eighth AnnuaI ConIerence oI the Cognitive Science Society, HiIIsdaIe, NJ:
ErIbaum, pp. 531-546, 1986.
2. M.I. Jordan, SeriaI Order: A ParaIIeI Distributed Processing Approach (Tech. Rep. No. 8604).
San Diego, La JoIIa, CA: Institute Ior Cognitive Science, University oI CaIiIornia, 1986.
3. J.L. EIman, Finding structure in time. Cognitive Science, VoI. 14, pp. 179-211, 1990.
4. F. Pineda, GeneraIization oI back-propagation to recurrent neuraI networks, Physical Reviev
Letters, VoI. 19, and pp. 2229-2232, 1987.
5. L.B. AImeida, A Iearning ruIe Ior asynchronous perceptrons with Ieedback in a combinatoriaI
environment, In Proceedings of the First International Conference on Neural Netvorks, VoI. 2,
pp. 609-618,1987.
6. B.A. PearImutter, Learning state space trajectories in recurrent neuraI networks, Neural
Computation, VoI. 1, No. 2, pp. 263-269, 1989.
7. B.A. PearImutter, Dynamic Recurrent Neural Netvorks (Tech. Rep. Nos. CMU-CS-90-196),
Pittsburgh, PA 15213: SchooI oI Computer Science, Carnegie MeIIon University, 1990.
8. J.A. Anderson, Neural Models vith Cognitive Implications. In D. LaBerge and S.J. SamueIs
(Eds.), Basic Processes in Reading Perception and Comprehension ModeIs, HiIIsdaIe, NJ:
ErIbaum, pp. 27-90, 1977.
9. T. Kohonen, Associative Memory: A System-1heoretical Approach, Springer-VerIag, 1977.
10. J.J. HopIieId, NeuraI networks and physicaI systems with emergent coIIective computationaI
abiIities, Proceedings of the National Academy of Sciences, VoI. 79, pp. 2554-2558, 1982.
11. A.D. Bruce, A. Canning, B. Forrest, E. Gardner, and D.J. WaIIace, Learning and memory
properties in IuIIy connected networks, In J.S. Denker (Ed.), AIP Conference Proceedings 151,
Neural Netvorks for Computing, pp. 65-70, DUNNO, 1986.
16S IIZZY LCIC AND NLIRAL NLTWRKS
12. J.J. HopIieId, D.I. Feinstein, and R.G. PaImer, unIearning has a stabiIizing eIIect in coIIective
memories, Nature, VoI. 304, pp. 159-159, 1983.
13. J.J. HopIieId, Neurons with graded response have coIIective computationaI properties Iike those
oI two-state neurons, Proceedings of the National Academy of Sciences, VoI. 81, pp. 3088-3092,
1984.
14. M.R. Garey, and D.S. Johnson, Computers and Intractability. New York: W.H. Freeman, 1979.
15. J.J. HopIieId, and D.W. Tank, neuraI computation oI decisions in optimization probIems,
Biological Cybernetics, VoI. 52, pp. 141-152, 1985.
16. G.V. WiIson, and G.S. PawIey, On the stabiIity oI the traveIing saIesman probIem aIgorithm oI
HopIieId and tank, Biological Cybernetics, VoI. 58, pp. 63-70, 1988.
17. D.H. AckIey, G.E. Hinton, and T.J. Sejnowski, (1985). A Iearning aIgorithm Ior BoItzmann
machines, Cognitive Science, VoI. 9, No. 1, pp. 147-169, 1985.
14
SeIf-Organizing Networks
+ 0 ) 2 6 - 4
14. 1 INTRODUCTION
In the previous chapters we discussed a number oI networks, which were trained to perIorm a mapping
F: J J
m
by presenting the network exampIes` (x
p
, d
p
) with d
p
F(x
p
) oI this mapping. However,
probIems exist where such training data, consisting oI input and desired output pairs are not avaiIabIe,
but where the onIy inIormation is provided by a set oI input patterns x
p
. In these cases the reIevant
inIormation has to be Iound within the (redundant) training sampIes x
p
.
Some exampIes oI such probIems are:
Clustering: the input data may be grouped in cIusters` and the data processing system has to
Iind these inherent cIusters in the input data. The output oI the system shouId give the cIuster
IabeI oI the input pattern (discrete output);
Vector quantisation: this probIem occurs when a continuous space has to be discretized. The
input oI the system is the n-dimensionaI vector x, the output is a discrete representation oI the
input space. The system has to Iind optimaI discretization oI the input space;
Dimensionality reduction: the input data are grouped in a subspace, which has Iower
dimensionaIity than the dimensionaIity oI the data. The system has to Iearn an optimaI mapping,
such that most oI the variance in the input data is preserved in the output data;
Feature extraction: the system has to extract Ieatures Irom the input signaI. This oIten means a
dimensionaIity reduction as described above.
In this chapter we discuss a number oI neuro-computationaI approaches Ior these kinds oI
probIems. Training is done without the presence oI an externaI teacher. The unsupervised weight
adapting aIgorithms are usuaIIy based on some Iorm oI gIobaI competition between the neurons.
There are very many types oI seII-organizing networks, appIicabIe to a wide area oI probIems. One
oI the most basic schemes is competitive Iearning as proposed by RumeIhart and Zipser (1985). A very
simiIar network but with diIIerent emergent properties is the topoIogy-conserving map devised by
Kohonen. Other seII-organizing networks are ART, proposed by Carpenter and Grossberg (1987), and
Fukushima (1975).
17 IIZZY LCIC AND NLIRAL NLTWRKS
14.2 COMPETITIVE LEARNING
14.2.1 CIusterIng
Competitive Iearning is a Iearning procedure that divides a set oI input patterns in cIusters that are
inherent to the input data. A competitive Iearning network is provided onIy with input vectors x and thus
impIements an unsupervised Iearning procedure. We wiII show its equivaIence to a cIass oI traditionaI`
cIustering aIgorithms shortIy. Another important use oI these networks is vector quantisation.
An exampIe oI a competitive Iearning network is shown in Fig. 14.1. AII output units o are
connected to aII input units i with weights v
io
. When an input pattern x is presented, onIy a singIe output
unit oI the network (the winner) wiII be activated. In a correctIy trained network, aII x in one cIuster wiII
have the same winner. For the determination oI the winner and the corresponding Iearning ruIe, two
methods exist.
O
i
w
io
WInner SeIectIon: Dot Product
For the time being, we assume that both input vectors x and weight vectors v
o
are normaIized to unit
Iength. Each output unit o caIcuIates its activation vaIue y
o
according to the dot product oI input and
weight vector:
y
o

i

v
io
x
i
v
1
o
x ...(14.1)
In a next pass, output neuron k is seIected with maximum activation
V
o
=

k : y
o
s y
k
...(14.2)
Activations are reset such that y
k
1 and y
o = k
0. This is the competitive aspect oI the network, and
we reIer to the output Iayer as the winner-take-aII Iayer. The winner-take-aII Iayer is usuaIIy
impIemented in soItware by simpIy seIecting the output neuron with highest activation vaIue. This
Iunction can aIso be perIormed by a neuraI network known as MAXNET (Lippmann, 1989). In
MAXNET, aII neurons o are connected to other units o0 with inhibitory Iinks and to itseII with an
excitatory Iink:
v
o, o

=
+
R
S
T
r iI
otherwise
o o
1
...(14.3)
Fig. 14.1 A simple competitive learning network. Each of the four outputs o is connected to all inputs i.
SLLI-RCANIZINC NLTWRKS 171
It can be shown that this network converges to a situation where onIy the neuron with highest initiaI
activation survives, whereas the activations oI aII other neurons converge to zero. From now on, we wiII
simpIy assume a winner k is seIected without being concerned which aIgorithm is used.
Once the winner k has been seIected, the weights are updated according to:
v
k
(t 1)
v t x t v t
v t x t v t
k k
k k
( ) ( ( ) ( ))
,, ( ) ( ( ) ( )),,
+
+
y
y
...(14.4)
where the divisor ensures that aII weight vectors v are normaIized. Note that onIy the weights oI winner
k are updated.
The weight update given in equation (14.4) eIIectiveIy rotates the weight vector v
o
towards the
input vector x. Each time an input x is presented; the weight vector cIosest to this input is seIected and is
subsequentIy rotated towards the input. ConsequentIy, weight vectors are rotated towards those areas
where many inputs appear: the cIusters in the input. This procedure is visuaIized in Fig. 14.2.
Fig. 14.2 Example of clustering in 3D with normalized vectors, which all lie on the unity sphere. The three weight vectors
are rotated towards the centers of gravity of the three different input clusters.
Winner selection: Euclidean distance
PreviousIy it was assumed that both inputs x and weight vectors v were normaIized. Using the
activation Iunction given in equation (14.1) gives a bioIogicaI pIausibIe` soIution. In Fig. 14.3 it is
shown how the aIgorithm wouId IaiI iI normaIized vectors were to be used. NaturaIIy one wouId Iike to
accommodate the aIgorithm Ior normaIized input data. To this end, the winning neuron k is seIected
with its weight vector v
k
cIosest to the input pattern x, using the EucIidean distance measure:
k: ,,v
k
x,,s,,v
o
x,, V
o
...(14.5)
It is easiIy checked that equation (14.5) reduces to (14.1) and (14.2) iI aII vectors are normaIized.
The EucIidean distance norm is thereIore a more generaI case oI equations (14.1) and (14.2). Instead oI
rotating the weight vector towards the input as perIormed by equation (14.4), the weight update must be
changed to impIement a shiIt towards the input:
v
k
(t 1) v
k
(t) y (x(t) v
k
(t)) ...(14.6)
w
1
w
3
w
2
Weight vector
Pattern vector
172 IIZZY LCIC AND NLIRAL NLTWRKS
Again onIy the weights oI the winner are updated.
A point oI attention in these recursive cIustering techniques is the initiaIization. EspeciaIIy iI the
input vectors are drawn Irom a Iarge or high-dimensionaI input space, it is not beyond imagination that
a randomIy initiaIized weight vector v
o
wiII never be chosen as the winner and wiII thus never be moved
and never be used. ThereIore, it is customary to initiaIize weight vectors to a set oI input patterns x}
drawn Irom the input set at random. Another more thorough approach that avoids these and other
probIems in competitive Iearning is caIIed Ieaky Iearning. This is impIemented by expanding the weight
update given in equation (14.6) with
v
l
(t 1) v
l
(t) y(x(t) v
l
(t)) V
l
= k ...(14.7)
with y < y the Ieaky Iearning rate. A somewhat simiIar method is known as Irequency sensitive
competitive Iearning (AhaIt, Krishnamurthy, Chen, & MeIton, 1990). In this aIgorithm, each neuron
records the number oI times it is seIected winner. The more oIten it wins, the Iess sensitive it becomes to
competition. ConverseIy, neurons that consistentIy IaiI to win increase their chances oI being seIected
winner.
Cost function: EarIier it was cIaimed, that a competitive network perIorms a cIustering process on the
input data. i.e., input patterns are divided in disjoint cIusters such that simiIarities between input
patterns in the same cIuster are much bigger than simiIarities between inputs in diIIerent cIusters.
SimiIarity is measured by a distance Iunction on the input vectors, as discussed beIore. A common
criterion to measure the quaIity oI a given cIustering is the square error criterion, given by
E
p

,,v
k
x
p
,,
2
...(14.8)
where k is the winning neuron when input x
p
is presented. The weights v are interpreted as cIuster
centres. It is not diIIicuIt to show that competitive Iearning indeed seeks to Iind a minimum Ior this
square error by the negative gradient oI the error-Iunction.
W
1
W
2
X
W
1
W
2
X
a b
Fig. 14.3 Determining the winner in a competitive learning network. a. Three normalized vectors. b. The three vectors
having the same directions as in a., but with different lengths. In a., vectors x and w
1
are nearest to each other,
and their dot product x
T
w
1
= |x||w
1
| cos o is larger than the dot product of x and w
2
. In b., however, the pattern
and weight vectors are not normalized, and in this case w
2
should be considered the 'winner' when x is
applied. However, the dot product x
T
w
1
is still larger than x
T
w
2
.
SLLI-RCANIZINC NLTWRKS 173
Theorem 14.1: The error Iunction Ior pattern x
p
E
p

p

,,v
k
x
p
,,
2
...(14.9)
where k is the winning unit, is minimised by the weight update ruIe in eq. (14.6).
Proof: As in eq. (3.12), we caIcuIate the eIIect oI a weight change on the error Iunction. So we have
that
A
p
h
io
C
o
o
E
v
p
io
...(14.10)
where y is a constant oI proportionaIity. Now, we have to determine the partiaI derivative oI E
p
:
o
o
E
v
p
io

v x o
io i
p

R
S
T
iI unit wins
otherwise 0
...(14.11)
such that
A
p
v
io
y(v
io
x
i
p
) y (x
o
p
v
io
) ...(14.12)
which is eq. (14.6) written down Ior one eIement oI v
o
.
ThereIore, eq. (14.8) is minimized by repeated weight updates using eq. (14.6).
Example 14.1: In Fig. 14.4, 8 cIusters oI each 6 data points are depicted. A competitive Iearning
network using EucIidean distance to seIect the winner was initiaIized with aII weight vectors v
o
0. The
network was trained with y 0:1 and a y 0:001 and the positions oI the weights aIter 500 iterations are
shown.
Fig. 14.4 Competitive learning for clustering data. The data are given by +. The positions of the weight vectors after
500 iterations is given by o.
1
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.5 0 1 0.5
174 IIZZY LCIC AND NLIRAL NLTWRKS
14.2.2 Vector QuantIsatIon
Another important use oI competitive Iearning networks is Iound in vector quantisation. A vector
quantisation scheme divides the input space in a number oI disjoint subspaces and represents each input
vector x by the IabeI oI the subspace it IaIIs into (i.e., index k oI the winning neuron). The diIIerence
with cIustering is that we are not so much interested in Iinding cIusters oI simiIar data, but more in
quantising the entire input space. The quantisation perIormed by the competitive Iearning network is
said to track the input probabiIity density Iunction`: the density oI neurons and thus subspaces is
highest in those areas where inputs are most IikeIy to appear, whereas a more coarse quantisation is
obtained in those areas where inputs are scarce. An exampIe oI tracking the input density is sketched in
Figure 14.5. Vector quantisation through competitive Iearning resuIts in a more Iine-grained
discretization in those areas oI the input space where most input have occurred in the past.
Fig. 14.5 This figure visualizes the tracking of the input density. The input patterns are drawn from J
2
; the weight vectors
also lie in J
2
. In the areas where inputs are scarce, the upper part of the figure, only few (in this case two)
neurons are used to discretized the input space. Thus, the upper part of the input space is divided into two
large separate regions. The lower part, however, where many more inputs have occurred, five neurons
discretized the input space into five smaller subspaces.
In this way, competitive Iearning can be used in appIications where data has to be compressed such
as teIecommunication or storage. However, competitive Iearning has aIso be used in combination with
supervised Iearning methods, and be appIied to Iunction approximation probIems or cIassiIication
probIems. We wiII describe two exampIes: the 'counter propagation method and the 'Iearning vector
quantisation.
14.2.3 Counter PropagatIon
In a Iarge number oI appIications, networks that perIorm vector quantisation are combined with another
type oI network in order to perIorm Iunction approximation. An exampIe oI such a network is given in
x
2
x
1
Weight vector Input pattern
SLLI-RCANIZINC NLTWRKS 175
Fig. 14.6. This network can approximate a Iunction f : J
n
J
m
by associating with each neuron o a
Iunction vaIue |v
1o
; v
2o
, ., v
mo
|
1
which is somehow representative Ior the Iunction vaIues f(x) oI
inputs x represented by o. This way oI approximating a Iunction eIIectiveIy impIements a Iook-up
tabIe`: an input x is assigned to a tabIe entry k with V
o
= k: ,,x v
k
,,s,,x v
o
,,, and the Iunction vaIue
|v
1k
; v
2k
, ., v
mk
|
1
in this tabIe entry is taken as an approximation oI f(x). A weII-known exampIe oI
such a network is the Counter propagation network (Hecht-NieIsen, 1988).
Vector
quantisation
Feed-
forward
h
i o
y
W
ih
W
ho
Fig. 14.6 A network combining a vector quantisation layer with a 1-layer feed-forward neural network. This network can
be used to approximate functions from J
2
to J
2
, the input space J
2
is discretized in 5 disjoint subspaces.
Depending on the appIication, one can choose to perIorm the vector quantisation beIore Iearning
the Iunction approximation, or one can choose to Iearn the quantisation and the approximation Iayer
simuItaneousIy. As an exampIe oI the Iatter, the network presented in Fig. 14.6 can be supervisedIy
trained in the IoIIowing way:
1. Present the network with both input x and Iunction vaIue d f (x);
2. PerIorm the unsupervised quantisation step. For each weight vector, caIcuIate the distance Irom
its weight vector to the input pattern and Iind winner k. Update the weights v
ih
with equation
(14.6);
3. PerIorm the supervised approximation step:
v
ko
(t 1) v
ko
(t) y(d
o
v
ko
(t)) ...(14.13)
This is simpIy the o ruIe with y
o

h

y
h
v
ho
v
ko
when k is the winning neuron and the desired
output is given by d f(x).
II we deIine a Iunction g(x, k) as:
g(x, k)
1 iI is winner
0 otherwise
k R
S
T
...(14.14)
It can be shown that this Iearning procedure converges to
v
ho

J
z
n
y
o
g(x, h)dx ...(14.15)
176 IIZZY LCIC AND NLIRAL NLTWRKS
i.e., each tabIe entry converges to the mean Iunction vaIue over aII inputs in the subspace represented by
that tabIe entry. As we have seen beIore, the quantisation scheme tracks the input probabiIity density
Iunction, which resuIts in a better approximation oI the Iunction in those areas where input is most
IikeIy to appear.
Not aII Iunctions are represented accurateIy by this combination oI quantisation and approximation
Iayers. e.g., a simpIe identity or combinations oI sines and cosines are much better approximated by
muItiIayer back-propagation networks iI the activation Iunctions are chosen appropriateIy. However, iI
we expect our input to be (a subspace oI) a high dimensionaI input space n and we expect our Iunction
f to be discontinuous at numerous points, the combination oI quantisation and approximation is not
uncommon and probabIy very eIIicient. OI course this combination extends itseII much Iurther than the
presented combination oI the presented singIe Iayer competitive Iearning network and the singIe Iayer
Ieed-Iorward network. The Iatter couId be repIaced by a reinIorcement Iearning procedure (see
chapter 15).
The quantisation Iayer can be repIaced by various other quantisation schemes, such as Kohonen
networks or octree methods (Jansen, Smagt, and Groen, 1994). In Iact, various modern statisticaI
Iunction approximation methods (Breiman, Friedman, OIshen, and Stone, 1984; Friedman, 1991) are
based on this very idea, extended with the possibiIity to have the approximation Iayer inIIuence the
quantisation Iayer (e.g., to obtain a better or IocaIIy more Iine-grained quantisation).
14.2.4 LearnIng Vector QuantIsatIon
It is an unpIeasant habit in neuraI network Iiterature, to aIso cover Learning Vector Quantisation (LVQ)
methods in chapters on unsupervised cIustering. Granted that these methods aIso perIorm a cIustering or
quantisation task and use simiIar Iearning ruIes, they are trained supervisedIy and perIorm discriminant
anaIysis rather than unsupervised cIustering. These networks attempt to deIine decision boundaries` in
the input space, given a Iarge set oI exempIary decisions (the training set); each decision couId, e.g., be
a correct cIass IabeI.
A rather Iarge number oI sIightIy diIIerent LVQ methods is appearing in recent Iiterature. They are
aII based on the IoIIowing basic aIgorithm:
1. With each output neuron o, a cIass IabeI (or decision oI some other kind) y
o
is associated;
2. A Iearning sampIe consists oI input vector x
p
together with its correct cIass IabeI y
p
o
;
3. Using distance measures between weight vectors v
o
and input vector x
p
, not onIy the winner k
1
is
determined, but aIso the second best k
2
:
,,x
p
v
k1
,,,,x
p
v
k
2
,,,,x
p
v
i
,, V
o
= k
1
, k
2
4. The IabeIs y
p
k
1
, y
p
k
2
are compared with d
p
. The weight update ruIe given in equation (6.6) is used
seIectiveIy based on this comparison.
An exampIe oI the Iast step is given by the LVQ2 aIgorithm by Kohonen (1977), using the
IoIIowing strategy:
iI y
p
k
1
= d
p
and d
p
y
p
k
2
and ,,x
p
v
k
2
,, ,,x
p
v
k
1
,, r
then v
k
2
(t 1) v
k
2
(t) y(x v
k
2
(t))
and v
k
1
(t 1) v
k
1
(t) y (x v
k
1
(t))
SLLI-RCANIZINC NLTWRKS 177
i.e., v
k
2
with the correct IabeI is moved towards the input vector, whiIe v
k
1
with the incorrect IabeI is
moved away Irom it.
The new LVQ aIgorithms that are emerging aII use diIIerent impIementations oI these diIIerent
steps, e.g., how to deIine cIass IabeIs y
o
, how many next-best` winners are to be determined, how to
adapt the number oI output neurons i and how to seIectiveIy use the weight update ruIe.
14.3 KOHONEN NETWORK
The Kohonen network (1982, 1984) can be seen as an extension to the competitive Iearning network,
aIthough this is chronoIogicaIIy incorrect. AIso, the Kohonen network has a diIIerent set oI
appIications.
In the Kohonen network, the output units in S are ordered in some Iashion, oIten in a two-
dimensionaI grid or array, aIthough this is appIication-dependent. The ordering, which is chosen by the
user, determines which output neurons are neighbours.
Now, when Iearning patterns are presented to the network, the weights to the output units are thus
adapted such that the order present in the input space J
2
is preserved in the output, i.e., the neurons in
S. This means that Iearning patterns which are near to each other in the input space (where near` is
determined by the distance measure used in Iinding the winning unit) must be mapped on output units,
which are aIso near to each other, i.e., the same or neighboring units. Thus, iI inputs are uniIormIy
distributed in J
N
and the order must be preserved, the dimensionaIity oI S must be at Ieast N.
The mapping, which represents a discretization oI the input space, is said to be topoIogy preserving.
However, iI the inputs are restricted to a subspace oI J
N
, a Kohonen network can be used oI Iower
dimensionaIity. For exampIe: data on a two- dimensionaI maniIoId in a high dimensionaI input space
can be mapped onto a two-dimensionaI Kohonen network, which can Ior exampIe be used Ior
visuaIization oI the data.
UsuaIIy, the Iearning patterns are random sampIes Irom J
N
. At time t, a sampIe x(t) is generated and
presented to the network. Using the same IormuIas as in section 6.1, the winning unit k is determined.
Next, the weights to this winning unit as weII as its neighbours are adapted using the Iearning ruIe
v
o
(t 1) v
o
(t) yg(o, k)(x(t) v
o
(t)) ...(14.16)
Here, g(o, k) is a decreasing Iunction oI the grid-distance between units o and k, such that g(k, k)
1. For exampIe, Ior g( ) a Gaussian Iunction can be used, such that (in one dimension!) g(o, k) exp
( (o k)
2
) (see Fig. 14.7). Due to this coIIective Iearning scheme, input signaIs, which are near to each
other, wiII be mapped on neighbouring neurons. Thus the topoIogy inherentIy present in the input
signaIs wiII be preserved in the mapping, such as depicted in Fig. 14.8.
II the intrinsic dimensionaIity oI S is Iess than N, the neurons in the network are IoIded` in the input
space, such as depicted in Fig. 14.9.
The topoIogy-conserving quaIity oI this network has many counterparts in bioIogicaI brains. The
brain is organized in many pIaces so that aspects oI the sensory environment are represented in the Iorm
oI two-dimensionaI maps. For exampIe, in the visuaI system, there are severaI topographic mappings oI
visuaI space onto the surIace oI the visuaI cortex. There are organized mappings oI the body surIace
17S IIZZY LCIC AND NLIRAL NLTWRKS
Iteration 0 Iteration 200 Iteration 600 Iteration 1900
Fig. 14.8 A topology-conserving map converging. The weight vectors of a network with two inputs and 8 x 8 output
neurons arranged in a planar grid are shown. A line in each figure connects weight w
i o o ,( )
1 2
with weights
w
i o o ,( , )
1 2
1 +
and w
i i i ,( )
1 2
1 +
. The leftmost figure shows the initial weights; the rightmost when the map is
almost completely formed.
Fig. 14.7 Gaussian neuron distance function g( ). In this case, g( ) is shown for a two-dimensional grid because it looks
nice.
Fig. 14.9 The mapping of a two-dimensional input space on a one-dimensional Kohonen network.
1
0.75
0.5
0.25
0
1
1
2
0
1
2
1
0
1
2
2
SLLI-RCANIZINC NLTWRKS 179
onto the cortex in both motor and somatosensory areas, and tonotopic mappings oI Irequency in the
auditory cortex. The use oI topographic representations, where some important aspect oI a sensory
modaIity is reIated to the physicaI Iocations oI the ceIIs on a surIace, is so common that it obviousIy
serves an important inIormation processing Iunction.
It does not come as a surprise, thereIore, that aIready many appIications have been devised oI the
Kohonen topoIogy-conserving maps. Kohonen himseII has successIuIIy used the network Ior phoneme-
recognition (Kohonen, Makisara, and Saramaki, 1984). AIso, the network has been used to merge
sensory data Irom diIIerent kinds oI sensors, such as auditory and visuaI, Iooking` at the same scene
(GieIen, Krommenhoek, and Gisbergen, 1991).
To expIain the pIausibiIity oI a simiIar structure in bioIogicaI networks, Kohonen remarks that the
IateraI inhibition between the neurons couId be obtained via eIIerent connections between those
neurons. In one dimension, those connection strengths Iorm a Mexican hat` (see Figure 14.10).
Excitation
Lateral distance
Fig. 14.10 Mexican hat. Lateral interaction around the winning neuron as a function of distance: excitation to nearby
neurons, inhibition to farther off neurons.
14.4 PRINCIPAL COMPONENT NETWORKS
The networks presented in the previous sections can be seen as (nonIinear) vector transIormations,
which map an input vector to a number oI binary output eIements or neurons. The weights are adjusted
in such a way that they couId be considered as prototype vectors (vectoriaI means) Ior the input patterns
Ior which the competing neuron wins.
The seII-organizing transIorm described in this section rotates the input space in such a way that the
vaIues oI the output neurons are as uncorreIated as possibIe and the energy or variances oI the patterns
is mainIy concentrated in a Iew output neurons. An exampIe is shown in Figure 14.11.
The two dimensionaI sampIes (x
1
, x
2
) are pIotted in the Iigure. It can be easiIy seen that x
1
and x
2
are
reIated, such that iI we know x
1
we can make a reasonabIe prediction oI x
2
and vice versa since the
points are centered around the Iine x
1
x
2
. II we rotate the axes over r/4 we get the (e
1
, e
2
) axis as
pIotted in the Iigure. Here the conditionaI prediction has no use because the points have uncorreIated
coordinates. Another property oI this rotation is that the variance or energy oI the transIormed patterns
is maximized on a Iower dimension. This can be intuitiveIy veriIied by comparing the spreads ( , ) d d
x x
1 2
and ( , ) d d
e e
1 2
in the Iigures. AIter the rotation, the variance oI the sampIes is Iarge aIong the e
1
axis and
smaII aIong the e
2
axis.
1S IIZZY LCIC AND NLIRAL NLTWRKS
This transIorm is very cIoseIy reIated to the eigenvector transIormation known Irom image
processing where the image has to be coded or transIormed to a Iower dimension and reconstructed
again by another transIorm as weII as possibIe.
The next section describes a Iearning ruIe which acts as a Hebbian Iearning ruIe, but which scaIes
the vector Iength to unity. In the subsequent section we wiII see that a Iinear neuron with a normaIised
Hebbian Iearning ruIe acts as such a transIorm, extending the theory in the Iast section to muIti-
dimensionaI outputs.
14.4.1 NormaIIzed HebbIan RuIe
The modeI considered here consists oI one Iinear neuron with input weights v. The output y
o
(t) oI this
neuron is given by the usuaI inner product oI its weight v and the input vector x:
y
o
(t) v(t)
1
x(t) ...(14.17)
As seen in the previous sections, aII modeIs are based on a kind oI Hebbian Iearning. However, the
basic Hebbian ruIe wouId make the weights grow uninhibitedIy iI there were correIation in the input
patterns. This can be overcome by normaIising the weight vector to a Iixed Iength, typicaIIy 1, which
Ieads to the IoIIowing Iearning ruIe
v(t 1)
v t y t x t
L v t y t x t
( ) ( ) ( )
( ( ) ( ) ( ))
+
+
y
y
...(14.18)
where L() indicates an operator which returns the vector Iength, and y is a smaII Iearning parameter.
Compare this Iearning ruIe with the normaIized Iearning ruIe oI competitive Iearning. There the deIta
ruIe was normaIized, here the standard Hebb ruIe is.
Now the operator which computes the vector Iength, the norm oI the vector, can be approximated
by a TayIor expansion around y 0:
L(v(t)) yy(t) x(t) 1 y
o
oy

L
y 0
O(y
2
) ...(14.19)
Fig. 14.11 Distribution of input samples.
x
2
x
1
dx
2
de
2
e
1
de
1
dx
1
e
2
SLLI-RCANIZINC NLTWRKS 1S1
When we substitute this expression Ior the vector Iength in equation (6.18), it resoIves Ior smaII
y(t
2
).
v(t 1) (v(t) yy(t) x(t)) 1
0
2

o
oy
+
F
H
G
G
I
K
J
J

y y
y
L
o( ) ...(14.20)
Since
o
oy
y
L
0
y(t)
2
discarding the higher order terms oI y Ieads to
v(t 1) v(t) yy(t) x(t)(x(t) y(t)v(t)) ...(14.21)
which is caIIed the Oja Iearning ruIe` (Oja, 1982). This Iearning ruIe thus modiIies the weight in the
usuaI Hebbian sense, the Iirst product terms is the Hebb ruIe y
o
(t) x(t), but normaIizes its weight vector
directIy by the second product term y
o
(t) x(t) v(t). What exactIy does this Iearning ruIe do with the
weight vector?
14.4.2 PrIncIpaI Component Extractor
Remember probabiIity theory? Consider an N-dimensionaI signaI x(t) with
Mean E(x(t));
CorreIation matrix R E((x(t) ) (x(t) )
1
).
In the IoIIowing we assume the signaI mean to be zero, so 0.
From equation (6.21) we see that the expectation oI the weights Ior the Oja Iearning ruIe equaIs
E(v(t 1),v(t)) v(t) y(Rv(t) (v(t)
1
Rv(t))v(t)) ...(14.22)
which has a continuous counterpart
d
dt
v(t) Rv(t) (v(t)
1
Rv(t)) v(t) ...(14.23)
Theorem 14.2: Let the eigenvectors e
i
oI R be ordered with descending associated eigenvaIues i
i
such that i
1
~

i
2
~ ... ~

i
N
. With equation (6.23) the weights v(t) wiII converge to e
1
.
Proof: 1 Since the eigenvectors oI R span the N-dimensionaI space, the weight vector can be
decomposed as
v(t)
i
N


i
(t)e
i
...(14.24)
Substituting this in the diIIerentiaI equation and concIuding the theorem is IeIt as an exercise.
14.4.3 More EIgenvectors
In the previous section it was shown that a singIe neuron`s weight converges to the eigenvector oI the
correIation matrix with maximum eigenvaIue, i.e., the weight oI the neuron is directed in the direction
1S2 IIZZY LCIC AND NLIRAL NLTWRKS
oI highest energy or variance oI the input patterns. Here we tackIe the question oI how to Iind the
remaining eigenvectors oI the correIation matrix given the Iirst Iound eigenvector.
Consider the signaI x which can be decomposed into the basis oI eigenvectors e
i
oI its correIation
matrix R,
x
i
N

o
i
e
i
...(14.25)
II we now subtract the component in the direction oI e
1
, the direction in which the signaI has the
most energy, Irom the signaI x
~
x x o
1
e
1
...(14.26)
we are sure that when we again decompose ~x into the eigenvector basis, the coeIIicient o
1
0, simpIy
because we just subtracted it. We caII
~
x the deIIation oI x.
II now a second neuron is taught on this signaI
~
x, then its weights wiII Iie in the direction oI the
remaining eigenvector with the highest eigenvaIue. Since the deIIation removed the component in the
direction oI the Iirst eigenvector, the weight wiII converge to the remaining eigenvector with maximum
eigenvaIue. In the previous section we ordered the eigenvaIues in magnitude, so according to this
deIinition in the Iimit we wiII Iind e
2
. We can continue this strategy and Iind aII the N eigenvectors
beIonging to the signaI x.
We can write the deIIation in neuraI network terms iI we see that
y
o
v
1
x e
1
1
i
N

o
i
e
i
o
i
...(14.27)
since
v e
1
...(14.28)
So that the deIIated vector
~
x equaIs
~
x x y
o
v ...(14.29)
The term subtracted Irom the input vector can be interpreted as a kind oI a back-projection or
expectation. Compare this to ART described in the next section.
14.5 ADAPTIVE RESONANCE THEORY
The Iast unsupervised Iearning network we discuss diIIers Irom the previous networks in that it is
recurrent; as with networks in the next chapter, the data is not onIy Ied Iorward but aIso back Irom
output to input units.
14.5.1 Background: AdaptIve Resonance Theory
In 1976, Grossberg introduced a modeI Ior expIaining bioIogicaI phenomena. The modeI has three
cruciaI properties:
SLLI-RCANIZINC NLTWRKS 1S3
1. A normaIization oI the totaI network activity. BioIogicaI systems are usuaIIy very adaptive to
Iarge changes in their environment. For exampIe, the human eye can adapt itseII to Iarge
variations in Iight intensities;
2. Contrast enhancement oI input patterns. The awareness oI subtIe diIIerences in input patterns can
mean a Iot in terms oI survivaI. Distinguishing a hiding panther Irom a resting one makes aII the
diIIerence in the worId. The mechanism used here is contrast enhancement.
3. Short-term memory (STM) storage oI the contrast-enhanced pattern. BeIore the input pattern can
be decoded, it must be stored in the short-term memory. The Iong-term memory (LTM)
impIements an arousaI mechanism (i.e., the cIassiIication), whereas the STM is used to cause
graduaI changes in the LTM.
The system consists oI two Iayers, F
1
and F
2
, which are connected to each other via the LTM (see
Fig. 14.12 The ART architecture.
Fig. 14.12). The input pattern is received at F
1
, whereas cIassiIication takes pIace in F
2
. As mentioned
beIore, the input is not directIy cIassiIied. First a characterization takes pIace by means oI extracting
Ieatures, giving rise to activation in the Ieature representation IieId.
The expectations, residing in the LTM connections, transIate the input pattern to a categorization in
the category representation IieId. The cIassiIication is compared to the expectation oI the network,
which resides in the LTM weights Irom F
2
to F
1
. II there is a match, the expectations are strengthened
otherwise the cIassiIication is rejected.
14.5.2 ART1: The SImpIIfIed NeuraI Network ModeI
The ART1 simpIiIied modeI consists oI two Iayers oI binary neurons (with vaIues 1 and 0), caIIed F
1
(the comparison Iayer) and F
2
(the recognition Iayer) (see Fig. 14.13). Each neuron in F
1
is connected to
aII neurons in F
2
via the continuous-vaIued Iorward Iong term memory (LTM) h
f
, and vice versa via the
binary-vaIued backward LTM h
b
. The other moduIes are gain 1 and 2 (G
1
and G
2
), and a reset moduIe.
Each neuron in the comparison Iayer receives three inputs: a component oI the input pattern, a
component oI the Ieedback pattern, and a gain G
1
. A neuron outputs a 1 iI and onIy iI at Ieast three oI
these inputs are high: the two-thirds ruIe`.
Category representation field
Feature representation field
LTM LTM
F
1
Input
F
2
STM activity pattern
STM activity pattern
1S4 IIZZY LCIC AND NLIRAL NLTWRKS
The neurons in the recognition Iayer each compute the inner product oI their incoming (continuous-
vaIued) weights and the pattern sent over these connections. The winning neuron then inhibits aII the
other neurons via IateraI inhibition.
Gain 2 is the IogicaI or` oI aII the eIements in the input pattern x.
Gain 1 equaIs gain 2, except when the Ieedback pattern Irom F
2
contains any 1; then it is Iorced to
zero.
FinaIIy, the reset signaI is sent to the active neuron in F
2
iI the input vector x and the output oI F
1
diIIer by more than some vigiIance IeveI.
14.5.3 OperatIon
The pattern is sent to F
2
, and in F
2
one neuron becomes active. This signaI is then sent back over the
backward LTM, which reproduces a binary pattern at F
1
. Gain 1 is inhibited, and onIy the neurons in F
1
which receive a one` Irom both x and F
2
remain active.
II there is a substantiaI mismatch between the two patterns, the reset signaI wiII inhibit the neuron in
F
2
and the process is repeated.
Instead oI IoIIowing Carpenter and Grossberg`s description oI the system using diIIerentiaI
equations, we use the notation empIoyed by Lippmann (1987):
1. InitiaIization:
v
fi
b
(0) 1
v
fi
f

1
1+ N
Fig. 14.13 The ART 1 neural network.
j
M neurons
i
N neurons
G
2
G
1
F
1
+ +
+ +

+
W
b
W
f
+

+
Reset
Input
F
2
SLLI-RCANIZINC NLTWRKS 1S5
where N is the number oI neurons in F
1
, M the number oI neurons in F
2
, 0 s i N, and 0 s f M.
AIso, choose the vigiIance threshoId p, 0 s p s 1;
2. AppIy the new input pattern x;
3. Compute the activation vaIues oI the neurons in F
2
:
y
i

f
N

1
v
i f
f
(t)x
1
...(14.30)
4. SeIect the winning neuron k(0 s k < M);
5. VigiIance test: iI
v t x
x x
k
b
( ) o
o
~ p ...(14.31)
where o denotes inner product, go to step 7, eIse go to step 6. Note that v
b
k
o x essentiaIIy is the
inner product x
*
o x, which wiII be Iarge iI x

and x are near to each other;


6. Neuron k is disabIed Irom Iurther activity. Go to step 3;
7. Set Ior aII l, o s l < N:
v
b
kl
(t 1) v
b
kl
(t)x
l
v
f
lk
(t 1)
v t x
v t x
kl
b
l
ki
b
i
i
N
( )
( )
1
2
1
+

8. Re-enabIe aII neurons in F


2
and go to step 2.
Fig. 14.14 shows exempIar behaviour oI the network.
14.5.4 ART 1: The OrIgInaI ModeI
In Iater work, Carpenter and Grossberg (1987) present severaI neuraI network modeIs to incorporate
parts oI the compIete theory. We wiII onIy discuss the Iirst modeI, ART 1.
The network incorporates a IoIIow-the-Ieader cIustering aIgorithm (Hartigan, 1975). This
aIgorithm tries to Iit each new input pattern in an existing cIass. II no matching cIass can be Iound, i.e.,
the distance between the new pattern and aII existing cIasses exceeds some threshoId, a new cIass is
created containing the new pattern.
The noveIty in this approach is that the network is abIe to adapt to new incoming patterns, whiIe the
previous memory is not corrupted. In most neuraI networks, such as the back- propagation network, aII
patterns must be taught sequentiaIIy; the teaching oI a new pattern might corrupt the weights Ior aII
previousIy Iearned patterns. By changing the structure oI the network rather than the weights, ART1
overcomes this probIem.
1S6 IIZZY LCIC AND NLIRAL NLTWRKS
14.5.5 NormaIIzatIon of the OrIgInaI ModeI
We wiII reIer to a ceII in F
1
or F
2
with k.
Each ceII k in F
1
or F
2
receives an input s
k
and respond with an activation IeveI y
k
.
In order to introduce normaIization in the modeI, we set I s
k
and Iet the reIative input
intensity O
k
s
k
I
1
.
So we have a modeI in which the change oI the response y
k
oI an input at a certain ceII k
depends inhibitoriIy on aII other inputs and the sensitivity oI the ceII, i.e., the surroundings oI
each ceII have a negative inIIuence on the ceII
=

y s
k l
l k
;
has an excitatory response as Iar as the input at the ceII is concerned Bs
k
;
Not
active
Backward LTM from
Input
pattern
Output Output Output
1 2 3
Output
4
Not
active
Not
active
Not
active
Not
active
Not
active
Not
active
Fig. 14.14 An example of the behaviour of the Carpenter Grossberg network for letter patterns. The binary input
patterns on the left were applied sequentially. On the right the stored patterns (i.e., the weights of W
b
for the
first four output units) are shown.
SLLI-RCANIZINC NLTWRKS 1S7
has an inhibitory response Ior normaIization y
k
s
k
;
has a decay Ay
k
.
Here, A and B are constants. The diIIerentiaI equation Ior the neurons in F
1
and F
2
now is
dy
dt
k
Ay
k
(B yk)s
k

=

y s
k l
l k
...(14.32)
with 0 s y
k
(0) s B because the inhibitory eIIect oI an input can never exceed the excitatory input.
At equiIibrium, when
dy
dt
k
0, and with I s
k
we have that
y
k
(A 1) Bs
k
...(14.33)
Because oI the deIinition oI O
k
s
k
I
1
we get
y
k
O
k
BI
A I +
...(14.34)
ThereIore, at equiIibrium y
k
is proportionaI to O
k
, and, since
BI
A I +
s B ...(14.35)
The totaI activity y
totaI
y
k
never exceeds B: it is normaIized.
14.5.6 Contrast Enhancement
In order to make F
2
react better on diIIerences in neuron vaIues in F
1
(or vice versa), contrast
enhancement is appIied: the contrasts between the neuronaI vaIues in a Iayer are ampIiIied. We can
show that eq. (14.32) does not suIIice anymore. In order to enhance the contrasts, we chop oII aII the
equaI Iractions (uniIorm parts) in F
1
or F
2
. This can be done by adding an extra inhibitory input
proportionaI to the inputs Irom the other ceIIs with a Iactor C:
dy
dt
k
Ay
k
(B y
k
)s
k
(y
k
C) s
l
l k =

...(14.36)
At equiIibrium, when we set B (n 1) C where n is the number oI neurons, we have
y
k

nCI
A I n
k
+

F
H
G
I
K
J
O
1
...(14.37)
Now, when an input in which aII the s
k
are equaI is given, then aII the y
k
are zero: the eIIect oI C is
enhancing diIIerences. II we set B s (n 1)C or C/(B C) > 1/n, then more oI the input shaII be chopped
oII.
The description oI ART1 continues by deIining the diIIerentiaI equations Ior the LTM. Instead oI
IoIIowing Carpenter and Grossberg`s description, we wiII revert to the simpIiIied modeI as presented by
Lippmann.
1SS IIZZY LCIC AND NLIRAL NLTWRKS
QUESTION BANK.
1. What are the advantages oI seII-organizing networks?
2. What is competitive Iearning network?
3. ExpIain various methods oI determining the winner and the corresponding Iearning ruIe.
4. Describe the square error criterion to measure the quaIity oI a given cIustering.
5. Describe the vector quantisation scheme.
6. ExpIain the counter propagation network.
7. Describe the Iearning vector quantisation method.
8. What is Kohonen network? ExpIain.
9. ExpIain normaIized Hebbian ruIe.
10. Describe adaptive resonance theory.
11. ExpIain ART 1 neuraI network.
12. Describe the normaIization oI ART 1.
REFERENCES.
1. D.E. RumeIhart, and D. Zipser, Feature discovery by competitive Iearning. Cognitive Science,
VoI. 9, pp. 75-112, 1985.
2. G.A. Carpenter, and S. Grossberg, A massiveIy paraIIeI architecture Ior a seII-organizing neuraI
pattern recognition machine, Computer Jision, Graphics, and Image Processing, VoI. 37, pp. 54-
115, 1987.
3. S. Grossberg, Adaptive pattern cIassiIication and universaI recoding I & II, Biological
Cybernetics, VoI. 23, pp. 121-134, 187-202, 1976.
4. K. Fukushima, Cognitron: A seII-organizing muItiIayered neuraI network, Biological
Cybernetics, VoI. 20, pp. 121-136, 1975.
5. K. Fukushima, Neocognitron: A hierarchicaI neuraI network capabIe oI visuaI pattern
recognition, Neural Netvorks, VoI. 1, pp. 119-130, 1988.
6. R.P. Lippmann, Review oI neuraI networks Ior speech recognition, Neural Computation,VoI. 1,
pp. 1-38, 1989.
7. S.C. AhaIt, A.K. Krishnamurthy, P. Chen, D. MeIton, Competitive Iearning aIgorithms Ior vector
quantisation, Neural Netvorks, VoI. 3, pp. 277-290, 1990.
8. R.H. NieIsen, Counterpropagation networks, Neural Netvorks, VoI. 1, No. 131-139, 1988.
9. A. Jansen, P.P. Smagt, P. Van der, and F.C.A. Groen, Nested networks Ior robot controI, In A.F.
Murray (Ed.), Neural Netvork Applications, KIuwer Academic PubIishers, 1994.
10. L. Breiman, J.H. Friedman, R.A. OIshen, and C.J. Stone, Classification and Regression 1rees.
Wadsworth and Broks/CoIe, 1984.
11. J.H. Friedman, MuItivariate adaptive regression spIines, Annals of Statistics, VoI. 19, pp. 1-141,
1991.
SLLI-RCANIZINC NLTWRKS 1S9
12. T. Kohonen, Associative Memory: A System-1heoretical Approach, Springer-VerIag, 1977.
13. T. Kohonen, SeII-organized Iormation oI topoIogicaIIy correct Ieature maps, BioIogicaI
Cybernetics, 43, 59-69, 1982.
14. T. Kohonen, Self-Organization and Associative Memory, BerIin: Springer-VerIag, 1984.
15. T. Kohonen, M. Makisara, and T. Saramaki, Phonotopic maps,insightIuI representation oI
phonoIogicaI Ieatures Ior speech recognition, In Proceedings of the 7th IEEE International
Conference on Pattern Recognition, DUNNO, 1984.
16. C. GieIen, K. Krommenhoek, and J. Gisbergen, A procedure Ior seII-organized sensor-Iusion in
topoIogicaIIy ordered maps, In T. Kanade, F.C.A. Groen, and L.O. Hertzberger (Eds.),
Proceedings of the Second International Conference on Autonomous Systems, EIsevier Science
PubIishers, pp. 417-423, 1991.
17. E. Oja, A simpIiIied neuron modeI as a principaI component anaIyzer, Journal of Mathematical
Biology, VoI. 15, pp. 267-273, 1982.
18. S. Grossberg, Adaptive pattern cIassiIication and universaI recoding I & II, Biological
Cybernetics, VoI. 23, 121-134, 187-202, 1976.
19. R.P. Lippmann, An introduction to computing with neuraI nets, IEEE 1ransactions on Acoustics,
Speech, and Signal Processing, VoI. 2 No.4, pp. 4-22, 1987.
20. G.A. Carpenter, and S. Grossberg, A massiveIy paraIIeI architecture Ior a seII-organizing neuraI
pattern recognition machine, Computer Jision, Graphics, and Image Processing, VoI. 37, pp. 54-
115, 1987.
21. G.A. Carpenter, and S. Grossberg, ART 2: SeII-organization oI stabIe category recognition codes
Ior anaIog input patterns, Applied Optics, 26(23), 4919-4930, 1987.
22. J.A. Hartigan, Clustering Algorithms, New York: John WiIey & Sons, 1975.
15
Reinforcement Learning
+ 0 ) 2 6 - 4
15.1 INTRODUCTION
In the previous chapters a number oI supervised training methods have been described in which the
weight adjustments are caIcuIated using a set oI Iearning sampIes`, existing oI input and desired output
vaIues. However, not aIways such a set oI Iearning exampIes is avaiIabIe. OIten the onIy inIormation is
a scaIar evaIuation r, which indicates how weII the neuraI network is perIorming. ReinIorcement
Iearning invoIves two subprobIems. The First is that the reinIorcement` signaI r is oIten deIayed since
it is a resuIt oI network outputs in the past. This temporaI credit assignment probIem is soIved by
Iearning a critic` network which represents a cost Iunction J predicting Iuture reinIorcement. The
second probIem is to Find a Iearning procedure which adapts the weights oI the neuraI network such that
a mapping is estabIished which minimizes J. The two probIems are discussed in the next paragraphs,
respectiveIy. Fig. 15.1 shows a reinIorcement-Iearning network interacting with a system.
Critic
Reinf.
learning
controller
System
Reinforcement
signal
x u
$
J
Fig. 15.1 Reinforcement learning scheme.
15.2 THE CRITIC
The Iirst probIem is how to construct a critic, which is abIe to evaIuate system perIormance. II the
objective oI the network is to minimize a direct measurabIe quantity r, perIormance Ieedback is
straightIorward and a critic is not required. On the other hand, how is current behaviour to be evaIuated
RLINIRCLMLNT LLARNINC 191
iI the objective concerns Iuture system perIormance? The perIormance may Ior instance be measured by
the cumuIative or Iuture error. Most reinIorcement Iearning methods (Barto, Sutton and Anderson
(1983) use the temporaI diIIerence (TD) aIgorithm (Sutton, 1988) to train the critic.
Suppose the immediate cost oI the system at time step k are measured by r(x
k
, u
k
, k), as a Iunction oI
system states x
k
and controI actions (network outputs) u
k
. The immediate measure r is oIten caIIed the
externaI reinIorcement signaI in contrast to the internaI reinIorcement signaI in Fig. 7.1. DeIine the
perIormance measure J(x
k
, u
k
, k)oI the system as a discounted cumuIative oI Iuture cost. The task oI the
critic is to predict the perIormance measure:
J(x
k
, u
k
, k) y
i k
i k

r (x
k
, u
k
, k) ...(15.1)
in which y e |0, 1| is a discount Iactor (usuaIIy - 0.95).
The reIation between two successive prediction can easiIy be derived:
J(x
k
, u
k
, k) r(x
k
, u
k
, k) rJ(x
k1
, u
k1
, k 1) ...(15.2)
II the network is correctIy trained, the reIation between two successive network outputs
$
J shouId
be:
$
J (x
k
, u
k
, k) r(x
k
, u
k
, k) r
$
J (x
k1
, u
k1
, k 1) ...(15.3)
II the network is not correctIy trained, the temporaI diIIerence o(k) between two successive
predictions is used to adapt the critic network:
o(k) |r(x
k
, u
k
, k) r
$
J (x
k1
, u
k1
, k 1)|
$
J (x
k
, u
k
, k) ...(15.4)
A Iearning ruIe Ior the weights oI the critic network v
c
(k), based on minimizing o
2
(k) can be
derived:
Av
c
(k) or(k)
o
o
$
( , , )
( )
J x u k
v k
k k
c
...(15.5)
in which o is the Iearning rate.
15.3 THE CONTROLLER NETWORK
II the critic is capabIe oI providing an immediate evaIuation oI perIormance, the controIIer network can
be adapted such that the optimaI reIation between system states and controI actions is Iound. Three
approaches are distinguished:
1. In case oI a Iinite set oI actions U, aII actions may virtuaIIy be executed. The action which
decreases the perIormance criterion most is seIected:
u
k
min
u eU
$
J (x
k
, u
k
, k) ...(15.6)
192 IIZZY LCIC AND NLIRAL NLTWRKS
The RL-method with this controIIer` is caIIed Q-Iearning (Watkins & Dayan, 1992). The method
approximates dynamic programming which wiII be discussed in the next section.
2. II the perIormance measure J(x
k
, u
k
, k) is accurateIy predicted, then the gradient with respect to
the controIIer command u
k
can be caIcuIated, assuming that the critic network is diIIerentiabIe. II
the measure is to be minimized, the weights oI the controIIer v
r
are adjusted in the direction oI the
negative gradient:
Av
r
(k)
$
( , )
( )

o
o
J x u k
u k
k k
o
o
u k
v k
r
( )
( )
...(15.7)
with being the Iearning rate. Werbos (1992) has discussed some oI these gradient based
aIgorithms in detaiI. SoIge and White (1992) appIied one oI the gradient based methods to
optimize a manuIacturing process.
3. A direct approach to adapt the controIIer is to use the diIIerence between the predicted and the
true` perIormance measure as expressed in equation 15.3. Suppose that the perIormance measure
is to be minimized. ControI actions that resuIt in negative diIIerences, i.e. the true perIormance is
better than was expected, then the controIIer has to be rewarded`. On the other hand, in case oI a
positive diIIerence, then the controI action has to be penaIized`. The idea is to expIore the set oI
possibIe actions during Iearning and incorporate the beneIiciaI ones into the controIIer. Learning
in this way is reIated to triaI-and-error Iearning studied by psychoIogists in which behavior is
seIected according to its consequences.
GeneraIIy, the aIgorithms seIect probabiIisticaIIy actions Irom a set oI possibIe actions and update
action probabiIities on basis oI the evaIuation Ieedback. Most oI the aIgorithms are based on a
Iook-up tabIe representation oI the mapping Irom system states to actions (Barto et aI., 1983).
Each tabIe entry has to Iearn which controI action is best when that entry is accessed. It may be
aIso possibIe to use a parametric mapping Irom systems states to action probabiIities. GuIIapaIIi
(1990) adapted the weights oI a singIe Iayer network.
15.4 BARTO'S APPROACH: THE ASE-ACE COMBINATION
Barto, Sutton and Anderson (1983) have IormuIated reinIorcement Iearning` as a Iearning strategy,
which does not need a set oI exampIes provided by a teacher`. The system described by Barto expIores
the space oI aIternative input-output mappings and uses an evaIuative Ieedback (reinIorcement signaI)
on the consequences oI the controI signaI (network output) on the environment. It has been shown that
such reinIorcement Iearning aIgorithms are impIementing an on-Iine, incrementaI approximation to the
dynamic programming method Ior optimaI controI, and are aIso caIIed heuristic` dynamic
programming (Werbos, 1990).
The basic buiIding bIocks in the Barto network are an Associative Search EIement (ASE) which
uses a stochastic method to determine the correct reIation between input and output and an Adaptive
Critic EIement (ACE) which Iearns to give a correct prediction oI Iuture reward or punishment
(Fig. 15.2). The externaI reinIorcement signaI r can be generated by a speciaI sensor (Ior exampIe a
coIIision sensor oI a mobiIe robot) or be derived Irom the state vector. For exampIe, in controI
appIications, where the state s oI a system shouId remain in a certain part A oI the controI space,
reinIorcement is given by:
RLINIRCLMLNT LLARNINC 193
r
0 iI
otherwise
s A e
R
S
T
1
...(15.8)
15.4.1 AssocIatIve Search
In its most eIementary Iorm the ASE gives a binary output vaIue y
o
(t) e0, 1}; as a stochastic Iunction
oI an input vector. The totaI input oI the ASE is, simiIar to the neuron presented in chapter 2, the
weighted sum oI the inputs, with the exception that the bias input in this case is a stochastic variabIe N
with mean zero normaI distribution:
s(t) v x
sf f
f
N

1
(t) N
f
...(15.9)
The activation Iunction F is a threshoId such that
y
o
(t) y(t)
1 0
0
iI
otherwise
s t ( ) >
R
S
T
...(15.10)
For updating the weights, a Hebbian type oI Iearning ruIe is used. However, the update is weighted
with the reinIorcement signaI r(t) and an eIigibiIity` e
f
is deIined instead oI the product y
0
(t) x
f
(t) oI
input and output:
v
sf
(t 1) v
sf
(t) or(t) e
f
(t) ... (15.11)
where o is a Iearning Iactor. The eIigibiIity e
f
is given by
Fig. 15.2: Architecture of a reinforcement learning scheme with critic element.
Reinforcement
Reinforcement
detector
Decoder System
ACE
WC
1
WC
2
WC
n
WS
1
WS
2
ASE
WS
n
State vector
Internal
reinforcement
y
o
$
T
T
194 IIZZY LCIC AND NLIRAL NLTWRKS
e
f
(t 1) oe
f
(t) (1 o) y
0
(t) x
f
(t) ...(15.12)
with o the decay rate oI the eIigibiIity. The eIigibiIity is a sort oI memory`; e
f
is high iI the signaIs Irom
the input state unit f and the output unit are correIated over some time.
Using r(t) in expression (15.11) has the disadvantage that Iearning onIy Iinds pIace when there is an
externaI reinIorcement signaI. Instead oI r(t), usuaIIy a continuous internaI reinIorcement signaI $ r(t)
given by the ACE, is used.
Barto and Anandan (1985) proved convergence Ior the case oI a singIe binary output unit and a set
oI IinearIy independent patterns x
p
. In controI appIications, the input vector is the (n-dimensionaI) state
vector s oI the system. In order to obtain a Iinear independent set oI patterns x
p
, oIten a decoder` is
used, which divides the range oI each oI the input variabIes s
i
in a number oI intervaIs. The aim is to
divide the input (state) space in a number oI disjunct subspaces or boxes. The input vector can thereIore
onIy be in one subspace at a time. The decoder converts the input vector into a binary vaIued vector x,
with onIy one eIement equaI to one, indicating which subspace is currentIy visited. It has been shown
(Krose and Dam, 1992) that instead oI a-priori quantisation oI the input space, a seII-organizing
quantisation, based on methods described in this chapter, resuIts in a better perIormance.
15.4.2 AdaptIve CrItIc
The Adaptive Critic EIement (ACE, or evaIuation network`) is basicaIIy the same as described in
section 7.1. An error signaI is derived Irom the temporaI diIIerence oI two successive predictions (in this
case denoted by p!) and is used Ior training the ACE:
$ r (t) r(t) yp(t) p(t 1) ...(15.13)
p(t) is impIemented as a series oI weights` v
Ck
to the ACE such that
p(t) v
Ck
...(15.14)
iI the system is in state k at time t, denoted by x
k
1. The Iunction is Iearned by adjusting the v
Ck
s
according to a deIta-ruIe` with an error signaI o given by $ r(t):
Av
Ck
(t) $ r(t)h
f
(t) ...(15.15)
is the Iearning parameter and h
f
(t) indicates the trace` oI neuron x
f
:
h
f
(t) ih
f
(t 1) (1 i) x
f
(t 1) ...(15.16)
This trace is a Iow-pass IiIter or momentum, through which the credit assigned to state f increases
whiIe state f is active and decays exponentiaIIy aIter the activity oI f has expired. II $ r(t) is positive, the
action u oI the system has resuIted in a higher evaIuation vaIue, whereas a negative $ r(t) indicates a
deterioration oI the system. $ r(t) can be considered as an internaI reinIorcement signaI.
15.4.3 The Cart-PoIe System
An exampIe oI such a system is the cart-poIe baIancing system (see Fig. 15.3). Here, a dynamics
controIIer must controI the cart in such a way that the poIe aIways stands up straight. The controIIer
appIies a IeIt` or right` Iorce F oI Iixed magnitude to the cart, which may change direction at discrete
time intervaIs. The modeI has Iour state variabIes:
RLINIRCLMLNT LLARNINC 195
x the position oI the cart on the track,
u the angIe oI the poIe with the verticaI,
& x the cart veIocity, and
&
u the angIe veIocity oI the poIe.
Furthermore, a set oI parameters speciIy the poIe Iength and mass, cart mass, coeIIicients oI Iriction
between the cart and the track and at the hinge between the poIe and the cart, the controI Iorce
magnitude, and the Iorce due to gravity. The state space is partitioned on the basis oI the IoIIowing
quantisation threshoIds:
1. x : 0.8, 2.4 m
2. u : 0, 1, 6, 12
3. & x : 0.5, ~ m/s
4.
&
u : 50, ~ /s
This yieIds 3 6 3 3 162 regions corresponding to aII oI the combinations oI the intervaIs. The
decoder output is a 162-dimensionaI vector. A negative reinIorcement signaI is provided when the state
vector gets out oI the admissibIe range:
when x ~ 2.4, x 2.4, u ~ 12 or u 12.
The system has proved to soIve the probIem in about 75 Iearning steps.
15.5 REINFORCEMENT LEARNING VERSUS OPTIMAL CONTROL
The objective oI optimaI controI is to generate controI actions in order to optimize a predeIined
perIormance measure. One technique to Iind such a sequence oI controI actions which deIine an optimaI
controI poIicy is Dynamic Programming (DP). The method is based on the principIe oI optimaIity,
IormuIated by BeIIman (1957): Whatever the initiaI system state, iI the Iirst controI action is contained
x
F
q
Fig. 15.3: The cart-pole system.
196 IIZZY LCIC AND NLIRAL NLTWRKS
in an optimaI controI poIicy, then the remaining controI actions must constitute an optimaI controI
poIicy Ior the probIem with as initiaI system state the state remaining Irom the Iirst controI action. The
BeIIman equations` IoIIow directIy Irom the principIe oI optimaIity. SoIving the equations backwards
in time is caIIed dynamic programming.
Assume that a perIormance measure J(x
k
, u
k
, k) J x u k
i i
i k
N
( , , )

with r being the immediate costs, is


to be minimized. The minimum costs J
min
oI cost J can be derived by the BeIIman equations oI DP. The
equations Ior the discrete case are (White & Jordan, 1992):
J
min
(x
k
, u
k
, k) min
u U e
|J
min
x
k1
, u
k1
, k 1) r(x
k
, u
k
, k)| ...(15.17)
J
min
(x
N
) r(x
N
) ...(15.18)
The strategy Ior Iinding the optimaI controI actions is soIving equation (15.17) and (15.18) Irom
which u
k
can be derived. This can be achieved backwards, starting at state x
N
. The requirements are a
bounded N, and a modeI, which is assumed to be an exact representation oI the system and the
environment. The modeI has to provide the reIation between successive system states resuIting Irom
system dynamics, controI actions and disturbances. In practice, a soIution can be derived onIy Ior a
smaII N and simpIe systems. In order to deaI with Iarge or inIinity N, the perIormance measure couId be
deIined as a discounted sum oI Iuture costs as expressed by equation 15.2.
ReinIorcement Iearning provides a soIution Ior the probIem stated above without the use oI a modeI
oI the system and environment. RL is thereIore oIten caIIed an heuristic` dynamic programming
technique (Barto, Sutton, & Watkins, 1990),(Sutton, Barto, & WiIson, 1992), (Werbos, 1992). The most
directIy reIated RL-technique to DP is Q-Iearning (Watkins & Dayan, 1992). The basic idea in Q-
Iearning is to estimate a Iunction, Q, oI states and actions, where Q is the minimum discounted sum oI
Iuture costs J
min
(x
k
, u
k
, k) (the name Q-Iearning` comes Irom Watkins` notation). For convenience, the
notation with J is continued here:
$
J (x
k
, u
k
, k) yJ
min
(x
k1
, u
k1
, k 1) r(x
k
, u
k
, k) ...(15.19)
The optimaI controI ruIe can be expressed in terms oI
$
J by noting that an optimaI controI action Ior
state x
k
is any action u
k
that minimizes
$
J according to equation 7.6.
The estimate oI minimum cost
$
J is updated at time step k 1 according equation 7.5. The temporaI
diIIerence r(k)between the true` and expected perIormance is again used:
r(k) |ymin
u U e

$
J (x
k1
, u
k1
, k 1) r(x
k
, u
k
, k)|
$
J (x
k
, u
k
, k) ...(15.20)
Watkins has shown that the Iunction converges under some pre-speciIied conditions to the true
optimaI BeIImann equation (Watkins & Dayan, 1992): (1) the critic is impIemented as a Iook-up tabIe;
(2) the Iearning parameter o must converge to zero; (3) aII actions continue to be tried Irom aII states.
RLINIRCLMLNT LLARNINC 197
QUESTION BANK.
1. ExpIain reinIorcement Iearning scheme.
2. What are the various approaches oI controI networks used to Iind optimaI reIation between
system states and controI actions?
3. Describe the Barto network oI reinIorcement Iearning.
4. What are the buiIding bIocks oI Barto network? ExpIain them.
5. ExpIain the cast-poIe baIancing scheme.
6. Describe dynamic programming to Iind a sequence oI acontroI actions.
REFERENCES.
1. A.G. Barto, R.S. Sutton, and C.W. Anderson, Neuron Iike adaptive eIements that can soIve
diIIicuIt Iearning probIems, IEEE Transactions on Systems, Man and Cybernetics, VoI. 13, pp.
834-846, 1983.
2. R.S. Sutton, Learning to predict by the methods oI temporaI diIIerences, Machine Learning, VoI.
3, pp. 9-44, 1988.
3. C.J. C.H. Watkins, and P. Dayan, Q-Iearning, Machine Learning, VoI. 8, pp. 279-292, 1992.
4. P. Werbos, Approximate dynamic programming Ior reaI-time controI and neuraI modeIing, In
D. SoIge & D. White (Eds.), Handbook of Intelligent Control, Neural, Fuzzy, and Adaptive
Approaches. Van Nostrand ReinhoId, New York, 1992.
5. D. SoIge, and D. White, AppIied Iearning: optimaI controI Ior manuIacturing, In D. SoIge & D.
White (Eds.), Handbook of Intelligent Control, Neural, Fuzzy, and Adaptive Approaches, Van
Nostrand ReinhoId, New York, 1992.
6. A.G. Barto, R.S. Sutton, C.W. Anderson, NeuronIike adaptive eIements that can soIve diIIicuIt
Iearning probIems, IEEE 1ransactions on Systems, Man and Cybernetics, VoI. 13, pp. 834-846,
1983.
7. V. GuIIapaIIi, A stochastic reinIorcement Iearning aIgorithm Ior Iearning reaI-vaIued Iunctions,
Neural Netvorks, VoI. 3, pp. 671-692, 1990.
8. P.W. Werbos, A menu Ior designs oI reinIorcement Iearning over time. In W.T.M. III, R.S. Sutton,
& P.J. Werbos (Eds.), Neural Netvorks for Control, MIT Press/BradIord, 1990.
9. A.G. Barto, and P. Anandan, Pattern-recognizing stochastic Iearning automata, IEEE
1ransactions on Systems, Man and Cybernetics, VoI. 15, pp. 360-375, 1985.
10. B.J.A. Krose, and J.W.M. Dam, Learning to avoid coIIisions: A reinIorcement Iearning paradigm
Ior mobiIe robot manipuIation, In Proceedings oI IFAC/IFIP/IMACS InternationaI Symposium
on ArtiIiciaI InteIIigence in ReaI-Time ControI, DeIIt: IFAC, Luxemburg, pp. 295-300 1992.
11. R. BeIIman, Dynamic Programming, Princeton University Press, 1957.
12. D. White, and M. Jordan, OptimaI controI: a Ioundation Ior inteIIigent controI, In D. SoIge & D.
White (Eds.), Handbook of Intelligent Control, Neural, Fuzzy, and Adaptive Approaches, Van
Nostrand ReinhoId, New York, 1992.
19S IIZZY LCIC AND NLIRAL NLTWRKS
13. A.G. Barto, R.S. Sutton, and C. Watkins, SequentiaI decision probIems and neuraI networks. In
D. Touretsky (Ed.), Advances in Neural Information Processing II, DUNNO, 1990.
14. R.S. Sutton, A. Barto, and R. WiIson, ReinIorcement Iearning is direct adaptive optimaI controI,
IEEE Control Systems, VoI. 6, pp. 19-22, 1992.
15. P. Werbos, Approximate dynamic programming Ior reaI-time controI and neuraI modeIing, In D.
SoIge & D. White (Eds.), Handbook of Intelligent Control, Neural, Fuzzy, and Adaptive
Approaches. Van Nostrand ReinhoId, New York, 1992.
16. C.J. C.H. Watkins, and P. Dayan, Q-learning, Machine Learning, VoI. 8, pp. 279-292, 1992.
16
NeuraI Networks AppIications
+ 0 ) 2 6 - 4
16.1 INTRODUCTION
A Iist oI some appIications mentioned in the Iiterature IoIIows:
1. Aerospace: High perIormance aircraIt autopiIot, IIight path simuIation, aircraIt controI systems,
autopiIot enhancements, aircraIt component simuIation, aircraIt component IauIt detection.
2. Automotive: AutomobiIe automatic guidance system, warranty activity anaIysis.
3. Banking: Check and other document reading, credit appIication evaIuation.
4. Credit Card Activity Checking: NeuraI networks are used to spot unusuaI credit card activity
that might possibIy be associated with Ioss oI a credit card.
5. Defense: Weapon steering, target tracking, object discrimination, IaciaI recognition, new kinds
oI sensors, sonar, radar and image signaI processing incIuding data compression, Ieature
extraction and noise suppression, signaI/image identiIication.
6. Electronics: Code sequence prediction, integrated circuit chip Iayout, process controI, chip
IaiIure anaIysis, machine vision, voice synthesis, nonIinear modeIing.
7. Entertainment: Animation, speciaI eIIects, market Iorecasting.
8. Financial: ReaI estate appraisaI, Ioan advisor, mortgage screening, corporate bond rating, credit-
Iine use anaIysis, portIoIio trading program, corporate IinanciaI anaIysis, currency price
prediction.
9. Industrial: NeuraI networks are being trained to predict the output gasses oI Iurnaces and other
industriaI processes. They then repIace compIex and costIy equipment used Ior this purpose in the
past.
10. Insurance: PoIicy appIication evaIuation, product optimization.
11. Manufacturing: ManuIacturing process controI, product design and anaIysis, process and
machine diagnosis, reaI-time particIe identiIication, visuaI quaIity inspection systems, beer
testing, weIding quaIity anaIysis, paper quaIity prediction, computer-chip quaIity anaIysis,
anaIysis oI grinding operations, chemicaI product design anaIysis, machine maintenance
anaIysis, project bidding, pIanning and management, dynamic modeIing oI chemicaI process
system.
2 IIZZY LCIC AND NLIRAL NLTWRKS
12. Medical: Breast cancer ceII anaIysis, EEG and ECG anaIysis, prosthesis design, optimization oI
transpIant times, hospitaI expense reduction, hospitaI quaIity improvement, emergency-room test
advisement.
13. Oil and Gas: ExpIoration.
14. Robotics: Trajectory controI, IorkIiIt robot, manipuIator controIIers, vision systems.
15. Speech: Speech recognition, speech compression, voweI cIassiIication, text-to-speech synthesis.
16. Securities: Market anaIysis, automatic bond rating, stock trading advisory systems.
17. Telecommunications: Image and data compression, automated inIormation services, reaI-time
transIation oI spoken Ianguage, customer payment processing systems.
18. Transportation: Truck brake diagnosis systems, vehicIe scheduIing, routing systems.
16.2 ROBOT CONTROL
An important area oI appIication oI neuraI networks is in the IieId oI robotics. UsuaIIy, these networks
are designed to direct a manipuIator, which is the most important Iorm oI the industriaI robot, to grasp
objects, based on sensor data. Another appIications incIude the steering and path-pIanning oI
autonomous robot vehicIes.
In robotics, the major task invoIves making movements dependent on sensor data. There are Iour
reIated probIems to be distinguished (Craig, 1989):
Forward kinematics
Inverse kinematics
Dynamics
Trajectory generation
16.2.1 Forward KInematIcs
Kinematics is the science oI motion, which treats motion without regard to the Iorces, which cause it.
Within this science one studies the position, veIocity, acceIeration, and aII higher order derivatives oI
the position variabIes. A very basic probIem in the study oI mechanicaI manipuIation is that oI Iorward
kinematics. This is the static geometricaI probIem oI computing the position and orientation oI the end-
eIIector (hand`) oI the manipuIator. SpeciIicaIIy, given a set oI joint angIes, the Iorward kinematic
probIem is to compute the position and orientation oI the tooI Irame reIative to the base Irame (see
Fig. 16.1).
16.2.2 Inverse KInematIcs
This probIem is posed as IoIIows: given the position and orientation oI the end-eIIector oI the
manipuIator, caIcuIate aII possibIe sets oI joint angIes which couId be used to attain this given position
and orientation. This is a IundamentaI probIem in the practicaI use oI manipuIators.
The inverse kinematic probIem is not as simpIe as the Iorward one. Because the kinematic
equations are nonIinear, their soIution is not aIways easy or even possibIe in a cIosed Iorm. AIso, the
questions oI existence oI a soIution, and oI muItipIe soIutions, arise. SoIving this probIem is a Ieast
requirement Ior most robot controI systems.
NLIRAL NLTWRKS AIILICATINS 21
16.2.3 DynamIcs
Dynamics is a IieId oI study devoted to studying the Iorces required to cause motion. In order to
acceIerate a manipuIator Irom rest, gIide at a constant end-eIIector veIocity, and IinaIIy deceIerate to a
stop, a compIex set oI torque Iunctions must be appIied by the joint actuators. In dynamics not onIy the
geometricaI properties (kinematics) are used, but aIso the physicaI properties oI the robot are taken into
account. Take Ior instance the weight (inertia) oI the robot arm, which determines the Iorce required to
change the motion oI the arm. The dynamics introduces two extra probIems to the kinematic probIems.
The robot arm has a memory`. Its responds to a controI signaI depends aIso on its history (e.g.
previous positions, speed, acceIeration).
II a robot grabs an object then the dynamics change but the kinematics don`t. This is because the
weight oI the object has to be added to the weight oI the arm (that`s why robot arms are so heavy,
making the reIative weight change very smaII).
16.2.4 Trajectory GeneratIon
To move a manipuIator Irom here to there in a smooth, controIIed Iashion each joint must be moved via
a smooth Iunction oI time. ExactIy how to compute these motion Iunctions is the probIem oI trajectory
generation.
In the Iirst section oI this chapter we wiII discuss the probIems associated with the positioning oI the
end-eIIector (in eIIect, representing the inverse kinematics in combination with sensory
transIormation).
16.2.5 End-Effector PosItIonIng
The IinaI goaI in robot manipuIator controI is oIten the positioning oI the hand or end-eIIector in order
to be abIe to, e.g., pick up an object. With the accurate robot arm that are manuIactured, this task is oIten
reIativeIy simpIe, invoIving the IoIIowing steps:
Determine the target coordinates reIative to the base oI the robot. TypicaIIy, when this position is
not aIways the same, this is done with a number oI Iixed cameras or other sensors which observe
the work scene, Irom the image Irame determine the position oI the object in that Irame, and
perIorm a pre-determined coordinate transIormation;
Fig. 16.1 An exemplar robot manipulator.
tool frame
base frame
3
2
1
4
22 IIZZY LCIC AND NLIRAL NLTWRKS
With a precise modeI oI the robot (suppIied by the manuIacturer), caIcuIate the joint angIes to
reach the target (i.e., the inverse kinematics). This is a reIativeIy simpIe probIem;
Move the arm (dynamics controI) and cIose the gripper. Gripper controI is not a triviaI matter at
aII, but we wiII not Iocus on that.
16.2.5a InvoIvement of NeuraI Networks
So iI these parts are reIativeIy simpIe to soIve with a high accuracy, why invoIve neuraI networks? The
reason is the appIicabiIity oI robots. When traditionaI` methods are used to controI a robot arm,
accurate modeIs oI the sensors and manipuIators (in some cases with unknown parameters which have
to be estimated Irom the system`s behavior; yet stiII with accurate modeIs as starting point) are required
and the system must be caIibrated. AIso, systems, which suIIer Irom wear-and-tear, need Irequent
recaIibration or parameter determination. FinaIIy, the deveIopment oI more compIex (adaptive!) controI
methods aIIows the design and use oI more IIexibIe (i.e., Iess rigid) robot systems, both on the sensory
and motor side.
16.2.6 Camera-Robot CoordInatIon In FunctIon ApproxImatIon
The system we Iocus on in this section is a work IIoor observed by Iixed cameras, and a robot arm. The
visuaI system must identiIy the target as weII as determine the visuaI position oI the end-eIIector.
The target position x
target
together with the visuaI position oI the hand x
hand
are input to the neuraI
controIIer N(). This controIIer then generates a joint position u Ior the robot:
u N(x
target
, x
hand
) ...(16.1)
We can compare the neuraIIy generated uq with the optimaI uq
0
generated by a Iictitious perIect
controIIer R():
u
0
R(x
target
, x
hand
) ...(16.2)
The task oI Iearning is to make the N generate an output cIose enough` to u
0
.
There are two probIems associated with teaching N():
1. Generating Iearning sampIes which are in accordance with eq. (8.2). This is not triviaI, since in
useIuI appIications R() is an unknown Iunction. Instead, a Iorm oI seII-supervised or
unsupervised Iearning is required. Some exampIes to soIve this probIem are given beIow;
2. Constructing the mapping N() Irom the avaiIabIe Iearning sampIes. When the (usuaIIy randomIy
drawn) Iearning sampIes are avaiIabIe, a neuraI network uses these sampIes to represent the
whoIe input space over which the robot is active. This is evidentIy a Iorm oI interpoIation, but has
the probIem that the input space is oI a high dimensionaIity, and the sampIes are randomIy
distributed.
We wiII discuss three IundamentaIIy diIIerent approaches to neuraI networks Ior robot end-eIIector
positioning. In each oI these approaches, a soIution wiII be Iound Ior both the Iearning sampIe
generation and the Iunction representation.
NLIRAL NLTWRKS AIILICATINS 23
16.2.6a Approach-1: Feed-forward Networks
When using a Ieed-Iorward system Ior controIIing the manipuIator, a seII-supervised Iearning system
must be used. One such a system has been reported by PsaItis, Sideris and Yamamura (1988). Here, the
network, which is constrained to two-dimensionaI positioning oI the robot arm, Iearns by
experimentation. Three methods are proposed:
1. Indirect learning: In indirect Iearning, a Cartesian target point x in worId coordinates is
generated, e.g., by a two cameras Iooking at an object. This target point is Ied into the network,
which generates an angIe vector u. The manipuIator moves to position u, and the cameras
determine the new position x oI the end-eIIector in worId coordinates. This x again is input to the
network, resuIting in u. The network is then trained on the error r
1
u u (see Fig. 16.2).
However, minimization oI r
1
does not guarantee minimization oI the overaII error r x x. For
exampIe, the network oIten settIes at a soIution` that maps aII x`s to a singIe u (i.e., the
mapping II).
2. General learning: The method is basicaIIy very much Iike supervised Iearning, but here the pIant
input u must be provided by the user. Thus the network can directIy minimize ,u u,. The success
oI this method depends on the interpoIation capabiIities oI the network. Correct choice oI u may
pose a probIem.
Neural
network
Plant
Neural
network
e
1
q
q
x x
Fig. 16.2 Indirect learning system for robotics. In each cycle, the network is used in two different places: first in the
forward step, then for feeding back the error.
3. Specialized learning: Keep in mind that the goaI oI the training oI the network is to minimize the
error at the output oI the pIant: e x x. We can aIso train the network by backpropagating` this
error trough the pIant (compare this with the backpropagation oI the error in Chapter 12). This
method requires knowIedge oI the Jacobian matrix oI the pIant. A Jacobian matrix oI a
muItidimensionaI Iunction F is a matrix oI partiaI derivatives oI F, i.e., the muItidimensionaI Iorm
oI the derivative. For exampIe, iI we have Y F(x), i.e.,
y
1
f
1
(x
1
, x
2
, ..., x
n
)
y
2
f
2
(x
1
, x
2
, ..., x
n
)
M
y
m
f
m
(x
1
, x
2
, ..., x
n
)
24 IIZZY LCIC AND NLIRAL NLTWRKS
then oy
1

o
o
f
x
x
1
1
1
o
o
o
f
x
x
1
2
2
o ...
o
o
f
x
x
n
n
1
o
oy
2

o
o
f
x
x
2
1
1
o
o
o
f
x
x
2
2
2
o ...
o
o
f
x
x
n
n
2
o
M
oy
m

o
o
f
x
x
m
1
1
o
o
o
f
x
x
m
2
2
o ...
o
o
f
x
x
m
n
n
o
or oY
o
o
F
X
X o ... (16.3)
Eq. (16.3) is aIso written as
oY J(X) oX ... (16.4)
where J is the Jacobian matrix oI F. so, the Jacobian matrix can be used to caIcuIate the change in the
Iunction when its parameters change.
Now, in this case we have
J
if

o
ou
L
N
M
O
Q
P
P
i
i
...(16.5)
where P
i
(u) the ith eIement oI the pIant output Ior input u. The Iearning ruIe appIied here regards the
pIant as an additionaI and unmodiIiabIe Iayer in the neuraI network. The totaI error e x xis
propagated back through the pIant by caIcuIating the o
f
:
@
f
F(s
f
) o
u
i
i
f
i
P o
ou

( )
...(16.6)
o
i
x x
where I is used to change the scaIar u
f
into a vector. When the pIant is an unknown Iunction,
o
ou
P
i
f
( ) u
can
be approximated by
o
ou
P
i
f
( ) u
-
P h e P
i f f i
f
( ) ( ) u u u +
ou
...(16.7)
This approximate derivative can be measured by sIightIy changing the input the pIant and
measuring the changes in the output.
A somewhat simiIar approach is taken in (Krose, Korst, and Groen, 1990) and (Smagt and Krose,
1991). Again a two-Iayer Ieed-Iorward network is trained with back-propagation. However, instead oI
NLIRAL NLTWRKS AIILICATINS 25
Fig. 16.3 The system used for specialized learning.
caIcuIating a desired output vector the input vector which shouId have invoked the current output vector
is reconstructed, and back-propagation is appIied to this new input vector and the existing output vector.
The conIiguration used consists oI a monocuIar manipuIator, which has to grasp objects. Due to the
Iact that the camera is situated in the hand oI the robot, the task is to move the hand such that the object
is in the centre oI the image and has some predetermined size (in a Iater articIe, a bioIogicaIIy inspired
system is proposed (Smagt, Krose, and Groen, 1992) in which the visuaI IIow-IieId is used to account
Ior the monocuIarity oI the system, such that the dimensions oI the object need not to be known
anymore to the system).
One step towards the target consists oI the IoIIowing operations:
1. Measure the distance Irom the current position to the target position in camera domain, x,
2. Use this distance, together with the current state u oI the robot, as input Ior the neuraI network.
The network then generates a joint dispIacement vector Au;
3. Send Au to the manipuIator;
4. Again measure the distance Irom the current position to the target position in camera domain, x;
5. CaIcuIate the move made by the manipuIator in visuaI domain, x R
i
l +1
x, where R
i
l +1
is the
rotation matrix oI the second camera image with respect to the Iirst camera image;
6. Teach the Iearning pair (x R
i
l +1
x, u; Au) to the network.
This system has shown to Iearn correct behavior in onIy tens oI iterations, and to be very adaptive to
changes in the sensor or manipuIator (Smagt & Krose, 1991; Smagt, Groen, & Krose, 1993).
By using a Ieed-Iorward network, the avaiIabIe Iearning sampIes are approximated by a singIe,
smooth Iunction consisting oI a summation oI sigmoid Iunctions. A Ieed-Iorward network with one
Iayer oI sigmoid units is capabIe oI representing practicaIIy any Iunction. But how are the optimaI
weights determined in Iinite time to obtain this optimaI representation?
Experiments have shown that, aIthough a reasonabIe representation can be obtained in a short
period oI time, an accurate representation oI the Iunction that governs the Iearning sampIes is oIten not
IeasibIe or extremeIy diIIicuIt (Jansen et aI., 1994). The reason Ior this is the gIobaI character oI the
approximation obtained with a Ieed-Iorward network with sigmoid units: every weight in the network
has a gIobaI eIIect on the IinaI approximation that is obtained.
X
Neural Network Plant
X q
e
26 IIZZY LCIC AND NLIRAL NLTWRKS
BuiIding IocaI representations is the obvious way out: every part oI the network is responsibIe Ior a
smaII subspace oI the totaI input space. Thus accuracy is obtained IocaIIy (keep it smaII and simpIe).
This is typicaIIy obtained with a Kohonen network.
16.2.6b Approach 2: TopoIogy ConservIng Maps
Ritter, Martinetz, and SchuIten (1989) describe the use oI a Kohonen-Iike network Ior robot controI. We
wiII onIy describe the kinematics part, since it is the most interesting and straightIorward.
The system described by Ritter et aI. consists oI a robot manipuIator with three degrees oI Ireedom
(orientation oI the end-eIIector is not incIuded), which has to grab objects in 3D-space. The system is
observed by two Iixed cameras which output their (x, y) coordinates oI the object and the end eIIector
(see Fig. 16.4).
Fig. 16.4 A Kohonen network merging the output of two cameras.
Each run consists oI two movements. In the gross move, the observed Iocation oI the object x
(a Iour-component vector) is input to the network. As with the Kohonen network, the neuron k with
highest activation vaIue is seIected as winner, because its weight vector v
k
is nearest to x.
The neurons, which are arranged in a 3-dimensionaI Iattice, correspond in a 11 Iashion with
subregions oI the 3D workspace oI the robot, i.e., the neuronaI Iattice is a discrete representation oI the
workspace. With each neuron a vector u

and Jacobian matrix A are associated. During gross move u
k
is
Ied to the robot which makes its move, resuIting in retinaI coordinates x
g
oI the end-eIIector.
To correct Ior the discretization oI the working space, an additionaI move is made which is
dependent oI the distance between the neuron and the object in space v
k
x; this smaII dispIacement in
Cartesian space is transIated to an angIe change using the Jacobian A
k
:
u
IinaI
u
k
A
k
(x v
k
) ...(16.8)
which is a Iirst-order TayIor expansion oI u
IinaI
. The IinaI retinaI coordinates oI the end-eIIector aIter
this Iine move are in x
f
.
NLIRAL NLTWRKS AIILICATINS 27
Learning proceeds as IoIIows: when an improved estimate (u, A)
*
has been Iound, the IoIIowing
adaptations are made Ior aII neurons f:
v
f
new
v
f
oId
y(t) g
fk
(t) (x v
f
oId
)
(u, A)
f
new
(u, A)
f
oId
y(t) g
fk
(t) ((u, A)
*
f
(u, A)
f
oId
)
II g
fk
(t) g
fk
(t) o
fk
, this is simiIar to perceptron Iearning. Here, as with the Kohonen Iearning ruIe,
a distance Iunction is used such that g
fk
(t) and g
fk
(t) are Gaussians depending on the distance between
neurons f and k with a maximum at f k. An improved estimate (u, A)
*
is obtained as IoIIows:
u
*
u
k
A
k
(x x
f
) ...(16.9)
A
*
A
k
A
k
(x v
k
x
f
x
g
)
( )
,, ,,
x x
x x
f g
1
f g
2
A
k
(Au A
k
Ax)
A
A
x
x
1
,, ,,
2
...(16.10)
In eq. (16.9), the IinaI error x x
f
in Cartesian space is transIated to an error in joint space via
muItipIication by A
k
. This error is then added to u
k
to constitute the improved estimate u
*
(steepest
descent minimization oI error).
In eq. (16.10), Ax x
f
f
g
, i.e., the change in retinaI coordinates oI the end-eIIector due to the Iine
movement, and Au A
k
(x v
k
), i.e., the reIated joint angIes during Iine movement. Thus eq. (16.10) can
be recognized as an error-correction ruIe oI the Widrow-HoII type Ior Jacobians A. It appears that aIter
6,000 iterations the system approaches correct behavior, and that aIter 30,000 Iearning steps no
noteworthy deviation is present.
16.2.7 Robot Arm DynamIcs
WhiIe end-eIIector positioning via sensorrobot coordination is an important probIem to soIve, the
robot itseII wiII not move without dynamic controI oI its Iimbs. Again, accurate controI with non-
adaptive controIIers is possibIe onIy when accurate modeIs oI the robot are avaiIabIe, and the robot is
not too susceptibIe to wear-and-tear. This requirement has Ied to the current-day robots that are used in
many Iactories. But the appIication oI neuraI networks in this IieId changes these requirements.
One oI the Iirst neuraI networks which succeeded in doing dynamic controI oI a robot arm was
presented by Kawato, Furukawa, and Suzuki (1987). They describe a neuraI network which generates
motor commands Irom a desired trajectory in joint angIes. Their system does not incIude the trajectory
generation or the transIormation oI visuaI coordinates to body coordinates.
The network is extremeIy simpIe. In Iact, the system is a Ieed-Iorward network, but by careIuIIy
choosing the basis Iunctions, the network can be restricted to one Iearning Iayer such that Iinding the
optimaI is a triviaI task. In this case, the basis Iunctions are thus chosen that the Iunction that is
approximated is a Iinear combination oI those basis Iunctions.
Dynamics modeI. The manipuIator used consists oI three joints as the manipuIator in Fig. 16.1
without wrist joint. The desired trajectory u
d
(t), which is generated by another subsystem, is Ied into the
inverse-dynamics modeI (Fig. 16.5). The error between u
d
(t) and u(t) is Ied into the neuraI modeI.
2S IIZZY LCIC AND NLIRAL NLTWRKS
The neuraI modeI, which is shown in Fig. 16.6, consists oI three perceptrons, each one Ieeding in
one joint oI the manipuIator. The desired trajectory u
d
(u
d1
, u
d2
, u
d3
) is Ieed into 13 nonIinear
subsystems. The resuIting signaIs are weighted and summed, such that
1
ik
(t) v x
lk lk
i

1
13
(k 1, 2, 3) ...(16.11)
with
x
l1
f
1
(u
d1
(t), u
d2
(t), u
d3
(t))
x
l 2
x
l3
g
l
(u
d1
(t), u
d2
(t), u
d3
(t))
and f
l
and g
l
as in TabIe 16.1.
The Ieedback torque 1
f
(t) in Fig. 16.5 consists oI
1
fk
(t) K
pk
(u
dk
(t) u
k
(t)) K
vk

d t
dt
u( )
, (k1,2,3)
K
vk
0 unIess ,u
k
(t) u
dk
(objective point) , r
The Ieedback gains K
p
and K
v
were computed as (517.2, 746.0, 191.4)
T
and (16.2, 37.2, 8.4)
T
.
Next, the weights adapt using the deIta ruIe
y
dv
dt
ik
x
ik
1
1
x
ik
(1
fk
1
ik
), (k 1,2,3) ...(16.12)
A desired move pattern is shown in Fig. 16.7. AIter 20 minutes oI Iearning the Ieedback torques are
nearIy zero such that the system has successIuIIy Iearned the transIormation. AIthough the appIied
patterns are very dedicated, training with a repetitive pattern sin (v
k
t), with v
1
: v
2
: v
3
1: 2 : 3 is
aIso successIuI.
Fig. 16.5 The neural model proposed by Kawato et al.
Inverse dynamics
model
Manipulator
T t
i
( ) T t ( ) q
d
( ) t q( ) t
T t
f
( )
K
+
+
+
NLIRAL NLTWRKS AIILICATINS 29
Fig. 16.6 The neural network used by Kawato et al. There are three neurons, one per joint in the robot arm. Each neuron
feeds from thirteen nonlinear subsystems. The upper neuron is connected to the rotary base joint (cf. joint 1 in
Figure 16.1), the other two neurons to joints 2 and 3.
TabIe 16.1: Nonlinear transformations used in the Kawato model.
l f
1
(u
1
, u
2
, u
3
) g
1
(u
1
, u
2
, u
3
)
1
&&
u
1
&&
u
2
2
&&
u
1
sin
2
u
2
&&
u
3
3
&&
u
1
cos
2
u
2
&&
u
2
cos u
3
4
&&
u
1
sin
2
(u
2
+ u
3
)
&&
u
3
cos u
3
5
&&
u
1
cos
2
(u
2
+ u
3
)
&&
u
2
1

sin u
2
cos u
2
6
&&
u
1
sin u
2
sin (u
2
+ u
3
)
&
u
2
1
sin (u
2
+ u
3
) cos (u
2
+ u
3
)
7
&&
u
1
&
u
2
sin u
2
cos u
2
&
u
2
1
sin u
2
cos (u
2
+ u
3
)
8
&&
u
1

&
u
2
sin (u
2
+ u
3
) cos (u
2
+ u
3
)
&
u
2
1
cos u
2
sin (u
2
+ u
3
)
9
&&
u
1
&
u
2
sin u
2
cos (u
2
+ u
3
)
&
u
2
2
sin u
3
10
&&
u
1
&
u
2
cos u
2
sin (u
2
+ u
3
)
&
u
2
3
sin u
3
11
&&
u
1
&
u
3
sin (u
2
+ u
3
) cos (u
2
+ u
3
)
&
u
2
&&
u
3
sin u
3
12
&&
u
1
&
u
3
sin u
2
cos (u
2
+ u
3
)
&
u
2
13
&&
u
1
&
u
3
x t
1, 1
( )
x t
13, 1
( )
x t
1, 2
( )
x t
1, 3
( )
x t
13, 2
( )
x t
13, 3
( )
T t
1
( )
T t
2
( )
T t
3
( )
T t
i3
( )
T t
i2
( )
T t
i1
( )
S
S
S
q
d 2
( ) t
q
d 1
( ) t
q
d 3
( ) t
f
1
f
13
g
1
g
13
21 IIZZY LCIC AND NLIRAL NLTWRKS
The useIuIness oI neuraI aIgorithms is demonstrated by the Iact that noveI robot architectures,
which no Ionger need a very rigid structure to simpIiIy the controIIer, are now constructed.
16.3 DETECTION OF TOOL BREAKAGE IN MILLING OPERATIONS
The recent trend in manuIacturing is to achieve integrated and seII-adjusting machining systems, which
are capabIe oI machining varying parts without the supervision oI operators. The absence oI human
supervision requires on-Iine monitoring oI machining operation, ensuring saIe and eIIicient metaI
removaI rate and taking corrective actions in the event oI IaiIures and disturbances (YusuI, 1998). One
oI the most important monitoring requirements is a system capabIe oI detecting tooI breakages on-Iine.
UnIess recognized in time, tooI breakage can Iead to irreparabIe damage to the workpiece and possibIy
to the machine tooI itseII.
The cutting Iorce variation characteristics oI normaI and broken tooIs are diIIerent. With the normaI
and broken tooI cutting Iorce variation signaIs is possibIe to train neuraI networks. The miIIing
operations can be monitored with the neuraI network, aIter training. The use oI adaptive resonance
theory (ART) type neuraI network was evaIuated Ior detections oI tooI breakage, in this study (Ibrahim
and McIaughIin, 1993). AIso simuIation-based training is proposed to reduce the cost oI preparing the
systems that monitor the reaI cutting signaIs.
NeuraI networks with paraIIeI processing capabiIity and robust perIormances provide a new
approach to adaptive pattern recognition. Adaptive Resonance Theory (ART 2) architectures are neuraI
networks that carry out stabIe seII-organization oI recognition codes Ior arbitrary sequence oI input
patterns. ArtiIiciaI neuraI networks reIer to a group oI architectures oI the brain (Cheng and Sheng,
1995). NeuraI networks are aIso cIassiIied as supervised and unsupervised according to their Iearning
characteristics. In unsupervised Iearning, the neuraI network cIassiIies the signaIs by itseII.
In this section, ART type unsupervised neuraI network paradigm was used Ior detection oI tooI
breakage. ART paradigm was used Ior the IoIIowing reasons.
(a) The training oI paradigm is much Iaster than the back-propagation technique;
(b) The back-propagation technique generaIizes the given inIormation in order to store it inside the
initiaIIy seIected hidden Iayers. The back propagation technique cannot give reIiabIe decisions on
the suIIiciency oI previous training; and
Fig. 16.7 The desired joint pattern for joint 1. Joints 2 and 3 have similar time patterns.
- p
10 20 30 t s /
q
1
p
0
NLIRAL NLTWRKS AIILICATINS 211
(c) ART has very important advantage since it can be trained in the IieId and continuousIy updates
previous experience.
The unsupervised ART neuraI networks can monitor the signaI based on previous experience and
can update itseII automaticaIIy whiIe it is monitoring the signaIs (Carpenter and Grossberg, 1991).
When as ART network receives an input pattern, this bottom-up pattern is compared to the top-down, or
aIready known patterns. II the input pattern is matched with known pattern in memory, the weights oI
the modeI are changed to update the category. II the new pattern cannot be cIassiIied in a known
category, it is coded and cIassiIied as a new category.
Another important issue is the training oI the neuraI network. It is extremeIy expensive and time
consuming to coIIect cutting Iorce data at diIIerent cutting conditions with normaI and broken tooIs. To
overcome this probIem, simuIation-based training oI neuraI networks was introduced. SimuIated data
was used to seIect the best vigiIance oI the ART 2 type neuraI network and to evaIuate the perIormance
oI paradigm. The theoreticaI background oI ART 2 type neuraI network, the proposed data monitoring
system and their perIormance is presented in the paper.
16.3.1 UnsupervIsed AdaptIve Resonance Theory {ART) NeuraI
Networks
The theory oI adaptive resonance networks was Iirst introduces by Carpenter and Grossberg, (1987).
Adaptive resonance occurs when the input to a network and the Ieed back expectancies match. The
ART 2 neuraI networks deveIoped by Carpenter and Grossberg (1991) seII-organize recognition codes
in reaI time.
The basic ART 2 architectures consist oI two types oI nodes, the short term memory (STM) nodes,
which are temporary and IIexibIe, and the Iong term memory (LTM) nodes; which are permanent and
stabIe. The input pattern ( i ) is received by the STM, where it is normaIized, matched, Iearned, and
stored in the LTM (z
fi
). The STM is divided into two sets oI nodes, F
1
and F
2
. The STM F
1
nodes are
used Ior normaIization, controI, gain and Iearning procedures. The F
1
IieId in ART 2 incIudes a
combination oI normaIization and noise suppression, in addition to the comparison oI the bottom-up
and top-down signaIs needed Ior the reset mechanism. To accompIish this, F
1
uses the IoIIowing
equations to caIcuIate the nodes:
u
i

v
e v
i
+ ,, ,,
...(16.13)
v
i
S
i
au
i
...(16.14)
p
i
u
i
Zq( y
i
) z
fi
...(16.15)
q
i

p
e p
i
+ ,, ,,
...(16.16)
v
i
f(x
i
) bf(q
i
) ...(16.17)
x
i

v
e v
i
+ ,, ,,
...(16.18)
212 IIZZY LCIC AND NLIRAL NLTWRKS
Here ,,p,,, ,,v,, and ,,v,, denote the norms oI the vectors p, v and v, and s
i
is the input. The non-Iinear
signaI Iunction in equation (5) is used Ior noise suppression . the activation Iunction (f ) is given by the
equation.
f(x)
x x
x
iI
iI 0
>
s >
R
S
T
u
u 0
...(16.19)
where u is an appropriate constant. The Iunction f IiIters the noise Irom the signaI. u can be set to zero
Ior the case where IiItering is not desired. The constants a, b, and e are seIected based on the particuIar
appIication.
The STM F
2
nodes are used Ior the matching procedure. F
2
equations seIect or activate nodes in the
LTM. When F
2
chooses a node, aII other nodes in the LTM are inhibited, and onIy one is aIIowed to
interact with the STM. The node that gives the Iargest sum with the F
1
, F
2
input pattern (bottom-up) is
the key property that is used Ior node seIection.
Bottom-up inputs are caIcuIated as in ART 2 |FauseIt, 2004]
1
f
p z
i fi
i

...(16.20)
The f
th
node is seIected iI equation (16.20) is satisIied.
1
f
max 1
f
: 1, 2 ... N} ...(16.21)
Competition on F
2
resuIts in contrast enhancement where a singIe winning node is chosen each
time. The output Iunction oI F
2
is given by
g(y
i
)
d 1 f
i
N iI
otherwise
: , , ... R
S
T
1 2
0
...(16.22)
Equation (3) takes the IoIIowing Iorm:
p
i

u F
u dz f F
i
i if
iI is inactive
iI th node is is active
2
2
+
R
S
T
...(16.23)
The bottom-up and top-down LTM equations are
bottom-up (F
1
F
2
) :
dz
dt
if
g(y
f
) |p
i
z
if
] ...(16.24)
topbottom ( F
2
F
1
) :
dz
dt
if
g(y
f
) |p
i
z
if
] ...(16.25)
When F
2
is active, then equations (16.24) and (16.25) are modiIied Irom equation (16.22) to:
NLIRAL NLTWRKS AIILICATINS 213
dz
dt
if
d|p
i
z
if
] ...(16.26)
where d is a constant ( 0 d 1). An orienting ART 2 sub system is used to decide, iI a new pattern can
be matched to a known pattern by comparing with a given vigiIance parameter, p:
r
i

u cp
e u cp
i i
+
+ + ,, ,, ,, ,,
...(16.27)
II ,, r,, p e, then F
2
resets another node. II ,, r,, > p e , a match has been Iound and the new
pattern is Iearned by the system. The LTM node weights are recaIcuIated and the pattern is Iearned by
the system. II no match has been Iound aIter aII nodes have been activated, a new node is created, and
the new pattern is stored.
16.3.2 ResuIts and DIscussIon
The experimentaI data was coIIected with a Io1ur IIute end miII oI 12.07 mm diameter at various cutting
conditions. The ART neuraI network monitored the proIiIe oI the resuItant Iorce in diIIerent tests. In the
three tests, experiments were done at diIIerent Ieed rates with the good and broken tooI. The spindIe
speed, Ieed rate, and depth oI cut oI these diIIerent conditions are out Iined in TabIes 16.2-16.5. The
neuraI network did not have any prior inIormation at the beginning oI each test. In each one, the neuraI
network inspected the resuItant Iorce proIiIe and pIaced it into a category or initiated a new category iI
it was Iound to be diIIerent.
The vigiIance oI the ART 2 seIected either 0.96 or 0.98 in aII the tests. The ART assigned 2, 2, 3, 1
and 3 diIIerent categories Ior the good tooI. For the broken tooI 2, 1, 1, 1 and 3 diIIerent categories were
seIected. In aII the tests, the neuraI network cIassiIied the good and broken tooIs in diIIerent categories.
As seen in TabIes 16.2-16.5, the neuraI network generated onIy one category in the 2
nd
(TabIe 16.3) , 3
rd
(TabIe16.4) and 4
th
(TabIe 16.5) tests Ior the broken tooIs. On the other hand, the neuraI network
assigned more nodes to the signaI oI a good tooI with oIIset. It indicates that the broken tooI signaIs are
more simiIar to each other at diIIerent cutting conditions compared to the Iorce patterns oI normaI tooIs.
TabIe-16.2 Classification of experimental data with the ART. Vigilance of the neural network was 0.96.
The ART used four categories to classify all of the data.
Spindle speed (rpm) Depth of cut (mm) Feed rate mm/min Tool condition Category
500 1.016 50.8 G 1
500 1.016 50.8 B 2
500 1.016 101.6 G 1
500 1.016 101.6 B 3
500 1.016 203.2 G 1
500 1.016 203.2 B 3
500 1.016 254 G 4
500 1.016 254 B 5
214 IIZZY LCIC AND NLIRAL NLTWRKS
TabIe16.3: Classification of experimental data with the ART. Vigilance of the neural network was 0.96.
The ART used three categories to classify all of the data.
Spindle speed (rpm) Depth of cut (mm) Feed rate mm/min Tool condition Category
500 1.524 50.8 G 1
500 1.524 50.8 B 2
500 1.524 101.6 G 3
500 1.524 101.6 B 2
500 1.524 203.2 G 1
500 1.524 203.2 B 2
500 1.524 254 G 3
500 1.524 254 B 2
TabIe 16.4: Classification of experimental data with the ART 2. Vigilance of the neural network was 0.98.
The ART 2 used four categories to classify all of the data.
Spindle speed (rpm) Depth of cut (mm) Feed rate mm/min Tool condition Category
700 1.016 50.8 G 1
700 1.016 50.8 B 2
700 1.016 101.6 G 3
700 1.016 101.6 B 2
700 1.016 203.2 G 4
700 1.016 203.2 B 2
700 1.016 254 G 4
700 1.016 254 B 2
TabIe 16.5: Classification of experimental data with the ART 2. Vigilance of the neural network was 0.96.
The ART 2 used two categories to classify all of the data.
Spindle speed (rpm) Depth of cut (mm) Feed rate mm/min Tool condition Category
700 1.524 50.8 G 1
700 1.524 50.8 B 2
700 1.524 101.6 G 1
700 1.524 101.6 B 2
700 1.524 203.2 G 1
700 1.524 203.2 B 2
700 1.524 254 G 1
700 1.524 254 B 2
The ART gained Iirst experience on the simuIation data and Iater, the neuraI network inspected the
incoming signaIs and continued to assign new categories when diIIerent types oI signaIs were
encountered. AIter simuIation training, the neuraI network started to monitor the experimentaI data
coIIected at diIIerent conditions. The studies Iocused on seIection oI the best vigiIance, which requires
a minimum number oI nodes and has acceptabIe error rate. When the vigiIance oI 0.98 is used, the
network cIassiIied the perIect too input data into seven diIIerent categories and cIassiIied the broken
tooI input data into Iour diIIerent categories.
NLIRAL NLTWRKS AIILICATINS 215
QUESTION BANK.
1. Enumerate various appIications oI neuraI networks.
2. ExpIain the appIication seII-supervised Iearning to controI manipuIator.
3. ExpIain the appIication oI Kohonon network Ior robot controI.
4. ExpIain the appIication oI neuraI network Ior robot arm dynamics.
5. ExpIain the appIication oI ART Ior machining appIications.
REFERENCES.
1. J.J. Craig, Introduction to Robotics, Addison-WesIey PubIishing Company, 1989.
2. D. PsaItis, A. Sideris, and A.A. Yamamura, A muItiIayer neuraI network controIIer, IEEE Control
Systems Magazine, VoI. 8, No. 2, pp. 17-21, 1988.
3. B.J.A. Krose, M.J. Korst, and F.C.A. Groen, Learning strategies Ior a vision based neuraI
controIIer Ior a robot arm, In O. Kaynak (Ed.), IEEE InternationaI Workshop on InteIIigent Motor
ControI, pp. 199-203, IEEE, 1990.
4. P.P. Smagt, and B.J.A. Krose, A reaI-time Iearning neuraI robot controIIer, In T. Kohonen,
K. Makisara, O. SimuIa, & J. Kangas (Eds.), Proceedings oI the 1991 InternationaI ConIerence
on ArtiIiciaI NeuraI Networks, North-HoIIand/EIsevier Science PubIishers, pp. 351-356, 1991.
5. B.J.A. Krose, M.J. Korst, and F.C.A. Groen, Using time-to-contact to guide a robot manipuIator,
In Proceedings oI the 1992 IEEE/RSJ InternationaI ConIerence on InteIIigent Robots and
Systems, 1992.
6. P.P. Smagt, and B.J.A. Krose, A reaI-time Iearning neuraI robot controIIer, In T. Kohonen, K.M.
akisara, O. SimuIa, & J. Kangas (Eds.), Proceedings oI the 1991 InternationaI ConIerence on
ArtiIiciaI NeuraI Networks, North HoIIand/EIsevier Science PubIishers, pp. 351-356,1991.
7. B.J.A. Krose, F.C.A. Groen, and B.J.A. Krose, Robot Hand-Eye Coordination Using NeuraI
Networks, Tech. Rep. Nos. CS-93-10, Department oI Computer Systems, University oI
Amsterdam, 1993.
8. A. Jansen, P.P. Smagt, and F.C.A. Groen, Nested networks Ior robot controI, In A.F. Murray (Ed.),
NeuraI Network AppIications, KIuwer Academic PubIishers, 1994.
9. H.J. Ritter, T.M. Martinetz, and K.J. SchuIten, TopoIogy-conserving maps Ior Iearning visuo-
motor-coordination, Neural Netvorks, VoI. 2, pp. 159-168, 1989.
10. Kawato, M., Furukawa, K., & Suzuki, R. (1987). A hierarchicaI neuraI-network modeI Ior controI
and Iearning oI voIuntary movement. BioIogicaI Cybernetics, 57, 169-185.
11. A. YusuI 'In-processs detection oI tooI breakages using time series monitoring oI cutting Iorces,
Int. J. Mach.1ools Manufact, VoI. 28, No. 2, pp. 157-172, 1998.
12. N.T. Ibrahim and C. McIaughIin, 'Detection oI tooI breakage in miIIing II, The neuraI network
approach. Int. J. Mach.1ools Manufact, VoI. 33, No. 4, pp. 545-558, 1993.
13. A.H.Cheng and F. Sheng, 'Adaptive hamming net. A Fast Iearning ART 1 modeI without
searching, Int. J. of Neural Netvorks, VoI. 8, No. 4, pp. 605-618, 1995.
216 IIZZY LCIC AND NLIRAL NLTWRKS
14. Carpenter and Grossberg, 'ART 2-A: Adaptive resonance AIgorithm Ior rapid category Iearning
and recognition, Int. J. of Neural Netvork, VoI. 4, pp. 493-504, 1991.
15. L. FauseIt, Fundamentals of Neural Netvorks, Architectures, Algorithms, and Applications,
Pearson Education, 2
nd
Edition, 2004.
16. G.A. Carpenter, and S. Grossberg, A massiveIy paraIIeI architecture Ior a seII-organizing neuraI
pattern recognition machine, Computer Vision, Graphics, and Image Processing, VoI. 37, pp. 54-
115, 1987.
17. G.A. Carpenter, and S. Grossberg, ART 2: SeII-organization oI stabIe category recognition codes
Ior anaIog input patterns, Applied Optics, 26(23), 4919-4930, 1987.
17
Hybrid Fuzzy NeuraI Networks
+ 0 ) 2 6 - 4
17.1 INTRODUCTION
NeuraI networks and Iuzzy systems try to emuIate the operation oI human brain. NeuraI networks
concentrate on the structure oI human brain, i.e., on the 'hardware whereas Iuzzy Iogic systems
concentrate on 'soItware. Combining neuraI networks and Iuzzy systems in one uniIied Iramework has
become popuIar in the Iast Iew years.
17.2 HYBRID SYSTEMS
Designation 'neuroIuzzy has severaI diIIerent meanings. Sometimes 'neuroIuzzy is associated to
hybrid systems, which act on two distinct subprobIems. In that case, neuraI network is utiIized in the Iirst
subprobIem (e.g., in signaI processing) and Iuzzy Iogic is utiIized in the second subprobIem (e.g., in
reasoning task). NormaIIy, when taIking about the neuroIuzzy systems, the Iink between these two soIt
computing methods is understood to be stronger. In the IoIIowing Iight is tried to shed on the most
common diIIerent interpretations.
Hybrid systems are those Ior which more than one technoIogy is empIoyed to soIve the probIem.
The hybrid systems are cIassiIied as:
SequentiaI hybrids
AuxiIiary hybrids
Embedded hybrids
17.2.1 SequentIaI HybrId Systems
SequentiaI hybrid systems make use oI technoIogies in a pipeIine-Iike structure. The output oI one
technoIogy becomes the input to another technoIogy and so on. A simpIe sequentiaI hybrid system is
shown in Fig. 17.1. This is one oI the weakest Iorms oI hybridization because an integrated combination
oI the technoIogies is not present.
21S IIZZY LCIC AND NLIRAL NLTWRKS
17.2.2 AuxIIIary HybrId Systems
In this, one technoIogy caIIs the other as a subroutine to process or manipuIate inIormation by it. The
second technoIogy processes the inIormation provided by the Iirst and hands it over Ior Iurther use. An
auxiIiary hybrid system is shown in Fig. 17.2. The auxiIiary hybrid system is better than the sequentiaI
hybrid system.
Inputs
Technology A
Technology B
Outputs
Fig. 17.1 A sequential hybrid system.
Fig. 17.2 An auxiliary hybrid system.
17.2.3 Embedded HybrId Systems
In embedded hybrid systems, the technoIogies participating are integrated in such a manner that they
appear intertwined. The Iusion is so compIete that it wouId appear that no technoIogy couId be used
without the others Ior soIving the probIem. Fig. 17.3 iIIustrates an embedded hybrid system. The
embedded hybrid system is better than sequentiaI and auxiIiary hybrid systems.
Inputs
Output
Technology A
Technology B
IYBRID IIZZY NLIRAL NLTWRKS 219
17.3 FUZZY LOGIC IN LEARNING ALGORITHMS
A common approach is to use Iuzzy Iogic in neuraI networks to improve the Iearning abiIity. For
exampIe, a Iuzzy controI oI back-propagation is iIIustrated in Fig. 17.3. The purpose is to achieve a
Iaster rate oI convergence by controIIing the Iearning rate parameter with Iuzzy ruIes. RuIes are oI the
type:
Rule 1: IF(G o E is NB) AND (C o G o E is NB) 1HEN (C o LP is NS)

Rule 13: IF(G o E is ZB) AND (C o G o E is ZE) 1HEN (C o LP is PS)

Rule 25: IF(G o E is PB) AND (C o G o E is PB) 1HEN (C o LP is NS)


where C o LP is change oI Iearning parameter, G o E is the gradient oI the error surIace, C o G o E is
change oI G o E (approximation to second order gradient), NB, NS, ZE, PS and PB are Iuzzy sets
'negative big, 'negative smaII, 'zero equaI, 'positive smaII and 'positive big. (They aIso
incorporated in ruIes inIormation about the sign change oI gradient and inIormation about the
momentum constant.)
Inputs
Technology A
Technology B
Outputs
Fig. 17.3 An embedded hybrid system.
Desired performance Error
FLC MLP
Learning
parameter
performance Output
Actual performance
Fig. 17.4 Learning rate control by fuzzy logic. FLC is fuzzy logic controller, MLP is multiplayer perceptron.
22 IIZZY LCIC AND NLIRAL NLTWRKS
SimuIation resuIts show that the convergence oI Iuzzy back propagation is Iaster than standard back
propagation.
17.4 FUZZY NEURONS
Definition 17.1: (hybrid neural netvork) A hybrid neuraI network is a network with reaI vaIued
inputs x
i
e|0, 1|, (usuaIIy membership degrees) and reaI vaIued weights v
i
e
|0, 1|. Input and weight are
combined (i.e., product is repIaced) using t-norm
y 1(S(x
1
, v
1
), ...., S(x
d
, v
d
)) ...(17.1)
y S(1(x
1
, v
1
), ...., 1(x
d
, v
d
)) ...(17.2)
1(x
i
, v
i
), t-conorm S(x
i
, v
i
), or some other continuous operation. These combinations are again
combined (i.e., addition is repIaced) using t-norm, t-conorm, or some continuous operation. Activation
Iunction g can be any continuous Iunction. II we choose Iinear activation Iunction, t-norm (min) Ior
addition and t-conorm (max) Ior product, we get an AND fuzzy neuron (17.1), and iI they are chosen on
the contrary, we get an OR fuzzy neuron (17.2). Output oI (17.1) corresponds to min-max composition
and (17.2) corresponds to max-min composition known Irom the Iuzzy Iogic.
Another way to impIement Iuzzy neuron is to extend weights and/or inputs and/or outputs (or
targets) to Iuzzy numbers. The choices are as IoIIows:
1. crisp input, Iuzzy weights, Iuzzy output (or crisp output by deIuzziIication).
2. Iuzzy input, crisp weights, Iuzzy output
3. Iuzzy input, Iuzzy weights, Iuzzy output
which can be used to impIement Iuzzy IF-THEN ruIes. In addition there exists a type oI network where
the weights and targets are crisp and the inputs are Iuzzy. The networks oI this type are used in
cIassiIication probIems to map Iuzzy input vectors to crisp cIasses.
Definition 17.2: (regular fuzzy neural netvork, RFNN) A reguIar Iuzzy neuraI network is a network
with Iuzzy signaIs and/or Iuzzy weights, sigmoidaI activation Iunction and aII the operations are deIined
by extension principIe.
Example 17.1: Consider a simpIe network y g(v
1
x
1
v
2
x
2
), where the inputs and weights are Iuzzy
numbers. We use the extension principIe to caIcuIate v
i
x
i
Output Iuzzy set Y is computed by the
Extension principIe
Y(y)
( )
( )
v x v x g y
y
1 1 2 2
1
0 1
0
+ s s
R
S
T
iI
otherwise
...(17.3)
where g
1
(y) In y In (1 y) is simpIy the inverse Iunction oI Iogistic sigmoid g(z) 1/(1 e
z
).
The probIem oI reguIar Iuzzy neuraI networks is that they are monotonic, which means that the
Iuzzy neuraI nets based on the extension principIe are universaI approximators onIy Ior monotonic
Iunctions.
IYBRID IIZZY NLIRAL NLTWRKS 221
Theorem 17.1: g is an increasing Iunction oI its arguments, i.e. iI x
1
c x
1
and x
2
c x
2
(x
i
, x
i
, v
i
are
Iuzzy numbers), then
g(v
1
x
1
v
2
v
2
) c g(v
1
x
1
v
2
x
2
) ...(17.4)
Proof: Because min and max are increasing Iunctions, then
(v
1
x
1
v
2
x
2
) c (v
1
x
1
v
2
x
2
) ...(17.5)
which means that reguIar Iuzzy neuraI network is not a universaI approximator. This is a serious
drawback Ior the networks oI this type.
Definition 17.3: (hybrid fuzzy neural netvork, HFNN) A hybrid Iuzzy neuraI network is a network
with Iuzzy vaIued inputs x
i
e |0, 1|, and/or Iuzzy vaIued weights v
i
e |0, 1|. Input and weight are
combined using t-norm 1(x
i
, v
i
), t-conorm S(x
i
, v
i
), or some other continuous operation. These
combinations are again combined using t-norm, t-conorm, or some continuous operation. Activation
Iunction g can be any continuous Iunction.
Many researchers working with Iuzzy neurons IoIIow the basic principIes described above, but
there is no standard path to IoIIow.
17.5 NEURAL NETWORKS AS PRE-PROCESSORS OR POST-
PROCESSORS
One oI the biggest probIems with Iuzzy systems is the curse oI dimensionaIity. When the dimension oI
the probIem increases the size oI the Iuzzy modeI (and the size oI training set needed) grows
exponentiaIIy. The use oI more than 4 inputs may be impracticaI. The number oI combinations oI input
terms (possibIy aIso the number oI ruIes) is
m
i
i
d
j
...(17.6)
where m
i
is the number oI Iuzzy sets on axis i. For exampIe, iI we have Iive inputs and each input space
is partitioned into seven Iuzzy sets, the number oI combinations is 16,807. ThereIore, there is a strong
need Ior data reduction. The smaIIest number oI input variabIes shouId be used to expIain a maximaI
amount oI inIormation.
The most common method to decrease the dimension oI input space is the principaI component
anaIysis (PCA). The main goaI oI identiIying principaI components is to preserve as much reIevant
inIormation as possibIe. SeIecting M attributes Irom d is equivaIent to seIecting M basis vectors which
span the new subspace, and projecting the data onto this M-dimensionaI subspace. ThereIore,
identiIying principaI reduce the dimensionaIity oI a data in which there are Iarge number oI correIated
variabIes and at the same time retaining as much as possibIe oI the variation present in the data. This
reduction is achieved via a set oI Iinear transIormations, which transIorm input variabIes to a new set oI
variabIes (uncorreIated principaI components). The aIgorithm goes as IoIIows:
1. compute the mean oI inputs in data and subtract it oII
2. caIcuIate covariance matrix and its eigenvectors and eigenvaIues
222 IIZZY LCIC AND NLIRAL NLTWRKS
3. retain eigenvectors corresponding to the M Iargest eigenvaIues
4. project input vectors onto the eigenvectors
NeuraI networks may be used to perIorm the dimensionaIity reduction. A two-Iayer perceptron with
Iinear output units (number oI hidden units is M, with M d ) which is trained to map input vectors onto
themseIves by minimization oI sum-oI-squares error is abIe to perIorm a Iinear principaI component
anaIysis. II two additionaI nonIinear hidden Iayers are aIIowed to put into the network, the network can
be made to perIorm a non-Iinear principaI component anaIysis.
17.6 NEURAL NETWORKS AS TUNERS OF FUZZY LOGIC SYSTEMS
The simiIarities between neuraI networks and Iuzzy Iogic systems were noticed, which Ied to the
deveIopment oI neuroIuzzy systems. The originaI purpose oI neuroIuzzy systems was to incorporate
Iearning (and cIassiIication) capabiIity to Iuzzy systems or aIternativeIy to achieve simiIar transparency
(inteIIigibiIity) in neuraI networks as in Iuzzy systems.
Learning is assumed to reduce design costs, increase IIexibiIity, improve perIormance and decrease
human intervention. II prior knowIedge is unavaiIabIe and/or the pIant is time-varying then the most
sensibIe (possibIy the onIy) soIution is to utiIize Iearning capabiIities.
In the 1980s, a computationaIIy eIIicient training aIgorithm Ior muIti-Iayer neuraI networks was
discovered. It was named error back-propagation. The principIe oI the method is that it Iinds the
derivatives oI an error Iunction with respect to the weights in the network. The error Iunction can then
be minimized by using gradient based optimization aIgorithms. Since back-propagation can be appIied
to any Ieed Iorward network, some researchers began to represent Iuzzy Iogic systems as Ieed Iorward
networks. The idea was to use the training aIgorithm to adjust weights, centers and widths oI
membership Iunctions.
In Fig. 17.5, the Iuzzy Iogic system with d inputs one output approximator, is represented as a
network.
Multivariate memb.
functions Pm
1
*
*
*
*
Pm
1
Pm
k
W
1
W
M
+
Norma
lize
X
1
X
d
l
l
l
Denominator
= Gaussian membership function
Fig. 17.5 Neurofuzzy network for back-propagation.
IYBRID IIZZY NLIRAL NLTWRKS 223
The most common way to represent neuroIuzzy architecture is shown in Figure 17.6. AIthough it
Iooks diIIerent Irom the network in Fig. 17.5, it is basicaIIy the same network. OnIy the way to iIIustrate
network diIIers.
17.7 ADVANTAGES AND DRAWBACKS OF NEUROFUZZY SYSTEMS
The advantages are as IoIIows:
1. Weights are the centers oI THEN part Iuzzy sets (cIear meaning)
2. Other parameters (width and position oI membership Iunctions) have cIear meaning
3. InitiaI weights can be chosen appropriateIy (Iinguistic ruIes)
The drawback is the curse oI dimensionaIity.
17.S COMMITTEE OF NETWORKS
The method oI combining networks to Iorm a committee has been used to improve the generaIization
abiIity oI the networks. The perIormance oI committee can be better than the perIormance oI isoIated
networks. It can consist oI networks with diIIerent architectures, e.g., diIIerent types oI neuraI networks,
Iuzzy Iogic systems and conventionaI modeIs.
Fig. 17.6 Neurofuzzy network for function approximation and classification problems.
Neurofuzzy function approximator
Neurofuzzy classifier
S
Linguistic
variables
x
1
x
2
A
1
A
2
B
1
B
2
y
Multi-
plier
Norma-
lizer
Consequent
parameters
A
1
A
2
B
1
B
2
x
1
x
2
y
Class 1
Class 2
Class 3
Class 4
Class 5
Linguistic
variables
And Or
Max defuzzifier
224 IIZZY LCIC AND NLIRAL NLTWRKS
The committee prediction (output) is caIcuIated as an average oI the outputs oI the q networks:
y
COM
(x)
1
1
q
y x
i
i
q

_
( ) ...(17.7)
The reduction oI error arises Irom the reduced variance due to averaging.
Kosko (1991) has proposed the use oI weighted average to combine diIIerent Iuzzy systems that try
to predict the same input-output reIation. He does not restrict the Iorm oI Iuzzy system to be additive or
SAM system. The onIy diIIerence with (17.7) is that Kosko weights Iuzzy system outputs y
i
(x) by
credibiIities v
i
e |0, 1|, such that at Ieast one system has nonzero credibiIity.
17.9 FNN ARCHITECTURE BASED ON BACK PROPAGATION
The input-output reIation is numericaIIy caIcuIated by intervaI arithmetic via IeveI sets (i.e., o-cuts) oI
Iuzzy weights and Iuzzy inputs. Next we deIine the strong L-R type Iuzzy number, and show its good
properties in intervaI arithmetic. WhiIe deIining a cost Iunction Ior IeveI sets oI Iuzzy outputs and Iuzzy
targets, we propose a Iearning aIgorithm Irom the cost Iunction Ior adjusting three parameters oI each
strong L-R type Iuzzy weight. LastIy we examine the abiIity oI proposed Iuzzy neuraI network
impIementing on Iuzzy iI-then ruIes.
In the Iuzzy neuraI networks based on BP, neurons are organized into a number oI Iayers and the
signaIs IIow in one direction. There are no interactions and Ieedback Ioops among the neurons oI same
Iayer, Fig. 17.7 shows this modeI Iuzzy neuraI network.
Input Hidden Output
x
1
x
2
y
1
y
2
y
3
Fig. 17.7 A three layered fuzzy neural network
According to the type oI inputs and weights we deIine three diIIerent kinds oI Iuzzy neuraI
networks as IoIIows:
I. crisp weight and Iuzzy inputs;
II. Iuzzy weight and crisp inputs; and
III. Iuzzy weight and Iuzzy inputs.
Type (III) oI Iuzzy Ieed Iorward neuraI networks is presented here. In this modeI, the connections
between the Iayers may be iIIustrated as a matrix oI Iuzzy weights v
fi
, which provides a Iuzzy weight oI
a connection between ith neuron oI the input Iayer, and fth neuron oI the hidden Iayer. The totaI Iuzzy
input oI fth neuron in the second Iayer is deIined as:
IYBRID IIZZY NLIRAL NLTWRKS 225
Net
pf
v O
fi
i
N
pf f
.

+ _
1
O ...(17.8)
Where, Net
pf
is the totaI Iuzzy input oI the fth neuron oI hidden Iayer, O
pi
x
pi
is the ith Iuzzy input
oI that neuron, and O
f
is Iuzzy bias oI the fth neuron. The Iuzzy output oI the fth neuron is deIined with
the transIer Iunction
O
pf
f(Ney
pf
), f 1, 2, ..., N
H
...(17.9)
Furthermore, the Iuzzy output oI the kth neuron oI output Iayer is deIined as IoIIows:
Net
pk
v O
kf
f
N
pf k
H
.

+ _
1
O ...(17.10)
O
pk
f (Net
pk
), f 1, 2, ..., N
O
...(17.11)
The Iuzzy output is numericaIIy caIcuIated Ior IeveI sets (i.e. o - cut) oI Iuzzy inputs, Iuzzy weights
and Iuzzy biases. Next, we need to Iind out a type oI Iuzzy number to denote the Iuzzy inputs, Iuzzy
weights and Iuzzy biases, this type Iuzzy number has good property so that it can be easiIy adept to the
intervaI arithmetic.
Furthermore, Iet (x
p
, 1
p
) is the Iuzzy input-output pairs, and 1
p
(1
p1
, 1
p2
, ..., 1
p
N
O
) is N
O

dimensionaI Iuzzy target vector corresponding to the Iuzzy input vector x


p
.
The cost Iunction Ior the input-output pair (x
p
, 1
p
) is obtained as
e
p
e
ph
h
_
...(17.12)
The cost Iunction Ior the h-IeveI sets oI the Iuzzy output vector O
p
and the Iuzzy target vector are
deIined as
e
ph
e
pkh
k
N
O

_
1
...(17.13)
where
e
pkh
e
L
pkh
e
U
pkh
e
L
pkh
h
1 O
pk h
L
pk h
L
.
| | | |
d i
2
2
e
U
pkh
h
1 O
pk h
U
pk h
U
.
| | | |
d i
2
2
Next section we introduce the strong L-R type Iuzzy number, and put Iorward a FNN aIgorithm
226 IIZZY LCIC AND NLIRAL NLTWRKS
based BP.
17.9.1 Strong - 4 RepresentatIon of Fuzzy Numbers
Definition 17.4: A Iunction, usuaIIy denoted L or R, is a reIerence Iunction oI Iuzzy numbers iI
1. S(0) 1;
2. S(x) S (x);
3. S is non-increasing on |0, ~ |.
Definition 17.5: A Iuzzy number M is said to be an L-R type Iuzzy number iI
M(x)
L
x
a
x a
R
x
b
x b

F
H
G
I
K
J
s >
F
H
G
I
K
J
> >
R
S
|
|
T
|
|
iI
iI
0
0
...(17.14)
L is the IeIt and R Ior right reIerence. M is the mean vaIue oI M. a and b are caIIed IeIt and right
spreads, symboIicaIIy, we write M (ma)
LR
.
Definition 17.6: This kind oI Iuzzy number M is said to be an strong L-R type Iuzzy number iI
L(1) R(1) 0.
This kind oI Iuzzy number has properties as IoIIows:
1. The o-cuts oI every Iuzzy number are cIosed intervaIs oI reaI numbers
2. Fuzzy numbers are convex Iuzzy sets.
3. Let ( x), a 1, x a a such that, L( a) L(a) 0, same as R( b) R(v) 0, such that
the support oI every Iuzzy number is the intervaI (a, v) oI reaI numbers.
Those properties are essentiaI Ior deIining meaningIuI arithmetic operations on Iuzzy numbers.
Since each Iuzzy set is uniqueIy represented by its o-cut. These are cIosed intervaIs oI reaI numbers,
arithmetic operations on Iuzzy numbers can be deIined in terms oI arithmetic operation on cIosed
intervaIs oI reaI numbers. These operations are the cornerstone oI intervaI anaIysis, which is a weII-
estabIished area oI cIassicaI mathematics. We wiII utiIize them in next section to deIine arithmetic
operations on Iuzzy numbers.
The strong L-R type is an important kind oI Iuzzy numbers. The trianguIar Iuzzy number (T.F.N.) is
a speciaI cIass oI the strong L-R type Iuzzy number. We can write any strong L-R type Iuzzy number
symboIicaIIy as M (a, , v)
LR
, in other words, the strong L-R type Iuzzy number can be uniqueIy
represented by three parameters. AccordingIy we can adjust three parameters oI each strong L-R type
Iuzzy weight and Iuzzy biases.
We express the strong L-R type Iuzzy weight v
fi
, v
kf
and Iuzzy biases O
f
, O
k
by these parameters as
W
kf
(v
o
kf
, v

kf
, w
y
kf
)
LR
W
fi
(v
o
fi
, v

fi
, w
y
fi
)
LR
O
k
(u
o
k
, u

k
, u
y
k
)
LR
...(17.15)
IYBRID IIZZY NLIRAL NLTWRKS 227
O
f
(u
o
f
, u

f
, u
y
f
)
LR
Let c
kf

v v
v v
kf kf
kf kf
y
o

, c
fi

v v
v v
fi fi
fi fi
y
o

, c
k

u u
u u
y
o
k k
k k

, and c
f

u u
u u
y
o
f f
f f

Then v

kf

v c v
c
kf
v
kf kf
kf
+ +
+
o
1
, v

if
, u

k
, u

k
have same Iorm as v

kf
.
We discuss how to Iearn the strong L-R type Iuzzy weight v
kf
(v
o
kf
, v

kf
, v
v
kf
) between the fth
hidden unit and the kth output unit. SimiIar to RumeIhart, we can count the quantity oI adjustment Ior
each parameter by the cost Iunction
Av
L
kf
(t) q
o
o
e
v
ph
kf
L
Av
L
kf
(t 1) ...(17.16)
Av
U
kf
(t) q
o
o
e
v
ph
kf
U
Av
U
k f
(t 1) ...(17.17)
The derivates oI above can be written as IoIIows:
o
o
e
v
ph
kf
o

o
o
e
v
ph
kf h
| |
o

o
o
| | v
v
kf h
kf
o
o

o
o
e
v
ph
kf h
y
| |

o
o
| | v
v
kf h
y
kf
o
...(17.18)
o
o
e
v
ph
kf
y

o
o
e
v
ph
kf h
| |
o

o
o
| | v
v
kf h
kf
y
o

o
o
e
v
ph
kf h
y
| |

o
o
| | v
v
kf h
y
kf
y
...(17.19)
Since v
kf
is a strong L-R type Iuzzy number, it is h-IeveI and 0-IeveI have reIations as IoIIows:
| | v
kf h
o

v c v
c
kf
y
kf kf
kf
+ +
+
o
1

( ) v v
c
kf
y
kf
kf
o
1+
L
N
M
M
O
Q
P
P
L
1
(h) ...(17.20)
| | v
kf h
y

v c v
c
kf
y
kf kf
kf
+ +
+
o
1

c v v
c
kf kf
y
kf
kf
( )
o
1+
L
N
M
M
O
Q
P
P
R
1
(h) ...(17.21)
ThereIore,
o
o
e
v
ph
kf
o

o
o +
+
+
L
N
M
M
O
Q
P
P
e
v
c
c
L h
c
ph
kf h
kf
kf kf
| |
( )
1
o
1 1

o
o + +
L
N
M
M
O
Q
P
P
e
v
c
c
c
c
R h
ph
k f h
y
kf
kf
kf
kf
| |
( )
1
1 1
...(17.22)
o
o
e
v
ph
kf
y

o
o +

+
L
N
M
M
O
Q
P
P
e
v
c
c
L h
c
ph
kf h
kf
kf kf
| |
( )
1
=
1 1

o
o +
+
+
L
N
M
M
O
Q
P
P
e
v c
c
c
R h
ph
kf h
y
kf
kf
kf
| |
( )
1
1
1 1
...(17.23)
22S IIZZY LCIC AND NLIRAL NLTWRKS
These reIations expIain how the error signaIs
oe
v
ph
k f h
| |
o
and
oe
v
ph
k f h
| |
o
Ior the h-levelset propagate to
the 0-IeveI oI the strong L-R type Iuzzy weight W
kf
, and then, the Iuzzy weight is updated by the
IoIIowing ruIes:
v
o
kf
(t 1) v
o
kf
(t) Av
o
kf
...(17.24)
v
y
kf
(t 1) v
y
kf
(t) Av
y
kf
...(17.25)
We assume that n vaIues oI h (i.e., h
1
, h
2
, ., h
n
) are used Ior the Iearning oI the Iuzzy neuraI
network. In this way, the Iearning aIgorithm oI the Iuzzy neuraI network can be deIined as IoIIows:
1. InitiaIize the Iuzzy weights and the Iuzzy biases.
2. Repeat 3 Ior h h
1
, h
2
, ..., h
n
3. Repeat the IoIIowing procedure Ior p 1, 2, ..., m (m input-output pairs (x
p
, 1
p
)).
Forward caIcuIation: CaIcuIate the hIeveI set oI the Iuzzy output vector O
p
corresponding to the
Iuzzy input vector x
p
.
Back-propagation: Adjust the Iuzzy weights and the Iuzzy biases using the cost Iunction c
ph
.
4. II a pre-speciIied stopping condition (etc., the totaI number oI iterations) is not satisIied, go to 2.
Let (x
p
, 1
p
) is the Iuzzy input-output pairs, and 1
p
(1
p1
, 1
p2
, ..., 1
p
N
o
)whaere N
o
dimensionaI
Iuzzy taget vector corresponding to the Iuzy input vector x
p
.
17.9.2 SImuIatIon
We consider a n-dimension cIassiIication probIem. It can be described by IF-THEN ruIes as IoIIows:
IF x
p1
is A
p1
and, ., and x
pn
is A
pn
.
THEN xp (x
p1
, r, x
pn
) beIing to G
p
.
Where p 1, 2, ..., k.A
pi
is Iingistic term, Ior exampIe: 'Iarge, 'smaII etc. Ior the convenience oI
computing, we assume that A
pi
is a symmetricaI strong L-R type Iuzzy number, that is to say,
L R max (0, 1 ,x,
2
) .
We can soIve the above probIem by using the Iuzzy neuraI network we discussed. So we note the
Iuzzy input as A
p
(A
p1
, ..., A
pn
), and the target ouput 1
p
can be deIined as IoIIows:
1
p

1 1
0 2
iI cIass
iI cIass
A
A
p
p
e
e
R
S
T
...(17.26)
according to the target output 1 and the reaI output O.
We deIine the error Iunction:
e
ph
max (t
p
o
p
)
2
/2,O
p
e|y
p
|
h
} ...(17.27)
We shouId train this network in order to make e
ph
be minimum. It is easy to know that the error
Iunction become the cIassicaI error Iunction e ( ) / t o
p p
p
k
2
1
2

_
in BP aIgorithm when input vector A
p
IYBRID IIZZY NLIRAL NLTWRKS 229
and y
p
are reaI numbers. We train the Iuzzy neuraI network with h-IeveIsets (h 0.2, 0.4, 0.6, 0.8), the
error Iunction oI the pth pair is:
e
ph
h t o O y h
p p p p
p
max ( ) / , | | }
2
2 e
_
...(17.28)
The resuIt oI the trained Iuzzy neuraI network is shown in Fig. 17.8. We assume A
1
-A
4
beIong to
cIass1, and A
5
-A
8
beIong to cIass 2. Using the proposed Iearning aIgorithm, we get a satisIied curve aIter
300 echoes.
1
20
18
16
14
8
6
2
0.5
A
5
A
6
A
7
A
4
A
3
A
1
A
2
A
8
Class 1
Class 2
5 10 15 20
Fig. 17.8 The result of fuzzy classification
17.1 ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM {ANFIS)
A Iuzzy system can be considered to be a parameterized nonIinear map, caIIed f. Let`s write here
expIicitIy the expression oI f.
f (x)
y x
x
l
l
m
A
i
n
i
A
i
n
i
l
m
i
l
i


_ j
j _
F
H
G
I
K
J
F
H
G
I
K
J
1 1
1 1
1

( )
( )
... (17.29)
where yis a pIace oI output singIeton iI Mamdani reasoning is appIied or a constant iI Sugeno reasoning
is appIied. The membership Iunction
A
i
1
(x
i
) corresponds to the input x (x
1
, ..., x
n
) oI the ruIe l. The
=@ connective in the premise is carried out by a product and deIuzziIication by the center-oI-gravity
method.
23 IIZZY LCIC AND NLIRAL NLTWRKS
This can be Iurther written as
f (x) v b x
i i
i
m

_
1
( ) ...(17.30)
where v
i
y
b
f
(x)

A
i
n
i
A
i
n
i
l
m
i
l
i
l
x
x


j
j _
1
1 1
( )
( )
II F is a continuous, nonIinear map on a compact set, then f can approximate F to any desired
accuracy, i.e.
F - f
FS
...(17.31)
WeII-known theorems Irom Approximation theory Ior poIynomiaIs, can be extended to Iuzzy
systems (e.g. Wang: A course in Iuzzy systems and controI, Prentice HaII, 1997).
The IoIIowing theorems are Iound in R.F. Curtain and A.J. Pritchard: Functional Analysis in
Modern Applied Mathematics as CoroIIaries oI OrthogonaI Projection Theorem.
Theorem 17.5: Let F be a bounded Iunction on |a, b|, and E (x
1
, ..., x
k
) a set oI points in |a, b|. Then
there exists the Ieast squares poIynomiaI oI degree s n, p
k
n
which minimizes
, ( ) ( ), F x p x
i i
i
k
2
1
_
...(17.32)
over aII poIynomiaIs oI degree s n.
Theorem 17.6: II FeC|a, b|, then Ior any n > 0, there exists a best approximating poIynomiaI r
n
oI
degree s n such that
,,F r
n
,,
~
s ,,F p,,
~
...(17.33)
over aII poIynomiaIs p oI degree s n.
Remark: The message oI Theorem 2 is that poIynomiaIs are dense in the space oI continuous
Iunctions in HiIbert space. The same can aIso be said oI trigonometric Iunctions. We can aIso consider
the simpIer probIem oI approximating at IiniteIy many points.
Theorem 17.7: II F is a bounded Iunction on |a, b|, and E (x
1
, ..., x
k
) a set oI points in |a, b|. Then
there exists a best approximating poIynomiaI r
k
n
oI degree s n, which minimizes
max
0 s s i k
,F(x
i
) p(x
i
), ...(17.34)
over aII poIynomiaIs oI degree sn.
IYBRID IIZZY NLIRAL NLTWRKS 231
17.1.1 ANFIS Structure
Consider a Sugeno type oI Iuzzy system having the ruIe base
1. II x is A
1
and y is B
1
, then f
1
p
1
x q
1
y r
1
2. II x is A
2
and y is B
2
, then f
2
p
2
x q
2
y r
2
Let the membership Iunctions oI Iuzzy sets A
i
, B
i
, i 1, 2, be
A B
i i
,
.
In evaIuating the ruIes, choose product Ior T-norm (IogicaI and).
1. EvaIuating the ruIe premises resuIts in
v
i

A
i
(x)
B
i
(x), i 1, 2 ...(17.35)
2. EvaIuating the impIication and the ruIe consequences gives
f (x, y)
v x y f x y v x y f x y
v x y v x y
1 1 2 2
1 2
( , ) ( , ) ( , ) ( , )
( , ) ( , )
+
+
...(17.36)
Or Ieaving the arguments out
f
v f v f
v v
1 1 2 2
1 2
+
+
...(17.37)
This can be separated to phases by Iirst deIining
v
i

v
v v
i
1 2
+
...(17.38)
Then f can be written as
f v f
1 1
v f
2 2
...(17.39)
x
y
A
1
A
2
B
1
B
2
P
P
N
N
w
1
w
2
w
1
w
2
x y
x y
w f
1 1
w f
2 2
f
S
Fig. 17.9 ANFIS structure
232 IIZZY LCIC AND NLIRAL NLTWRKS
AII computations can be presented in a diagram Iorm.
QUESTION BANK.
1. What are the diIIerent types oI hybrid systems? ExpIain them schematicaIIy.
2. ExpIain the use oI Iuzzy Iogic in neuraI networks to improve the Iearning abiIity.
3. DeIine the IoIIowing:
Hybrid neuraI network
ReguIar Iuzzy neuraI network
Hybrid Iuzzy neuraI network
4. ExpIain the roIe oI neuraI networks as pre-processor or post-processor oI data.
5. What is committee oI networks? ExpIain.
6. Describe FNN architecture based on back propagation.
7. Describe adaptive neuro-Iuzzy inIerence system.
REFERENCES.
1. B. Kosko, Neural Netvork and Fuzzy Systems, Prentice HaII, EngIewood, NJ, 1991.
2. G.A. Carpenter and S. Grossberg, The art oI adaptive pattern recognition by a seII organizing
neuraI network, IEEE Computer Magazine, VoI. 21, No. 3, pp. 77-88.
3. S. Haykin, Neural Netvorks: A Comprehensive Foundation, IEEE computer society press,
MacmiIIan, NY.
4. S. KartaIopouIos, Understanding Neural Netvorks and Fuzzy Logic, IEEE press, NY.
5. H.M. Lee and B.H. Liu, Fuzzy BP: a neuraI network modeI with Iuzzy inIerence, Proceedings oI
InternationaI conIerence on artiIiciaI neuraI networks, pp. 1583-1588, 1994.
18
Hybrid Fuzzy NeuraI Networks
AppIications
+ 0 ) 2 6 - 4
1S.1 INTRODUCTION
The hybrid Iuzzy neuraI networks have a tremendous potentiaI to soIve engineering probIems. It is
improper to expect that iI the individuaI technoIogies are good then hybridization oI technoIogies
shouId turn out to be even better. Hybridization is perIormed Ior the purpose oI investing better methods
oI probIem soIving.
1S.2 TOOL BREAKAGE MONITORING SYSTEM FOR END MILLING
In recent years, gIobaI competition in industry has Ied to expIoration oI new means oI more eIIicient
production. In particuIar, IIexibIe manuIacturing systems (FMS) have been investigated as a tooI Ior
raising manuIacturing productivity and product quaIity whiIe decreasing production costs. One type oI
FMS is the Unmanned FIexibIe ManuIacturing System (UFMS), which has received a great deaI oI
attention because it repIaces human operators with robotic counterparts in manuIacturing and assembIy
ceIIs. In addition to perIorming the same Iunction as the FMS, the UFMS reduces direct Iabour costs and
prevents personaI oversights.
However, since human operators are absent in these systems, eIectronic sensors associated with a
decision-making system must monitor the process. The decision-making system anaIyzes inIormation
provided by sensors to make appropriate controI actions. In order to ensure eIIiciency within the system,
monitoring equipment and aIgorithms Ior the adaptation oI the manuIacturing process must be executed
accurateIy (AItintas, YeIIowIey, and TIusty, 1988).
To appIy the UFMS eIIectiveIy, manuIacturers must conIirm that the tooI is in good condition in
process; thereIore, automatic and rapid detection oI tooI breakage is centraI to successIuI UFMS
operation. By themseIves, computer numericaI controI (CNC) machines are not typicaIIy capabIe oI tooI
breakage detection. Since CNC machines cannot detect tooI conditions, they cannot haIt the process iI
the tooI becomes damaged. MateriaIs costs increase and product quaIity suIIers iI a broken tooI is used
in production. To reduce costs oI materiaIs and prevent damaged tooIs Irom negativeIy aIIecting
production, a detecting technoIogy Ior unexpected tooI breakage is needed (Lan and Naerheim, 1986).
234 IIZZY LCIC AND NLIRAL NLTWRKS
An in-process tooI breakage detection system was deveIoped in an end miIIing operation with
cutting Iorce and machining parameters oI spindIe speed, Ieed rate, and depth oI cut seIected as input
Iactors. The neuraI networks approach was empIoyed as the decision-making system that judges tooI
conditions.
The common method oI detecting tooI breakage in process invoIves Iorce signaIs resuItant Irom
tooI processes on raw materiaIs. A dynamometer sensor is the main device used to measure Iorce signaIs
in diIIerent machining operations. Lan and Naerheim (1986) proposed a time series auto regression
(AR) modeI oI Iorce signaIs to detect tooI breakage. The time-series-based tooth period modeI
technique (TPMT), which used the Iast a posterior error sequentiaI technique (FAEST), was appIied by
TanseI and McLaughIin (1993) to detect tooI breakage in miIIing. Tarng and Lee (1993) proposed using
the average and median Iorce oI each tooth in the miIIing operation. Measured by sensors, the average
and median Iorces oI each tooth were used as input inIormation. An appropriate threshoId was buiIt to
anaIyze inIormation and detect tooI conditions.
JemieIniak (1992) proposed that sudden changes in the average IeveI oI Iorce signaIs couId be due
to catastrophic tooI IaiIure (CTF) in turning operations. In this study, anaIyzing Iorce signaIs and
determining ampIitude IIuctuations aIIowed on-Iine tooI breakage detection. Zhang, Han, and Chen
(1995) used a teIemeter technique, a battery-powered sensing Iorce/torque cutter hoIder mounted on the
spindIe head with the transmitter, to measure Iorce in miIIing operations.
The appIication oI neuraI networks and Iuzzy Iogic in detecting tooI breakage has aIso been studied
in recent years. Tae and Dong (1992) deveIoped a Iuzzy pattern recognition technique and a time-series
AR modeI to detect tooI wear in turning operations. The variation oI dynamic cutting Iorce was used to
construct the Iuzzy dispersion pattern needed to distinguish tooI conditions. Ko, Cho, and Jung (1994)
introduced an unsupervised seII-organized neuraI network combined with an adaptive time-series AR
modeIing aIgorithm to monitor tooI breakage in miIIing operations. The machining parameters and
average peak Iorce were used to buiId the AR modeI and neuraI network.
Chen and BIack (1997) aIso introduced a Iuzzy-nets system to distinguish tooI conditions in miIIing
operations. The Iuzzy-nets system was designed to buiId the ruIe-bank and soIve conIIicting ruIes with
a computer. Variance oI adjacent peak Iorce was seIected as an input parameter to train the system and
buiId a ruIe-bank Ior detecting tooI breakage.
1S.2.1 MethodoIogy: Force SIgnaIs In the End MIIIIng CuttIng Process
MiIIing is a IundamentaI machining process in the operation oI CNC machines, and miIIing operations
can be oI two varieties: peripheraI and Iace miIIing. MiIIing is an interrupted cutting process, meaning
that each cutting tooth moving in the same direction generates a cycIic cutting Iorce ranging Irom zero
to maximum Iorce, and back to zero. This cycIic Iorce is graphed as a series oI peaks. The principIe oI
cutting Iorce can be Iurther deIined as resuItant Iorce, F
R
, generated in x and y directions. The resuItant
Iorce, F
ri
, generated Irom x and y directions, was used in this experiment expressed as:
F
r
i
F F
x y
i i
2 2
+ ...(18.1)
Where
F
r
i
is the resuItant Iorce oI point I
IYBRID IIZZY NLIRAL NLTWRKS AIILICATINS 235
F
x
i
is the Iorce in X direction oI point I
F
y
i

is the Iorce in Y direction oI point i.
TooI conditions and machining parameters aIIect the magnitude oI resuItant Iorce; thereIore, iI the
tooI condition is good, the peak measurement oI each tooth`s Iorce shouId be roughIy the same Irom
tooth to tooth during one revoIution oI the cutting process. ComparativeIy, iI a tooth is broken, it
generates a smaIIer peak Iorce because it carries a smaIIer chip Ioad. As a resuIt, the tooth that IoIIows
a broken tooth generates a higher peak Iorce as it extracts the chip that the broken tooI couId not.
AppIying the Iorce principIe, two main diIIerences can be used to detect tooI breakage:
1. maximum peak Iorce in each revoIution shouId diIIer between good and broken tooIs, and
maximum peak Iorce oI a broken tooI must be Iarger than that oI a good tooI;
2. Maximum variance Iorce oI adjacent peaks shouId diIIer between good and broken tooIs, and
maximum variance Iorce oI adjacent peaks oI broken tooIs must be Iarger than in undamaged
tooIs.
Figure 18. 1 iIIustrates the diagram oI undamaged and broken tooIs. In this work, an in-process tooI
breakage detection system was deveIoped in an end miIIing operation. The cutting Iorce and machining
parameters, such as spindIe speed, Ieed rate, and depth oI cut, were seIected as input Iactors. FinaIIy, the
neuraI networks approach was used as a decision-making system using input Irom sensors to judge tooI
conditions.
1S.2.2 NeuraI Networks
In this work, a back propagation neuraI network (BPNN) was chosen as the decision-making system
because it is the most representative and commonIy used aIgorithm. It is reIativeIy easy to appIy, has
been proven eIIective in deaIing with this kind oI task, and has aIso proven successIuI in practicaI
appIications.
Back propagation is intended Ior training Iayered (i.e., nodes are grouped in Iayers), Ieed Iorward
(i.e., the arcs joining nodes are unidirectionaI, and there are no cycIes) nets, as shown in Figure. 18.1.
This approach invoIves supervised Iearning, which requires a teacher` that knows the correct output Ior
any input, and uses gradient descent on the error provided by the teacher to train the weights. As the
weights oI the neuraI network were obtained, the prediction Iunction was achieved via the weight
inIormation. The propagation ruIe, aIso caIIed a summation or aggregated Iunction, was used to
combine or aggregate inputs passing through the connections Irom other neurons. It can be expressed as
S
i
a W a W
i fi fo

0
...(18.2)
S
k
a W a W
k fk fo

0
...(18.3)
where, i is an input neuron, f is a hidden neuron, and k is an output neuron, W
fi
and W
kf
denote weight
Irom input to hidden neuron, and Irom hidden to output neuron respectiveIy, whiIe a
0
represents the
bias, usuaIIy 1, and W
fi
and W
fi
are the weight oI bias.
236 IIZZY LCIC AND NLIRAL NLTWRKS
The transIer Iunction, aIso caIIed the output` or squashing` Iunction, is used to produce output
based on IeveI oI activation. Many diIIerent transIer Iunctions can be used to transIer data, and one is
caIIed the Sigmoid Function, expressed as:
O
y

1
1+ e
ay
...(18.4)
where ay is a Iunction oI S
f
and S
k
respectiveIy.
Comparing actuaI output oI neuraI networks to desired output, the process is repeated untiI the error
percentage IaIIs into a reasonabIe range.
Fig. 18.1 The amplitude of cutting force of a good and broken tool.
One revolution
1000
800
600
400
200
0
200
N
1200
Fa
b
b
a
Good Tool
One revolution
Fa
a
1400
N
1600
1200
800
600
1000
400
200
0
Key:
a = force signal
b = revolution signal
Cutting Parameters:
speed = 650 rmp, feed = 15 ipm
depth = 0.08 inch
Broken Tool
200
b
IYBRID IIZZY NLIRAL NLTWRKS AIILICATINS 237
1S.2.3 ExperImentaI DesIgn and System DeveIopment ExperImentaI
DesIgn
This experiment empIoyed a CNC verticaI machining center. A dynamometer was mounted on the tabIe
to measure cutting Iorce. A proximity sensor was buiIt near the spindIe to conIirm data in each
revoIution. Four /-inch doubt-end Iour-IIute high-speed steeI cutters were used. In each cutter, one side
oI the tooI was in proper working order and the other side was broken. The experimentaI setup was
shown in Fig. 18.2. The broken side oI the tooI possessed varying degrees oI breakage (Fig. 18.3). The
cutting parameters were set as:
1. Five IeveI oI spindIe speed (740, 500, 550, 600, and 650 revoIution per minute),
2. Five IeveI oI Ieed rate (6, 12, 18 and 24 inch per minute), and
3. Five IeveIs oI depth oI cut (0.06, 0.07, 0.08, 0.09 and 0.1 inches).
Fig. 18.2 The experimental setup.
Proximity Sensor
Dynamometer
Workpiece
Amplifier
DC Power Supply
A/D
Board
VM
C40
23S IIZZY LCIC AND NLIRAL NLTWRKS
T1 T2 T3 T4
1.5
1.5 2 2 3
3 2.5
1.5
Unit of value: mm.
Fig. 18.3 Diagram of broken tool.
The cutters used to execute the experiment were seIected randomIy. Cutting Iorce was measured in
voItage by the Charge AmpIiIier and transIormed to Newtons (N) via computer.
1S.2.4 NeuraI Network-BP System DeveIopment
To deveIop back propagation oI neuraI networks as a decision-making system, MATLAB soItware was
appIied to anaIyze data. Seven steps were conducted. In step one, prediction Iactors were determined in
order to perIorm the training process. Step 2 was necessary to anaIyze diIIerences between scaIing data
and unscaIing data. Step 3 deaIt with separating data into training and testing categories. From steps 4
through 6, parameters were deveIoped Ior the training process, incIuding the hidden Iayer/hidden
neuron, Iearning rate, and momentum Iactor. FinaIIy, in step 7, inIormation Irom the training process
was used to predict tooI conditions.
Step 1. Determine the factors
Five input neurons were used Ior tooI breakage prediction data:
1. SpindIe speed;
2. Feed rate;
3. Depth oI cut;
4. Maximum peak Iorce; and
5. Maximum variance oI adjacent peak Iorce.
Output neurons were either (1) Good, or (2) Broken.
Three hundred data points were used in this work. Good tooIs coIIected haII oI these and broken
tooIs coIIected the rest, and aII data were randomized using MS ExceI soItware.
Step 2. Analyze unscaling and scaling data
In order to avoid experimentaI errors resuIting Irom bigger vaIues oI some data sets, some pre-
processing was needed to obtain good training and prediction resuIts. Since histograms oI aII data sets
IYBRID IIZZY NLIRAL NLTWRKS AIILICATINS 239
were uniIorm or normaI distributions, the SimpIe Linear Mapping method was empIoyed Ior scaIing. To
compare the diIIerence between two sampIe sizes, some parameters were Iirst set and Iixed.
The number oI hidden neurons was set at 4, the Iearning rate was set at 1, and the momentum item
was 0.5. The number oI training cycIes was 2000, and the testing period was 5. TabIe 18.1 shows the
comparison oI the diIIerence between scaIing and unscaIing data. As one can see, errors in scaIing data
are smaIIer than in unscaIing data.
TabIe 18.1 Difference between scaling and non-scaling
Hidden Hidden Learn Momentum Train Testing RMS error RMS error
layer neuron rate factor Error Error of training of testing
Unscale 1 4 1 0.5 0.505 0.550 0.508 0.515
Scale 1 4 1 0.5 0.040 0.160 0.197 0.388
Step 3. Impact of the ratio of training and testing data
The 300 originaI 300 data records were randomized and separated into three groups. The Iirst group had
200 training data and 100 (200 100) testing data, the second had 225 training and 75 (225 75) testing
data, and the third had 250 training and 50 testing (250 50) data. TabIe 18.2 shows the Back
Propagation NeuraI Network (BPNN) with diIIerent sampIe sizes oI training and testing data. The Iast
Iour coIumns oI TabIe 18.2 show training and testing errors.
The training, testing, and RMS (root mean square) errors oI training oI the second group were
smaIIer than in other groups. The RMS errors oI testing data oI the second group sampIe were Iarger
than in the Iirst; however, the RMS errors oI each sampIe size were very simiIar. II sampIes had simiIar
error percentages, the sampIe with the Iargest training sampIe size was seIected because it provided
suIIicient inIormation to predict testing data. From the experimentaI design, the ideaI ratio between
training and testing data was 3:1 Ior neuraI networks. The 225 75 sampIe size was empIoyed in this
anaIysis.
TabIe 18.2: Different sample size of training and testing data
Tra*Tes Hidden Hidden Learn Momentum Train Testing RMS error RMS error
layer neuron rate factor Error Error of training of testing
200* 100 1 4 1 0.5 0.040 0.160 0.197 0.388
225* 75 1 4 1 0.5 0.036 0.093 0.185 0.298
250* 50 1 4 1 0.5 0.044 0.106 0.204 0.285
Step 4. Impact of the hidden layer and hidden neuron
In the beginning, the number oI hidden neurons was set at 5, and the hidden Iayer was set at 1. DiIIerent
hidden neurons and Iayers were tested to determine which vaIues wouId Iead to the smaIIest error
percentage. To this end, the hidden neurons were set at 4 and 6, and the hidden Iayers were set at 1 and
2. TabIe 18.3 shows the BPN with a diIIerent number oI hidden neurons and Iayers. According to this
data, the percentage error oI the triaI with 4 hidden neurons and 1 hidden Iayer was Iess than it was in aII
24 IIZZY LCIC AND NLIRAL NLTWRKS
other triaIs. Thus, the conIiguration contained in the 4 hidden neuron/1 hidden Iayer experiment was
chosen because it Ied to the best resuIts. The IormuIa, (input neurons output neurons)/2, was useIuI Ior
determining the number oI hidden neurons at the beginning.
TabIe 18.3: Different number of hidden neurons and layers
Neuron Neuron Learn Momentum Train Testing RMS error RMS error
in layer-1 in layer-2 rate factor Error Error of training of testing
3 ~ 1 0.5 0.080 0.200 0.256 0.410
4 ~ 1 0.5 0.036 0.093 0.185 0.298
5 ~ 1 0.5 0.049 0.093 0.193 0.288
3 3 1 0.5 0.267 0.320 0.338 0.375
4 2 1 0.5 0.489 0.453 0.512 0.504
4 3 1 0.5 0.316 0.333 0.407 0.425
4 4 1 0.5 0.164 0.227 0.362 0.414
Step 5. Impact of the learning rate
This step was necessary to determine the optimaI Iearning rate. The initiaI Iearning rate was 1. Three
additionaI Iearning rates, 0.5, 2, and 10, were used to compare with the initiaI. TabIe 18.4 shows the
BPN with diIIerent Iearning rate vaIues. TabIe 18.4 shows that the error percentage oI the Iearning rates
oI 0.5 and 1 were the same, in addition to being Iower than aII other Iearning rates. To achieve the
objective oI Iinding the smaIIest error percentage, the Iearning rate oI 1 was used, because the soItware
originaIIy recommended that vaIue.
TabIe 18.4 Different learning rate values
Hidden Hidden Learn Momentum Train Testing RMS error RMS error
layer neuron rate factor Error Error of training of testing
1 4 0.5 0.5 0.036 0.093 0.185 0.298
1 4 1 0.5 0.036 0.093 0.185 0.298
1 4 2 0.5 0.116 0.133 0.306 0.319
1 4 10 0.5 0.111 0.133 0.286 0.317
Step 6. Impact of the momentum factor
The IinaI step oI data anaIysis was to change the vaIue oI the momentum item to obtain the
conIiguration Ieading to the Iowest error percentage. The initiaI vaIue oI the momentum item was 0.5.
Another three vaIues, 0.3, 0.6, and 0.8, were seIected to compare with the initiaI vaIue. TabIe18.5 shows
the BPN with diIIerent vaIues Ior the momentum item. TabIe 18.5 shows that the percentage oI errors oI
momentum items oI 0.3 and 0.5 are the same, and smaIIer than aII others. To achieve the smaIIest error
percentage, the 0.5 momentum item was used, because the soItware originaIIy recommended that vaIue.
Step 7. Prediction
AIter compIeting anaIysis and obtaining inIormation about weight and input Iactors, equations to
predict tooI conditions were constructed. The variabIes a
1
, a
2
, .., and a
5
represent 5 input Iactors,
maximum peak Iorce, spindIe speed, Ieed rate, depth oI cut, and maximum variance oI adjacent peak
Iorce, respectiveIy. By appIication oI equations (18.5), the weighted vaIue oI hidden Iactors a
h1
, a
h2
, a
h3
,
a
h4
can be expressed as:
IYBRID IIZZY NLIRAL NLTWRKS AIILICATINS 241
a
h1

1
1
1 2 3 4 5
4 652 0 448 0 947 237 0 853 221
+
+ + + +
exp
| ( . ) ( . ) ( . ) (25. ) ( . ) (0. )| a a a a a
...(18.5)
a
h2

1
1
1 2 3 4 5
40 457 39 421 261 7 317 054 44 505
+
+ + + +
exp
| ( . ) ( . ) (15. ) ( . ) (21. ) ( . )| a a a a a
...(18.6)
a
h3

1
1
1 2 3 4 5
224 444 24 252 3 449 4 215 289
+
+ + + +
exp
| (10. ) (3. ) ( . ) ( . ) ( . ) (1. )| a a a a a
...(18.7)
a
h4

1
1
1 2 3 4 5
1 321 736 0 202 0 79 0 015 0 829
+
+ + + +
exp
| ( . ) (24. ) ( . ) ( . ) ( . ) ( . )| a a a a a
...(18.8)
a
out1

1
1
1 2 3 4
11 697 16 977 12 295 11 807 2 945
+
+ + +
exp
| ( . ) ( . ) ( . ) ( . ) ( . )| a a a a
h h h h
...(18.9)
a
out2

1
1
1 2 3 4
697 977 295 807 945
+
+ + +
exp
| (11. ) (16. ) (12. ) (11. ) (2. )| a a a a
h h h h
...(18.10)
FinaIIy, the output inIormation was used to judge the tooI conditions:
II a
out1
> a
out2
, then the tooI condition is used
II a
out1
< a
out2
, then the tooI is broken.
1S.2.5 FIndIngs and ConcIusIons
To operate the UFMS successIuIIy, in-process sensing techniques that reIate to rapid-response decision-
making systems were required. In this research, a neuraI networks modeI was deveIoped to judge
cutting Iorce Ior accurate in-process tooI breakage detection in miIIing operations. The neuraI networks
were capabIe oI detecting tooI conditions accurateIy and in process. The accuracy oI training data was
96.4, and the accuracy oI testing data was 90.7. PartiaI resuIts oI training and testing data are shown
in TabIes 18.6 and TabIe 18.7.
TabIe 18.5 Percent error of momentum items
Hidden Hidden Learn Momentum Train Testing RMS error RMS error
layer neuron rate factor Error Error of training of testing
1 4 1 0.3 0.036 0.093 0.185 0.298
1 4 1 0.5 0.036 0.093 0.185 0.298
1 4 1 0.6 0.049 0.093 0.212 0.288
1 4 1 0.8 0.116 0.133 0.306 0.322
242 IIZZY LCIC AND NLIRAL NLTWRKS
TabIe 18.6 Partial results of training data
Tool Input factors Output factors Prediction
Conditions a
1
a
2
a
3
a
4
a
5
a
out1
a
out2
Good 904.5 600 12 0.09 45.8 1 0 Good
Broken 1220.54 600 12 0.09 954.04 0 1 Broken
Good 634.64 550 10 0.06 274.64 0.98 0.02 Good
Broken 780.14 550 10 0.06 368.64 0.03 0.97 Broken
Good 674.36 650 15 0.06 248.64 1 0 Good
Broken 847.06 650 15 0.06 537.54 .02 0.98 Broken
Good 1239.4 500 18 0.07 225.2 1 0 Good
Broken 1677.92 500 18 0.07 1,159.64 0 1 Broken
Good 1413.76 450 15 0.08 300.56 0 1 Broken
Broken 1861.72 450 15 0.08 1,250.92 0.01 0.99 Broken
TabIe 18.7 Partial results of testing data
Tool Input factors Output factors Prediction
Conditions a
1
a
2
a
3
a
4
a
5
a
out1
a
out2
Good 711.94 550 15 0.06 177.02 1 0 Good
Broken 1296.56 550 15 0.06 481.66 0.01 0.99 Broken
Good 723.32 550 12 0.07 311.52 0.35 0.65 Broken
Broken 1215.96 550 12 0.07 1,042.50 0 1 Good
Good 1084.32 550 18 0.07 192.5 0.99 0.01 Broken
Broken 1542.92 550 18 0.07 1,303.92 0 1 Good
Good 1024.46 600 18 0.07 172.22 0.98 0.02 Broken
Broken 1253.28 600 18 0.07 580.3 0.03 0.97 Broken
Good 1507.18 450 20 0.08 550.06 0.98 0.02 Broken
Broken 1876.74 450 20 0.08 1,062.02 0 1 Broken
The weights oI hidden Iactors and output Iactors were generated Irom pre-trained neuraI networks,
and a program was written to process these weights in order to respond to the tooI conditions. ThereIore,
the in-process detection system demonstrated a very short response-time to tooI conditions. Since tooI
conditions couId be monitored in a reaI-time situation, the broken tooI couId be repIaced immediateIy to
prevent damage to the machine and mis-machining oI the product. However, since the weights were
obtained Irom the pre-trained process, they were Iixed when they were put into the detection program.
ThereIore, the whoIe system does not have the adaptive abiIity to Ieed back inIormation into the system.
In this work, depth oI cut was empIoyed as one input Iactor. However, in actuaI industriaI
environments, the surIace oI work materiaIs is oIten uneven, impIying that the depth oI cut set in the
computer might diIIer Irom that used to cut the workpiece. Under the circumstances, the neuraI
networks might generate a wrong decision and misjudge the tooI conditions due to IIuctuating depths oI
cut across machining.
IYBRID IIZZY NLIRAL NLTWRKS AIILICATINS 243
1S.3 CONTROL OF COMBUSTION
Beside the economicaI and environmentaI advantages, there are severaI diIIicuIties with burning bio
IueIs and municipaI wastes. The combustion oI those IueIs or IueI-mixtures has diIIerent properties
compared to the conventionaI IueIs (coaI, gas, and oiI). Bio IueIs and municipaI wastes are very
inhomogeneous. The properties (heat vaIue, moisture content, homogeneity, density, mix abiIity) may
vary in a Iarge range. It causes non-steady, agitated combustion conditions; even iI steady IueI Ieed
voIume is maintained, Ieading to increase in the emission IeveI and variation oI the generated heat IIow.
Those property variations are not predictabIe or directIy measurabIe, onIy their eIIects on the
combustion, on the steam generation and on the power production can be observed through the O
2
content oI the IIue gas.
This topic presents an ANFIS system, which determines the amount oI IueI Ied to the combustion
chamber. Combined with a stoichiometric modeI, it predicts the IIue gas properties, incIuding the O
2
content.
1S.3.1 AdaptIve Neuro-Fuzzy Inference System
Fuzzy Logic ControIIers (FLC) has pIayed an important roIe in the design and enhancement oI a vast
number oI appIications. The proper seIection oI the number, the type and the parameter oI the Iuzzy
membership Iunctions and ruIes are cruciaI Ior achieving the desired.
Adaptive Neuro-Fuzzy InIerence Systems are Iuzzy Sugeno modeIs put in the Iramework oI
adaptive systems to IaciIitate Iearning and adaptation. Such Iramework makes FLC more systematic and
Iess reIying on expert knowIedge. To present the ANFIS architecture, Iet us consider two-Iuzzy ruIes
based on a Iirst order Sugeno modeI:
Rule 1: iI (x is A
1
) and (y is B
1
), then
(f
1
p
1
x q
1
y r
1
)
Rule 2: iI (x is A
2
) and (y is B
2
), then
(f
2
p
2
x q
2
y r
2
)
One possibIe ANFIS architecture to impIement these two ruIes is shown in Fig. 18.4. Note that a
circIe indicates a Iixed node whereas a square indicates an adaptive node (the parameters are changed
during training). In the IoIIowing presentation O
Li
denotes the output oI node i in a Iayer L.
Layer 1: AII the nodes in this Iayer are adaptive nodes, I is the degree oI the membership oI the input
to the Iuzzy membership Iunction (MF) represented by the node:
O
1, i

Ai
(x)i 1, 2
O
1, i

Bi 2
(y)i 3, 4
A
i
and B
i
can be any appropriate Iuzzy sets in parameter Iorm. For exampIe, iI beII MF is used then,
244 IIZZY LCIC AND NLIRAL NLTWRKS

Ai
(x)
1
1
2
+
F
H
G
I
K
J
L
N
M
M
O
Q
P
P
x c
a
i
i
b
i

i 1, 2 ...(18.11)
where a
i
, b
i
and c
i
are the parameters Ior the MF.
Layer 2: The nodes in this Iayer are Iixed (not adaptive). These are IabeIIed M to indicate that they
pIay the roIe oI a simpIe muItipIier. The outputs oI these nodes are given by:
O
2, i

Ai
(x)
Bi
(y) i 1, 2 ...(18.12)
The output oI each node is this Iayer represents the Iiring strength oI the ruIe.
Layer 3: Nodes in this Iayer are aIso Iixed nodes. These are IabeIIed N to indicate that these perIorm
a normaIization oI the Iiring strength Irom previous Iayer. The output oI each node in this Iayer is given
by:
O
3, i
w
i

w
w w
i
1 2
+
i 1, 2. ...(18.13)
Layer 4: AII the nodes in this Iayer are adaptive nodes. The output oI each node is simpIy the product
oI the normaIized Iiring strength and a Iirst order poIynomiaI:
O
4, i
w
i
f
i
w
i
( p
i
x q
i
y r
i
) i 1, 2 ...(18.14)
where p
i
, q
i
and r
i
are design parameters (consequent parameter since they deaI with the then-part oI the
Iuzzy ruIe)
Fig. 18.4 ANFIS.
Layer 1 Layer 2
Layer 4
Layer 3
Layer 5
Backwards
Forwards
W
2
W
1
W
1
W
2
A
1
A
2
B
1
B
2
Y
A
M
M N
N
N Y
N Y
Y
S
IYBRID IIZZY NLIRAL NLTWRKS AIILICATINS 245
Layer 5: This Iayer has onIy one node IabeIIed S to indicate that is perIorms the Iunction oI a simpIe
summer. The output oI this singIe node is given by:
O
i, 5
f w f
i i
i


w f
w
i i
i
i
i

i 1, 2 ...(18.15)
The ANFIS architecture is not unique. Some Iayers can be combined and stiII produce the same
output. In this ANFIS architecture, there are two adaptive Iayers (12, 15). Layer 1 has three modiIiabIe
parameters (a
i
, b
i
and c
i
) pertaining to the input MFs |13|. These parameters are caIIed premise
parameters. Layer 4 has aIso three modiIiabIe parameters (p
i
, q
i
and
ri)
pertaining to the Iirst order
poIynomiaI. These parameters are caIIed consequent parameters.
1S.3.2 LearnIng Method of ANFIS
The task oI training aIgorithm Ior this architecture is tuning aII the modiIiabIe parameters to make the
ANFIS output match the training data |14|. Note here that a
i
, b
i
and c
i
describe the sigma, sIope and the
center oI the beII MF`s, respectiveIy. II these parameters are Iixed, the output oI the network becomes:
f
w
w w
1
1 2
+
f
1

w
w w
2
2 2
+
f
2
w f w f
1 1 2 2
+ ...(18.16)
w
1
(p
1
x q
1
y r
1
) w
2
(p
2
x q
2
y r
2
)
w x
1
b g p
1
w y
1
b g q
1
w
1
b g r
1 w x
2
b g
p
2
w y
2
b g q
2
w
2
b g r
2
This is a Iinear combination oI the modiIiabIe parameters. For this observation, we can divide the
parameter set S into two sets:
S S
1
S
2
...(18.17)
S set oI totaI parameters
S
1
set oI premise (nonIinear) parameters
S
2
set oI consequent (Iinear) parameters
direct sum
For the Iorward path (see Fig. 1), we can appIy Ieast square method to identiIy the consequent
parameters. Now Ior a given set oI vaIues oI S
1
, we can pIug training data and obtain a matrix equation:
AO y ...(18.18)
where O contains the unknown parameters in S
2
. This is a Iinear square probIem, and the soIution Ior E,
which is minimizes 2 A y E - , is the Ieast square estimator:
246 IIZZY LCIC AND NLIRAL NLTWRKS
O
*
(A
1
A)
1
A
1
y ...(18.19)
we can use aIso recursive Ieast square estimator in case oI on-Iine training.
For the backward path (Fig. 18.4), the error signaIs propagate backward. The premise parameters
are updated by descent method |15|, through minimizing the overaII quadratic cost Iunction
J(O)
1
2
2
1
| ( ) $( , )| y k y k
i
N
O

...(18.20)
in a recursive manner with respect E(S
2
). The update oI the parameters in the i
th
node in Iayer L
th
Iayer
can be written as
$
O
i
L
(k)
$
O
i
L
(k 1) q
o
o
E k
k
i
L
( )
$
( ) O
...(18.21)
where q is the Iearning rate and the gradient vector
o
o
E
i
L
$
O
r
L, i
o
o
$
$
,
Z
L i
i
L
O
...(18.22)
o
$
,
Z
L i
being the node`s output and r
L, i
, is the back propagated error signaI.
1S.3.3 ModeI of CombustIon
The roIe oI the combustion process is to produce the required heat energy Ior steam generation at the
possibIe highest combustion eIIiciency. The eIIiciency depends on the compIeteness oI burning and the
waste heat taken away in the IIue gas by the excess airIIow. The higher the burning rate and smaIIer the
waste heat is the higher eIIiciency. However, excess air is required Ior ensuring compIete burning. The
O
2
content oI the IIue gas is directIy reIated to the amount oI excess air. The aim oI the combustion
controI, Irom the eIIiciency point oI view, is to keep the O
2
content around 3-5 |16|. In muIti-IueI Iired
IIuidised bed power pIants (Fig. 18.5), it is a diIIicuIt task due to the inhomogeneous properties oI the
IueI.
The combustion modeI, utiIising the ANFIS structure based on |20|, caIcuIates the combustion
power (Pcomb) and IIue gas components (C
f
), incIuding the oxygen content, Irom the IueI screw QHz,
signaI primary airIIow F
p
, secondary airIIow F
s.
(see Fig. 18.6)
The oxygen and combustion power controIIer (Fig. 18.6) consists oI two paraIIeI PI controIIers. The
error signaI Iorm the oxygen content drives the PI controIIer oI the IIue screw signaI, whiIe combustion
power is controIIed by the primary airIIow. The structure oI the PI controIIer is
U(s) K
pi
K
S
h
1
i 1, 2 ...(18.23)
The reIerence signaIs Ior the IueI screw QHz, primary airIIow F
p
and secondary airIIow F
s
signaIs
are caIcuIated by the Iinearization modeI as a Iunction oI the reIerence oI the combustion power such as:
Q
Hz
0.2662P
comb
9.7207
IYBRID IIZZY NLIRAL NLTWRKS AIILICATINS 247
Fig. 18.5 Fluidized bed power plant.
F
p
0.0737P
comb
10.912 ...(18.24)
F
s
0.2662P
comb
4.005
1S.3.4 OptImIzatIon of the PI-ControIIers UsIng GenetIc AIgorIthms
Standard genetic or genetic searching aIgorithms are used Ior numericaI parameter optimization and are
based on the principIes oI evoIutionary genetics and the naturaI seIection process |17|.
A generaI genetic aIgorithm contains, usuaIIy, the next three procedures: seIection, crossover and
mutation. These procedures are responsibIe Ior the 'gIobaI search minimization Iunction without
testing aII the soIutions. SeIection corresponds to keeping the best members oI the popuIation to the next
generation to preserve the individuaI with good perIormance (eIite individuaIs) in Iitness Iunction.
Crossover originates new members Ior the popuIation, by a process oI mixing genetic inIormation Irom
both parents; depending oI the seIected parents the growing oI the Iitness oI the popuIation is Iaster or
Econ. Heater
(4)
(6)
X
I
(7)
Steam
flow to
network
Steam header
Steam Pressure
Furnace pressure
Air fan
(9)
Boiler
drum
Boiler
drum
(2)
(1)
(8)
Oxygen Induced draft
fan
Sec. air
Prim. air
(5)
(3)
Prim. air fan
Fuel feed
Furnace
waterwall
Steam
super heater
24S IIZZY LCIC AND NLIRAL NLTWRKS
Iower. Among many other soIutions, the parent seIection can be done with the rouIette method, by
tournament, and random|18|. Mutation is a process by which a percentage oI the genes are seIected in a
random Iashion and changed. In the impIemented aIgorithm a smaII popuIation oI 20 individuaIs, an
eIitism oI 2 individuaIs was used, the crossover oI one site spIicing is perIormed and aII the members are
subjected to mutation except the eIite. The mutation operator is a binary mask generated randomIy
according to a seIected rate that is superposed to the existing binary codiIication oI the popuIation
changing some oI the bits |19|. Crossover is perIormed over haII oI the popuIation, aIways incIuding the
eIite. The individuaIs are randomIy seIected with equaI opportunity to create the new popuIation. The
Iitness Iunction is
J
( $ ) y y
N
N
comb comb
1

k
( $ ) y y
N
O O
N
2 2
1

...(18.25)
where the k is a weighting Iactor. In our case k 2 to emphasize the importance oI oxygen content which
is directIy reIated to the IIue gas emissions.
The perIormance oI the controIIer based on the ANFIS modeI is compared to the perIormance oI the
reaI process. The reIerence signaI Ior the combustion power is taken Irom the measurement data. The
simuIation shows that by appIying the new controIIer structure together with the ANFIS modeI, much
smaIIer deviation in the oxygen content can be achieved whiIe satisIying the same demand Ior
combustion power.
Fig. 18.6 Control system of combustion process.
P
comb
P
comb
Linearlization
model
Oxygen and
combustion power
controllers
ANFIS fuel
flow model
and
Stoichiometric
combustion
and fuel gas
model
C
f
Q
H2
F
Q
F
s
IYBRID IIZZY NLIRAL NLTWRKS AIILICATINS 249
Fig. 18.7 Fitness function by the generation of the GA
Fig. 18.8 PI combustion power controller optimization with GA.
+
+
+
+
+
+
+
+
+
+
+ + +
+
+
+
+
+
+
+
Generations
1 2 3 4 5 6 7 8 9 10
2.5
3
3.5
4
4.5
5
6.5
5.5
6
Best
Average
Poorest
F
i
t
n
e
s
s
Optimization of the Pl controller
120
110
100
90
80
70
60
50
40
0 100 200 300 400 500 600 700 800 900 1000
Setpoint
Output
Time [S]
C
o
m
b
u
s
t
i
o
n
p
o
w
e
r
[
M
W
]
25 IIZZY LCIC AND NLIRAL NLTWRKS
Fig.18.9 PI Oxygen content controller optimization.
Fig. 18.10 Combustion power response: comparison of the achievement in real process and in the simulated control
system.
4.2
4.15
4.1
4.05
4
3.95
3.9
3.85
3.8
3.75
3.7
0 100 200 300 400 500 600 700 800 900 1000
O
x
y
g
e
n
c
o
n
t
e
n
t
[
%
]
Optimization of the Pl Controller
Setpoint
Output
Time [S]
Setpoint
Output
C
o
m
b
u
s
t
i
o
n
p
o
w
e
r
[
M
W
]
0 100 200 300 400 500 600 700 800 900 1000
30
40
50
60
70
80
90
120
Combustion power measurement signal
Time [S]
110
100
IYBRID IIZZY NLIRAL NLTWRKS AIILICATINS 251
QUESTION BANK.
1. ExpIain the appIication oI hybrid Iuzzy neuraI network Ior the tooI breakage monitoring oI end
miIIing.
2. ExpIain the appIication oI ANFIS to controI the combustion process.
REFERENCES.
1. Y. AItintas, I. YeIIowIey, and J. TIusty, The Detection oI TooI Breakage in MiIIing Operations,
Engineering for Industry, VoI. 110, pp. 271-277, 1988.
2. J.C. Chen, A Fuzzy-Nets TooI-Breakage Detection System Ior End-MiIIing Operations,
International Journal of Advanced Manufacturing 1echnology, VoI. 12, pp. 153-164, 1996.
3. J.C. Chen, and J.T. BIack, J.T, A Fuzzy-Nets In-process (FNIP) System Ior TooI-Breakage
Monitoring in End-MiIIing Operations, International Journal of Machining, 1ools manufac-
turing, VoI. 37, No. 6, pp. 783-800, 1997.
4. J.S. Jang, C.T. Sun, and E. Mizutani, Neuro-Fuzzy and Soft Computing, New Jersey: Prentice
HaII, 1997.
5. K. JemieIniak, Detection oI Cutting Edge Breakage in Turning, Annals of the CIRP, VoI. 41,
No. 1, pp. 97-100, 1992.
6. M. Lan, and Y. Naerheim, In-Process Detection oI TooI Breakage in MiIIing, Engineering for
Industry, 108, 191-197, 1986.
Fig. 18.11 Oxygen content response: comparison of the achievement in real process and in the simulated control
system.
0
1
2
3
4
5
6
7
9
8
0 100 200 300 600 400 500 700 800 900 1000
Time [S]
Oxygen content measurement signal
O
x
y
g
e
n
c
o
n
t
e
n
t
[
%
]
Output
Measurement
Setpoint = 4%
252 IIZZY LCIC AND NLIRAL NLTWRKS
7. T.J. Ko, and D.W. Cho, D.W and M.Y. Jung, On-Iine Monitoring oI TooI Breakage in Face
MiIIing: Using a SeII-Organized NeuraI Network, Journal of Manufacturing Systems, VoI. 14,
No. 2, pp. 80-90, 1990.
8. I.N. TanseI, and C. McLanghIin, Detection oI TooI Breakage in MiIIing Operation-I. The Time
Series AnaIysis Approach, International Journal of Machining, tools Manufacturing, VoI. 33,
No. 4, pp. 531-544, 1993.
9. I.N. TanseI, and C. McLanghIin, Detection oI TooI Breakage in MiIIing Operation-II. The NeuraI
Network Approach, International Journal of Machining, 1ools Manufacturing, VoI.33, No. 4, pp.
545-588, 1993.
10. Y.S. Tarng, and B.Y. Lee, A Sensor Ior the Detection oI TooI Breakage in NC MiIIing, Journal of
Materials Processing 1echnology, VoI. 36, pp. 259-272, 1993.
11. D. Zhang, Y. Han, and D. Chen, On-Iine Detection oI TooI Breakage Using TeIemetering oI
Cutting Forces in MiIIing, International Journal of Machining, 1ools Manufacturing, VoI. 35,
No. 1, pp. 19-27, 1995.
12. R. Jang, C. Sun, and E. Mizutani, Neuro-fuzzy and Soft Computation, Prentice HaII, NJ,1997.
13. F.A. AIturki, and A.B. Abdennour, Neuro-Iuzzy controI oI a steam boiIer turbine unit,
Proceeding of the 1999 IEEE, InternationaI ConIerence on ControI AppIications, Hawai, USA,
pp. 1050-1055, 1999.
14. E. Ikonen, and K. Najim, Fuzzy neuraI networks and appIication to the FBC process, IEE Proc.-
ControI Theory AppIications, VoI. 143, pp. 259-269, May 1996.
15. S.H. Kim, Y.H. Kim, K.B. Sim, and H.T Jeon, On DeveIoping an adaptive neuraI-Iuzzy controI
system, Proc. IEEE/RSJ ConIerence on inteIIigent robots and systems Yokohama, Japan, pp. 950-
957, JuIy 1993.
16. K. Leppkoski and J. Kovacs, Hybrid modeI oI oxygen content in IIue gas. Proc. IASTED
InternationaI ConIerence on AppIied ModeIIing and SimuIation, Nov, 2002, Cambridge, MA,
USA, pp. 341- 346.
17. J.H. HoIIand, Adaptation in Natural and Artificial System, MIT Press, 1992.
18. J.S.Yang and M.L. West, A Case Study oI PID ContoIIer tunning by Genetic AIgorithm
Proccedings oI IASTED InternationaI ConIerence on ModeIIing and ControI, Innsbruck, 2001.
19. J. Vieria, and A. Mota, Water Gas Heater NonIinear PhysicaI ModeI: Oprimazation with Genetic
AIgorithms Proccedings oI IASTED InternationaI ConIerence on ModeIIing IdentiIication and
ControI, GrindeIwaId, SwitzerIand 2004.
20. Z. Himer, V. Wertz, J. Kovacs, and U. KorteIa, Neuro- Iuzzy modeI oI IIue gas oxygen content,
Proceedings oI IASTED InternationaI ConIerence on ModeIing IdentiIication and ControI,
GrindeIwaId, SwitzerIand, 2004.
1@AN
A
A back propagation neural network 235
=-Cut 11
Activation Iunction 124
Adaline 125, 129, 133
Adaptive Critic 194
Adaptive Critic Element 192
Adaptive linear element 133
Adaptive Resonance Theory 182
Andlike 62
Andness is 62
ANEIS 229
ANEIS Structure 231
Anti-ReIlexivity 20
Anti-Symmetricity 20
Applications oI Iuzzy logic 95
Approximate reasoning 41
Arithmetic mean 59
Arithmetical mean 58
ART 1 169, 183, 185
ART 2 211
ArtiIicial network 122
ArtiIicial neural 3
ArtiIicial neural network 3, 123
ASE-ACE combination 192
Associative learning 125
Associative Memory 163
Associative Search 193
Associative Search Element 192
Associativity 54, 55
Asymmetric divergence 166
Auto-associator network 161
Auxiliary Hybrid Systems 218
Auxiliary hybrids 217
Average error 151
Averaging operators 58
Axon 3, 121
B
Back-propagation 3, 139, 142
Barto network 192
Barto`s approach 192
Basic property 45
Bellman equations 196
Bias 127
Binary Iuzzy relation 21
Biological neural network 121, 122
Boltzmann machines 165
Boolean logic 3
C
Calssical modus ponens 45
Cart-Pole system 194
Cartesian product 24
Cells 122
Center oI-gravity method 74
Center-oI-area 87
Centroid deIuzziIication 90, 9
Characteristic oI Iuzzy systems 9
Classical modus ponens 44
Classical modus tollens 45
#" IIZZY LCIC AND NLIRAL NLTWRKS
Classical N-array relation 19
Clustering 169, 170
Committee 223
Committee oI networks 223
Commutative 61
Commutativity 58
Compensatory 58
Competitive learning 170
Complement 17
Component extractor 181
Compositional rule 44
Conjugate gradient minimization 148
Conjunction rule 43
Contrast enhancement 187
Control oI combustion 243
Control room temperature 100
Controller network 191
Convergence theorem 131
Convex Iuzzy set 11
Cost Iunction 172
Counter propagation 174
Critic 190
D
DeIuzziIication 67, 103
DeIuzziIier 68, 86
Delta 131
Delta rule 134
Dendrites 3, 121
Dimensionality reduction 169
Direction set minimization method 148
Discrete membership Iunction 10
Disjunction Rule 43
Dot product 170
Dubois and prade 55
Dynamic Programming 195
Dynamics 201
E
Eigenvectors 181
Elman Network 159
Embedded Hybrid Systems 218
Embedded hybrids 217
Empty Euzzy Set 15
End-eIIector positioning 201
End milling 233
End milling cutting process 234
Entailment Rule 43
Entropy 63
Equivalence 20
Error back-propagation 222
Error Iunction 173
Euclidean distance 171
Evaluation network 194
Exclusive-or 135
Expressive power 151
Extremal conditions 58
F
Eeature extraction 169
Eeed-Iorward network 139, 150
Eeed-Iorward networks 125
Eeedback control system 81
Eirst-oI-Maxima 87
Elexible manuIacturing systems 233
ENN architecture 224
Eollow-the-leader clustering algorithm 185
Eorward Kinematics 200
Erank 55
EuzziIication 8
Euzzy Approach 116
Euzzy control 8
Euzzy implication operator 32
Euzzy implications 30
Euzzy logic 1, 1, 6, 7
Euzzy logic controller 82, 84
Euzzy Logic Controllers 243
Euzzy logic is 1, 1
Euzzy neuron 220
Euzzy Number 12
Euzzy Point 15
Euzzy relations 19, 21
Euzzy rule-base system 71
Euzzy set 9
Euzzy singleton 63
Euzzy systems 2
G
Gaussian membership Iunctions 89
Generalised delta rule 140
Generalized Modus Ponens 44
Geometric mean 59
Godel implication 32
Graded response 164
INDLX ##
Grading oI apples 104
Gravity 87
H
Hamacher 55, 56
Harmonic mean 59
Hebbian learning 126
Hebbian Rule 180
Height deIuzziIication 88
Hidden Units 123, 153
HopIield network 161
Human brain 3, 121
Hybrid Iuzzy neural network 221
Hybrid neural network 220
Hybrid systems 217
I
Idempotency 58, 61
InIerence mechanisms 72
Input units 123
Interpolation 42
Intersection 16, 21
Inverse Kinematics 200
1
Jordan Network 158
K
Kleene-Dienes implication 32
Kohonen network 177
Kullback inIormation 166
L
Larsen inIerence Mechanism 77
Larsen system 66
Law oI the excluded middle 2
Laws oI Thought 2
Learning 127
Learning Rate 144
Learning Samples 152, 190
Least mean square 133, 135
Linear discriminant Iunction 130
Linear threshold neural network 130
Linguistic variable 33, 34
Linguistic variable truth 35
LMS 131
LMS` rule 131
Local Minima 148
Long-term memory 183
M
Mamdani inIerence Mechanism 73
Mamdani system 66
Mamdani`s implication operator 32, 63
Material implication 29
Mathematical neuron network 122
Max-Criterion 88
Maximum 56
Measure oI dispersion 63
Median 59
Membership Iunction 10
Middle-oI-Maxima 87
Milling 210
Minimum 54
ModiIiers 33
Momentum 144
Monotonicity 54, 55, 58, 61
Multi-layer network 140
Multi-layer perceptrons 137
Multi-input-multi-output 82
N
Negation rule 44
Network paralysis 148
Neural networks 2, 121
Neuro-Iuzzy systems 8
Neuro-Iuzzy-genetic systems 8
NeuroIuzzy network 222
Neurons 3, 122, 164
Non-Iuzzy approach 112
Normal Iuzzy set 11
Normalization 186
Number oI layers 127
O
OIIset 127
One identy 54
Ordered weighted averaging 60
Original model 185
Orlike operator 62
Orness 62
Output units 123
P
Paradigms oI learning 125
Partial order 20
Perceptron 2, 125, 129
Perceptron learning rule 131
#$ IIZZY LCIC AND NLIRAL NLTWRKS
Perceptron` learning rule 131
Perceptrons 2
Precisiated natural language 8
Precision 7
Principal component analysis 221
Principle oI incompatibility 101
Principle oI optimality 195
Probabilistic 56
Processing Units 123
Product 55
Product Iuzzy conjunction 89
Product Iuzzy implication 89
Projection 23
Projection Rule 44
Q
Q-learning 196
Quasi Iuzzy number 12
R
Recurrent networks 125, 157
ReIlexivity 20
Regular Iuzzy neural network 220
ReinIorcement learning 192
ReinIorcement learning scheme 190
Representation 127
Road accidents 96
Robot arm dynamics 207
Robot control 200
S
SelI-organization 125
SelI-organizing networks 169
Semi-linear 124
Sequential hybrids 217
Sgn Iunction 124
Shadow oI Iuzzy relation 24
Short-term memory 183
Sigmod 124
SigniIicance 7
SimpliIied Iuzzy Reasoning 77
Single layer Ieed-Iorward network 129
Single layer network 129, 130, 134
Singleton IuzziIier 89
SoIt computing 8
Standard Strict 32
Stochastic Iunction 193
Strong 56
Subset 46
Subsethood 14
Sugeno InIerence Mechanism 75
Summed squared error 135
Sup-Min Composition 26
Superset 47
Supervised learning 125
Support 11
Symmetricity 20, 54
Symmetry 55
T
T 54, 56
t-conorm-based union 57
t-norm-based intersection 57
Taylor series 148
Test error rate 151
The linguistic variable truth 35
Threshold 127
Threshold (sgn) 130
Tool breakage 233
Total error 135
Total indeterminance 46
Total order 20
TraIIic accidents and traIIic saIety 96
Trajectory generation 201
Transitivity 20
Translation rules 43
Trapezoidal Iuzzy number 14
Triangular conorm 55
Triangular Euzzy Number 13
Triangular norm 54
Tsukamoto inIerence mechanism 73
two layer Ieed-Iorward network 139
Two-input-single-output 82
U
Union 16, 22
Universal approximation theorem 142
Universal approximators 91
Universal Iuzzy set 15
Unmanned Ilexible manuIacturing system 233
Unsupervised learning 125
INDLX #%
V
Vector quantisation 169, 174
W
Weak 55
Winner Selection 170
Y
Yager 55, 56
Z
Zero identity 55

You might also like