Expert

title: author: publisher: isbn10 | asin: print isbn13: ebook isbn13: language: subject publication date: lcc: ddc:
subject:
Expert Systems for Engineers Siddall, James N. CRC Press 0824783603 9780824783600 9780585327761 English Computer-aided engineering, Expert systems (Computer science) 1990 TA345.S565 1990eb 620/.0042/0285633 Computer-aided engineering, Expert systems (Computer science)
Page i
Expert Systems for Engineers

James N. Siddall Faculty of Engineering McMaster University Hamilton, Ontario, Canada
Page ii
Library of Congress Cataloging-in-Publication Data Siddall, James N. Expert systems for engineers / James N. Siddall. p. cm. ISBN 0-8247-8360-3 (alk. paper) 1. Computer-aided engineering. 2. Expert systems (Computer science) I. Title. TA345.S565 1990 620'.0042'0285633--dc20 89-71509 CIP This book is printed on acid-free paper. Copyright 1990 by MARCEL DEKKER, INC. All Rights Reserved Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage and retrieval system, without permission in writing from the publisher. MARCEL DEKKER, INC. 270 Madison Avenue, New York, New York 10016 Current printing (last digit): 10 9 8 7 6 5 4 3 2 1 PRINTED IN THE UNITED STATES OF AMERICA
Page iii
Preface
This book is based to a considerable extent on the thesis that engineers can and should write their own expert systems. Special "knowledge engineers" are not required. The software packages that are commercially available for assisting in the writing of expert systems, commonly called "shells", are unnecessary, and even undesirable. And special languages, such as LISP and PROLOG, that are widely represented as highly desirable if not essential, are also unnecessary, and also may be even undesirable. Most engineers are at home with a high level language, such as FORTRAN, BASIC, PASCAL, or C. All of these do an excellent job on codifying expert systems. Also, the engineer is an expert on the engineering method, as well as being a specialist, or near specialist, in some phase of engineering. And thirdly, the engineer has an almost unique training and experience in intuitive decision making, which is the whole basis of expert systems. Finally, engineers are highly creative, and can cope well with the creativity requirements of designing expert systems. This expertise, combined with the rigorous training in logical thinking that all engineers possess, makes them uniquely well qualified to
Page iv
develop expert systems. And incidently, to use them intelligently. This book is also an attempt to remove the cloud of mystery that surrounds the subject of expert systems. Although it seems to be cursed with a compulsion by authors to present it as a highly advanced and arcane subject, it is really rather simple and straightforward. Also many books on artificial intelligence present a broad conceptual picture of the subject, with a tendency to gloss over practical details. The emphasis here is on down to earth practical details. The book includes the use of uncertainty methods and Boolean algebra, which all engineers may not be familiar with, but useful expert systems can be written even without these. And all of the necessary introductory theory is included. The subject of data structures is an important basis for creating expert systems, and is the one that engineers are most likely to be unfamiliar with. Some introductory theory is given in the Appendix, but large systems depend heavily on data structures, and the designer of such systems may find it necessary to extend his or her knowledge in this field. Expert systems is a very new subject in engineering, and not yet in widespread use; but it has great potential for exploiting the power of the computer in engineering practice. Expertise, until recently only available from live human experts, can now, to a circumscribed extent, be
Page v
widely available to technologists at all levels, in many different applications. It is one of the new subjects in engineering that is only practicable to use when computers are available. Its most important contribution is that it makes possible day-to-day use, by all engineers, of complex engineering knowledge - knowledge and expertise that otherwise has lain unused, and even often unknown. The chapters containing the more advanced material are essentially independent of each other. These are Chapter 9, "Expert Systems Incorporating Uncertainty"; Chapter 10, "Machine Learning Expert Systems"; and Chapter 11, "Boolean Algebra". The book contains a number of algorithms in pseudo code that should provide a convenient basis for preparing software in the language of the reader's choice. However, no claim is made that they are optimum, and suggestions to the author for improving them would be welcomed. JAMES N. SIDDALL
Page vii
Contents
1.Introduction to Applied Artificial Intelligence 1.1 Introduction 1.2 Languages in Artificial Intelligence 1.3 Using Artificial Intelligence in Engineering 1.4 Applications of Expert Systems 2. Introduction to Expert Systems 3. The General Structure of Expert Systems 4. Phases of Developing an Expert System 5. Knowledge Base 5.1 Introduction 5.2 Acquiring the Rules 6. Inputs, Intermediate Conclusions, and Outputs 6.1 Introduction 6.2 Inputs 6.3 Outputs 6.4 Intermediate Conclusions 1 3 12 17 26 35 41 55 63 65 66 91 93 93 94 95
Page viii
7. Control Structures 7.1 Introduction 7.2 Sequential Structure with Iteration 7.3 Exhaustive Search 7.4 Forward and Backward Chaining 7.5 Structuring 7.6 Pruning 7.7 Decomposition 7.8 Heuristic Search 7.9 Use of the User's Judgement 7.10 General Production Systems 7.11 Hard Wired Systems and Parallel Processing 7.12 Selection of a Control Structure 8. Data Structures 8.1 Data Structured Control 8.2 Frames 8.3 Blackboards 8.4 Interfacing with Commercial Data Management Software 9. Expert Systems Incorporating Uncertainty 9.1 Introduction 9.2 Probability Concepts 9.3 Use of Subjective Probability and Probability Laws in Expert Systems 9.4 Monte Carlo Simulation
97 99 101 101 105 106 107 108 109 110 110 112 113 119 121 130 136 137 141 143 145 155 164
Page ix
9.5 Bayes' Theorem in Expert Systems 9.6 MYCIN Method 9.7 Control Structures with Uncertainty 9.8 Discussion 10. Machine Learning Expert Systems 10.1 Introduction 10.2 The Precedent Rule Method 10.3 Generation of Rules Using Entropy 10.4 Multiple Valued Attributes 10.5 Example 10.6 Discussion 11. Boolean Algebra 11.1 Introduction 11.2 Definitions and Postulates 11.3 Manipulation of Boolean Functions 11.4 Simplification of Boolean Functions 11.5 Multiple Functions 11.6 Conversion of Expert Systems to Boolean Form 11.7 Uncertainty with Boolean Representation 12. Design Systems 12.1 The Use of Expert Systems in Design 12.2 The Special Nature of Design Systems 12.3 Some Typical Rules for the Bearing System 12.4 An Extension of the Bearing Example 12.5 Using Frames with the Bearing Selection System
168 175 178 181 187 189 190 195 212 213 216 221 223 225 230 235 250 252 261 267 269 269 276 285 295
Page x
13. Expert System Development 13.1 Small Systems 13.2 Developing Larger Systems 13.3 Testing and Updating Appendix - Data Structures A.1 Stacks and Queues A.2 List Processing A.3 Search Methods A.4 Sorting Index
297 299 301 311 313 315 320 324 327 333
Page 1
1 Introduction to Applied Artificial Intelligence
Page 3
1.1 Introduction The purpose of this chapter is to provide a general overview of applied artificial intelligence, of which expert systems is only one topic, although the most widely used. It is difficult to get a general insight into the subject. In investigating the subject of artificial intelligence, one soon learns that it is many things to many people. It also has a somewhat different meaning as a fundamental research subject than as an applied research subject, or as an applied technological method. For computer scientists in fundamental research, it appears to represent an attempt to set up algorithmic principles for simulating thinking processes of the brain, to develop some sort of general problem solving or human-like intellectual behaviour. In neural sciences it is the study of the neural brain processes, with a view to copying them or simulating them in hardware. Philosophers are also interested in artificial intelligence, in the hope of gaining a better insight into the nature of human thought (Boden 1977, McCorduck 1979). Psychologists are concerned with similar goals. It seems to be generally conceded that not too much progress has been made in fundamental research into artificial intelligence over the past 20 to 30 years that it has existed as a science. This is not to suggest that it should be abandoned as a worthwhile research subject. The goals are of vital importance. On the other hand, rather
Page 4
strong claims are now being made for the progress and utility of applied research in artificial intelligence, primarily by computer scientists in the field. Their goals, and the nature of their research, appear to be more modest than those in pure research. They are developing algorithms and programs that they believe represent in some sense, specific, rather than general, human thinking processes. Engineers have invaded the field of applied research, as they do in any field when they find that, in their particular application, not sufficient is known to proceed directly to the desired result. It may be simply a minor adaptation of existing theory, or it may be a major research program. It is extremely difficult to set bounds on the area of artificial intelligence. After all, almost anything a computer does for us simulates human thinking processes. It surely takes intelligence to do complex numerical calculations, or complex sorting of data. Computer scientists tend to define artificial intelligence by what it has done, rather than by any clear-cut principles or concepts. The usual areas of application that are cited are as follows (Nillson 1980, Hayes-Roth et al 1983, Winston and Brown 1979, Winston 1977, Charniak and McDermott 1985, Schutzer 1987). 1 Interpretation of Natural Language Computer systems have been developed that, in a limited way, interpret the
Page 5
meaning of natural language, either for translation to another language, or to provide instructions for a computer or control process. 2 Expert Systems These have been called automatic consulting systems, because they simulate the role of an expert in solving some problem, using an information database provided by a real expert, plus rules for interpreting the data in terms of a particular problem that is within the scope of the package. It is this application of artificial intelligence that appears to have the most promise for the partial automation of the design process, and the solving of other engineering problems requiring expertise, such as fault diagnosis. It also seems to have had the most practical success of any applications of artificial intelligence, and has great promise as a tool in engineering work of all kinds. 3 Theorem Proving and Mathematical Analysis Computers can be programmed to apply logic rules automatically to check the validity of a mathematical theorem, and software has been developed to do analytical integration of complex algebraic expressions. 4 Control of Robots This appears to imply more than just programming the motion of the robot's linkages; and seems to require the inclusion of software for planning the motions of the robot, and perhaps requiring the control software to make dynamic decisions while the robot is working. A control engineer might have some difficulty un-
Page 6
derstanding how this is different from conventional process control by computers, other than the fact that the linkage somewhat resembles human morphology. 5 Perception or Pattern Recognition This topic is sometimes called machine vision. Video hardware is used to feed data to a computer in the form of pixel brightness levels or colors. Pixels are micro-sized photocells, arranged in an array at the camera's focal plane, and thus replacing conventional film in a camera. This gives the computer image data that can be processed, and it can be programmed to interpret the data in a way that corresponds to some extent to the way humans interpret visual input. This could be a simple outline of a part which is analyzed to accurately determine if it matches a prescribed outline within tolerances; or it could be a system that analyzes a jumble of unoriented parts and directs a robot to orient its hand correctly so as to pick up one of the parts. Or, on an even more complex level, it may be a system that examines a scene, and selects from the pattern an identifying classification from its reference database, such as an obstacle for an autonomous vehicle, controlled by artificial intelligence. An artificial intelligence specialist might be unhappy about calling the first example artificial intelligence, but it is difficult to see any conceptual difference between the examples. 6 Combinatorial Examples This category includes games and puzzles. Chess and checker playing computers have
Page 7
reached quite a high level of expertise (McCorduck 1979, Fleck and Silverman 1984). It also includes assignment problems (Nillson 1980) such as the travelling salesman problem in which it is desired to minimize the travel path of a salesman who has to visit a specified set of cities, in any order. This topic seems to have a great deal in common with the mathematical topic of combinatorial optimization (Lawler 1976) or integer programming (Greenberg 1971), which certainly requires use of the computer. However there is little cross reference between them, and the difference seems to be in the terminology used, or even the computer language used. 7 Automatic Programming Limited success has been achieved in translating natural language algorithms into high level computer languages. Computer program design languages, or pseudo codes (Horowitz and Sahni 1976 and Van Tassel 1978), are a formalized form of describing an algorithm, and, if in sufficient detail, can almost directly be translated into a high level language. However, workers in artificial intelligence seem to have in mind a more interactive approach, where the user presents to the computer a rather brief or vague statement of the algorithm, and the computer worms out of him or her more details of what is really required. 8 Machine Learning In this topic the computer learns from a set of examples the appropriate way to make decisions, in a limited problem region (Charniak and McDermott
Page 8
1985). Some expert systems have successfully used machine learning, for example Wu and Siddall(1988). These classifications of the types of things that applied artificial intelligence workers are doing, or hope to do, gives some insight into what it is and what it might do for engineering. However one would like to have available some general principles of how artificial intelligence is used. Unfortunately this does not seem to be available, as confirmed by the following quotation from Hayes-Roth et al (1983), page 6.
These and other accomplishments indicate that the field of expert systems is maturing rapidly. The scientific and technical bases that support this field, however, have achieved only limited development. Each new application requires creative and challenging work, although some principles and systematizations heve emerged. At this point expert systems is a highly experimental field with little in the way of general theory. Nevertheless, core problems have surfaced and numerous tools and techniques now exist that transfer from one application to the next.
The authors were discussing expert systems, rather than artificial intelligence generally, but the statement would seem to be rather generally applicable, and still currently true. In reviewing all of the above, an experienced application software developer might have some difficulty in discerning how artificial intelligence is different from
Page 9
other applications work. Certainly the examples illustrate great ingenuity in software design, but this is richly demonstrated in many other kinds of application software. And the fact that programs in artificial intelligence sometimes do better than humans is not an unusual achievement for computers. It is also somewhat difficult to accept that programs in artificial intelligence are more ''intelligent" than other kinds of applications. It would seem desirable not to be too concerned with the name "artificial intelligence", which was rather an unfortunate choice. Intelligence is extremely difficult to define, and the term conjures up much more than it should for real life, practical, and current problems. There are, however, special criteria that distinguish the field, and it is important to identify them, so that for a given problem type, in our case design automation and other engineering applications, we can decide if the methods of artificial intelligence are applicable. First, artificial intelligence applications work mainly with symbols rather than numbers. This is not a sufficient condition to distinguish them; data management systems certainly work a great deal with symbols. A data file containing mailing lists, for example, can be manipulated in all kinds of elegant ways. However it is an important characteristic and has influenced the kinds of languages used for artificial intelligence. Symbols are used to represent knowledge, and Hayes-Roth (1983) has
Page 10
emphasized that expert systems use knowledge more than formal reasoning methods. And he really means expert knowledge; and this includes not only knowledge that can be found in books, but also privileged knowledge known only to highly trained and experienced experts in a field. Such knowledge is primarily intuitive. In a given situation an expert makes an intuitive decision based on judgement and experience and the context of the problem, and the decision has a high probability of being correct. Engineering offices are full of expert designers whose expertise is often lost when they leave or retire. Computerized expert systems should be a gold mine in this field. These expert designers cannot enumerate all of the knowledge upon which a decision is based; much of it is buried in the unconscious mind. A central problem of expert systems would seem to be how to utilize such unconscious knowledge. One of the difficulties in coming to grips with the literature of artificial intelligence (AI), and in particular expert systems, is the terminology, which tends to be unnecessarily arcane. Hayes-Roth goes on to say on page 5,
In contrast to traditional data processing systems, AI applications generally involve several distinguishing features, which include symbolic representation, symbolic inference, and heuristic search. In fact, each of these corresponds to a well studied core topic within AI, and a simple AI task often yields to one of the formal approaches developed for these core problems.
Page 11
These topics are not unfamiliar to many engineers, but perhaps not by the same names. Symbolic representation simply means representing things or procedures by symbolic names, usually natural language ones. Symbolic inference is the IF-THEN algorithmic structure, widely used in procedural high level languages such as FORTRAN, BASIC, PASCAL and C. Simple examples are
IF a gear (must be compact AND have long life AND be silent) THEN Use a case-hardened alloy steel IF (the process temperature exceeds 1000 degrees AND the temperature is increasing at a rate greater than 10 degrees per sec) THEN Shut down the process
Some authors also call this abduction. Symbolic inference also includes what is sometimes called predicate calculus by artificial intelligence writers. It is essentially the same as Boolean event logic1, used in combinatorial mathematics, probability theory, digital circuit analysis, and fault trees in reliability theory. The only difference is in the terminology and notation, where the list structure of the language LISP is used, as in the following examples.
(NOT (AND A B)) (OR C D)
1 For detailed information on Boolean algebra, see Chapter 11.
Page 12
The term predicate, appears to be simply an event, or a fact, or a characteristic, that can either be true or false. Thus in LISP notation, the fact that the component x is a gear could be represented by the predicate
(GEAR X)
Heuristic search is a well known procedure in optimization, widely used in engineering, management science, military science, and other fields. It is decribed in detail in Chapter 7. 1.2 Languages in Artificial Intelligence The most commonly used languages in artificial intelligence are LISP and PROLOG. LISP is the major contender, and special computer systems of hardware and software have been marketed for using LISP in software development in artificial intelligence applications. LISP is one of the oldest high level languages, having been created in 1960 (Wegner 1976). In general LISP does not evaluate expressions as a sequence of assignment statements, such as the following pattern.
x, a, and b have known values in the computer memory y = ax + b
Page 13 z = y2 + sin(y) output z
Instead the computer is presented with a function, which it evaluates, and then prints out the result, in the following pattern.
x, a, and b have known values in the computer memory (f2 (f1 x a b))
where x, a, and b are arguments of an operator or function f1, as defined in the following example.
f1 = (PLUS (TIMES a x) b)
and f1, is an argument of a function f2 defined by

f2 = (PLUS (EXPT f1 2) (SIN f1 ))
Note that the function name precedes the arguments, each is separated by a space, and the whole is surrounded by parentheses. The successive evaluations are built up by nesting, rather than by sequential statements. Note also that the trigonometric and arithmetic operators work in the same manner as nested functions. In most applications of LISP, algebraic type expressions like the above are not too commonly used, and the
Page 14
functions are mostly operators that manipulate symbols. Symbols are used to represent not only physical quantities that have a numerical value, but also physical things and concepts. The following is a simple example.
(SETQ TOOLS (LIST 'HAMMER 'SCREWDRIVER 'SAW 'WRENCH 'AWL 'FILE)
PROLOG, or extensions of it, is the language of choice of the Japanese in their Fifth Generation project (Feigenbaum and McCorduck 1983). It is more widely used in Britain and Europe, but seems to be gaining popularity in North America. It is really a higher level language than the others referred to here, since it is specially designed for logic expressions (IF/THEN), and contains a special procedure for solving systems of logic expressions. FORTH has been used for at least one expert system (Johnson and Bonissone 1983). And PASCAL and C are gaining popularity. However the overwhelming leader is LISP, and for many workers LISP and the applied subject of artificial intelligence are very closely interwoven; indeed they sometimes seem almost synonymous; if you are programming in LISP you are applying artificial intelligence, and vice versa. The language of LISP was created for artificial intelligence work (Wegner 1976). Thus one is inclined to the impression, if not to the conviction, that applied artificial intelligence is simply software created using a
Page 15
symbolic language such as LISP, and no other criteria are needed to define it. These languages are symbolically oriented, in that they are said to be designed particularly to work with word symbols rather than numerical calculations and analytical algorithms. This is quoted as a basis for using them as languages of preference in artificial intelligence work. This is somewhat puzzling since procedural languages have been overwhelmingly used in data management software, which is surely symbolically oriented. Another language feature said to be of vital importance in artificial intelligence work is recursion. It permits a function to call itself, and is a feature of LISP, as in the following example.
Y = FUNC(A, FUNC(E, F, G), B)
Recursion is also a standard procedure in PASCAL. Wagner (1982) shows that, although recursion is not a legal feature of FORTRAN 77, it can be done quite easily by indirect means. A third feature of LISP that is commonly quoted as important is the facility for dynamic memory allocation, by which memory space can be allocated automatically by the computer as required, rather than in advance by DIMENSION type statements. ALGOL 60, APL and PASCAL have it, and other languages such as FORTRAN 77 can incorporate it by
Page 16
including data management in the coding, using linked lists (Horowitz and Sahni 1976). As languages are being evolved these distinctions appear to be disappearing. It has been proposed, for example, that the next version of standard FORTRAN will have recursion and dynamic memory allocation. It is the author's belief that there are no important reasons for not using any high level language in artificial intelligence work, and the procedural languages can easily duplicate the coding of LISP or PROLOG. The coding may not be as elegant, or as compact, or as efficient. This does not seem to have been demonstrated as necessarily true, but in any event these may not be primary concerns. Other considerations may be of equal or more importance - such as portability, running time, convenience of access, ease of reading, ease of maintaining and updating, flexibility, interfacing with other languages, familiarity of the programmer with the language, and time to code. For example, procedural languages using recursion are said to have a longer running time than alternate procedures, and are more difficult to debug. The more widespread use of artificial intelligence applications in engineering may in fact be encouraged if this tight bond with special languages is broken, and engineers realize that artificial intelligence applications are just another type of algorithm, for which the more familiar general purpose languages can quite well be used.
Page 17
1.3 Using Artificial Intelligence in Engineering 1.3.1 Areas of Application Our particular concern is the application of artificial intelligence methods in engineering. They are confined, in commercial applications, mainly to two topics - machine vision and expert systems. Engineers do have a strong research interest in using interpretation of natural language for communication with factory processes and robots, the control of robots and other autonomous systems, and machine learning. All of these are closely related to expert systems. 1.3.2 Expert Systems We cannot really get a conceptual grasp of expert systems until later chapters, but it will be helpful at this point to give the definition of the British Computer Society.
"An expert system is regarded as the embodiment within a computer of a knowledge-based component from an expert skill in such a form that the system can offer intelligent advice or take an intelligent decision about a processing function. A desirable additional characteristic, which many would consider fundamental, is the capability of the system, on demand, to justify its own line of reasoning in a manner directly intelligible to the enquirer. The style adopted to attain these characteristics is rule based programming."
Page 18
Rule based programming simply means the use of IF/THEN algorithmic structures to codify the decision rules from the expert. Engineering use of expert systems is becoming fairly widespread, although many reported on are at the research stage. The engineer is essentially a generalist, and could be greatly assisted by expertise in several engineering specialties. Expert systems could also be provided to non-professionals working in technology. We can categorize the following different areas of expertise. 1 Synthesis It is synthesis that is the engineering designer's unique contribution to the art of engineering. He or she learns the laws of nature and procedures of analysis from the physicist, the chemist, the biologist, and the mathematician. Synthesis, the art of putting together a new machine, structure, chemical process, electrical circuit, or system, is a special talent of the engineer, and the basis of our industrial civilization. Duplicating this process in an expert system is far beyond our present knowledge of human thinking, both conscious and unconscious, however some work has been done on low level engineering synthesis. 2 Engineering Modeling We should not imply, however, that the engineer does not make, as well, an important contribution to analysis - the verification of the design. Analysis is an essential component of design. It ensures that the design is feasible; that is, it will meet the per-
Page 19
formance requirements expected of it. But even more importantly, analysis may be used to ensure that the design is feasible in an optimum way. Let us examine the analytical method. To make an analysis, we must first substitute for the physical problem a simplified and idealized abstraction that is commonly called the engineering model, a prototype or archetype of the design. The problem is not otherwise subject to analysis - we cannot analyze the highly complex real life device. We can only hope that the engineering model will represent the real device or system in all of its essential features. An important part of this model building is deciding what failure modes are likely to be significant, and should be included in the model for predicting feasibility. Will a roof beam fail from excessive yielding of the material, or collapse by buckling, or shearing of the end fasteners? What level of snow loading should be applied? Will a highway carry enough vehicles, have an adequate life, be safe, convenient, not require excessive maintenance, and permit snow removal under extreme conditions? Will an aircraft design have adequate life, reliability, capacity, controllability, and so on? Will a transmission line carry the required current; or will an electrical motor provide the required power? It requires engineering skill, experience, intuition and judgement to set up models for such predictions. The mathematicians and scientists cannot do it for the engineer. Modeling uncertainty is a major element of deci-
Page 20
sion making for modeling. If the engineer's judgement is poor, the model will not adequately predict the behaviour of the actual device or system. Thus the engineer's unique contribution in analysis, as in synthesis, is again the intuitive component, the one based on scientific knowledge and practicing experience. 3 Taxonomy and Morphology Another part of the expertise required in engineering is an intimate knowledge of all of the physical elements that are used, or might be used, to synthesize designs. New designs are primarily new combinations of existing elements. Only rarely are completely new elements incorporated. A really new element such as the transistor can lead to major new technological thrusts. But such a new element is not required in order to achieve brilliant new designs of electrical circuits. A similar requirement is a knowledge of materials. What is the best steel for both strength and corrosion resistance? the best lubricant for long life, minimum noise, minimum friction loss, and minimum maintenance? 4 Manufacturing Methods The engineer must also be an expert on manufacturing and building processes - how metal parts are formed, cut, or cast, how roads and buildings are constructed, how microchips for electrical systems are produced, and so on. 5 Maintenance Maintenance methods are also an important engineering expertise, requiring a knowledge of maintenance methods and testing procedures, including
Page 21
special skills in diagnostic procedures and use of sensors. 6 Applications No engineer can hope to be successful unless he is an expert on the field of application of the device or system being designed. If an agricultural mowing machine for hay making is being designed, the designer must be an expert on the mowing of hay. 7 Tools The engineer must be trained to use the tools of the trade. These include drawing, both manual and computerized, the use of calculators and computers, and a familiarity with procedures for finding information. 8 Optimization Criteria And finally, and perhaps most important of all, the designer must be an expert on the criteria for choosing the best possible design. Engineering designers tend to work with what might be called immediate criteria, which are the performance criteria that they wish to maximize or minimize - minimum weight for an aircraft, minimum cost of a lathe, maximum capacity of a railway car, and so on. Sometimes these can be achieved by analytical optimization, but more often by judgement. But the real objectives for a design are more abstract, and because they are abstract, we tend to satisfy them indirectly by combining judgement with the immediate criteria. These objectives may not have an obvious connection to the technological device, but they cannot be ignored. The owner of a motorcycle may really be more interested in deriving excitement, status, and pleasure from technological devices, rather than just using it as a transportation
Page 22
convenience. Examples of human values, and their application to engineering, can be found in Siddall (1982). Over and above these categories of expertise, there is a kind of super expertise permeating all of engineering work, and the engineer, particularly the engineering designer, is also the primary expert in these. One aspect of it is aesthetics. The sense of judgement of the aesthetics of an engineering design would seem to be the basis for judging the overall optimization of a device, and not just the external appearance or style. Styling is used to satisfy certain particular values, such as the type mentioned above for motorcycles. Closely related to this is the human-machine interaction, commonly called ergonomics. It is apppropriate at this point to bring into the discussion the role of the other groups of technological designers - the architects and industrial designers. These designers tend to have a greater contribution to make in ergonomics, and the art of satisfying subtle human values by an appropriate general arrangement and external appearance. The second type of super expertise is in risk judgment. This is perhaps the most critical and difficult expertise required. We are referring to the risk levels associated with failure of a device to meet one or more performence requirements. Not only must the engineer make judgements about the level of risk that is acceptable, but
Page 23
he or she must also make subjective judgments about probability distributions associated with design parameters, so that the actual risk levels are equal to or better than the required risk levels. In the current state of the art, risk judgments are most often coded in simple factors of safety, but the computer has made practical a codification in terms of probabilities, based on the explicit use of probability distributions and probabilistic analysis. This topic is discussed in more detail in Siddall (1983). Our next concern is the mechanism for exercising expertise in engineering. Expert systems use expert knowledge, some of which has an esoteric character that is beyond what is available in the literature. Humans are very limited as to how much formal knowledge they can hold in the conscious mind, and formulate at will as a basis for decision making. This is even more true of knowledge acquired by experience. A specialist with a large background in formal knowledge is usually an expert at following information trails back to knowledge that he or she wants to reformulate. And therefore he can relatively quickly recapture knowledge needed for specific applications, particularly if assisted by computerized information retrieval systems. However most expert decision making is based on knowledge, either from formal learning or experience, that lies in the unconscious mind, and cannot be resummoned. This is the use of intuitive judgement.
Page 24
In specific cases of intuitive judgment, related to expertise in synthesis, the designer might be asked questions like the following. Q. Why did you use ductile iron rather than steel for that part? A. Ductile iron is cheaper than steel for a part that must be cast, and it has excellent ductility for this service. Also it it has worked well in previous similar applications. Q. Why did you not heat treat that steel to give a yield strength of 250,000 psi (172 MPa) instead of only 125,000 psi (86.2 MPa)? A. I would like to have had the high yield strength that is possible with that steel, but the ductility, impact strength, and resistance to crack growth are too low at that hardness. So I had to sacrifice yield strength to achieve them. The exact trade-off level that I chose is based on my experience in this application - it is intuitive judgement. Q. Why did you use three bearings instead of two, in order to support that shaft in your gear box? A. We have found in this kind of application that only two bearings results in excessive shaft deflection with corresponding misalignment of the gear teeth, and excessive wear and noise. Q. Why did you select 300 rev/min as the speed for the cutting element in that hay processing machine? A. I cannot give you any explicit reasons. I just ''feel" that 300 rev/min will work satisfactorily.
Page 25
It remains to be seen if this is confirmed by field tests, but we have to start somewhere. While all of the above kinds of expertise represent potential applications of expert systems, it will become clear, after we examine in more detail the nature of expert systems, that the highly intuitive and creative kinds of expertise are unlikely candidates for expert systems. This would particularly include synthesis, engineering modeling, aesthetic judgement, and risk judgement. 1.3.3 Machine Vision We can be briefer and more specific in examining engineering application of machine vision. Some applications are well established commercially and militarily. These include image enhancement and inspection. The former process can be used to remove distortion, blurring, graininess, and to detect edges of features in the image. Inspection has become an important representative of computerized automation of production processes; and can inspect the dimensions of components accurately and at high speed. Another application in automated production is seam following for welding processes. Image detection is beginning to be used with robots for identifying components and their orientation, when robots are picking up parts and installing them onto fixtures, or assembling them. More esoteric applications are at the research level;
Page 26
such as image identification and feature extraction for autonomous robots and vehicles. 1.4 Applications of Expert Systems A flood of engineering applications is appearing in the literature, and it can be difficult to determine if individual ones are really significant. As in any "hot" new topic, there is a considerable bandwagon effect. The following gives an indication of the scope of applications; and is not intended to be comprehensive. The systems may also be proprietary. The source is The CRI Directory of Expert Systems (Smart and Langeland-Knudsen 1986). The status shown is that most recently reported, and may not be up to date.
DESCRIPTION Maintenance of telephone cables, and fault-finding for defective cables A design system for digital circuit design Automatic forging design STATUS Operational Being developed Being developed
Design of chemical engineering process flow sheets to solve a chemical processing Operational problem
(table continued on next page)
Page 27
(table continued from previous page)

DESCRIPTION STATUS Control strategies for channel routing for VLSI micro-circuit chips. Operational Improving agricultural productivity in crop pest control, plant disease treatment, andPilot system management recommendations operational Diagnosis of faulty modules in large electronic systems Being developed Control of aircraft moving through an air traffic control center Being developed Architectural design assistant for house design Operational Design fixtures for holding and positioning a workpiece for machining Operational Commercially Convert camera image of a drawing into a CAD file available Assist in use of a CAD system for design of digital logic circuits Operational Diagnosis and repair of diesel electric engines Operational Maintenance of telephone switching systems In field use Prototype Interpretation of data concerning soil layers in geotechnical engineering developed Consultant for operators of nuclear Prototype
Page 28

DESCRIPTION power plants Ship design assistant Interprets logs from dipmeter probes to determine geologic formations for oil exploration Diagnoses problems in turbofan aircraft engines Flight status monitor to oversee and control aircraft system monitoring and alarm functions Detection of early stages of failure in large power distribution transformers Aids in preliminary structural design of high rise buildings Diagnoses faults in steel rolling mill Analyses models of machines to understand their function Designs single-board computers automatically from high level prescriptions Selects analysis strategy to use in a structural-mechanical system design Aids in the design of ocean harbour wave protection jetties STATUS developed Under development Commercially available Being developed Being developed Operational Being developed In use Operational Operational Operational Being developed
Page 29

DESCRIPTION Assesses damage of existing structures after an earthquake Configures DEC VAX minicomputers STATUS Preliminary prototype In use by DEC
A number of civil engineering expert systems are decribed in Maher (1987); and engineering design systems in Rychener (1988). 1.5 Conclusions Because of the newness of applied artificial intelligence, and the sometimes exaggerated claims for it, it is a worthwhile exercise to attempt to decide if the subject, particularly expert systems, can usefully be applied in engineering. It is difficult not to read too much into the term artificial intelligence, and therefore difficult not to be disappointed when less than expected is forthcoming. Some writers deliberately exploit these expectations and mask their writings in arcane terminology and phrases so that they seem to deliver more than they do. It seems to be a more severe disease in this field than in most others in science. Workers in the field tend to follow more the speculative approach of the social sciences and philosophy, rather than the physical and life sciences. It makes the research literature particularly difficult to penetrate. However many writers do use the term artificial in-
Page 30
telligence openly and honestly; they do not intend it to mean more than a modicum of intelligence, and do not pretend more, either in writings or in applications. Our purpose here is not to evaluate artificial intelligence as such, although it is difficult to avoid this as a side issue. The work of people in the field of artificial intelligence is a valuable contribution to software engineering. They are applying well known techniques (which they tend not to acknowledge) in somewhat new ways, with new jargon and terminology; but they are developing new and important applications of computers. However one must view with caution any claim that they are making an important and unique contribution to the understanding of human intelligence, or the development of real artificial intelligence. They are doing no more than anyone else who is writing real and original algorithms. All computer work can be considered a contribution to artificial intelligence. In some areas computer intelligence is already far beyond that of humans, and as the field develops we shall gradually evolve a computing capacity that slowly approaches human intelligence in those areas where it does not. We can conclude that engineers can learn much from the experience of workers in this field on how to build expert or consulting systems in engineering; and that such systems would be a very valuable aid to engineering, both in design and in operations. It is the author's strong belief that
Page 31
engineers should be the ones to write these programs, with the assistance of specialists in the field of expertise. Engineers know best exactly what is needed in a consulting system; and have sufficient familiarity with the field, and with software development, to create the most effective systems in the shortest possible time. Expert systems notoriously require a very large number of man-hours. References Boden, M. (1977). Artificial Intelligence and Natural Man, Basic Books, N.Y. Charniak, E. and McDermott, D. (1985). Introduction to Artificial Intelligence, Addison-Wesley, Reading, Mass. Feigenbaum, E. A. and McCorduck P. (1983). The Fifth Generation, Addison-Wesley, Reading, Mass. Fleck, E. and Silverman, J. (1984). SPOC: the Chess Master, BYTE, Vol. 9, No. 3, March, pp. 288294. Greenberg, H. (1971). Integer Programming, Academic Press, N. Y. Hayes-Roth, F., Waterman, D. and Lenat, D. (1983). Building
Page 32
Expert Systems, Addison-Wesley, Reading, Mass. Horowitz, E. and Sahni, S. (1976). Fundamentals of Data Structures, Computer Science Press, 11 Taft Court, Rockville, MA 20850. Johnson, H. E. and Bonissone, P. P. (1983). Expert System for Diesel Electric Locomotive Repair, Jour. of FORTH Applications and Research, Vol 1, Sept., pp. 716. Lawler, E. L. (1976). Combinatorial Optimizations: Networks and Matroids, Holt, Rinehart and Winston, New York. Maher, M.L. (1987). Expert Systems for Engineers; Technology and Applications, American Society of Civil Engineers, New York. McCorduck, P. (1979). Machines Who Think, W. H. Freeman, N.Y. Nillson, N. J. (1980). Principles of Artificial Intelligence, Tiogo Publishing Co., P.O. Box 98, Palo Alto, CA 94032. Rychener, M. D. (ed.) (1988). Expert Systems for Engineering Design, Academic, San Diego, California, USA.
Page 33
Schutzer, D. (1987). Artificial Intelligence; an Applications Oriented Approach, Van Nostrand Reinhold, N.Y. Siddall, J. N. (1982). Optimal Engineering Design; Principles and Applications, Marcel Dekker, N.Y. Siddall, J. N. (1983). Probabilistic Engineering Design; Principles and Applications, Marcel Dekker, N.Y. Smart, G. and Langeland-Knudson, J. (1986). The CRI Directory of Expert Systems, Learned Information (Europe) Ltd., Oxford. Van Tassel, D. (1974). Program Style, Design, Efficiency, Debugging and Testing, Prentice-Hall, Englewood Cliffs, N.J. Wagner, J. L. (1980). FORTRAN77: Principles of Programming, Wiley, N.Y. Wegner, P. (1976). Programming Languages - the First 25 Years, IEEE Trans. on Computers, Dec., pp. 12071225. Winston, P. H. (1977). Artificial Intelligence, AddisonWesley, Reading, Mass.
Page 34
Winston, P. H. and Brown, R. H. (eds.) (1979). Artificial Intelligence; an MIT Perspective. Vol. 1: Expert Problem Solving, Natural Language Understanding, Intelligent Computer Coaches, Representation and Learning, MIT Press, Cambridge, Mass. Wu, Z. and Siddall, J. N. (1988). A Model for Intuitive Reasoning in Expert Systems, Computers in Engineering 1988, Proceedings of ASME International Computers in Engineering Conference, San Francisco, pp. 459466.
Page 35
2 Introduction to Expert Systems
Page 37
An expert system is an algorithm for making automatic decisions, or predictions, related to some specific problem area, requiring special expertise. We wish to be able to decide on problems such as 1. What will the weather be like tomorrow, given weather data for the previous week? 2. What material should we use for a given application? 3. What are the reasons why a particular device will not work correctly, given the error features? 4. Will this heat exchanger transfer a specified amount of heat, given its dimensions, fluid flows, and material properties? 5. What disease does a patient have, given a set of symptoms? The first question that an engineer will ask is how this differs from conventional engineering modeling. We are all familiar with many computer packages that do such things, using abstract models and physical laws. And in fact, the fourth item above is done that way. And yet it would not appear to be intrinsically different from the others. The reasons that computer programs for the others are commonly called expert systems, would appear to be the following. 1. Physical modeling is not possible, and intuitive relationships are used for the predictions, provided by an expert in the field.
Page 38
2. There are a large number of inputs or variables that enter into the prediction. 3. The relationships between the outputs and inputs, or in other terminology, between the dependent and independent variables, are all in the form of IF/THEN logic rules. Why, then, do so many writers consider the first three and the fifth examples to be manifestations of human intelligence, or a simulation in some sense of human intelligence, while the fourth item is not? There would seem to be no valid reason. Efforts in the literature to maintain the illusion that expert systems are somehow ''more intelligent" are becoming rather extreme. Authors are now talking about "shallow" versus "deep" reasoning expert systems. It is sometimes difficult to penetrate the jargon, but they essentially seem to mean that shallow systems are the type that have conventionally been developed up until recently, using heuristic and intuitive relationships as a basis for the IF/THEN rules that provide the modeling, while the more advanced systems, the deep systems, include the physical modeling so familiar to engineers. Even the commonly used term, knowledge based systems, is rather pretentious, and implies much more than it should. All engineering modeling is knowledge based. However we must not let concern about terminology, or about exaggeration by workers in the field, imply that developments in expert systems are not significant and of
Page 39
potential value to engineers. It is a different kind of modeling than we customarily use, and has its own special methods. If we must call this kind of modeling "expert systems", or "artificial intelligence", so be it; as long as we do not deceive ourselves about its real nature. Expert systems were really the first successful application of artificial intelligence; and the first system was developed in the area of medical diagnosis over ten years ago. Despite a great deal of activity and public discussion in the field, relatively few systems appear to be yet openly used in practice. It is likely that a large unknown number are secret proprietary systems within companies. A number of so-called shells are commercially available, some for microcomputers. These are software packages that can be used to build expert systems; their purpose being to save the programmer a significant amount of the considerable programming time required for the development of useful systems. They are also called expert system tools. There are a considerable number of commercial shells available; and the companies providing them claim that they have had many applications. A more complete discussion of shells can be found in Harmon, Maus and Morrissey (1988). A large percentage of the development of an expert system is devoted to the user interface. We have suggested that there are three criteria for distinguishing expert systems from conventional engineering modeling. The first criterion was that physical modeling
Page 40
is not possible, and intuitive relationships are used for the predictions. This should be qualified in two ways. Firstly, it is always possible to combine intuition based rules with engineering modeling based on physical laws, or dimensional analysis. The second point is that solution methods given in the following chapters can be equally well applied to complex logic systems. These may be very complex and highly nested IF/THEN/ELSE procedures, required in some kinds of engineering modeling. It may be easier to codify and solve such systems if they are formulated as a set of IF/THEN logic rules of the type used in expert systems. They might be considered as pseudo expert systems. This point will be further discussed in later chapters. Reference Harmon, P., Maus, R. and Morrissey, W. (1988). Expert Systems; Tools and Applications, Wiley, N.Y.
Page 41
3 The General Structure of Expert Systems
Page 43
Expert systems are primarily a kind of intuitive modeling, although it is quite possible to incorporate empirical or physical modeling at any stage. In intuitive modeling a human expert sets up a set of IF/THEN rules that can be used for predictions. It is an attempt to implement in computer code the expert's intuitive decision making processes in a rather narrow field. This decision making is based on knowledge and experience, and the decisions essentially come to the expert from his or her unconscious mind. These rules have the following form.
IF (a set of variables have been observed to have values, I1) THEN (a set of output variables will have outcome 01) IF (an input set I2 is observed) THEN (an output set O2 will occur) ... ...
Note that the IF/THEN/ELSE structure, that is common in high level languages and could be used here, is not used. It is important to understand why this is so, and that expert system type IF/THEN rules are actually equivalent, in their final result, to nested IF/THEN/ELSE algorithm blocks. All of us who work with procedural computer languages like FORTRAN, C, BASIC and PASCAL are familiar with the IF/THEN/ELSE algorithm, such as in this example.
Page 44 IF (X1=a OR X2>b) THEN A1 = .true. ELSEIF (E3=.true. AND E4=.false.) THEN A2 = .true. ELSEIF (E5=.true. AND 97<X6<220) THEN A3 = .true. ENDIF
We can convert this to a set of simple logic statements by first converting all IF arguments or premises to logic events.
E1 = .true. if X1=a is true E2 = .true. if X2>b is true E6 = .true. if 97<X6<220 is true
Now the logic statements can be written as a sequence of simple IF/THEN expressions, or "rules".
IF (E1 OR E2) THEN A1 IF (E3 AND ) THEN A2 IF (E5 AND E6) THEN A3
All of the variables are logic variables with value true or false. The variables must be true for the expression to be satisfied, unless there is a bar over the symbol, which indicates that the variable must be false. This set of expressions give the same result for the A's as the IF/THEN/ELSE block. This same conversion can be done even
Page 45
if the IF/THEN/ELSE block is nested, except now it may be necessary to iterate the evaluation of the expressions, if they are not in the correct order implied by the nesting. But even out of order, they can be evaluated. In expert systems, these are rules. In Boolean notation, they look like this
Boolean methods are discussed in Chapter 11. Rules are not, of course, created by conversion from IF/THEN/ELSE blocks, but the equivalent to nested IF/THEN/ELSE blocks is hierarchical rules, in which the output from one rule may be an input to another. Thus systems of rules are commonly represented as logic networks, in which rules, now called "gates", may share inputs with other rules, so that there can be many interconnections and levels.1 Simple logic statements such as these are preferable to IF/THEN/ELSE blocks in expert systems for several reasons. The IF/THEN/ELSE blocks can become very large and complex nested systems, which are extremely difficult to work with. And in the simple rule form, new rules can be easily added without necessarily disturbing the existing
1Logic networks are discussed in Chapter 5.
Page 46
rules. It makes the Boolean representation possible, with all of the scope available in the methods of Boolean algebra. It makes methods incorporating uncertainty easier to handle. And it makes possible the use of data structured programming, discussed in Chapter 8. In the previous chapter we discussed pseudo expert systems, in which the rules are not intuition based. It is perhaps now possible to see the advantages of using the expert system approach in these complex logic systems. An example of a rule from an engineering expert system for robot planning is the following.2
IF (the robot is not holding anything AND there is an object that is removable AND the object needs to be moved AND the object is sitting on another wrong object AND there is a physical space designed to act as a buffer) THEN Place the part on the buffer
There are two major difficulties in explaining the procedure for setting up the set of rules for this type of expert system. No standardized procedure exists that is generally applicable, and the programmer must use his or her own ingenuity to develop a workable algorithm. Secondly, it is difficult to illustrate with examples, because meaningful examples must be too large, whereas
2Provided by Peter Tang.
Page 47
small ones appear trivial. Commonly there may be many inputs and potential outputs. And there may be a single unique selected output, or several candidates may be selected of similar merit. The set of rules also usually contains intermediate stages, and can be thought of as having a tree or hierarchical structure. The rules that represent the modeling are called the knowledge base in the terminology of artificial intelligence. The inputs to the system are sometimes called the evidence, representing observed facts or data that define the specific application for which the expert system is to be used. They can also be specifications, if the system is design related. The outputs from the system are sometimes called the conclusions, advice, or recommendations. Because there can be many inputs and outputs; and as well hundreds or even thousands of rules in the knowledge base, it is usually not a trivial problem to search through the rules and find the appropriate final output. The algorithm for doing this is called the control system, or inference engine3. It is essentially a binary search problem, quite similar to 01 programming in the field of optimization. However workers in the two fields rarely acknowledge one another. Just what constitutes appropriate expertise for a
3Some authors make a distinction between the control structure and the inference engine. See for example Taylor (1988).
Page 48
workable expert system is difficult to pin down. This expertise appears to be so intuitive and complex that it cannot be formulated in textbooks or instruction manuals. The expert knows when he or she examines the evidence, what the conclusion should be. The expert does not, in fact, consciously formulate the logic rules when reaching the conclusion, and one of the greatest difficulties in developing a system is to extract these rules from the expert's decision processes. Most writers in the field like to call the person, who works with the expert to do this, a knowledge engineer. We are all experts at many things in life - no one could survive without having a great deal of this day-to-day expertise. We are, for example, experts at sizing up a situation when involved with a group of people, and instantly deciding on a course of action. People skilled in physical games use this kind of expertise in addition to their manual skills. A tennis player decides instantly what kind of a shot to return as the ball approaches. A football quarterback sizes up the situation during a play, and instantly decides what to do. The essence of intuitive decision making is that it is done rapidly by the unconscious mind, using a vast store of previous experience and knowledge, with no formulation of reasons, logic or background knowledge. However, it is not always true that professional experts go from evidence to conclusion in one jump. It is common that there will be intermediate conclusions, and after these are reached, the
Page 49
expert may ask for additional evidence before proceeding to the next stage. A typical engineering example would be an expert sytem for material selection. In the first stage the expert might select a broad class of material, say aluminum alloy. Additional questions would then be asked about aspects of the application, and the expert would then go on to decide which aluminum alloy, what heat treatment, and so on. It thus uses hierarchical concepts, going from the more general to the more specific. It is not surprising, then, that it is difficult to extract rules that only exist in the unconscious mind, and actually in an unknown format. Procedures are available which could be considered a kind of machine learning, in which the rules are generated by an analysis of examples for which an expert has given solutions. See Chapter 10. This would actually seem to better simulate the human acquisition of intuition based expertise. An important feature of human expertise is uncertainty. Intuition based judgement is well known to be error prone, and a professional expert would acknowledge this and take it into consideration when delivering a judgement. He or she would, or should, give some indication of the degree of certainty embedded in the conclusion or recommendation, so that anyone acting on the advice would know the risk involved. Most expert systems should correspondingly have a mechanism for propogating uncertainty through the logic chain. There is considerable variation in methods used to
Page 50
do this. The different approaches include conventional probability, Bayes' theorem, fuzzy set theory, and a kind of pseudo probability. A lot of the difficulty appears to be associated with the reluctance of computer scientists to accept the concept of subjective probability, insisting on a strict frequency definition. The author has argued elsewhere [Siddall 1983] that in engineering design, the only meaningful conceptual basis for probability is to define it as a measure of one's personal belief that an event will occur, based on experience and judgement. Observed data is considered only additional information, to assist in making probabilistic judgements. This approach to probability would also seem to be very appropriate to expert systems, or indeed any application based on intuitive decision making. There are a number of capabilities that a good expert system is said to require in addition to providing a final recommendation (Weiss and Kulikowski 1984); although one must always be careful of overgeneralizing in a discussion of expert systems, and they may not be appropriate to all systems. However they should always be given consideration when developing one. They include the following. 1. The system will provide, on request, a tracing of the logic chain that led to the conclusion. This is said to be important so that the user has some insight into how the decision was made, and can apply judgement to possibly
Page 51
modifying it. It is also intended to help in the evaluation and development of a system. This tracing is rather difficult to achieve; and it is somewhat surprising that a computer is asked to do something, while simulating an expert, that the real expert can rarely do well. If you ask a football quarterback why he made the play that he did in a game, he may give some justification by citing the set of events that existed at the time, and suggesting bad things that might have resulted if he had made a somewhat different decision. But it will tend to be not very convincing or logical. Can we really expect more from the computer than we do from the real life expert? Possibly this is one of the superior aspects of computers over humans, that also occur in other applications such as numerical analysis; but we may in this case be deluding ourselves about the capabilities of the computer. 2. The system will permit the user to correct previous inputs, and volunteer information not asked for. 3. The control system operates sequentially, or appears to, so that the user can at any stage ask for the system's interim conclusions. 4. The system has a data file for storing and retrieving known examples of the related expertise. 5. It would enhance program development and debugging to be able to do a trace for a case, so that if the system had been modified or extended, the programmer could easily determine how the conclusion was affected for a trial case,
Page 52
and where any deviation occurred. 6. It could be helpful to analyze the effects of incomplete data. The system may be designed so that it would still give the best possible recommendation, even in the face of missing information. This, in fact, would be a nice feature and correspond to real life situations. But in this event, some indication would be desirable as to the effect on the probability associated with the conclusion, and the effect if the additional information were available and had a certain form. Would a radically different conclusion have been reached? 7. It would not be unlikely for inconsistencies to creep into the expert based rules. It might be possible to incorporate some automatic check for these; and if they cannot be resolved, at least the user can be informed of their existence, if they appear in some specific cases. 8. Possibly the most important facility is the ability of the system to accept new rules. A good system should be able to grow, possibly because new experts have been consulted, or new expertise has been developed, or new candidates become available, such as a new plastic in a material selection system. It should be possible to add these new rules without a major revision of the coding. Although the primary role of expert systems is the coding of intuitive expertise in an algorithm, it is important to again emphasize that all rules need not be intuitive. And we may even have all rules based on logic or
Page 53
modeling, with no intuitive input. This would be a pseudo expert system, referred to earlier. References Siddall, J. N. (1983). Probabilistic Engineering Design; Principles and Applications, Marcel Dekker, N.Y. Taylor, W. A. (1988). What Every Engineer Should Know About Artificial Intelligence, MIT Press, Cambridge, Mass., USA. Weiss, S. M. and Kulikowski, C. A. (1984). A Practical Guide to Designing Expert Systems, Rowman & Allanheld, Totawa, New Jersey. Problems 3.1 Find or devise any nested IF/THEN/ELSE block with at least three levels. Convert it to a set of IF/THEN rules. Demonstrate by an evaluation of each form that they are equivalent in result. Show that this is true by iteration, even if the second set is out of the order implied by the IF/THEN/ELSE block nesting.
Page 55
4 Phases of Developing an Expert System
Page 57
An expert system need not and should not be created as one complete fully rounded system, before testing and development begins. It should be created in such a way that it can be started small and gradually expanded. Most writers strongly emphasize a general rule that the knowledge base and the control structure be quite separate. It will also be clear that a high degree of modularization is desirable, with careful design discipline in keeping track of inputs and outputs to the modules, and of design changes. The development phases can be usefully categorized in order to get a broad picture of the procedure used in creating an expert system. It has much in common with the general engineering design procedure. 1 Recognition of Need It is important to carefully circumscribe the limits of the expert system; and they must usually be rather narrow. If the need is very broad in nature, then the system should be decomposed into subsystems. 2 Determination of the True Role of the System It must carefully be determined exactly who will use the system, and for what purpose. 3 Determination of the Outputs The outputs will have the form of conclusions, recommendations, or predictions. 4 Determination of the Inputs The inputs will have the form of evidence, or data, or observations, or possibly
Page 58
goals of the user. 5 Formulation of the Knowledge Base The human expert will provide the knowledge that is transcribed into logic rules. This phase also includes the structure of the rules - the order in which they are arranged and classified. There is likely to be some iteration between 3, 4, and 5. 6 Development of the Control System The algorithm for searching through the rules to find the logically true conclusion is done at this stage. Although in the beginning of this section it was stated as a design rule that the knowledge base and the control system should be kept separate, this and the previous stage cannot really be done separately. The structure of the knowledge base, and the binary search procedure, are really rather closely interconnected, although structurally separate, and iterations of these two stages will be required. This sort of internal looping is, of course, typical of the design process. 7 Development of the User Interface This is usually interactive, with the user responding to questions from the computer; but inputs could have other forms, such as sensor measurements, the result of searches through data files, the results of engineering modeling, and so on. Inputs may be sequential, occurring at some intermediate conclusions reached by the control system, so that all potential system inputs are not necessarily activated. 8 Writing of the Prototype System Pseudocode Just as in any development of long and complex software systems, it
Page 59
is highly desirable to develop a detailed algorithm represented by pseudocode. This is more appropriate to procedural languages such as FORTRAN, rather than a functional programming language like LISP, or a logic programming language such as PROLOG. However, even when using these latter two languages, which are said to be nonalgorithmic, some detailed planning of program structure must be done. The experimental prototype should be a subset of the complete system, so that not all inputs, outputs, and rules would be included. The user interface would also be rudimentary. The use of this subset makes it easier to debug the algorithm and coding, to test the control system, and the accuracy of the conclusions for trial cases. The testing at this stage cannot be conclusive, but the subset should be selected with a view to being a typical representation of the full system. 9 Writing of the Prototype Code The selection of a language for artificial intelligence applications has been discussed in Chapter 1 . Accumulating evidence from the literature seems to indicate that the traditional use of LISP is waning somewhat, and is not necessary or even particularly desirable. Practitioners are favoring PROLOG in increasing numbers, but some appear to look on it unfavorably. Forsyth (1984, page 16), for example, is quite abusive of PROLOG. The procedural languages most commonly used appear to be C, FORTRAN and PASCAL. 10 Testing All testing of design prototypes leads to
Page 60
modifications, and these will commonly be extensive in expert system development, because of the intuitive nature of the program design. 11 Development and Refinement of the Full System This stage will include the addition of more components, a refinement of the user interface, and considerable testing of cases. 12 Modifications during Application Any dynamic device or system will continue to evolve and improve during its application lifetime. New expertise should be incorporated, and new inputs and outputs added. Provision must be made to accommodate this in the structure of the system. It is said (Winston & Prendergast 1984) '' . . . developing a substantial expert system with real performance takes at least five manyears of effort, . . .", and the reference goes on to show that some of the well known expert systems described in the literature have taken much longer, up to thirty or forty man-years. These are rather formidable figures, but it is doubtful if expert systems written by engineers would take so long. The engineer will likely be something of an expert in the subject of the expert system, and even though he or she may call on other experts for substantial assistance, the rapport will be much better, and the engineer will have better insight into the problem than a general purpose "knowledge engineer".
Page 61
References Forsyth, R. (ed.) (1983). Expert Systems; Principles and Case Studies, Chapman and Hall, London. Winston, P. H. and Prendergast, K. A. (1984). The AI Business; the Commercial Use of Artificial Intelligence, MIT Press, Cambridge, Mass. Suggested Reading Harmon, P., Maus, R. and Morrissey, W. (1988). Expert Systems; Tools and Applications, Wiley, N.Y., pp. 171-159. Parsaye, K. and Chignell, M. (1988). Expert Systems for Experts, Wiley, N.Y., pp. 295304. Weiss, S. M. and Kulikowski, C. A. (1984). A Practical Guide to Designing Expert Systems, Rowman & Allanheld, Totawa, New Jersey, pp. 13,14. Winston, P. H. and Prendergast, K. A. (eds.) (1984). The AI Business: The Commercial Uses of Artificial Intelligence, MIT Press, Cambridge, Massachusetts, pp. 135.
Page 63
5 Knowledge Base
Page 65
5.1 Introduction We are concerned here with how the rules are developed; how they are structured so as to make programming, debugging, modifications and control optimum; and the various ways that the knowledge rules can be represented. Acquiring and structuring the rules is a fundamental problem in coding an expert system, and the area which writers tend to gloss over, and be least helpful. This would indicate that there are no standard procedures, and the programmer must simply use ingenuity to devise techniques for finding the rules, and the best structure for them. It might seem more logical to first discuss inputs and outputs, but it is difficult to get a good grasp of these until one has some insight into the knowledge base procedures. We shall avoid any concern about uncertainty in this section, deferring it to a later discussion. The use of machine learning will also be left to a special section. We shall not discuss the use of shells; an engineer, with programming experience, and some expertise in the subject, does not really need to use a shell, and may find them lacking in adequate flexibility in designing the control system and the management of uncertainty. In the event that a decision is made to use a shell, it is not difficult to do so with the background information in this book.
Page 66
5.2 Acquiring the Rules Let us begin by looking at an example that is familiar, easy to set up, and the expertise is well known. It is not really an expert system, although it has many of its characteristics; but if programmed, it might be a convenient piece of software, particularly if expanded. It is adapted from Forsyth (1984). Let us call it an expert system for identifying the species of an animal or bird, using observed physical characteristics or behaviour patterns. It illustrates the importance of class concepts, and class hierarchies, in developing and structuring the knowledge base. We shall severely limit the number of species that we might identify. These outputs are shown at the top of Figure 5.1. We shall use AND and OR gates to graphically illustrate logic statements. Observed features are associated with circles; conclusions, either intermediate or final, are enclosed in rectangles. This same representation is used in the design of digital electronics when it is a physical layout drawing, and in the analysis of failure of complex systems when it is called a fault tree. Note that there are some intermediate conclusions - mammal, carnivore, ungulate and bird. Note that if certain intermediate conclusions are reached, we do not need to ask for all of the potential inputs. For example, if both "hair" and "gives milk" are true, then an intermediate conclusion is "mammal", and we need not ask if "feathers",
Page 67
Fig. 5.1 Logic diagram for an expert system to identify certain animals and birds.
Page 68
"flies", and "lays eggs" are true. We could think of Figure 5.1 as representing a preliminary sub-set of the complete system, used to test the program in a preliminary way. It is characteristic of expert system rules that OR and AND logic combinations are avoided; although this is not necessarily true. The concept of representation by logic gates automatically enforces this. The next question is - how shall we structure the program that represents the logic diagram? We shall go into this in more detail when discussing control systems, but for now we shall use the simplest possible approach - putting them down in sequence from the bottom up in Figure 5.1. The following algorithm is represented in pseudocode. Note that parentheses () are used to indicate input items; and brackets [] are used to indicate a conclusion. Algorithm for Identifying an Animal Input The following logical variables pointed teeth, claws, forward eyes, hair, gives milk, eats meat, hoofs, chews cud, feathers, flies, lays eggs, tawny color, dark spots, long legs, black stripes, long neck, cannot fly, black and white, swims, flies
Page 69
Procedure
1. Obtain evidence from user 2. # Determine conclusion # R1. IF (pointed teeth) AND (claws) AND (forward eyes) THEN [T1] R2. IF (hair) OR (gives milk) THEN [mammal] R3. IF (eats meat) OR [T1] THEN [carnivore] R4. IF [mammal] AND (hoofs) THEN [T2] R5. IF [mammal] AND (chews cud) THEN [T3] R6. IF (flies) AND (lays eggs) THEN [T4] R7. IF [T2] OR [T3] THEN
continues
Page 70
continued
[ungulate] R8. IF (feathers) OR [T4] THEN [bird] R9. IF [mammal] AND [carnivore] AND (tawny color) AND (dark spots) THEN [cheetah] PRINT "cheetah" STOP R10. IF [mammal] AND (tawny color) AND [carnivore] AND (black stripes) THEN [tiger] PRINT "tiger" STOP R11. IF (dark spots) AND (long legs) AND [ungulate] AND (long neck) THEN [giraffe] PRINT "giraffe" STOP R12. IF (black stripes AND [ungulate] THEN [zebra] PRINT "zebra" STOP
continues
Page 71
continued
R13. IF (long neck) AND [bird] AND (cannot fly) AND (black and white) THEN [ostrich] PRINT "ostrich" STOP R14. IF (cannot fly) AND [bird] AND (black and white) AND (swims expertly) THEN [penguin] PRINT "penguin" STOP R15. IF [bird] AND (flies well) THEN [albatross] PRINT "albatross" STOP 3. Go to 2 4. END
We can now test the system with some evidence. Assume that the following input items were determined to be true.
Page 72
hair, gives milk, eats meat, pointed teeth, claws, forward eyes, tawny color, black stripes, cannot fly The algorithm is now executed and the following rules will have fired.
R1 - [T1] R2 - [mammal] R3 - [carnivore] R10 - [tiger]
Note that we did not need the (cannot fly) evidence, suggesting that it might be desirable to only supply evidence as needed, in order to avoid unnecessary questioning of the user. This could be significant in large systems. Note also that there is a hierarchy in Figure 5.1, which was used to order the rules. This avoids having a ''don't know" at any point, and also avoids the need for the iteration feature that was included in the algorithm. However, in complex systems, this hierarchy may be difficult to maintain or determine, and it is still possible to order the rules randomly. Sufficient iterations will eventually fire the required rules. However we would need a "don't know" feature, so that a rule containing an unknown intermediate conclusion would be skipped. It is generally recommended that the rules be as
Page 73
simple as possible. Gates could be combined in an IF statement; indeed everything could be combined in an IF/THEN/ELSE block, but this is avoided. The simple form of the rules, and the random ordering, make changes and additions easy to do. A new rule can be simply added on to the bottom of the list. However we shall see that some structure for the rules is desirable. It is also interesting at this point to think about some ways to make the algorithm a little more sophisticated. We could, for example, associate a flag with each rule, indicating whether or not it has been fired. As each rule is encountered, its flag could be checked, and if the rule has fired, it would be bypassed. The flag could also be used as a tracer, permitting the fired rules to be listed after a run. It also might be desirable to have a data file associated with the system, so that if a tiger, for example, has been identified, the user could ask for a lot of additional information about the nature of tigers. Having gained some insight into how the rules are set up, and how a simple control structure works, we can now begin to think about how to formulate rules for a new sytem being developed. The only real guideline that is available for establishing rules is to use the principle of class concepts and hierarchical arrangement of classes. This is sometimes called taxonomy, and one of its most familiar applications is is the classification of biological species, using classes such as orders, genera, families,
Page 74
species, sub-species and varieties. This is not, unfortunately, a very revealing guideline, since humans automatically think this way in any event. The whole structure of our thought and intuitive behaviour is based on class concepts. Consider the simple engineering example of bearings; the partial taxonomy is shown in Figure 5.2. If an engineer decides he wants to select a bearing, he first asks himself - do I want a radial or thrust bearing? And he looks for evidence to support a decision. He then asks - should it be a rolling element bearing or a sliding contact bearing? If the application requires very low friction, high rigidity, and permanent lubrication, he will likely choose a rolling element bearing; and so on through the hierarchy. Thus inputs are related to conclu-
Fig. 5.2 Class hierarchy for bearings.
Page 75
sions through this hierarchical class approach. Each subclass represents an intermediate conclusion. Another engineering example of an actual expert system is one for selecting a welding process and welding rod material for welding metals (Jones and Turpin 1986). This system has over 100 rules and the inputs include parent material and hardness, material thickness, weldment position for welding, and so on. The input might include other application details which would be used by the expert to determine if there was danger from hydrogen embrittlement, which would be an intermediate conclusion. Finally, consider the fault tree diagram shown in Figure 5.3, taken from Siddall (1983). It represents the possible failure modes of a power plant. Again, we are ignoring failure probabilities. This logic diagram uses some additional symbols, shown in Figure 5.4, plus some symbols that are not used, but could be in other applications. They correspond to symbolism used in fault tree analysis in reliability theory. The logic diagram is translated into the following algorithm for an equivalent expert system, used to determine which subsystem failure caused breakdown of the plant. Our outputs are therefore pump system failure, heat exchanger failure, and boiler failure. Intermediate conclusions represent successive failures of lower level subsystems in the hierarchy.
Page 76
Fig. 5.3 Fault tree for failure modes of a power plant.
Page 77
Page 78
Fig. 5.3 Continued
Page 79 SYMBOL NAME AND OR INHIBIT LOGIC Output event occurs if all input events occur Output event occurs if one or more input events occur Output event occurs when input event occurs, if conditional event has occured
INVERTER Converts the value of a variable to its complement EXCLUSIVE Output event occurs if one and only one input event occurs OR Input event Input event (Undeveloped event in reliability fault trees) Conclusion Transfer symbol to join separated parts of a network
Fig. 5.4 Logic symbols.
Page 80
Algorithm for Identification of the Subsystem Failure for a Power Plant Input
I1 - motor 1 fails I2 - valve 1 closes inadvertently I3 - valve 2 opens inadvertently I4 - motor 2 fails I5 - valve 1 fails to close I6 - valve 2 fails to open I7 - heat exchanger fails externally I8 - heat exchanger fails internally I9 - internal heat exchanger failure damages system I10 - external leak in No. 1 boiler I11 - external leak in No. 2 boiler I12 - internal leak in No.1 boiler I13 - internal leak in No. 2 boiler I14 - structural failure in No. 1 boiler I15 - structural failure in No. 2 boiler I16 - NaK plug in No. 1 boiler I17 - NaK plug in No. 2 boiler I18 - mercury side of No. 1 boiler plugs I19 - performance degrades of mercury side of No. 1 boiler I20 - erosion or corrosion of mercury side of No. 1 boiler I21 - mercury side of No. 2 boiler plugs I22 - performance degrades of mercury side of No. 2 boiler
continues
Page 81
continued
I23 - erosion or corrosion of mercury side of No. 2 boiler I24 - pump No. 1 leaks externally I25 - pump No. 2 leaks externally I26 - valve No. 1 leaks externally I27 - valve No. 2 leaks externally
Procedure
1. Obtain evidence from user or from sensors 2. # Determine conclusion # R1. IF (I1) OR (I2) OR (I3) THEN [No. 1 motor/valve failure] R2. IF (I4) OR (I5) OR (I6) THEN [No. 2 motor/valve failure] R3. IF (I8) THEN # Check for system damage # IF (I9) THEN [T1] ENDIF R4. IF (I7) OR [T1] THEN [heat exchanger failure]
continues
Page 82
continued
PRINT "System failure was due to heat exchanger failure" STOP R5. IF (I10) OR (I11) THEN [external leak] R6. IF (I12) OR (I13) THEN [internal leak] R7. IF (I14) OR (I15) THEN [structural failure] R8. IF (I16) OR (I17) THEN [NaK plug] R9. IF (I18) OR (I19) OR (I20) THEN [No. 1 boiler fails on mercury side] R10. IF (I21) OR (I22) OR (I23) THEN [No. 2 boiler fails on mercury side] R11. IF (I21) OR (I25) THEN [Pump leaks] R12. IF (I26) OR (I27) THEN [External valve leaks]
continues
Page 83
continued
R13. IF [pump leaks] OR [ext. valve leak] OR [motor and valve failure] THEN [pump system failure] PRINT "System failure was due to pump system failure" STOP R14. IF [No. 1 motor/valve failure] AND [No. 2 motor/valve failure] THEN [motor and valve failure] R15. IF [external leak] OR [internal leak] THEN [leak] R16. IF [No.1 boiler fails on mercury side] AND [No. 2 boiler fails on mercury side] THEN [boiler fails on mercury side] R17. IF [leak] OR [structural failure] OR [NaK plug] OR [boiler fails on mercury side] THEN [boiler failure] PRINT "System failure was due to boiler failure" STOP 3. Go to 2 4. END
Page 84
There are two features to note in this algorithm. In Rule 3 we have an example of the system asking for additional information, which might be expensive or time consuming to acquire if it was not absolutely essential. And Rule 13 is out of strict hierarchical order, and cannot fire during the first pass, because [motor and valve failure] will be a ''don't know". Let's use this expert system as a model for trying to conceive how one might go about acquiring the knowledge base. There are three possible systematic approaches that might be used; or combinations of these. The process could be started at either the top or the bottom of the logic tree. It is rather arbritrary, but might be influenced by whether it is easier to identify inputs or outputs. 1. Build the complete logic tree, starting at either the top or bottom, and then write the rules. This corresponds essentially to what we did above with the fault analysis. Start at the top and ask the expert - the failure of what major subsystem alone would cause a system breakdown? The software designer and the expert are then forced to identify subsystem classes that would combine with OR logic, and these would represent the outputs. One subsystem branch would be arbitrarily selected, an a similar question asked - what lower subsystems could contribute to the failure of this subsystems, and how are they logically connected? More generally, the question would be - what
Page 85
previous conclusions, or evidence, would be required to reach this conclusion? The corresponding portion of the logic tree would then be constructed. This process would be continued until a subclass has been reached which appears to be an appropriate observable input. Then a new branch would be started from the top. The bottom up approach would begin by identifying the lowest strata of subclasses, or conclusions, or some of them. One would be selected, and the software designer would ask the expert - what observed inputs could lead to this conclusion and by what logic combination? Or the question might be framed somewhat differently - what combination of evidence will permit an interim conclusion? This portion of the logic tree would then be drawn. As these bottom groups are accumulated, a system structure would begin to emerge, and higher level groups would be conceived and added, until it becomes clear that a subclass has been reached that would cause system failure, or more generally, that a final conclusion or recommendation has been achieved. 2. Use the same top down approach but write the rules immediately, and do not attempt to design a full logic tree. This may make it more difficult to structure the rules in an optimum way. 3. Use the same bottom up approach as in procedure 1, but again do not attempt to design a logic tree. Simply write down the rules as you come to them.
Page 86
The above procedures would seem straightforward and rather simple, particularly if no uncertainty is involved. There could be uncertainty about whether or not a given input item, or lower conclusion, really does contribute to the current conclusion, in the sense does it ever have any effect, even with a small likelihood? should it be there at all? Or the uncertainty may be reflected in the question what is the probability that the given combination of evidence and interim conclusions will actually fire the rule? A third possibility is that the expert may believe that some evidence is unreliable, representing in some cases a faulty observation, or a sensor error. However, probabilistic considerations aside for now, this apparent simplicity may be misleading, and a characteristic of some engineering systems in which hierarchical classifications are rather clear cut and obvious, because engineering systems tend to be designed by decomposition. One of the strongest messages in the literature on expert systems is that attempting to set up standard procedures such as the above is dangerous, and the approach to every expert system must be tailor made. A typical statement is by Radig (1986). "It is well recognized that the design and evolution of knowledge based systems is a complex process which is not properly understood." It is a basic characteristic of any conceptual design that no pat procedures exist, and the ideas burst out of the designer's mind. So this difficulty should not
Page 87
be surprising, if we consider design of an expert system analogous to designing any engineering system. Both are conceptual design problems, and require a high level of creativity, unless one is satisfied to use a stereotyped configuration, and it is possible to do so. However the above procedures may provide a start; and the software designer could pick one and begin; and see how things go, but always keeping an open mind for new and better approaches appropriate to his or her particular problem. Some authors have attempted to describe more fully techniques for extracting knowledge from experts. See for example the following sources in the reading list - Hart (1986), Kidd (1987), Liebowitz (1988), and Parsaye and Chignell (1988). After the initial rule set has been generated, more attention can be paid to structuring them. This cannot be completely divorced from the problem of defining the control system, but some attention should be given to it at an early stage. For example - can the rules be reordered or grouped to reduce the likely amount of required evidence? In other words, search paths should be set up that will avoid unnecessary inputs. Large sets of rules may require some kind of index or catalog system, so that they can be easily found. Provision must be made for easily adding new rules. Consideration should be given at this time to providing the capabilities described in Chapter 2.
Page 88
References Forsyth, R. (ed.) (1983). Expert Systems; Principles and Case Studies, Chapman and Hall, London. Jones, J.E. and Turpin, W. (1986). Developing an Expert System for Engineering, Computers in Mechanical Engineering, November, 1986, pp. 1016. Radig, B. (1986). Design and Application of Expert Systems, in Winter, H., Artificial Intelligence and Man-Machine Systems, Proc. Int. Sem. Organized by Deutsche Forschungsund Ver-suchsanstalt fur Luft- und Raumfahrt (DFVLR), Bonn, Germany, Springer-Verlag, Berlin. Siddall, J. N. (1983). Probabilistic Engineering Design; Principles and Applications, Marcel Dekker, N.Y. Suggested Reading Alty, J. L. and Coombs, M. J. (1984). Expert Systems; Concepts and Examples, NCC Publications, The National Computing Centre Ltd., Oxford Road, Manchester M1 7ED, England, pp. 7176. Brachman, R. J. (1983). What IS-A Is and Is'nt: An Analysis
Page 89
of Taxonomic Links in Semantic Networks, Computer, Vol. 16, No. 10, October, pp. 3036. Forsyth, R. (ed.) (1983). Expert Systems; Principles and Case Studies, Chapman and Hall, London, pp. 112132. Hart, A. (1986). Knowledge Acquisition for Expert Systems, McGraw-Hill, N.Y., pp. 4969. Henley, E. J. and Kimamoto, H. (1985). Designing for Reliability and Safety Control, Prentice-Hall, Englewood Cliffs, New Jersey, pp. 223238. Liebowitz, J. (1988). Introduction to Expert Systems, Mitchell Publishing, Santa Cruz, California, pp. 3545. Parsaye, K. & Chignell, M. (1988). Expert Systems for Experts, Wiley, N.Y., pp. 342351. Radig, B. (1986). Design and Application of Expert Systems, in Winter, H., Artificial Intelligence and Man-Machine Systems, Proc. Int. Sem. Organized by Deutsche Forschungsund Ver-suchsanstalt fur Luft- und Raumfahrt (DFVLR), Bonn, Germany, Springer-Verlag, Berlin. Schubert, L. K., Papalaskaris, M. A. and Taugher, J. (1983). Determining Type, Part, Color, and Time Relationships, Computer, Vol. 16, No. 10, pp. 5360.
Page 90
Weiss, S. M. and Kulikowski, C. A. (1984). A Practical Guide to Designing Expert Systems, Rowman & Allanheld, Totawa, New Jersey, pp.8391. Woods, W. A. (1983). What's Important About Knowledge Representation?, Computer, Vol. 16, No. 10, pp. 2226. Problems 5.1 The purpose of this problem is to gain some initial familiarity with an expert system, and how it is programmed. Convert the algorithm for the animal species expert system into computer code. Use any high level language with an IF/THEN algorithmic structure. Make the first version of your program exactly as indicated in the algorithm, with embellishments if you wish, and then a second version with the order of the rules arbitrarily changed, so that it will not run in one pass. Include in your report an introduction, a discussion of the results, a listing, and the computer printout of the output of the two versions.
Page 91
6 Inputs, Intermediate Conclusions, and Outputs
Page 93
6.1 Introduction The inputs are all of the potential items of information or evidence that the user can provide. The outputs are all of the potential conclusions that the expert must provide. And intermediate conclusions are those that are either convenient or essential, as part of the process of progressing towards final recommendations. Possibly the best way to obtain a feel for how to identify these is by some examples. A complete set of all three of these items would usually be evolved during the design of the expert system, and only a preliminary subset identified at the beginning. 6.2 Inputs In our example of identifying animal species, the inputs are the basic items of evidence that the observer can provide. These are detailed characteristics of the animals structure or behaviour. The fault diagnosis system, described in Section 5.2, inputs are associated with the smallest subsystem or subassembly in which a fault can be observed. It may be possible to eventually analyze the problem further. For example, if motor failure is observed, the motor could be disassembled in order to determine which component part failed, such as a bearing failure, but this is not what is normally observed directly. However it is conceivable that the bearing failure is due to some cause external to the
Page 94
motor, such as excessive environmental temperature, and the expert system then might at some point, if motor failure is reported, ask the user to determine this environmental condition as additional evidence. In the welding process example (Jones & Turpin, 1986), referred to above, the inputs were the parent metal materials and thicknesses, weldment position, and service environment. An expert system to select a bearing type would require inputs such as shaft configuration, service characteristics of the machine such as load and speed, life required, operating environment, servicing procedures, and so on. A full set is given in Chapter 12. It is clear from these examples that it is difficult to anticipate in advance all possible required inputs. 6.3 Outputs The outputs for the animal identification system are rather clearly defined, being simply a list of species. The knowledge base would likely be developed from the top down, working from a given species, down to the necessary input evidence. The fault diagnosis expert system would, on the other hand, likely be developed from the bottom up, beginning with the observable failures as inputs, and "discovering" what outputs would be developed. When a major subsystem failure is reached that would imply system failure, it
Page 95
would be defined as an output. The welding process expert system would have rather readily identified outputs - all of the available welding processes and rod or wire materials. An expert would also have no difficulty in making a comprehensive list of all possible bearing types in the bearing selection example. A full set of conclusions for the bearing selection system is given in Chapter 12. 6.4 Intermediate Conclusions Intermediate conclusions are most likely to materialize as the expert and programmer cooperate in developing the system. We shall see that systems using machine learning still require the expert to identify the intermediate conclusions in the case studies that are provided to the computer as the ''experience" input. It seems to be characteristic of human thinking to create hierarchical class concepts, so this leads naturally to appropriate intermediate conclusions. The reading list below gives additional more detailed descriptions of expert system examples, showing the inputs, outputs, and intermediate conclusions. References Jones, J. E. and Turpin, W. (1986). Developing an Expert
Page 96
System for Engineering, Computers in Mechanical Engineering, Nov., pp. 1016. Reading Charniak, E. and McDermott, D. (1985). Introduction to Artificial Intelligence, Addison-Wesley, Reading, Mass., pp. 461468. Jardine, T. J. (1986). A Machinability Knowledge Based System, Knowledge Based Problem Solving (Kowalik, J. S. ed.), PrenticeHall, Englewood Cliffs, N.J., pp.257269. Weiss, S. M. and Kulikowski, C. A. (1984). A Practical Guide to Designing Expert Systems, Rowman & Allanheld, Totawa, N. J., pp. 4373. Winston, P. H. and Prendergast, K. A. eds. (1984). The AI Business; the Commercial Uses of Artificial Intelligence, MIT Press, Cambridge, Mass., pp.4191.
Page 97
7 Control Structures
Page 99
7.1 Introduction The control structure is the algorithm used to decide on the search path for finding a sequence of "true" rules, leading to the final recommendation. In most cases these are heuristic algorithms, in the sense that they are essentially arbitrary, but possibly with some rational features incorporated. It may be possible to incorporate a goal or merit function, the objective function of optimization, which can be used as a basis for a rational hill climbing type search procedure. A hill climbing algorithm assumes that the knowledge base has a structure that permits the merit function to guide the search. It is worth while to review the basic procedure of an expert system. Some authors call this a production system, and it can be applied to other types of problem solving than expert systems (Nilsson 1980, Charniak and McDermott 1985). The general production system algorithm is given below. It is assumed that there is no uncertainty. General Production System Algorithm Data Base
List of inputs List of outputs List of intermediate conclusions
Input
Evidence - items in input list that are true
Page 100
Procedure
1. DO to 4 2. Select some rule in the set of rules, following some control algorithm IF rule fires THEN IF final conclusion found THEN EXIT ENDIF ENDIF
3.
4. ENDDO
Item 2 in the above algorithm is the control structure. This, the data base, and the set of rules are all separate and distinct. It should be possible to modify or add to each of these three components without affecting the others. A rule should not call another rule, and rules should not be nested in IF/THEN/ELSE blocks. We shall examine in the following sections some of the standardized control strategies that have been used in expert systems. We shall eventually compare methods, but it is difficult to give general guidelines, and the designer of the expert system must use judgement to decide which to use, or whether a unique control strategy is required.
Page 101
7.2 Sequential Structure with Iteration We have noted earlier that the simplest control structure is to execute the rules in any order, and iterate through the list until a "true" conclusion has been found. A perfectly ordered set of rules would mean that no rule would be encountered that would have an unknown input, as we have seen in the earlier examples. With this kind of perfect ordering, only one pass is required. Otherwise the number of passes required for any particular application is unpredictable. It would be useful to have a flag associated with every rule, indicating if it has fired yet or not. A fired rule would then be skipped in later passes. It might also be useful to have an indicator associated with every rule to show if it is ready to fire, with all inputs known. If it was not ready, it too would be skipped. 7.3 Exhaustive Search Sequential search does not necessarily concern itself with the network structure of the set of rules. We have seen in our earlier examples that even quite small knowledge bases can have a rather complex logic network. We now begin to examine the network structure, in order to apply some systematic algorithm for searching through it for a final "true" conclusion. Figure 7.1 illustrates an idealized logic network with four hierarchical levels. There are four final conclusions, with triple branching into each final and intermediate conclusion. The logic
Page 102
gates associated with each box are not shown since their nature is not relevant to the purpose of the illustration. The resulting number of rules is 52, and the number of inputs, sometimes called leaves, is 156. Breadth first is an exhaustive search, starting at the bottom left intermediate conclusion in Figure 7.1, and progressing horizontally across until all bottom hierarchy rules are tested. It then jumps to the left end of the next level and proceeds across. This continues until the
Fig. 7.1 An idealized logic network for a knowledge base. The network is not completely drawn.
Page 103
top row is reached and a final conclusion is fired. The ordering of the final conclusions in the top row is arbitrary. The perfectly ordered sequential structure is identical to breadth first. A depth first procedure begins at the bottom, and then follows an upward path, with reference to the logic diagram in Figure 7.1, using the following algorithm. A blocked rule is one that has been tried and been found false. Depth First Algorithm
Bi = ith bottom intermediate conclusion. A bottom conclusion is one having at least one input; and the order is arbitrary. Ri = rule associated with Bi TR = current top rule in chain BR = current rule in back tracking Comments are enclosed in # symbols 1. # Initialize # i= 0 2. i = i + 1 3. IF (i > total number of bottom rules) THEN PRINT, "No bottom rule fired" STOP ENDIF # Test Ri # IF (Ri does not fire) THEN
continues
Page 104
continued
Block rule Ri Go to 2 ENDIF 4. Move up to next rule or final conclusion that is above Define this rule as TR 5. IF (TR is an OR rule) THEN # Rule has fired, go to next rule # Go to 4 ELSEIF (TR is an AND rule) THEN 6. DO for all associated paths leading into this gate Backtrack to the next rule that is below Redefine BR as this rule IF (BR has previously fired) THEN Go to ENDDO ELSEIF (BR can fire) THEN Fire it Go to ENDDO ELSEIF (BR has been blocked) THEN Block output from rule BR and block TR Go to 2 # Give up on this chain # ELSE BR is an unknown rule Go to ENDDO # Backtrack # ENDIF ENDDO Fire rule TR Go to 4 ELSE # AND rule OK - continue #
7.
8. 9.
10.
continues
Page 105
continued
# TR is a final conclusion # PRINT, output of result STOP ENDIF 11. End
7.4 Forward and Backward Chaining Forward chaining corresponds to the depth first search described above. This procedure assumes that the inputs at the bottom of the logic tree are the known quantities, and the appropriate conclusion is to be determined. We shall, incidently, find that when we begin to use a probabilistic approach, there is not necessarily a unique conclusion; there may be several with varying levels of likelihood of occurring. Rather than arbitrarily follow a forward chaining path, some control systems check to find all rules "ahead" that can fire. Conflict resolution is used to decide which of these rules to fire next. So-called meta-rules are used to decide. Some systems may be used in reverse; so that we wish to determine which of the inputs would lead to a specified conclusion. It would then be more convenient to use backward chaining, working from the top down, using an algorithm similar to forward chaining described above. However, some control structures use backward chaining even though the inputs are known and the conclusions
Page 106
unknown. The logic programming language PROLOG has this algorithm built in; and the user has no option but to use it (Forsyth 1984, p. 101). It is essentially the same as the backward chaining described above, except we do not know which is the correct conclusion, and have to keep trying them in succession, until a successful downward set of rules is found. Back tracking would be necessary. Backward chaining can reduce the number of inputs, since there is a significant probability that a conclusion will be found before all inputs are solicited from the user. Also the user can augment this probability if he or she has some insight into the most likely conclusion in a specific application. Although an algorithm similar to that given in the previous section could be used for backward chaining, consideration should also be given to the use of recursion1 and stacks2, for both forward and backward chaining. 7.5 Structuring We are now concerned with methods of reducing the amount of search required. The methods described above are in a sense arbitrary and random. On the average, we would have to search one half the total number of rules, even with perfectly structured rules; and we cannot predict the
1This procedure is available in PASCAL and C, and can be simulated in FORTRAN - see Wagner (1980) or Day (1972). 2See Appendix.
Page 107
search time in advance. We have shown in Section 4.2, in the discussion of the animal species system, that the effectiveness or efficiency of the control structure can be improved by appropriate ordering of the rules. It may be possible in any system to gain this kind of enhancement by appropriate structuring. A form of this is sometimes called conflict resolution (Charniak and McDermott 1984, Winston 1984, Jackson 1986, Parsaye and Chignell 1988), or the use of metaknowledge (Hayes-Roth et al 1984). When a rule is fired the control structure looks ahead and determines all possible rules that could be fired next. It uses some criterion for deciding which is best to choose. In some systems its purpose is just to speed up solution of the system; in others the choice of one rule will preclude the others even firing; or it may prevent erroneous results by incorrect ordering of the rules. The actual strategy used is very problem dependent, and requires creative design. 7.6 Pruning In the same example we can see that certain input information can be used to automatically disqualify certain conclusions immediately, and the chains leading to them can be blocked. Thus, if ''lays eggs" is true, then all nonbird conclusions can be blocked. This also seems to be the procedure described in Forsyth (1984, p. 192), Hayes-Roth,
Page 108
Waterman and Lenat (1983, p. 71), and Alty and Coombs (1984, p. 137) as generate and test. 7.7 Decomposition This procedure, called abstraction by Alty and Coombs (1984, p. 153) and Hayes-Roth, Waterman and Lenat (1983, p.70), decomposes the problem into a sequence of essentially separate problems, in which the output of one can be the input of the next. An example given is the R1 system (also known as XCON), used by Digital Equipment Corp. to configure a suitable VAX computer arrangement, given the customer's particular requirements. It is decomposed into six tasks (following closely the descriptions of Alty and Coombs); a) to determine if there are any inconsistencies in the configuration; b) to put the correct components in CPU subassemblies; c) to develop the arrangement for the unibus cabinets; d) to put names in the unibus expansion cabinets; e) to design the system and lay out its diagram; f) to design the cabling. No backtracking or searching is required between these tasks. This kind of decomposition comes naturally when
Page 109
applying the engineering method. A more complete description of R1 can be found in several references (McDermott 1982, Smart and Langeland-Knudsen 1986). 7.8 Heuristic Search In the literature of expert systems, heuristic search appears to mean that the search is guided by an objective or evaluation function, essentially similar to heuristic search techniques in optimization theory. In the optimization field, a heuristic algorithm for finding a minimum implies that certain rules are used based on intuitive logic, and the probability of achieving an optimum is much lower than for most rigorous methods. This is not too satisfactory a definition, since the likelihood of achieving even a true local minimum cannot be predicted for any method in optimization, except for linear formulations (Siddall 1982). Although examples are given in the literature of artificial intelligence related to general problem solving production systems (Hayes-Roth, Waterman and Lenat 1983, p. 68), it is difficult to visualize an engineering expert system that would have a meaningful evaluation function; however the possibility should not be ruled out, at least for parts of the system. Some authors consider all of the above search procedures to be heuristic. See for example Pearl (1984).
Page 110
7.9 Use of the User's Judgement It is always desirable to take advantage of the user's judgement if it would contribute to the problem solution, or if the user believes it would. The user may have some intuitive knowledge of the most likely final conclusion. Significant search time might be saved, in this event, if the user can specify backward chaining, starting at his or her choice of final conclusion. If the user is wrong, no real harm is done, unless the system is so large that a control structure is being used that permits a small possibility of missing the correct solution. 7.10 General Production Systems The purpose of this section is not to describe a control system procedure, but rather to make clear the distinction between a general production system and an expert system. Confusion of the two can make reading the literature rather difficult, because control structures are sometimes illustrated by examples from general production systems, which are not applicable to expert systems. Production systems can be considered a generalization of expert systems, in which problem solving systems are represented graphically in network form, and the solution is found by searching through the network, following certain rules, and linked by some control structure. They are most commonly illustrated by the problem of playing some game, or solving some puzzle; and this category of a
Page 111
production system is quite different from an expert system. The rules at nodes in the network are now not logic gates, but rather rules for changing the state of a system. A commonly used example is the eight-puzzle. It is a square tray containing nine cells and eight pieces, numbered from one to eight, thus leaving one cell always empty (Pearl 1984, Hayes-Roth, Waterman and Lenat 1983). This is called a state space system, with the starting and goal states shown in Figure 7.2. The player moves one of three tiles into the empty space, and thus any given state can be transformed into any one of three new possible states. The purpose is to reach the goal state as quickly as possible. There are clearly a very large number of possible moves,
Fig. 7.2 The 8-puzzle, showing only the start of the network, and the goal state.
Page 112
and heuristic rules can be applied that greatly speed up the search of the state space. Backward chaining is used, but the algorithm would be quite different from that used in an expert system. 7.11 Hard Wired Systems and Parallel Processing Research is in progress on the intriguing possibility of creating hard wired expert systems, or more generally, production systems. Robinson (1985) and Togai and Watanabe (1986) have used hard wiring for some aspects of expert systems. Lu, Siddall and Verhaege (1989) have demonstrated an approach in which the logic gate circuits are actually incorporated onto a microchip, so that there is a direct mapping of the system logic on the chip. When input and output interfaces are added, there is a potential for very high speed processing of the control structure, since all rules are simultaneously evaluated. An attractive possibility for doing this is to use erasable programmable logic devices, with which logic gate systems can be created by easily used software on a microcomputer, and embodied on a commercially available chip having room for up to 9000 gates (Electronics 1988). Inputs for the system would be converted to input voltages by the controlling computer; and output voltages converted to conclusions. The software available provides a graphics facility for creating and displaying the logic system. It even can simulate the operation of the chip. It would be desirable, in large
Page 113
systems, to first design the logic system in heirarchical form, and then reduce it to a minimal two level system. The concept of minimal logic circuits is discussed in Chapter 11. Parallel processors are being introduced as the newest generation of computers. They are considered to be very promising for simultaneously processing many chains in expert systems, thus speeding up the process considerably. Speed is vital in real time expert systems, such as in robot applications. 7.12 Selection of a Control Structure As in other aspects of developing expert systems, it is best to start off with a very simple control structure; and as the design develops shape, give consideration to a more complicated one, particularly if speed becomes a problem. The preferred initial choice would thus be the simple, unstructured sequential system with iteration. It is likely that most engineering expert systems would have relatively few levels; and in this situation a breadth first algorithm would be effective. If the user could select a most likely final recommendation for a given application, then depth first search with backward chaining might be desirable. As the system matures, the possibility of taking advantage of structuring, decomposition, and pruning should be considered.
Page 114
Although considerable emphasis is placed, in the artificial intelligence literature, on how very large production system trees tend to be, it is not as likely in engineering expert systems that they would reach the enormous sizes commonly used as examples for general systems. Typical examples quoted are games such as chess, and other problem solving systems. To do an exhaustive search in chess would take essentially infinite time. Engineering systems would likely have relatively few levels in the network, and therefore the control structure problem may not be too serious. Examples in the literature of engineering expert systems rarely seem to pay much attention to this problem. R1 is a fairly large system, having about 2000 rules, but we have seen that it has considerable decomposition, which alleviates the problem. Even the well known systems in other fields do not appear to exceed about 600 rules. On the other hand, if minimum search time is of the essence, as in real time systems such as those used with robots, production processes, and vehicles, then considerable attention must be paid to optimizing the control structure. References Alty, J. L. and Coombs, M. J. (1984). Expert Systems; Concepts and Examples, The National Computing Centre Ltd., Oxford Road, Manchester, England, M1 7ED
Page 115
Charniak, E. and McDermott, D. (1984). Introduction to Artificial Intelligence, Addison-Wesley, Reading, Mass. Day, A.C. (1972). FORTRAN Techniques; with Special Reference to Non-Numerical Applications, Cambridge University Press, Cambridge. Electronics, May 12, 1988, p.61. Forsythe, R. ed. (1984). Expert Systems; Principles and Case Studies, Chapman and Hall, London. Hayes-Roth, F., Waterman, D. A. and Lenat, D. B. eds. (1983). Building Expert Systems, Addison-Wesley, Reading, Mass. Jackson, P. (1986). Introduction to Expert Systems, Addison-Wesley, Reading, Mass. Lu, P., Siddall, J. N. and Verhaeghe, J. (1989). Expert System Built on a Chip, Computers in Engineering 1989, Proc. 1989 ASME Design Automation Conference, Montreal, Canada. McDermott, J. (1982). R1: a Rule Based Configurer of Computer Systems, Artificial Intelligence, Vol. 19, pp. 3988.
Page 116
Nilsson, N. J. (1980). Principles of Artificial Intelligence, Tioga Publishing Co., Palo Alto, California. Parsaye, K. and Chignell, M. (1988). Expert Systems for Experts, Wiley, N.Y. Pear, J. (1985). Heuristics; Intelligent Search Strategies for Computer Problem Solving, Addison-Wesley, Reading, Mass. Robinson, P. (1985). The SUM: an AI Coprocessor, BYTE, June, 1985., pp. 169180. Siddall, J. N. (1982). Optimal Engineering Design; Principles and Applications, Marcel Dekker, N.Y. Smart, G. and Langeland-Knudsen, J. (1986). The CRI Directory of Expert Systems, Learned Information (Europe) Ltd., Oxford. Togai, M. and Watanabe, H. (1986). Expert System on a Chip: an Engine for Real-Time Approximate Reasoning, IEEE Expert, Fall, 1986, pp. 5562. Wagner, J. L. (1980). FORTRAN77: Principles of Programming, Wiley, N.Y.
Page 117
Winston, P. H. (1984). Artificial Intelligence, 2nd ed., Addison-Wesley, Reading, Mass. Suggested Reading Alty, J. L. and Coombs, M. J. (1984). Expert Systems; Concepts and Examples, NCC Publications, The National Computing Centre Ltd., Oxford Road, Manchester England, M1 7ED, pp. 2223,5455, 7482, 9192. Charniak, E., Riesbeck, C. K. and McDermott, D. V. (1980). Artificial Intelligence Programming, Lawrence Erlbaum Associates, Hillsdale, N.J., pp. 257258. Forsyth, R. ed. (1984). Expert Systems; Principles and Case Studies, Chapman and Hall, London, pp. 104105, 124132. Hayes-Roth, F., Waterman, D. A. and Lenat, D. B. eds. (1983). Building Expert Systems, Addison Wesley, Reading, Mass., pp. 6672, 92. Merry, M. ed. (1985). Expert Systems 85, Proceedings of the Fifth Technical Conference of the British Computer Society Specialist Group on Expert Systems, Cambridge University Press, Cambridge, pp. 2130.
Page 118
Nilsson, N. J. (1980). Principles of Artificial Intelligence, Tioga, Palo Alto, California, pp. 53130. Pearl, J. (1984). Heuristics; Intelligent Search Strategies for Computer Problem Solving, Addison-Wesley, Reading, Mass., pp. 3372. Weiss, S. M. and Kulikowski, C. A. (1984). A Practical Guide to Designing Expert Systems. Rowman & Allanheld, Totowa, N.J., pp. 4142, 5051, 92101. Problems 7.1 Write an algorithm for backward chaining when the inputs are known and the conclusions unknown. Use a format similar to the depth first algorithm given in Section 7.3; or recursion or stacks can be used as an alternative.
Page 119
8 Data Structures
Page 121
8.1 Data Structured Control We are concerned in this chapter with the use of data structures in the development of an expert system program. We shall use only relatively simple procedures described in any book on data processing or data structures. See for example, Horowitz and Sahni (1983), or Date (1986). A brief description of the necessary theory for data structures is given in Appendix A. We have given algorithms earlier for two simple examples of expert systems - the animal identification system and the fault determination system. These algorithms could be translated directly and explicitly into programming code, so that the code would look very much like the algorithm, if a procedural language is used. We shall designate this approach to the programming of an expert system as explicit code. This procedure is probably desirable in the initial development stages of an expert system, so that the program designer can easily see exactly what is happening. However, as the program becomes longer and more complex, it is desirable to switch to the use of data structures. We shall call this data structured code. This approach makes it easier to keep track of changes and additions, to include messages, and to permit the incorporation of the features described in Chapter 3, such as finding and displaying a trace of the logic chain that led to the conclusion for a specific application of the expert system. All of the information is contained in arrays, and
Page 122
pointers are used to keep track of the order of items in an array. This facilitates changes and additions to the lists. The following procedure described, for incorporating data structures, is one possible approach, and not necessarily the best for any given system. Developing optimum data structured code for a new expert system is an exercise in design ingenuity. We shall begin by defining in Table 8.1 a set of arrays needed to incorporate the data structure. But first we must define some terms used. Node Arc Record Pointer Size - an input, or a gate (intermediate conclusion) or a final conclusion - a connection between nodes, a line on a network diagram - a record is similar to a line in a file, each field containing an associated array element - an array element used to define the next item in an associated array - number of elements in the array
NODTOT- total number of nodes in the system ARCTOT - total number of arcs in the system INPTOT - total number of inputs in the system RULTOT - total number of rules in the system
Page 123 TABLE 8.1 Arrays for Use with Data Structured Code RECORD DESCRIPTION NAME TYPE Node ASCII description of node DESCR CHARACTER Input prompts or node messages MESSAG CHARACTER Node states 1 - true NODSTA INTEGER 2 - false 3 - unknown Node type 1 - input NODTYP INTEGER 2 - AND gate 3 - OR gate Pointer to next node PNNOD INTEGER Pointer to first input arc PINARC INTEGER Pointer to first output arc POUTAR INTEGER Pointer to next adjacent input arc Arc PINADJ INTEGER 0 - no more Pointer to next adjacent output arc POUTAD INTEGER 0 - no more Arc states 1 - true ARCSTA INTEGER 2 - false 3 - unknown SIZE NODTOT NODTOT NODTOT
NODTOT NODTOT NODTOT NODTOT ARCTOT ARCTOT ARCTOT
Page 124
CONTOT INPFRS RULFRS CONFRS ORUNK arc
- total number of final conclusions - first node that is an input - first node that is a rule - first node that is a final conclusion - flag to show if an OR rule has an unknown input
= 0 , no = 1 , yes We can now write an algorithm that uses data structured code, and a control structure incorporating the sequential procedure with iteration. We shall make the first subset of the node arrays correspond to input items; the second set will be rules; and the third set final conclusions. These can be mixed up in the arrays, as long as the pointers are set correctly. Algorithm Use of Data Structured Code and a Sequential Control Structure with Iteration Arrays with Known Values
DESCR, MESSAG, NODTYP, PNNOD, PINARC, POUTAR, PINADJ, POUTAD
User Input
STATE(I), I = INPFRS to RULFRS-1
Output to User
Page 125 True values of MESSAG(I)
Procedure
1. # Initialize, set all states to default as unknown # DO for I = 1 to NODTOT NODSTA(I) = 3 ENDDO DO for I = 1 to ARCTOT ARCSTA(I) = 3 ENDDO 2. # Read input data values from user # INPCOU = 1 # Initialize count of inputs # I = INPFRS # INPFRS = first input node# 3. WRITE, MESSAG(I) READ, NODSTA(I) # Set state of arcs out of input node # J = POUTAR(I) # POUTAR(I) = first output arc # 4. ARCSTA(J) = NODSTA(I) K = POUTAD(J) IF K = 0 THEN # POUTAD(J) = next output arc # Go to 5 # Go to next input node # ELSE J=K GO TO 4 ENDIF 5. K = PNNOD(I) # Determine next input node # INPCOU = INPCOU + 1 # Increment count of inputs # IF INPCOU > INPTOT THEN # Check if all inputs tested # Go to 6 ENDIF I=K Go to 3
continues
Page 126
continued
6. # Scan all rules and determine states # I = RULFRS # First rule in list # 7. IF NODTYP(I) = 3 THEN # OR rule # ORUNK = 0 J = PINARC(I) # PINARC(I) = first input arc # 8. IF ARCSTA(J) = 1 THEN # Arc has been fired # NODSTA(I) = 1 # Rule fires because it is an OR rule ## Go to 12 ELSEIF ARCSTA(J) = 3 THEN ORUNK = 1 # Rule has an unknown input arc # ENDIF 9. IF PINADJ(J) = 0 THEN # Next input arc does not exist # IF ARCSTA(J) = 2 AND ORUNK = 0 THEN # Previous arc has not fired # NODSTA(I) = 2 # Node state is false # ELSE NODSTAT(I) = 3 # Node is unknown # ENDIF ELSE # Try next arc # K = PINADJ(J) J=K Go to 8 ENDIF
ELSEIF NODTYPE(I) = 2 THEN # AND rule # J = PINARC(I) 10. IF ARCSTA(J) = 1 THEN # Arc has been fired # Go to 11 # Control jumps to 11 # ELSEIF ARCSTA(J) = 2 # Arc has not been fired # NODSTA(I) = 2 # Rule cannot fire # Go to 13 ELSE # Arc is unknown # Go to 13 # Node is unknown #
continues
Page 127
continued
ENDIF # Only gets to here if all previous arcs into AND node have fired # 11. IF PINADJ(J) = 0 THEN # Next input arc does not exist # NODSTA(I) = 1 # Rule is fired # Go to 12 ELSE # Try next adjacent input arc # K = PINADJ(J) J=K Go to 10 ENDIF ELSE # Error - rule is mistaken for input # WRITE, "ERROR - RULE NODE", I, "MISTAKEN FOR INPUT" STOP ENDIF 12. IF POUTAR(I) = 0 AND NODSTA (I) = 1 THEN Go to 16 # Solution !! # ENDIF # Final conclusion #
13. # Set state of arcs out of node # J = POUTAR(I) IF POUTAR(I) = 0 THEN # Final conclusion, no arc out # Go to 15 ENDIF 14. ARCSTA(J) = NODSTA(I) K = POUTAD(J) IF K 0 THEN J=K Go to 14 ENDIF 15. K = PNNOD(I) I=K # All arcs tagged # # Do next rule #
continues
Page 128
continued
Go to 7 16. # Solution # WRITE, MESSAG(I) STOP END
It would be important to have an additional module for adding nodes and arcs, devised so that it is not necessary to edit arrays directly. Instead, the program would use adequate prompts and internal logic to adjust the lists, and also the total numbers and initial locations of nodes, arcs, inputs, intermediate conclusions, and final conclusions. Algorithms for adding and removing items from linked lists are given in Section 2 in the Appendix. We have developed, in this algorithm, the essential parts of a shell. For example, it could be used to program either the animal species system or the fault analysis system, as long as we are satisfied with the built-in control structure. Figure 5.1, showing the animal identification logic net, is repeated in Figure 8.1 to illustrate the data structured algorithm. Although most adjacent inputs and arcs are numbered sequentially, they need not be, because of the use of pointers. Examples of pointers are shown in Figure 8.1. It would be desirable to incorporate in the above algorithm a counter of the number of iterations through all
Page 129
Fig. 8.1 Logic diagram for an expert system to identify certain animals and birds, illustrating data structuring.
Page 130
of the rules, and have an escape statement if some arbitrary maximum number of iterations has been exceeded. The sequential control structure with iteration is relatively simple to incorporate in data structured code. Using other control procedures offers considerably more challenge. 8.2 Frames It should be made clear at the outset of this section that frames are just one way of handling data management in rather large complex systems, and are not necessarily required when developing a system, particularly small and medium size systems. The frame is one of the most difficult concepts in expert systems with which to come to grips. The jargon is unusually fierce, even for artificial intelligence topics. One encounters terms like ''schema", "script", "data dependencies", "situation calculus", "ontology", "property inheritance", "slots", "links", "isa", "aka", and "demons". And like most artificial intelligence ideas, it is essentially simple if one can penetrate this verbal jungle. The difficulty is compounded because it is often presented in the symbolism of predicate calculus or LISP. We shall try to translate the concept into terminology more familiar to engineers. Frames are a device used to structure or decompose expert systems, so as to make them more easy to work with,
Page 131
both by the developer of the system, and the users. The concept is particularly useful in shells, as a means of ennabling the shell user to organize the system as coherently as possible. A frame is also used to represent an item of information that is too complex to have an obvious single feature identification. The item could be any of our three types - inputs, intermediate conclusions, and final conclusions. It is really a kind of data structure, which accounts for its inclusion in this chapter. Frames, contexts, schemas, and objects seem to be essentially interchangeable terms in the literature of expert systems, although some authors may give them somewhat different interpretations. Frames thus have a kind of dual role - to provide an organization for a system of rules, and to provide a structure to the representation of information about some item (object) in the system. Let us first examine the structured representation of information, and consider the example of selecting a material. Each material would have a frame, which would be incorporated in a data structure. The frame would be simply the set of specifications describing the material - yield strength, fatigue strength, ductility, hardness, corrosion resistance, wear resistance, and so on. The frame would also include as a descriptive item the next
Page 132
higher class concept. So that if the material was an aluminum alloy, the next higher concept would be metal. Aluminimum alloy is a kind of ("aka") metal. Lower level concepts could also be included as items, and in our example would be specific aluminum alloys. Each alloy could have a frame, containing more specific information. Each alloy frame would also imply all of the properties of its parent concepts ("property inheritance"). This information is organized in a data structure, using the familiar concepts of records ("frames") and fields ("slots"), chained together with pointers ("links''). A slot would be used for each material property, for the next higher class concept (parent frame), and for the next lower level class concepts (child frames). A frame can occasionally have more than one parent frame, and commonly more than one child frame. A slot containing a variable must also have provision for storing the specific value of the variable. Thus the parameter might be "yield strength" and the value "33,000 psi". One way of using frames as a unit of information in an expert system is as follows. A certain set of material specifications may have been established, resulting from specific inputs, and the control structure would attempt to match the required specifications with a frame containing the same matching set. The rules doing this might look like the following.
Page 133 IF (the specification set is "S1") THEN Search through material frames until a match is found (such as the frame "metal") IF (metal is specified AND another specification set S2 is required) THEN Search through metal frames until a match is found (such as "aluminum alloy")
The second role of frames is to provide an organization for a system of rules. A root frame is established at the top of the frame hierarchy, having a set of subframes, each of which may in turn have associated subframes, and so on. Thus a network of frames can be built up similar to the rule network. Each frame has associated with it a set of rules, thus organizing or decomposing the rule system into more manageable components. In a material selection expert system, the root frame might be "structural material" or "material for a gear". The first level of subframes could be items associated with the gear application, such as "aircraft transmission gear" and "crane hoist gear". This organization could be used to narrow the scope of the system, and eliminate many inputs and rules from consideration. Rule sets may also be decomposed by frames so that a certain segment of the rules can be re-evaluated without affecting the rest of the system. For example there may be a set of rules associated with manufacturability of the gear, assuming certain tools are available. In a given
Page 134
specific application (consultation), the user may wish to rapidly determine the effect of having different kinds of tooling available, without affecting the final conclusions of other subframes. Evaluating a subframe for its conclusions is sometimes called instantiation of the frame. Some authors require that a child frame can inherit the values in a parent frame, but that the parent frame cannot have access to parameter values in its child frames. They also may require that a frame have access to all rules in its descendent frames, but not to rules in its ancestor frames. These requirements can be used to control the dissemination of knowledge through the system. In a frame based system, the control system must also incorporate features for managing the use of frames. Metarules may be associated with controlling the order in which frames are activated (instantiated). A frame would thus also have a slot containing information on activating the frame, which would be inherited from its parent frame. Or it may have control slots for logic that decides on the local search path for its rule set. The frames must incorporate a pointer system for recording what rules are associated with it. Or a slot can be provided for every rule in a frame set. A slot may also define a procedure ("demon") which is automatically called for calculating something. It might be an algorithm, or simply a function, such as the determination of yield strength as a function of hardness.
Page 135
In general, slots can be used for any item of information or control associated with a frame, including character strings for displaying on the monitor. An incomplete example of a parent and child frame is shown in Figure 8.2. It is apparent from this example that the child frame must inherit, or have available, the information in the parent frame. A script is similar in concept to a frame. It is a standardized series of events, identified by standard event descriptions; so that scripts can be compared or matched by comparing the event sequence. The script can have branching, so that alternate outcomes are possible. A script might be used in an expert system for robot control in the following way.
IF (xxxxxxxxxx) THEN The robot will be directed to have sequence of actions (script) A IF (sequence A = standard script number 7) THEN Action yyyyy
The use of frames and scripts is really just common sense; and an engineer would very likely revert to the use of them when appropriate, without realizing that he or she was using an esoteric artificial intelligence concept. 8.3 Blackboards A blackboard is again essentially a tool for working
Page 136 FRAME NAME AISI 1020 Plain Carbon Steel Parent - Plain carbon steel Child 1 - Hot rolled Child 2 - Cold drawn DESCRIPTIVE SLOTS Carbon content - 0.18 to 0.23% Maganese content - 0.30 to 0.60% Sulphur content - 0.05 % maximum KNOWLEDGE STRUCTURE SLOTS First rule in slot - R43 Control logic - . . . . . . . . Demon - . . . . . . . . CHARACTER STRING SLOTS Character string 1 - . . . . . . Character string 2 - . . . . . . . . . . . .. ...... FRAME NAME Hot Rolled AISI 1020 Plain Carbon Steel Parent - AISI 1020 plain carbon steel Child - nil SLOTS Tensile strength - 65,000 psi Yield strength - 43,000 psi Elongation - 36 . . . . . .. First rule in set - R209 Control logic - . . . . . . . Demon - . . . . . . . . Character string 1 - . . . . . . . ....... ....... Fig. 8.2 Example of parent and child frames.
Page 137
with data structures. It is basically a module, which may include a display, that dynamically records such items as the control system currently being used, the next sequence of rules that are to be examined, the current candidate conclusion that is being processed if backward chaining is to be used, any intermediate or final conclusions that have been established, and generally anything that indicates the current state of the system. The blackboard is particulary useful when the the user can interact with the system while it is running. It is questionable if there is any real need for the term "blackboard"; the system designer would automatically do these things when required. 8.4 Interfacing with Commercial Data Management Software Most commercial data management software will accomodate interfacing with user written software. This may be limited to the facility of putting the data files ina standard format, so that they can be read by any high level or assembly language. An example is dBase III or IV. The most recent development in these systems is an attempt to establish a standard command language, called structured query language (SQL)1, which will work with different commercial systems which would otherwise be incompatible. Although this simplifies the interfacing problem, commercial systems have the disadvantage of being somewhat
1A general description of SQL can be found in Tech PC JOURNAL, Vol. 5, No. 12, December, 1987.
Page 138
inflexible, and carrying a lot of features and overhead that may be unnecessary for a custom expert system. The SQL commands can be incorporated as subprogram calls in most high level languages. A rather theoretical approach to integrating data structures into expert systems can be found in Addis (1985). References Addis, T. R. (1985). Designing Knowledge Based Systems, Prentice-Hall, Englewood Cliffs, N.J. Date, C. J. (1986). An Introduction to Database Systems, Vol. 1, 4th ed., Addison-Wesley, Reading, Mass. Horowitz, E. and Sahni, S. (1983). Fundamentals of Data Structures, Computer Science Press, 11 Taft Courst, Rockville, MA20850. Suggested Reading Data Structures Addis, T. R. (1985). Designing Knowledge Based Systems, Prentice-Hall, Englewood Cliffs, N.J., pp. 79112.
Page 139
Hayes-Roth, F., Waterman, D. A. and Lenat, D. B. eds., (1983). Building Expert Systems, Addison-Wesley, Reading, Mass., p. 86. Williams, G. (1981). Tree Searching; Part 1: Basic Techniques, Byte, Sept., p. 86. Frames Alty, J. L. and Coombs, M. J. (1984). Expert Systems; Concepts and Examples, NCC Publications, The National Computing Centre Ltd., Oxford Road, England, M1 7ED, pp. 6771. Charniak, E. and McDermott, D. (1984). Introduction to Artificial Intelligence, Addison-Wesley, Reading, Mass., pp. 393451. Jackson, P. (1986). Introduction to Expert Systems, Addison-Wesley, Reading, Mass., pp. 5660, 142147. Parsaye, K. and Chignell, M. (1988). Expert Systems for Experts, Wiley, N.Y., pp. 161210. Schutzer, D. (1987). Artificial Intelligence; An Applications-Oriented Approach, Van Nostrand Reinhold, N.Y., pp. 2731.
Page 140
Blackboards Hayes-Roth, F., Waterman, D. A. and D. B. Lenat eds., (1983). Building Expert Systems, Addison-Wesley, Reading, Mass., p. 16. Problems 8.1 The purpose of this problem is to gain familiarity with data structured code for an expert system. Write a program using the algorithm given in this chapter, and set up the arrays to represent the animal identification system. However, you may, if you prefer, develop your own data structured algorithm. Add a segment to the program that can be used to add new items to the inputs, rules, and final conclusions. Make it a user's option, when program execution begins, to decide whether to run it as an expert system problem, or in a program development mode.
Page 141
9 Expert Systems Incorporating Uncertainty
Page 143
9.1 Introduction It might well be argued that all expert systems should incorporate uncertainty, since by definition they represent an intuitive decision making process; and all intuitive decisions are subject to uncertainty. However, as in all engineering work, some decisions can be considered, for practical purposes, to be based on certainty. And so it may be reasonable to treat some expert systems as exact decision making procedures. We need some simple notation before proceeding. The probability of an event A occurring, or being ''true", will be represented as P(A) or P[A]. The "AND" combined probability of events A1, A2, and A3 will be shown as
The "OR" combination will have the form
Uncertainty may exist in an expert system rule in two possible ways. Type 1 Uncertainty There is uncertainty about the truth of the initial inputs, or at least some of them. This uncertainty is then propagated through the rules to the final conclusion
Page 144
using some type of probability law for combined events. To illustrate this consider the rule in the animal identification example which says
IF (pointed teeth) AND (claws) AND (forward eyes) THEN [T1]
T1 was a dummy intermediate conclusion. The IF form of the rule would now be replaced by a probability expression.
P[T1] = P[(pointed teeth)(claws)(forward eyes)]
Although we could retain the following IF form; it is not as concise. We shall insert some numbers for illustration.
IF (pointed teeth is true with a probability of 0.70) AND (claws is true with a probability of 0.99) AND (forward eyes is true with a probability of 0.85) THEN [T1 is true with a probability of P(T1)]
We clearly need a combined probability rule to calculate P(T1); and such rules are one of our major concerns in this section. Type 2 Uncertainty The second type of uncertainty exists when there is uncertainty about the validity of the rule in all circumstances, even if the input events are certain true. This implies that there is some additional required
Page 145
evidence that is missing from the input to the rule. It will be important to distinguish between these two kinds of uncertainty. 9.2 Probability Concepts Probability and statistics is one of the most difficult scientific subjects to cope with, because it is based on opposing (and controversial) concepts, and also uses many alternative procedures with no clear criteria for selection of a preferable one. This difficulty complicates the application of uncertainty to expert systems. We shall examine alternative approaches that are commonly used in the literature, but some do not appear to be too applicable to engineering expert systems. Very often in the literature the choice of a probabilistic or statistical procedure is simply dictated by the one that is the most convenient to use. Probability theory is concerned with uncertain events. We cannot, for example, predict in advance if a component will fail within a specified time interval, or whether a resistor will have exactly a specified resistance. Such unpredictable events can be called random events. The most general definition of probability is that it is a measure of our uncertainty about the likelihood of an event occurring; and the convention is that it equals 1 when we are certain that it will occur, and 0 when we are certain
Page 146
that it will not occur. Controversy begins when we begin to try to be more specific about how to define probability. It is common in engineering to think of the probability of an event as the expected relative frequency with which it will occur in the future. Thus if 10,000 "identical" machines are put in service; and the predicted probability of any one machine surviving, without breakdown, up to a specified life of 100,000 hr is 0.908, then it is meaningful to an engineer to say that we can expect about (10,00010,0000.908) or 920 of them to fail in that period. This is a useful way to interpret a probability figure, but not to estimate one. The classical definition of probability is that it is the limit of the ratio of the number of observed occurrences of an event to the total number of trials, as the number of trials increases without limit. A trial is the procedure of observing the possible occurrence of an event, in order to determine if it actually occurs or not. This is commonly called the frequency definition of probability. This is a rather abstract definition, since we never have an infinite number of trials, and rarely even a very large number in engineering work; but some mathematicians go even further and define probability as a set of abstract axioms, and develop the whole theory of probability from this. This may be an exciting approach to a mathematician, but to an engineer it is a nonsense; and there is no need
Page 147
for the two groups to use the same approach. In engineering work the only really plausible definition of probability is that it is a subjective judgement by an individual of the likelihood of an event occurring. It is a measure of a person's degree of belief that an event will occur. This is called subjective probability. Observed relative frequencies can be considered simply as part of the information that an individual uses as a basis for his or her probability judgement. This does not rule out relying completely on observed relative frequencies if no other information is available. An engineer is continually faced with making intuitive risk judgements, most often with little or no frequency of occurrence data to assist in the decision. If this is difficult to accept, how else do we justify the use of factors of safety, which are a crude way of codifying risk of failure, based solely on experience and judgement. We shall briefly review some of the simple probability theorems used for calculating the probabilities of combined events. If we define A as the occurrence of an event, and as the non-occurrence of the event, then
If we have two events A1 and A2 then
Page 148
or
These can be extended to more than two events, and thus we can always convert an AND combination to an OR combination, or vice versa. The general OR combination expression is
This can also be extended to more than two events, but the form becomes somewhat complicated.
If the events are mutually exclusive, then
The general AND combination expression is
Page 149
However equation (9.2.7) only applies if the events are all stochastically independent. Stochastic independence is not a very easy concept to grasp. An event A1 is independent of an event A2 if P(A1) is not affected by the occurrence or non-occurrence of A2. It may be helpful to think of it in frequency terms. Suppose that we have a number of trials, say nt, in which both A1 and A2 are being observed, and n1 and n2 are the number of occurrences of A1 and A2 respectively. Then, if A1 and A2 are independent, the ratio n1/nt should not be significantly different than the ratio n1,2/n2, where n1,2 is the number of trials in which both occur. If they are dependent events then the relative frequency measure of the probabilities would be
where P(A1|A2) represents the probability of A1 occurring, given that A2 has occurred, or the conditional probability of A1. It follows that if the events are independent, then
Now, if A1 and A2 are dependent, then the AND rule for combinations is
Page 150
The general AND rule for n dependent events is the following.
By playing games with these basic relationships, many more elaborate theorems can be derived (Siddall, 1983). We need one more important relationship.
Bayes' theorem is commonly used with expert systems incorporating probability. We equate the two versions of (9.2.11) as follows.
This is one form of Bayes' theorem, and it is just another probability relationship, until the quantities are given a special definition. Bayesian statistics is sometimes considered to imply a completely different definition of
Page 151
probability, and a Bayesian group of mathematicians in probability and statistics consider their methods quite distinct from the classical relative frequency school of thought. See for example Lindley (1965) and Savage (1954). The basic difference is that they accept the idea that probabilities can be subjective, although not in quite as general a sense as our subjective definition given above. The probability P(A1) is called the prior probability of A1, and it is accepted that P(A1) can be given a value subjectively, without support of frequency data. This would violate the strict relative frequency definition of probability, given above. P(A1|A2) is called the posterior probability of A1, and event A2 is defined as new data that becomes available which provides evidence about the occurrence of A1. Thus P(A1) is the probability estimate of A1 prior to knowing A2, and P(A1|A2) is the estimate after A2 is observed or known. In our use of Bayes' theorem it is helpful to think of A1 as a conclusion, and A2 as evidence, or a previous intermediate conclusion. So we shall change our symbols to change A1 to C, and A2 to E. Now Bayes' theorem has the form
So Bayes' theorem can be thought of as a means of updating
Page 152
the estimate of P(C), given new evidence. Conceptually, P(E) may be somewhat difficult to cope with, or it may be difficult to estimate. Another result of playing with basic probability expressions leads to
And now (9.2.15) has the form
It may be easier to estimate P(E|C) and
rather than P(E).
Bayes' theorem only applies when there is uncertainty about the validity of the hypothesis, even if all of the evidence is certain. So it is type 2 uncertainty. We shall later go into more detail on applying Bayes' theorem to expert systems, but it will help to understand the meaning of the various probabilities if we look for a moment at the animal identification system, and imagine there is uncertainty about the validity of the rules, simply because all of the inputs provided are not sufficient evidence to be sure of any conclusion, even though all available evidence is known for certain. When Bayes' theorem is applied to a rule, only one input of evidence or intermediate conclusion is considered at a time; we shall
Page 153
see in Section 9.5 how the attributes are combined. Let us consider Rule 10.
IF [mammal] AND (tawny color) AND [carnivore] AND (black stripes) THEN [tiger]
If we use (9.2.15), and consider E as (black stripes) and C as [tiger], then it has the form
P(C), or P[tiger], is interpreted as the probability that any animal observed will be a tiger, if no evidence at all is available. This can be considered to be the ratio of all tigers observed to all animals observed, over many trials with the system. However, if even this knowledge is not available, it can be taken as the ratio
Thus each animal is considered equally likely to occur, if there is no evidence. This is a common assumption for the prior probability in the use of Bayes' theorem; and it can be argued that it is a weakness of the procedure, since it ignores the possibility of using intuitive judgement. We always know something about the risk associated with the prior event.
Page 154
We may be confused about the meaning of P(E), or P(black stripes). In this context it does not mean that we are uncertain about whether or not the stripes observed are black, or if they are stripes or spots; we assume that we know black stripes for certain when we see them. Rather P(E), when using Bayes' theorem, estimates the fraction of all animals observed that would have black stripes. We must make some kind of estimate of this, either based on observed frequencies or pure judgement. The final one is P((black stripes)|[tiger]), and in this case it must be 1, because all tigers are presumed to have black stripes. However some species have variations which could make this kind of probability less than 1. For example, if ''grey squirrels" (which is a species name) were in the system, P((grey color)|[grey sqirrel]) would be less than 1, because a certain fraction of "grey sqirrels" (as a species name), are actually black rather than grey. In our example, (9.2.17) becomes
Page 155
9.3 Use of Subjective Probability and Probability Laws in Expert Systems In this approach we argue that, in engineering at least, probabilities associated with inputs to an expert system should be considered subjective, in the sense described in section 9.2. The user providing the inputs is not hampered by any need to cling to relative frequencies as the source of these estimates, but rather can call on all of his or her experience and judgement in codifying risk estimates. The user of the expert system is best qualified to make these type 1 risk judgements, since the user is making the observations and reporting on them to the system. It may be appropriate in some systems for the system to provide some advice or guidance. On the other hand, for type 2 uncertainty, related to the probability that a rule will fire even if all input arcs are true, it is the system expert that can and should make the risk estimates; it is part of his or her specialized expertise. It may not be possible to codify this expert risk knowledge except by intuitive decisions; and yet it would be folly to discard this unconscious knowledge because it cannot be supported by extensive controlled trials. Engineers are constantly making decisions on the basis of intuition. The procedure described below is mathematically more rigorous than other methods mentioned in the literature. We can now begin to construct algorithms to propagate
Page 156
these subjective probabilities through the system to the final conclusions. There will not now be a discrete number of unique final conclusion; many or all of them will be candidates, and the choice will be related to their relative probabilities of being true. We shall use the same notation, where C = event of an intermediate or final conclusion associated with a node being true Ei = event of the ith lower level premise associated with a node being true - the premise may be either an input or an intermediate conclusion, E = event of combined Ei's being true = event of combined Ei's not being true P[C|E] = probability of rule firing if the combined premises of the rule are true - value assigned by expert = probability of rule firing if the combined premises of the rule are not true - value assigned by expert P(E) = probability that combined premises are true P(Ei) = probability that the ith premise is true - if the premise is evidential input, then the user assigns the value P[CE] = probability that the conclusion at a node is true and the combined premises are true = probability that the conclusion at a node is
Page 157
true and the combined premises are not true P[C] = probability that the rule fires It should be noted that P[C|E] and are being interpreted quite differently than when used in Bayes' rule; they are considered simply conditional probabilities. Also P(E) is now considered a straightforward uncertainty about the evidence, as demonstrated below. We first invoke equation (9.2.13), converted to the present terminology.
Next we use (9.2.12) for both terms on the right hand side to get the probability that the rule will fire, based on combined type 1 and type 2 uncertainty.
It is conceptually helpful in understanding the meaning of the above quantities if we imagine that we are engaged in a large number of trials, in each of which the inputs are observed to be the same. nt = total number of trials nC = number of trials in which rule fires nE = number of trials in which the premise is true
Page 158
= number of trials in which the premise is not true nC,E = number of trials in which both the premise is true and the rule fires = number of trials in which the rule fires and the premise is not true The probabilities in (9.3.2) must reflect the following frequency ratios.
The expression reduces to an identity. P[C |E] and are estimated for each node by the system expert. We are now faced with the problem of evaluating P(E), the remaining unknown on the right side of (9.3.2). For an AND gate, the expression is
where n is the total number of premises entering the gate. If Ei is a bottom input, then it has been given a value by the user. If it is the intermediate conclusion from a lower node, then it will have been evaluated in a previous calculation; otherwise the current node must be tagged unknown. By using (9.2.12), the general expansion of (9.3.4) is
Page 159
If all Ei's are independent, then this equation reduces to
The full dependence that is implied by (9.3.5) would be rare, and if dependence exists it would more likely be limited to a subset, with a result like the following example, where only E1, E2 and E3 are dependent.
If the node is an OR gate, then the expression for P(E) is
In this case it is more convenient to convert it to an AND combination, using an extension of (9.2.3).
Now
can be treated just like P(E1E2 . . .En) in equation (9.3.5).
Page 160
Dependent probabilities are a considerable complication. If Ei is an input observed by the user, and it is independent, then estimating its associated probability is a rather straightforward exercise of judgement. If it is an observed input dependent on other inputs, say E5|E4, then the user could ask himself or herself - over the total number of times that I might observe a set of inputs, what fraction of E4 observations would I expect to also include E5? The user may be able to make a meaningful judgement of this. Fortunately, more complicated dependencies are likely uncommon. The problem becomes nastier if some or all of the premises to the node are intermediate conclusions, with their roots in dependent observations, or observations shared by other intermediate conclusions. The only current method to handle this appears to be Monte Carlo simulation. We shall discuss this below, but first let us imagine applying the above procedure to our two earlier examples - the animal species identification, and the fault analysis systems. The animal species system is a somewhat difficult example to use for illustration because it is incomplete. However, let us imagine that it also included bats and turtles. "Flies" and "lays eggs" would occur together whenever the object was a bird; and "flies" and "claws" would occur together whenever it was a bat; and "lays eggs'' and "swims expertly" whenever a turtle. There is clearly some dependence involved in some sense. But it does not
Page 161
involve our definition of type 1 uncertainty1; which is concerned only with how likely the user is to distinguish whether, for example, an animal can "fly", "fly well", or ''cannot fly". He or she could be fooled by a flying squirrel or a flying fish, which can glide but not really fly; or the user might identify the flight of a Canada goose as "flies well", whereas the expert may not have put it in this category. So, in the sense that we are applying type 1 uncertainty, all inputs are independent. This would also be true of the fault identification system; and very likely of most engineeering expert systems. Other procedures, discussed below, generally ignore the effect of dependent probabilities. They assume that the effect is negligible, particularly if the only concern is with relative magnitudes of probabilities associated with final conclusions. We can gain insight into the use of type 2 uncertainty by again looking at our examples. In the animal identification system, it would seem possible that a biologist could quote a rule which would share the same premises with another rule, if minor but unobtainable information is required to distinguish them. The biologist might be able to quote a probability that the conclusion would be one rather than the other. Similarly, there may be uncertainty about the following rule from the power plant fault system.
1This distinction is important also in comparing expert system logic networks with reliability fault tree networks. The two networks are discussed further at the end of this section.
Page 162 IF (motor 1 fails) OR (valve 1 closes inadvertently) OR (valve 2 opens inadvertently) THEN [no. 1 motor/valve failure]
There may be some possibility that the no. 1 motor/valve will not fail, even though the premises are true. This would correspond to P[C |E]. It is also possible that no. 1 motor/valve will fail even though the premises are not true, because of some (probably rare) unknown cause. An estimate of this gives . Consider also an example outside the field of engineering. In the well known MYCIN system (Hayes-Roth, Waterman and Lenat 1983, Shortcliffe 1976) for medical diagnosis, there is the following rule.
IF (the infection is primary-bacteremia) AND (the site of the culture is one of the sterile sites) AND (the suspected portal of entry of the organism is the gastro-intestinal tract) THEN [there is suggestive evidence that the identity of the organism is bacteroids with a likelihood2 of 0.7]
Because of the close similarity between expert systems logic networks and reliability fault tree networks, it will be useful to clarify the difference between them. In a reliability fault tree there is normally no type 2 uncertain2MYCIN actually uses "approximate implication" rather than probabilities.
Page 163
ty. The only uncertainty is in the bottom component failures, which is similar to, but not quite the same as type 1 uncertainty in expert systems. In fault trees, the uncertainty is whether the bottom component will fail or not. In expert systems, it is whether the evidence (failure) has been observed correctly or not. Thus, for the system in Figure 9.1, if it is a reliability fault tree, the probability of the top event is given by
Fig. 9.1 Comparison of reliability fault trees and expert system logic networks.
Page 164
The expert system represented by Figure 9.1 would, with no type 2 uncertainty, have the following probability for the top event.
9.4 Monte Carlo Simulation Monte Carlo simulation is the most general procedure for solving any combined probability problem (Siddall, 1983). It does not require the use of equations, only definitions of combined events. For example, P(E1E2 . . .En) can be evaluated without recourse to equation (9.3.7). It can therefore rigorously solve systems containing inputs that are conditional probabilities. The algorithm is quite simple. A sequence of trials are executed that simulate a real life case. The probability of an input occurring will be represented by P(Ei). The inputs are given appropriate assumed probabilities. The algorithm begins, for a trial, with the inputs; each one is simulated by generating a random number uniformly distributed between 0 and 1. Most computer systems have a random number generator in their library; if not, one can easily be written (Tocher, 1963 or Rubinstein,
Page 165
1981). If the random number is less than P(Ei), then it is assumed that the input is true for the current trial. All nodes are then systematically processed, in an order defined by the control structure. The definition of the event that a node fires is applied to the node. E will be true for an AND node if (E1E2 . . .En) is true, and for an OR node if (E1+E2+ . . .+En) is true. We can determine if C is true by using two new random numbers. If E is true and the first number is less than the preassigned value of P[C|E], or E is false and the second number is less than , then for this trial C is true. An iteration will end with one final conclusion being true. This procedure must be repeated thousands of times, giving statistical data on the relative frequency of occurrence of each final conclusion. This, then, yields an estimate of their probabilities. The procedure is given in detail in the following algorithm. It uses a function for random number generation called RAN(NSEED), where the argument, NSEED, is the starting integer for the generator. Algorithm Monte Carlo Simulation System Inputs
NO_INPUTS = number of inputs NO_RULES = number of rules or logic gates
Page 166 P[Cj|E] = probability of jth rule firing if the combined premises are true, j = 1 to NO_RULES = probability of jth rule firing if the combined premises are not true
The control structure is used to search through nodes. Thi would normally be the same control structure used in the actua system; however, it must evaluate all nodes and conclusions. User Inputs
NSEED = seed number for random number generator P(Ei ) = array of type 1 probabilities for inputs NSAMP = sample size
Internal Variables
NODE(I) = "AND" , if ith node is AND type = "OR" , if ith node is OR type INPUT(I) = array of simulated inputs for each simulation trial = 1 , if input is true = 2 , if input is false n E = 1 , if combined Nth node premises fire in the simulation = 0 , if combined Nth node premises do not fire = 2 , Nth node (rule) has not been tested NO_FIRE(N) = counter for number of times the Nth node ha fired, initialize at 0
Output
PN = probability that Nth node (rule) will fire, this includes rules for the final conclusions, whic will be a subset of the array
Page 167
Procedure
1. # Repeat simulation for NO_SAMP trials # Do for to NO_SAMP 2. # Generate input # Do for i=1 to NO_INPUTS # Initialize En to 2 # IF(RAN(NSEED)>P(Ei)) THEN # Simulate firing of inputs # INPUT(I) = .false. ELSE INPUT(I) = .true. ENDIF ENDDO 3. # Using control structure, do an exhaustive test of all nodes # # Current node is Nth node # Check each node in the usual way to see if it would fire or not, if the system were determinate. En = 1 , in a determinate system, node would fire En = 0 , in a determinate system, node would not fire 4. IF(En=1 AND RAN(NSEED)<P[C|E] OR En=0 AND RAN(NSEED)< THEN NO_FIRE(N) = NO_FIRE(N) + 1 ENDIF 5. # Try another node # IF (all nodes are known) THEN Go to 6 ELSE Select next node using control structure Go to 3 ENDIF 6. ENDDO 7. # Calculate probabilities of each node (rule) firing #
Page 168 DO for N=1 to NO_RULES PN = NO_FIRES(N)/NSAMP ENDDO 8. END
This is a bare bones algorithm, which does not take into detailed consideration the possibility in most systems that, at some stage, one or more premises of a rule are undefined (En=2), and the rule is correspondingly undefined. If the expert system is at all large and complicated, it is clear that the Monte Carlo method will consume a large amount of computer time. As a rough rule of thumb, it requires about 10,000 trials in order to estimate probabilities to three significant figures. Nevertheless it may still have important uses. If the system designer is concerned about the significance of conditional probabilities, then a few simulations with some typical inputs will give an indication of the error caused by using the analytical methods, such as the one described in the previous section, when the effect of stochastic dependence is ignored. A second use during the design stage, is to again use a few simulations to confirm that an analytical method has used the correct probability expressions in the rules. 9.5 Bayes' Theorem in Expert Systems Bayes' theorem is used for type 2 uncertainty - uncertainty in the rule. We shall begin with a general AND
Page 169
rule, represented by
IF (E1)(E2)...(En) THEN [C]
Bayes' theorem is applied by first assuming that there is only one premise, E, and it is known to be true or false. Using the version of Bayes' theorem given by equation (9.2.17), we get
We also need , in the event that the premise is not true. It is not independent of P[C|E], and is calculated by an expression analogous to (9.5.1), in which E is replaced by .
We also have the following relationships by the definition of probability.
Page 170
It is important to note that P[C|E] and
have a different meaning here than in Section 9.3. . Equations (9.5.3), (9.5.4) and (9.5.5) are then used
The system expert now assigns values for each node to P[C], P(E|C) and to calculate the remaining unknowns in (9.5.2).
Where E is comprised only of user provided evidence, with no uncertainty in the observation of E, this procedure is rigorous. However, when nodes are at a higher level in the network, where premises are uncertain because they are outputs from lower nodes, a rigorous application of Bayes' theorem fails. Now E is the combined result of C|E from incoming arcs from lower nodes, as indicated in Figure 9.2, where E for node 4 is the result of combining E1, E2 and E3, which are posterior events from lower nodes. This
Fig. 9.2 Transmission of events through a network.
Page 171
implies uncertainty for E coming into a node, which Bayes' theorem cannot handle. One procedure for handling this is a heuristic method used in PROSPECTOR (Duda, Gaschnig and Hart 1979 and Alty and Coombs 1984). The expert assigns probabilities P[C], P(E|C) and to all nodes. The corresponding values of P[C|E] and are calculated as shown above. These values are then ''adjusted" during a given application of the system using an algorithm based on Figure 9.3. P(E) is the probability assigned E by the expert in the usual way for Bayes' theorem. It can be based on the expression
On the other hand, P(P) is the actual propagated probability of the premise, indicated by Figure 9.2. The probability that the rule will fire is now P[C |P]. Using Figure 9.3, we can calculate P[C|P] as follows.
The problem of type 1 uncertainty for input evidence is also somewhat difficult with Bayes' theorem. It is con-
Page 172
Fig. 9.3 Obtaining the probability of the premise event of a node.
ceptually very difficult to separate out the meaning of P(E) as a frequency ratio in the sense in which it is used in Bayes' theorem, and P(E) as a subjective probability, for which the user has uncertainty as to whether he or she has correctly observed the evidence. It becomes even more difficult at intermediate levels, where E is an intermediate conclusion from a lower level in the logic chart, and may combine evidence with lower level conclusions. It has been proposed in the literature [Alty and Coombs (1984) and Forsyth (1984)] that heuristic methods be used incorporating certainty factors. The concept of a certainty factor is introduced to represent the user's degree of belief that the evidence has been correctly observed. It is arbitrarily given the range -5 (certain it is not true) to +5 (certain it is true), with 0 corresponding to "don't know". The graph relating
Page 173
the certainty factor, CF, and the probability that the evidence is true, P(), is shown in Figure 9.4. P(E) is based on the usual Bayes' theorem definition of the probability of the evidence being true, and can also be calculated using (9.5.6). The expressions are
This procedure for handling Type 1 uncertainty is non-probabilistic, and would likely be used by an engineer with some reluctance. The final problem with Bayes' theorem is how to handle combined premises at a gate. Naylor (1984) has suggested a
Fig. 9.4 Converting an input uncertainty to a probability.
Page 174
procedure for AND gates which treats each premise sequentially as additional accumulated evidence. The general AND rule has the form
IF (E1) AND (E2) AND . . . AND (En) THEN [C]
Bayes' theorem is applied by first imagining that only E1 has been observed. Using the version of Bayes' theorem given by equation (9.2.17), we get
C1 is the posterior event after the first updating by E1, and now becomes the prior event for the subsequent updating by E2. The general iterative expression is
And after the last updating,
The expert must provide estimates of all P(Ei|C) and
Page 175
probabilities. This procedure requires that E1, E2, . . .,En independent; and the method cannot handle dependence. It is not clear from the literature how some proponents of Bayes' theorem apply the method with an OR gate. It could be argued that there is really only one item of evidence for updating the prior probability, and that is E, where
Thus the probability of an OR rule firing would be
This is philosophically not too easy to swallow, because of the way that the E's are defined. PROSPECTOR, which applies Bayes' theorem as described earlier, uses a very simple heuristic procedure for both AND and OR gates, which also would be somewhat reluctantly adopted by an engineer. For an AND gate, the minimum premise probability is used; and for an OR gate, the maximum is used. 9.6 Mycin Method The MYCIN method does not seem to be too well adapted
Page 176
to engineering expert systems; however it is mentioned briefly here because it was one of the earliest expert systems developed (Shortcliffe 1976), and is frequently discussed in the literature. It is an application in the field of medical diagnosis. The uncertainty procedure in MYCIN was developed to respond to the claim that, in medical diagnostic systems at least, and inferentially in many others, the expert that is assigning levels of uncertainty in rules is unwilling to accept the following axiom from probability theory.
Thus for an engineering example in fault analysis, if the expert estimated that the probability for a rule was as follows,
P[motor/valve failure being true|motor fails OR valve closes inadvertently OR valve fails to close] = .94
Then the expert would not necessarily assume that the following relationship is correct.
P[motor/valve failure being false|motor fails OR valve closes inadvertently OR valve fails to close] = 1 - .94 = .06
Another example, which is non-diagnostic, is from material
Page 177
selection.
P[material X is best choice|set of specified performance characteristics Y] = 1 - P[material X is not best choice|set of specified performance characteristics Y]
These examples are presented only to illustrate what Shortcliffe claimed might represent inconsistencies, if probability theory were used. They are not examples where an engineer would have difficulty accepting the formalism of probability theory, and indeed it is difficult to formulate such an example. If an engineering expert did encounter such an anomaly, his best approach would be to reexamine the nature of the items of evidence in the rule, in the conviction that the anomaly must be due to missing evidence. Shortcliffe also seems to have developed his method because he believed that conventional probability theory is not valid unless the input probabilities are based strictly on adequate frequency data. We have argued in Section 9.1 that this concern is unjustified if we accept the concept of subjective probabilities; and it was argued further that this concept should be accepted in engineering systems. Certainty factors are assigned by the expert to each rule, and by the user to each item of evidence. Each certainty factor would equal -1 if the expert, or user, is certain the conclusion will not be true, would equal +1 if he or she is certain that the conclusion will be true. It
Page 178
would be set to 0 if it is believed that the ith input has no effect. Heuristically derived expressions are used to propagate the uncertainty through the system. All evidence or intermediate conclusion items are assumed to be applied or observed sequentially, and in this sense the method is sequentially similar to using Bayes' theory. The method is too complex to justify a detailed explanation, and the reader is referred to Shortcliffe (1976). 9.7 Control Structures with Uncertainty When an expert system incorporates uncertainty, it is theoretically necessary to do an exhaustive search of all rules, and then select the final conclusion having the highest probability of being true. This could involve considerably more computer time in order to achieve a result than for a determinate system which incorporates a control structure other than straightforward exhaustive search, such as forward or backward chaining. However, reasonable procedures can be used to improve this situation, which may actually give a better control structure than would be possible with a corresponding determinate system. Once a rule set has been developed, it is useful to do an expertise evaluation by first setting all evidential inputs to a probability of one or zero, while maintaining consistency; and then doing an exhaustive search. Each final conclusion will have its highest possible probability
Page 179
of being true in any application. A sample of different applications would also be required. The system designer then must review these, and decide if the system is sufficiently positive about most conclusions for it to be really considered "expert". If, for example the best possible probability of a conclusion being true was .44, then the system designer should confront the expert with this. If the expert insists that this is too low, and that he is certain that his expert judgements have much higher possible maximum probabilities, then it will be necessary for the expert to review his rule probabilities, and attempt to assign values that lead to more consistent conclusion probabilities. However, in the worst case, the expert may be forced to the realization that his judgements are not as valid as he thought, and the expert system is not feasible. The results of this evaluation can also be used for the control structure. A decision threshold is subjectively established for each final conclusion, which is a percentage of the maximum value determined above. All final conclusions would not necessarily have the same threshold, but a simple common figure could be used, such as "the first conclusion that has a probability exceeding 0.90 times its maximum possible value will be accepted as the true result". In some systems it is desirable to determine the evidence sequentially, and do a probabilistic analysis after each input. Weiss and Kulikowski (1984) have sugges-
Page 180
ted some considerations for ordering questions in this kind of system. 1. Each item of evidence is preassigned a "cost" measure in some sense. It may be an estimate of the relative difficulty of answering the prompt, particularly if it involves doing some tests, or performing a difficult or time consuming observation. The least cost evidence is determined at each stage. 2. If Bayes' theorem is being used, then at each stage there may be one or more intermediate conclusions at an AND type node that have partially established probabilities. It would be reasonable to select the next item of evidence that contributes to the highest ranked of these. 3. Using the threshold probability referred to above, sequential consideration of evidence is terminated as soon as any hypothesis reaches its threshold. Naylor(1984) has suggested a procedure for selecting the order with which bottom nodes are considered, when using sequential evidence. Each node is given a value, defined as follows.
The summation on j is for all rules affected by the ith remaining evidence. P[Cj|Ei] is defined as the probability that the jth node conclusion is true, given that only the ith remaining evidence is known and is true; and
Page 181
is the probability that the jth node conclusion is true, given that only the ith remaining evidence is known and is false. 9.8 Discussion We have not discussed all procedures that have been proposed in the literature, and the reader is referred to Kanal and Lemmer (1986) for a philosophical discussion of the various methods. Two other leading candidates are the use of fuzzy sets (Kandel 1986, Negoita 1985), relative entropy inference (Shore and Johnson 1980), and belief functions. MYCIN is a representative of the latter, but a more general treatment of belief functions is given by Shafer (1981). The difficulty of making a choice among probabilistic methods was referred to in Section 9.1, and these methods have been somewhat arbitrarily left out of consideration as not being most suitable for engineering expert systems. A general basis for them all seems to be an unwillingness to accept the concept of subjective probability, which leads to rather abstract and convoluted approaches in an attempt to circumvent straightforward subjective probability estimates. Bayes' theorem is the most common approach used. It does not permit as direct an expression of uncertainty judgement as does the subjective method of Section 9.3. An engineer, either expert or user, can make subjective estimates for P(Ei), P[C|E], and , which are used in this
Page 182
method, much easier than for those required in Bayes' theorem, P[C], P(E|C), and , unless there is a significant amount of special statistical data available. This kind of data would be unusual in engineering systems. The method of Section 9.3 also avoids the ambiguity of Type 1 uncertainty when using Bayes' theorem, which requires the use of non-rigorous methods. Bayes' theorem also will not accomodate dependent inputs. One of the difficulties associated with the use of shells to create an expert system is the fact that the shell supplier's literature may not make entirely clear what the theoretical basis is for how it handles uncertainty. It is never wise for an engineer to apply theories or procedures blindly. The Monte Carlo method is the most powerful, and is worth consideration if computer time is not too important. There are two additional interesting considerations in probabilistic expert systems. The conclusion with the highest likelihood of occurring is not necessarily the best choice. It also depends on the penalty involved if a conclusion is incorrect in a specific instance. It may be preferable to select a conclusion that is 80% likely to be right, than one at 90%, if the cost of being wrong about the latter choice is much higher. The second consideration is the question of what should be done if an item of information is unavailable, or the delay cost in getting it is very high. In this event a
Page 183
reasonable action would be to assign it a type 1 probability of 0.5. References Alty, J. L. and Coombs, M. J. (1984). Expert Systems; Concepts and Examples, NCC Publications, The National Computing Centre Ltd., Oxford Road, Manchester, England, M1 7ED. Duda, R. O., Gaschnig, J. and Hart, P. E. (1979). Model Design in the PROSPECTOR Consultant Program for Mineral Exploration, Expert Systems in the Microelectronic Age (Michie, D., ed.), Edinburgh University Press, Edinburgh. Forsyth, R., ed. (1984). Expert Systems; Principles and Case Studies, Chapman and Hall, London. Hayes-Roth, F., Waterman, D. A. and Lenat, D. B. (1983). Building Expert Systems, Addison-Wesley, Reading, Mass. Kanal, L. N. and Lemmer, J. F., eds. (1986), Uncertainty in Artificial Intelligence, North-Holland, Amsterdam. Kandel, A. (1986). Fuzzy Mathematical Techniques with Applications, Addison-Wesley, Reading, Mass.
Page 184
Lindley, D. V. (1965). Introduction to Probability and Statistics; from a Bayesian Viewpoint, Cambridge University Press, Cambridge, England. Naylor, C. (1983). Build Your Own Expert System, Sigma, Wilmslow, U.K. Negoita, C.V. (1985). Expert Systems and Fuzzy Systems, Benjamin/Cummings, Menlo Park, California. Rubinstein, R. Y. (1981), Simulation and the Monte Carlo Method, Wiley, N.Y. Savage, L. J. (1954). The Foundation of Statistics, Wiley, N.Y. Shafer, G. (1981). Constructive Probability, Synthese, Vol. 48, pp. 160. Shore, J. E. and Johnson, R. W. (1980). Axiomatic Derivation of the Principle of Maximum Entropy and the Principle of Minimum Cross-Entropy, IEEE Trans. Inform. Theory, Vol. IT-26, pp. 2637. Shortcliffe, E. H. (1976). Computer Based Medical Consultations: MYCIN, Elsevier, N.Y. Siddall, J. N. (1983). Probabilistic Engineering Design;
Page 185
Principles and Applications, Marcel Dekker, N.Y. Tocher, K. D. (1963). The Art of Simulation, The English Universities Press, London. Weiss, S. M. and Kulikowski, C. A. (1984). A Practical Guide to Designing Expert Systems, Rowman and Allanheld, Totowa, N.J. Suggested Reading Hart, A. (1986). Knowledge Acquisition for Expert Systems, Kogan Page, London. Liebowitz, J. (1988). Introduction to Expert Systems, Mitchell Publishing, Inc., Santa Cruz, California, pp.6468. Mamdani, A. (1985). ''Inference Under Uncertainty", in Merry, M. (ed.), Expert Systems 85, Cambridge University Press, Cambridge. Parsaye, K. and Chignell, M. (1988). Expert Systems for Experts, Wiley, N.Y., pp. 211250.
Page 186
Problems 9.1 Develop a probabilistic version of your animal species or fault diagnosis program, using the method of Section 9.3. Insert arbitrary values of the type 2 probabilities associated with each rule. Demonstrate that it runs, using arbitrary values for the type 1 input uncertainties. Do a Monte Carlo simulation of the system, using the same values for the type 1 input uncertainties. Run it for 100, 1000, and 10,000 trials; and for the final trial size, repeat it for a different seed number for the random number generator. Compare the results of the two programs, both in final probability values and running times. Try to use a set of examples that give some insight into probability propagation through a system, such as the effect of one or two "unknowns"; or how high the certainty in inputs must be to get adequate certainty levels in final conclusions; or by doing an "expertise evaluation", in which a consistent set of inputs are made determinate.
Page 187
10 Machine Learning Expert Systems
Page 189
10.1 Introduction In machine learning or self taught expert systems, the system designer must provide the system with a set of inputs, or evidential premises, and a set of final conclusions. It may be possible to formulate rules of the type that we have used above, which employ hierarchic concepts, and are called hierarchic rules, but it is more common to use attribute rules with no intermediate conclusions. The system designer and the expert may be unable to formulate a hierarchical structure. It is also assumed that many examples are available in each of which the state of all inputs is known, and it is known which final conclusion is true. These examples may have been observed, or they may be provided by expert intuition, in which the expert can jump by judgement from an input state to a conclusion. An expert may not even be necessary to determine the nature of the final conclusion in an example. The rules are created in an adaptive learning mode for the system, in which the computer "learns" by experience an appropriate set of rules. The more examples that there are, the more "expert" the system will be. Considerable research work is currently being done on expert systems incorporating machine learning, and on machine learning as a general topic in artificial intelligence.
Page 190
10.2 The Precedent Rule Method The method described in this section is one of the simplest forms of machine learning. All the system developer need do is provide a set of inputs, a set of conclusions, and a set of examples. It is important to realize that, in this method, the rules are defined as attribute rules, which can be set up using only evidential input and final conclusions. If we refer back to the logic network in Figure 5.1 for the animal species system, we can follow different paths to the final conclusions, in which only inputs are used, as shown in the following examples.
IF (hair AND eats meat AND tawny color AND dark spots) THEN [cheetah] IF (hair AND pointed teeth AND claws AND forward eyes AND tawny color AND dark spots) THEN [cheetah] IF (gives milk AND eats meat AND tawny color AND dark spots) THEN [cheetah] IF (gives milk AND pointed teeth AND claws AND forward eyes AND tawny color AND dark spots) THEN [cheetah] ..............
Page 191 IF (long neck AND feathers AND cannot fly AND black AND white) THEN [ostrich] ..............
It should be noted that these rules are all AND rules. It should also be noted in the above rules that we are omitting some of the inputs, because of the expertise built into Figure 5.1 by means of the hierarchic classifications. However we cannot do this in a self learning system because we do not know in advance what inputs are relevant to each rule. Thus, without some additional guidelines, we must create a potential rule for every possible combination of input states. In the animal species example we have seven conclusions, or species, and 20 inputs. The total potential number of rules of this type is
There will be a subset of rules applicable to each conclusion; and some rules will not confirm any conclusion. So for the animal species, the number of possible rules of this type is a very large number. The advantage of using hierarchic rules with class concepts is quite clear. In the learning mode, each example defines a rule; and with enough examples, all active rules would be defined.
Page 192
By an active rule, we mean one that gives a conclusion. As a rule is defined, it is assigned to the set for the corresponding conclusion. We must imagine, in our system, that we have observed in an example the value of all 20 inputs, sometimes called attributes in this method, and we have taken a picture of the animal and shown it to an expert, who immediately says "that is a giraffe". In other systems, it may not even require an expert to identify the classification, and almost any knowledgeable person could do so. So this would establish one rule for a giraffe. It is clear however that many of the premises will be redundant, in the sense that all inputs are not required in order to identify a given animal. However, for this kind of a system, they must all be consistent. A consistent redundant input for the giraffe would be "lays eggs" is false. These variations in this rule would never occur in an example, but we do not know this, and so they must be retained as part of the rule. There are other types of systems where, in some rules, certain attributes are randomly redundant, and the result would be the same even if there were different values for the redundant inputs. In some systems, many or even all rules would contain such redundant premises. The redundant attributes are really "don't care" attributes, in that the value of such attributes could be either true or false, and there would be no inconsistency. The presence of such attributes in rules might be disclosed by segregating rules
Page 193
by conclusions, and generating a histogram for the frequency of occurrence of each attribute being true in a rule. The "don't care" attributes would tend to have 50% relative frequency, whereas significant attributes should have zero or 100% relative frequency. It may be that there would be some correlation between a "don't care" attribute and the conclusion, giving a relative frequency different than 50%; but it may still be identifiable. However, if an attribute is redundant in the sense of those described above for the animal identification system, then there will be 100% correlation, and no filtering is possible by this method. We now imagine that we have processed a large number of examples, and generated a corresponding set of rules. All rules would have to be screened as they are generated, in order to make sure that they do not duplicate a previous result. We can now incorporate a control structure; but we do not have the insight into selecting a type of control structure that we had in the hierarchic type rule system. However the application is similar; we observe an animal, record its attributes, and search for a match between the set of observations and an existing rule. If one cannot be found, it means that no animal of this type has ever been used in an example. It is conceivable that, in some systems, an expert might prefer to set up the rules in this form. The particular nature of these rules lend themselves
Page 194
to the use of a different type of control structure, called bit matching. Each premise for a rule is coded as true or false by setting a corresponding bit of an integer word at either 1 or 0. Thus a 32 bit integer could handle all 20 premises of the animal species system, so a typical rule would be represented as
11101001111000100101 plus 12 additional unused bits set at zero
Each rule could be given a corresponding decimal number. If the rules were put in an array RULE(I), and a given application is put in similar coding in TRIAL, then the rules could be rapidly searched by a DO loop structure. Algorithm for Bit Matching
1. DO for I = 1, nr IF TRIAL = RULE(I) THEN Go to 2 ENDIF ENDO WRITE, "Solution not found" STOP 2. WRITE, "Conclusion is", C(I) END
The array C(I) contains the character string explaining the conclusion for the Ith rule. If there are more premises in the rules than are avai-
Page 195
lable in the maximum integer word size, two or more words can be combined as necessary to represent a rule and a trial case. The testing statement would now be
IF TRIAL1 = RULE1(I) AND TRIAL2 = RULE2(I) THEN
Although it would be possible to scan through a set of rules in order to determine the existence of any premises that are never active, and delete them from the system, this is risky if new examples are continuing to be submitted. The premise might eventually pop up. The system designer must always be on the alert for the existence of new premises which must be added. Attribute rules can also be processed by Boolean techniques, as described in Chapter 11. 10.3 Generation of Rules Using Entropy This method was originated by Quinlan (1983, 1986), and is commonly called the ID3 algorithm. It is the basis for some of the commercial expert system shells that incorporate learning, working autonomously from examples. It could be considered a refinement of the previous method, in which the irrelevant attributes are now stripped out of the attribute rules. We shall first assume that there are no intermediate conclusions, so that the rules have the general form
IF (Wi1L1 AND wi2L2 AND . . . AND wikLi AND . . . AND THEN Conclusion is correct )
Page 196 There are nc conclusions and Lk = kth item of evidence, called a literal, which may be true (Ek) or false ( ) wik = 1 , if kth attribute is present in the ith rule = 0 , otherwise
Alternate rules may be generated for each conclusion, but otherwise no OR rules are possible. Our basic purpose is to determine the wik array. This kind of rule generation is sometimes called machine induction, or inductive inference. We generate a rule by selecting one of the attributes, Ek; the sample is then divided up into two subsets, one containing those for which Ek is true, numbering ek, and the other containing those where it is false. In each subset we observe how many times conclusion Cj is true if Ek is true, cj|Ek, and how many times Cj is true if is true. We shall first assume that the selection of the attribute is arbitrary. The procedure is repeated with a second attribute, first to the one sample subset, and then to the second subset. The examples are subdivided on the basis of attributes in this way until one of the branches has one of the Cj all true. At this point we could discard all remaining attributes from this rule, on the grounds that they cannot affect the result. This rule is now represented by the sequence of divisions in the branch. Other branches are continued in the same way, until a rule is found for each conclusion. If we were lucky enough to perfectly order the selection of attributes, then we would
Page 197
have minimized the processing time required, and also have avoided the burying of irrelevant attributes in the rules. So our next concern is to develop an algorithm for optimally ordering the attribute selection. It may be helpful to visualize the procedure to represent it graphically, as in Figure 10.1. This is called a classification network, and is not the same as the logic gate network used earlier. Not all systems necessarily have rules with redundant attributes. In this case every rule would represent a full branch. When this is so, there is no basic advantage for the ID3 algorithm over the precedent rule method of Section 10.2, except computational efficiency.
Fig. 10.1 Classification network.
Page 198
The ID3 algorithm as described in the literature appears to assume that the attribute which is the basis for the first division must appear as either true or false in all rules generated. However, in many systems, every rule will not contain this primary attribute; and some means is needed to filter it out from the rules in which it is redundant. The same situation can occur with other attributes that are high up in the decision tree. The histogram merthod described in Section 10.2 could be tried. Redundant attributes will be much more apparent if the ID3 algorithm is used. The optimization of the ordering is done using concepts from information theory 1. If the example set is large, it can be thought of as representing a statistical population, containing the random variables Cj (true or false), and the random variables Ek (true or false). We can make an estimate of the probability of the jth conclusion being true by using relative frequencies.
1It is beyond the scope of this book to attempt to explain the concepts of information and entropy. A discussion can be found in Siddall (1983), related primarily to probability theory. The reader is referred also to the many books available on communications theory.
Page 199
where mj = number of occurrences of Cj ns = size of sample of examples Our real concern is the situation that exists at the current division, where we estimate probabilities using frequencies from the subsample that satisfies all literals, L, in the current branch prior to the division. We shall designate this sub-sample . We can estimate the probability of the kth attribute being true in this subset, considering only attributes not already in the current branch.
where = number of occurrences of Ek true in the subset = vector of all literals in the current branch, up to but not including the current division nk = size of sub-sample containing After the kth division, based on Ek, we can estimate the following conditional probabilities.
Page 200
where nk = size of current subset of sample = number of occurrences of Cj true in subset, given that Ek is true = number of occurrences of Cj true in subset, given that Ek is false = event that Cj is true in subset, given that Ek is true = event that Cj is true in subset, given that Ek is false The entropy, or average information, for the population of conclusions is
The entropy for the conclusions in the current subset of examples, given the kth item of evidence being either true or false, is as follows. Only attributes not already in the current branch are included. 2
2It should be noted that in entropy calculations, if P(A) is zero, then the product P(A) log[P(A)] can be taken as zero.
Page 201
The average entropy, or information content, from using the new evidence Ek is
And the gain in information from using Ek is
The items of evidence are now ordered, actually selected at each stage, so as to give the maximum gain in information at the current division. The logarithms in the above expressions are usually presented as base 2, in the literature of information theory. However, the base is actually arbitrary, since we are only concerned with relative values, so the more convenient base e, or base 10, can be used. The process of creating a rule now continues until
or
or until there is no increase in total entropy. We thus
Page 202
continue along a branch, and after each division, we have a trial rule which is tested by the above equations. There is actually no need to calculate Sc, since it remains constant during the generation of a rule. So the criterion is to minimize B(Ek). The concept is illustrated by the example in Figure 10.2, which shows a network segment. The process continues dividing the sample based on a new Ek each time until all of one conclusion in the subset are true and the rest are false. We then have achieved a rule, which, from the figure, is
IF(E1 AND C3 E3 AND E4) THEN
The subscripts will not necessarily be in order. They are shown so in Figure 10.2 for convenience. It is also possible that a branch may be followed to
Fig. 10.2 Segment of a network to illustrate the generation of a rule.
Page 203
the last possible literal, and still no rule found based on (10.3.10) or (10.3.11). This would be quite legitimate, and the algorithm must backtrack one step, or more if necessary, and try a new branch. An attribute can be repeated in different branches, because an attribute may appear in more than one rule. Every possible branch is followed until terminated by a rule; or until the full branch is explored. Algorithm for Generating Rules by Machine Learning Using Entropy Input
Note - all arrays are shown with index subscripts in parentheses. ns = size of teaching sample nc = number of conclusions na = number of attributes CONCL(L,J) = array of values of conclusions in sample = 1 , Jth conclusion in Lth example is true = 0 , otherwise EVID(L,K) = array of values of evidence in sample = 1 , Kth attribute in Lth example is true = 0 , otherwise
Internal Variables In the hope of adding clarity, the caret (^) is used as a separator in mnemonic names for variables. The current rule or part rule is kept in a stack. See the Appendix for a discussion of stacks.
e(K) = number of occurrences of Ek in division subset nk = size of sub-sample containing m(J) = number of occurrences of a conclusion in division subset
Page 204 cE(J,K) = array giving count of number of occurrences of Cj true in the division subset, given that Ek is true cEBAR(J,K) = array giving count of number of occurrences of Cj true in the division subset, given that Ek is false S(J,K) = array of entropy values of Jth conclusion, given Ek is true SBAR(J,K) = array of entropy values of Jth conclusion, given Ek is false B(K) = array of average entropy values from using evidence Ek PROBE(K) = array containing probability that Kth attribute is true, in the division subset PROBC(J,K) = array containing probability that Cj true, given Ek true, in the division subset PROBCBAR(J,K) = array containing probability that Cj true, given Ek false, in the division subset ENTR(K) = array containing entropies Sk ENTRBAR(K) = array containing entropies STACK(M) = attribute in Mth position in stack defining branch SIGN(M) = 1 , attribute in Mth position in stack is true = 0 , attribute is false STACK^SIZE = number of attributes in current branch LAST = last attribute in stack START = 1 , procedure is at start = 0 , otherwise FLAG = 0 , if search is in branch where the first attribute is true (left branch in Figure 10.1) = 1 , if search is in branch where the first attribute is false FINISH = 1 , all rules are found = 0 , otherwise INCLUDE(L) = 1, if example is a member of current subset = 0, otherwise OK(K) = 1, if Lk is a candidate for division = 0, otherwise
Page 205
Output
w(I,K) = array defining rules = 11 , if Kth attribute is present in Ith rule, and is true = 10 , if Kth attribute is present in Ith rule, and is false = 0 , otherwise RULE^CONC(I) = number of conclusion in Ith rule
Procedure
1. # Initialize # I = 1 # Rule counter # START = 1 2. # Iterate until all possible rules found. Each loop creates a rule # DO UNTIL (FINISH=.true.) # Loop to step 15 # # See step 13 for calculation of finish # 3. # Start a division # # Count frequencies at division # 4. # Zero all counters and entropies # nk = 0 DO for K=1 to na e(K) = 0 DO for J=1 to nc m(J) = 0 cE(J,K) = 0 cEBAR(J,K) = 0 ENTR(K) = 0 ENTRBAR(K) = 0 ENDDO ENDDO 5. # Count frequencies # DO for L=1 to ns # Loop on sample # # First determine if example is a member of sub-sample. To qualify, example must contain all members of
continues
Page 206
continued
stack. At start, sub-sample corresponds to original sample # IF (START=1) THEN nk = ns INCLUDE(L) = 1 ELSE DO for M=1 to STACK^SIZE # Loop on stack # IF (SIGN(M)=EVID(L,STACK(M)) THEN INCLUDE(L) = 1 # Example is included # ELSE INCLUDE(L) = 0 Go to 6 ENDIF ENDDO IF (INCLUDE(L) = 1 THEN nk = nk + 1 ENDIF ENDIF # Count each Ek in sub-example, which is not in stack # IF (INCLUDE(L)=1) THEN DO for K=1 to na # Loop on attributes # IF (START=1) THEN Go to 7 ENDIF DO for M=1 to STACK^SIZE IF (STACK(M)=K) THEN OK(K) = 0 # Ek not counted # Go to 8 ENDIF ENDDO OK(K) = 1 IF (EVID(L,K)=1) THEN e(K) = e(K) + 1 ENDIF # Count frequencies for conclusions # DO for J=1 to nc # Loop on conclusions #
6.
7.
continues
Page 207
continued
# Use only attributes not yet in branch # IF (CONCL(L,J)=1) THEN IF (EVID(L,K)=1) THEN cE(J,K) = cE(J,K) + 1 ELSEIF (EVID(L,K)=0) THEN cEBAR(J,K) = cEBAR(J,K) + 1 ENDIF ENDIF ENDDO # End of loop on conclusions # ENDDO # End of loop on attributes # ENDIF ENDDO # End of loop on sample # START = 0 # Calculate probabilities at division # DO for K=1 to na # Loop on attributes # IF (OK(K)=1) THEN # Ek is a candidate # DO for J=1 to nc # Loop on conclusions #
8.
9.
10.
# Calculate entropies, using (10.3.6) and (10.3.7) # ENTR(K) = ENTR(K) + PROBC(J,K) * LOG(PROBC(J,K)) ENTRBAR(K) = ENTRBAR(K) + PROBCBAR(J,K) * LOG(PROBCBAR(J,K)) ENDDO # End of loop on conclusions # B(K) = PROBE(K) * ENTR(K) + (1-PROBE(K)) * ENTRBAR(K) ENDIF ENDDO # End of loop on attributes #
continues
Page 208
continued
11. # Find attribute with lowest B(K), using procedure MIN # # MIN finds minimum B(K) in array B and returns in LAST the subscript for lowest value # CALL MIN(B,LAST) 12. # Add LAST to stack # STACK^SIZE = STACK^SIZE + 1 STACK(STACK^SIZE) = LAST # Test for completion of rule # IF (PROBC(J,LAST)=1) THEN # Rule found # SIGN(LAST) = 1 STACK^SIZE = STACK^SIZE + 1 # Call sub-procedure for defining a rule # CALL RULE(I,LAST,SIGN(LAST),STACK^SIZE,w, RULE^CONC) # Set FLAG # IF (Start=1) THEN FLAG = 0 ENDIF # Start new rule # # Since last attribute in previous rule is true, switch to false branch # # Push false attribute into stack # IF (STACK^SIZE=1 AND FLAG=0) THEN FLAG = 1 ENDIF SIGN(STACK^SIZE) = 0 Go to 3 ENDIF IF (PROBCBAR(LAST)=1) THEN # Rule found # # Push new attribute into stack # SIGN(LAST) = 0 CALL RULE(I,LAST,SIGN(LAST),STACK^SIZE,w,RULE^CONC) # Set FLAG # IF (START=1) THEN FLAG = 1 ENDIF
continues
Page 209
continued
# Start new rule # # Backtrack until true attribute found # STACK^SIZE = STACK^SIZE - 1 IF (STACK^SIZE=0 AND FLAG=1) THEN FINISH = .true. # All branches explored # Go to 16 ENDIF IF (SIGN(STACK^SIZE)=0) THEN Go to 13 ENDIF SIGN(STACK^SIZE) = 0 Go to 3 ENDIF # Rule not found, continue developing branch # # First check if branch completed # IF (STACK^SIZE=na) THEN # Branch complete # # Must backtrack one step in stack before starting a new branch # STACK^SIZE = STACK^SIZE - 1 LAST = STACK(STACK^SIZE) IF (SIGN(LAST)=1) THEN # Push false attribute into stack # SIGN(LAST) = 0 ELSE # Backtrack until true attribute found # STACK^SIZE = STACK^SIZE - 1 IF (STACK^SIZE=0 AND FLAG= 1 ) THEN FINISH = .true. # All branches explored # Go to 16 ENDIF IF (SIGN(STACK^SIZE=0) THEN Go to 14 ENDIF SIGN( STACK-SIZE) = 0 Go to 3 ENDIF
13.
14.
continues
Page 210
continued
ELSE SIGN(LAST) = 1 # Always follow left (true) branch # # Select next division # Go to 3 ENDIF 15. ENDDO 16. Output results 17. END
Sub-Procedure RULE(I,LAST,SIGN(LAST),STACK^SIZE,w,RULE^CONC) This procedure creates a rule from the contents of the stack.
1. # Define rule # I=I+1 2. # Put stack into new rule # DO for K=1 to na DO for M=1 to STACK^SIZE IF (STACK(M)=K) THEN IF (SIGN(M)=1) THEN w(I,K) = 11 ELSE w(I,K) = 10 ENDIF ELSE w(I,K) = 0 ENDIF ENDDO ENDDO 3. RETURN 4. END
If the teaching sample is so large that there is a problem keeping it in central memory, then Quinlan (1983) has suggested a modification to the above algorithm, described in the following brief algorithm.
Page 211
Algorithm for Extension to ID3 1. Select at random a subset of the sample, called a window, small enough to be handled in central memory. 2. Form a rule using ID3 based on the window. 3. Scan through the rest of the original example in order to locate exceptions to the rule. 4. Form a new window from the current window and the above exceptions. 5. Redetermine the rule using the enlarged window. 6. Repeat for the rule until there are no more exceptions. 7. Go on to next rule using ID3. The training example sometimes must be rather large, perhaps in the thousands, in order to ''catch" all possible rules; every possible application must be covered by an example in order to be sure of a perfect system. However this is not true in relatively small systems, as illustrated in the example given below. There also must be no "noise" in the examples. This noise is essentially our type 1 and type 2 uncertainties. In the first type, errors have been made in observing the attributes in the examples. And in the second type, either errors have been made in observing the conclusions, or the system itself inherently has inconsistencies. An example of manifestation of noise is when two examples have the same set of attributes, but a different conclusion. The presence of noise can mean that the branches (rules) become very long, or no rules may be found at all because of inconsistencies. One simple heuristic way to cope with the
Page 212
problem is to terminate a branch at some preset probability for
or
less than one, say 0.95.
Quinlan (1986) has suggested other procedures for coping with noise. If two rules are generated with the same set of attributes, but having different conclusions, then one way of deciding on the appropriate rule would be to pick the one having the most examples. Or, alternately, neither conclusion would be ruled out; each would be assigned a probability of being correct, equal to the ratio of its number of successes over the combined total number of examples in which the duplicate rules occur. If a branch does not terminate due to irrelevant attributes not being filtered out, then Quinlan has suggested a rule based a statistical determination of whether a suspected attribute is really independent of the conclusion. Aside from questions of noise, the method cannot be guaranteed to work, and may need considerable tuning. Problems may be due to an inappropriate or incomplete choice of attributes or conclusions. And if the sample is not sufficiently large or representative, the resulting rules may not encompass all possible or eventual applications; a conclusion may not even have a rule generated for it. 10.4 Multiple Valued Attributes It is not difficult in principle to extend the ID3 algorithm to accommodate attributes having multiple values,
Page 213
rather than just true or false. These could be intervals assigned to a continuously varying quantity, such as ambient temperature. Such an attribute would be represented now by a two dimensional array, Ekp, where the index p can range from 1 to qk. The upper limit qk must be a function of k, since every multiple valued variable could have a different number of values or states. The key equations can be quite easily modified to handle this situation. Equations (10.3.6) and (10.3.7) become one expression.
Equation (10.3.8) becomes
The algorithm can also be easily modified to accomodate multiple valued attributes. 10.5 Example This simple hypothetical example for material selection illustrates the concepts. It uses two multiple valued attributes and three binary attributes. The binary
Page 214
attributes are not logical variables, but have similar characteristics. The attributes operating specifications for the material to be used; and their possible values are as follows. Number of cycles Acceptable number of cracks Operating temperature Maintenance Environment - high, low - 0, 1, 2, 3 - low, medium, high - good, poor - good, poor
There are three candidate materials, simply designated by number - 1, 2 or 3. The sample is shown in the Table 10.1. After processing by the ID3 algorithm, the rules are found to be as follows. 1. IF (E21) 2. IF (E22 AND E31) 3. IF (E22 AND E32 AND E51) 4. IF (E22 AND E32 AND E52) 5. IF (E22 AND E33) 6. IF (E23) 7. IF (E24 AND E31) 8. IF (E24 AND E32 AND E41) 9. IF (E24 AND E33) THEN [C1] THEN [C2] THEN [C2] THEN [C1] THEN [C1] THEN [C2] THEN [C3] THEN [C3] THEN [C2]
Page 215 Table 10.1 Data Sample for Example SPECIFICATIONS Number Operating Acceptable Number of of TemperaCracks Cycles ture E1p E2p E3p High 1 Medium Low 3 High High 1 Medium Low 1 Medium Low 0 High High 0 Medium Low 2 Medium Low 1 Medium High 0 Medium Low 1 High Low 3 Low High 3 High Low 3 Low High 1 High High 1 Low High 1 High High 1 Low Low 3 Low High 1 Low Low 2 High Low 3 Low Low 1 High High 0 Low Low 1 High Low 3 Medium Low 3 Medium Low 1 Low Low 3 High Low 1 Low High 3 Medium High 2 Low Low 2 High CONCLUSIONS Maint.Envir. 1 E4p E5p Poor Poor Good Poor Good Poor Poor Poor Poor Poor Poor Poor Good Good Good Poor Good Good Poor Poor Good Good Poor Poor Good Good Good Poor Good Poor Poor Good Good Poor Poor Poor Poor Good Poor Poor Poor Poor Good Good Poor Poor Poor Good Good Poor Good Poor Good Good Poor Good Good Good Good Good Good Good Poor Poor 2 3
Sample Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
Page 216
It is apparent that some attributes are redundant in some rules. And the analysis discloses that the environment has no effect on any of the rules. It should be dropped with caution however, because more examples may disclose that it can be a factor in a new rule not yet found. 10.6 Discussion Expert systems based on machine learning may be useful in circumstances where it is not possible to formulate rules based on expertise, because of the complexity of the problem. In these systems, the results are unpredictable, or nearly so, even for an expert. Weather forecasting is commonly used as an example, where certain current weather features, such as temperature, humidity, wind speed and direction, presence of rain or clouds, and so on, are treated as the attributes, and forecasted conditions for the next day are the conclusions. The sample of examples must be large, and there may be problems with noise, and incorrect choice of the attribute set. Naylor (1983, 1984) has described procedures for an expert system that can use examples to teach a system to reach conclusions, using essentially only one AND rule containing all of the attributes. The black box decides in each application which conclusion is most likely true. It is unlikely that it is possible to extend the ID3 algorithm to handle a multilevel system, containing inter-
Page 217
mediate conclusions. If the hierarchical structure can be well defined, then each intermediate conclusion could be treated as a separate system for use in ID3. The "attributes" for an intermediate conclusion would now include all bottom level inputs and all lower level intermediate conclusions. The problem is that items of evidence that should not feed directly into an intermediate or final conclusion, would not be independent of that conclusion, and would not therefore be filtered out. A general conceptual description of a number of machine learning methods, including ID3, can be found in Forsyth and Rada (1986). Papers describing research work in machine learning are available in Michalski, Carbonell and Mitchell (1983, 1986). References Forsyth, R. and Rada, R. (1986). Machine Learning Applications in Expert Systems and Information Retrieval, Ellis Horwood, Chichester, U.K. Michalski, R. S., Carbonell, J. G. and Mitchell, T. M., eds. (1983). Machine Learning; an Artificial Intelligence Approach, Tioga Publishing Company, Palo Alto, California.
Page 218
Michalski, R. S., Carbonell, J. G. and Mitchell, T. M., eds. (1986). Machine Learning; an Artificial Intelligence Approach, Vol. II, Morgan Kaufmann Publishers, Inc., 95 First Street, Los Altos, California. Naylor, C. (1983). Build Your Own Expert System, Sigma Press, Wilmslow, U.K. Naylor, C. (1984). "How to Build an Inferencing Engine", in Forsyth, R. (ed.), Expert Systems; Principles and Case Studies, Chapman and Hall, London. Quinlan, J. R. (1983). Learning Efficient Classification Procedures and Their Application to Chess End Games, Machine Learning; an Artificial Intelligence Approach (Michalski, R. S., Carbonell, J. G. and Mitchell, T. M., eds.), Tioga Publishing Company, Palo Alto, California. Quinlan, J. R. (1986). The Effect of Noise on Concept Learning, Machine Learning; an Artificial Intelligence Approach, Vol. II (Michalski, R. S., Carbonell, J. G. and Mitchell, T. M., eds.), , Morgan Kaufmann Publishers, Inc., 95 First Street, Los Altos, California. Siddall, J. N. (1983). Probabilistic Engineering Design; Principles and Applications, Marcel Dekker, New York.
Page 219
Reading Hart, A. (1986). Knowledge Acquisition for Expert Systems, Kogan Page, London, pp. 109132. Problems 10.1 Draw a logic diagram for all of the attribute rules for tiger and cheetah, from the animal identification system. 10.2 Write a computer program for the ID3 algorithm in Section 10.3. Test your program by using the data for the example in Section 10.5.
Page 221
11 Boolean Algebra
Page 223
11.1 Introduction Boolean algebra is a subject in mathematics, and basic information on it is available from several types of textbooks. The more ''pure mathematics" type of book is typically very abstract, rigorous, and difficult to read. See for example Rudeanu (1974). The more applied books, such as Hohn (1966) are more suitable for use by an engineer to become familiar with the basic concepts. A third approach is through switching theory (Hill and Peterson, 1974 and Muroga, 1979). Boolean algebra is actually a rather simple subject, essentially based on common sense logic, codified for easy manipulation. The main difficulty is becoming familiar with the terminology used; and this difficulty is compounded by the fact that different authors may use different terminology. Boolean algebra is also essentially similar to set theory, the concept of events in probability theory, mathematical logic theory or propositional calculus, and the algebra of 01 mathematical programming. All of these use different terminology, except set theory and probability theory. The basic concept running through all of these related subjects is that the variables that are used have only two states, true or false. And usually 1 is considered equivalent to "true", and 0 to "false". In switching theory the states are physically a finite voltage, say 10 v, and zero
Page 224
voltage level, although they are still represented by 1 and 0, when Boolean algebra is used. The concept of logic gates, physically embodied in switching devices, is also useful as a representation in other applications of Boolean algebra and set theory. Examples are expert systems and reliability fault trees. The algorithmic structure IF/THEN, common in high level computer languages, is a language codification of Boolean expressions and logic gates. For example
IF (A=.TRUE. AND C=.TRUE.) THEN G=.TRUE. ENDIF
The Boolean algebra codification is
and the logic gate is shown in Fig. 1. We thus see that it should be possible to convert a set of expert system rules to a system of Boolean expressions. And in many cases it is possible to achieve a con-
Fig. 11.1 AND type logic gate.
Page 225
siderable simplification of the rule system by manipulating the Boolean expressions. 11.2 Definitions and Postulates In Boolean algebra there are four operators, shown in Table 1. The first symbols for OR and AND, + and ., are commonly used in Boolean algebra; the second symbols are commonly used in logic theory; and the third set in set theory and probability theory. However this is not always the case. We shall use the first symbol in each row, but usually the period in the conjunction will be omitted. The OR symbol is strictly defined as the inclusive OR, and
really means that C is true if A is true, or B is true, or both are true. This is the most common situation, and the
TABLE 1 Boolean Operators PURPOSE EXAMPLE +, , OR, union A+B, A B, A B ., , AND, conjunction or intersection A.B or AB, A B, A B = equivalent to (A+B) C = D overbar, prime or acute complement = 1 - A, = 1 - A Symbol
Page 226
only one encompassed by Boolean algebra. However in logic or truth functions it is also possible to have an exclusive OR,
which means that C is true if A is true or B is true, but both cannot be true. We are using the equal sign to represent equivalence.
More carefully, this should be defined to mean A is true if and only if B is true. This means that the reverse is also always true, and A can be substituted for B in an expression, or vice versa. This is the IF type statement represented in Boolean logic by the equal sign. Some authors use . However logic relationships may not always have strict equivalence. The expression
means that if A is true then B must be true, but does not imply that A will be necessarily true if B is true. Only if both
are applicable can we conclude that
Where the distinction is important, we shall use IFF to represent "if and only if". An illustration of this distinction occurs in the following engineering example.
which represents the truth or logic statement
Page 227
"Use bearing type 1 if specifications 1, 2 and 3 apply to the design of a machine" However it could also be possible that B1 could be selected based on a different set of specifications.
and all of the first three specifications need not necessarily be true. These statements could be represented by the Boolean expressions
This simply indicates that even if B1 is 1, the left hand side can be 0 or 1, but if the left hand side is 1, then B1 must be 1. However they can be combined into a Boolean equivalence expression.
It is important to realize that this cannot be decomposed into the following Boolean IFF type statements.
Engineers will likely be most comfortable with the notation and terminology used in books on switching theory, such as Hill and Peterson (1974). The only constants that appear in our operations with Boolean algebra will be 0 and 1. Boolean algebra is based1 on the following properties, or postulates. It can be shown that these are not actually all independent, and all can be derived from a smaller subset.
1Other axiomatic approaches are also commonly used.
Page 228
These expressions, and in fact any Boolean relationship, can be verified by using a truth table. All possible combinations of values of all variables are itemized, and the corresponding values for each term. The values of the left side must equal the corresponding values of the right side for all states. The truth of equation (11.2.10), for example, is verified by the truth table in Table 2. In terms of common algebra, if A and B are both 1, then
Page 229 VARIABLES A B C T T T T T F T F T T F F F T T F T F F F T F F F Table 2 Truth Table for Equation (11.2.10) LEFT SIDE BC A+BC A+B A+C T T T T F T T T F T T T F T T T T T T T F F T F F F F T F F F F RIGHT SIDE (A+B) (A+C) T T T T T F F F
whereas in Boolean algebra,
An alternative way of expressing this is
A union of conjunctions is called a disjunctive form, or Boolean polynomial, or sum of products, or general Boolean function, with the general form
where Eij represents the jth variable in the ith term. An
Page 230
example is
A literal is defined as a variable or its complement, such as A, or 11.3 Manipulation of Boolean Functions
, and D or , and so on.
It is intuitively obvious that any Boolean function can be converted to a sum of products of literals, using the postulates. Consider the following examples.
The procedure is to use De Morgan's laws if there are multiple variables complemented, such as distributive laws (11.2.10) and (11.2.11)
, and otherwise use the
Page 231
plus (11.2.6) and (11.2.7). The conversion process is not necessarily unique. If we note in the first example that
Then the first example can be converted to the form
When every variable appears as a literal in every term, as above, then the terms are called canonic products, standard products, or minterms. A function of this type is called a canonic function or standard sum of products, or normal sum of products. In general, if there are n variables, then the total possible number of minterms is 2n. When there are four variables, we have 16 combinations.
This represents all possible combinations of literal events.
Page 232 TABLE 3 Binary and Decimal Eqivalent of Minterms Decimal code Variables A B C D 0 0 0 0 0 1 0 0 0 1 2 0 0 1 0 3 0 0 1 1 4 0 1 0 0 5 0 1 0 1 6 0 1 1 0 7 0 1 1 1 8 1 0 0 0 9 1 0 0 1 10 1 0 1 0 11 1 0 1 1 12 1 1 0 0 13 1 1 0 1 14 1 1 1 0 15 1 1 1 1
Minterms are more easily generated and represented by representing them as binary numbers, with any variable represented by 1 and its complement by 0. Table 3 illustrates this for the above set. It is apparent that a minterm can also be conveniently represented by the decimal equivalent of its binary number. This is illustrated in the following example.
or
Page 233
A simple rule can be used to convert any Boolean function into canonic form. Convert the expression to a sum of products of literals. For each product, scan through the set of minterms and add to the new expression any minterm that contains the product. Consider the four variable case A,B,C,D For the term ABC, we can scan through the list of minterms, and find that ABC is contained in the minterms ABCD and clear that if either of these is true, then ABC is true. Thus ABC can be replaced by . It is
We can avoid scanning all minterms by generalizing the procedure indicated in the above example. Convert the expression to a sum of products of literals. Scan each product in turn in order to determine the missing variables. For each missing variable V, include in the product the quantity . Expand each term into products. Thus using the same example, we replace ABC by
It is intuitively obvious that the sum of all minterms is one, since they represent all possible mutually exclusive combinations of events.
Page 234
There is a corresponding dual to the sum of products, in which an expression is given the form of the product of sums of a full set of literals. The conversion procedure is also the dual, as follows. Convert the expression to a product of sums of literals. Scan each sum in turn in order to determine the missing variables. For each missing variable V, include in the sum the quantity . Expand into a product of sums, using (11.2.10). Any expression can be converted to a product of sums of literals by first using De Morgan's laws if there are multiple variables complemented, then using the distributive law (11.2.10), and finally applying (11.2.1) to remove forms such as . Our previous two examples are converted to this form as follows.
Page 235
A function already in the form of a sum of products could be converted to a product of sums by using (11.2.10) repeatedly. A function such as the following, can be converted to a product of sums by first using (11.2.2).
A sum, in a product of sums, that contains a literal representing every variable is called a maxterm, and a sum of products which are all maxterms is called a standard product of sums. 11.4 Simplification of Boolean Functions We have seen that Boolean functions can be manipulated and changed into many different forms, all of which are equivalent. Our goal here is to discover the simplest equivalent form of a given Boolean function. To begin our study we require the concept of an implicant. Any conjunction, or product of literals, which is either explicitly or implicitly contained in a Boolean function, and that implies that the function has a value of one if the product has a value of one, is called an implicant. Every term of a sum of products is automatically an implicant.
Page 236
A set of prime implicants has no implicants that contain another implicant. The following example is shown as a sum of products, and then as a normal sum of products.
All terms in the second form are implicants, but only and C are prime implicants. We would intuitively expect that the set of prime implicants would give a very simple form, whereas the minterms represent the longest form. It turns out that there can be redundant prime implicants, so our final goal is to determine the smallest number of these. We would anticipate that the representation of a set of expert system rules by a minimal set of prime implicants would be highly desirable in some applications. An early numerical procedure for finding the prime implicants is due to Quine (1952). Although more recent methods have supplanted the algorithm, it does give insight into the concepts involved. The algorithm is as follows. Algorithm for Finding Prime Implicants 1. Convert the function to a normal sum of products. 2. Convert the minterms to binary form. 3. Scan the terms and find all possible combinations of pairs that differ only by one digit. 4. IF pairs exist THEN
Page 237
5. Combine the pairs into a new product that contains all literals except the common one, or in terms of binary numbers, all digits except the common one, which is replaced by an x. Scan all terms and eliminate any terms that contain another term. Go to (3). 6. ELSE Convert remaining binary numbers to products of literals, omitting any variable represented by an x. Report these products as a set of prime implicants. ENDIF 7. END When two products are coalesced in the above algorithm, it is equivalent to the simple operation shown in the following example.
We shall see that all prime implicants may not be necessary to represent the original function, and our purpose now is to develop an algorithm for discovering an optimal minimal set. A minimal set will commonly include the essential prime implicants, defined by the algorithm below. It will be useful to first define the concept of covering. A product of literals is said to cover a larger different product if the smaller is contained in the
Page 238
larger. The larger is also said to subsume the smaller. It is apparent that if the covering or subsumed conjunction is true, then the covered or subsuming conjunction is redundant. This can also be demonstrated by using minterms. Suppose the function variables are A,B,C,D,E. Then
The minterm sum for ABC includes those for ABCD, so the latter is unnecessary. We can write the function in terms of the prime implicants.
where P i = the ith prime implicant. We can also write each prime implicant in terms of its covered minterms.
where Mjjth minterm of the previously determined set for =the function 1, if the jth minterm is present in the ith prime Wij = implicant =0, otherwise
Page 239
If we combined the above two equations we would have
There will commonly be some duplication of minterms in the above expression, if we were to actually write it out. So we actually only need a set of prime implicants that just covers all of the minterms in the set; their sum is sometimes called an irredundant sum-ofproducts. These can be determined in the following algorithm. Algorithm for Determining the Minimal Prime Implicants Using Minterms Input
Pi = set of prime implicants, in binary code Mj = set of minterms representing function, in binary code (not strictly necessary) ij W = matrix indicating presence of the jth minterm in the ith prime implicant n = number of prime implicants m = number of minterms
Procedure
1. # Initialize # DO for j = 1 to m Nprimej = 0 # Nprimej is a counter to record the number of prime implicants covering a minterm # j FLAG1 = 0 # FLAG1j is a flag to mark uncovered minterms #
continues
Page 240
continued
ENDDO DO for i = 1 to n EPIi = 0 # EPIi is a flag to mark essential prime implicants # ENDDO 2. DO for j = 1 to m # For all minterms # DO for i = 1 to n # For all prime implicants # IF Wij=1 THEN Nprimej = Nprimej+1 ENDIF ENDDO ENDDO 3. # Determine group of essential prime implicants # DO for j = 1 to m IF Nprimej = 1 THEN DO for i = 1 to n IF Wij = 1 THEN EPIi = 1 # A flag to indicate that the ith prime implicant is an essential prime implicant # ENDIF ENDDO ELSEIF Nprimej = 0 THEN # Check for error in input sets # WRITE "Error in minterms or prime implicants" STOP ENDIF ENDDO # There may be no essential prime implicants found at this stage # 4. # The set of minimal prime implicants must cover all of the minterms. Check for uncovered minterms. If any exist, then the next criterion is to select a new prime implicant that covers the most uncovered minterms # OK = 1 # Flag to show if any minterms are uncovered #
continues
Page 241
continued
DO for i = 1 to n IF EPIi = 1 THEN DO for j = 1 to m IF Wij 1 THEN # The jth minterm is uncovered # FLAG1j = 1 # Flag uncovered minterms # OK = 0 # Uncovered minterm exists # ELSE # The jth minterm is covered # FLAG1j = 0 ENDIF ENDDO ENDIF ENDDO 5. # Check to see if there are any uncovered minterms # IF OK=1 THEN # No more uncovered minterms - all minimal prime implicants found # Go to 8 # Output results # ENDIF 6. # Select next minimal prime implicant # Nmax = 0 DO for i = 1 to n IF EPIi = 0 THEN # Count number of uncovered minterms that each prime implicant covers # Nminti = 0 # Counter to record the number of minterms uncovered by a prime implicant # DO for j = 1 to m IF FLAG1j=0 THEN Nminti = Nminti + 1 ENDIF ENDDO ENDIF IF Nminti > Nmax THEN NEQUAL = 1 # Counts number of equal best # BEST = i # Flag to mark current best prime implicant #
continues
Page 242
continued
POINT = i # Pointer index # ELSEIF Nminti=Nmax THEN NEQUAL = NEQUAL + 1 EQUAL(POINT) = i # Use pointer to mark equally good prime implicants # POINT = i ENDIF ENDDO # Select smallest of current best set as next minimal prime implicant # KOUNT = 0 SMALL = P(BEST) ISMALL = BEST POINT = BEST 7. K = EQUAL(POINT) # K is the subscript of the next equal prime implicant # KOUNT = KOUNT +1 IF KOUNT = NEQUAL THEN # New minimal prime implicant found # EPI(ISMALL) = 1 Go to 4 ENDIF IF P(K) < SMALL THEN SMALL =P(K) ISMALL = K POINT = K Go to 7 ENDIF 8. # Output # DO for i = 1 to n IF EPIi=1 THEN WRITE Pi ENDIF ENDDO 9. END
Page 243
Although the formal algorithm looks rather long and complicated, it is conceptually quite simple if illustrated by a truth table of minterms and prime implicants. Consider the following example from Becher (1977), with the results shown in Table 4. Blanks represent zeros.
TABLE 4 Example for Determining the Minimal Prime Implicants Using Minterms MINTERMS FUNCTION TERMS 1 2 3 4 xyz 1 001 1 1 3 011 1 0 000 1 1 4 100 1 1 6 110 1
The second term can be dropped because it is covered by the last term. In this simple example it is possible to tell at a glance that either term 2 or term 3 is not required in order to cover all of the minterms. So the minimum functional form is
or
Page 244
There may be alternate sets of minimal prime implicants, but the algorithm gives the best set based on two criteria. (1) The primary criterion is to achieve a minimum number of prime implicants. (2) The secondary criterion is to select those with the fewest literals. An alternate procedure for finding all of the prime implicants uses the consensus method, first proposed by Quine (1955). Two products can be combined in a consensus if one and only one variable occurs in one product, and occurs as its complement in the other. This variable is deleted from the consensus, repeated literals are condensed, and the new consensus term is added to the original function. The brief algorithm for the consensus method is as follows. Proof that all prime implicants are found is given by Quine (1955). Algorithm for Finding Prime Implicants by the Consensus Method 1. Convert the expression to a sum of products of literals. 2. Scan through products and eliminate any that are covered by another. 3. Scan through remaining products and combine as many as possible by the consensus. Retain the combined terms as well as
Page 245
the new ones. After each application of the consensus method, check the new product using step 2. These prime implicants could be condensed to a minimal set of prime implicants, in the manner described above. The validity of the consensus method is demonstrated by the following example.
We can demonstrate that this relationship is valid by writing a reduced truth table, Table 5, showing only the valid cases, which actually correspond to the minterms. If the consensus is true, then at least one of the minterms is true, and the function is true. Note, however, that the consensus does not adequately replace the original terms. However it may cover one of them, or some other term, which can then be eliminated. A useful lemma of the consensus algorithm is that we can quickly determine if a given sum of products represents
TABLE 5 Truth Table to Demonstrate Consensus VARIABLES ORIGINAL TERMS CONSENSUS ABCDE ABCD ACDE 10001 0 1 0 10011 0 1 0 10101 0 1 0 10111 0 1 1 11110 1 0 0 11111 1 0 1
Page 246
only prime implicants, for two special cases. If all literals are either uncomplemented, or all are complemented, and none covers another, then they must all be prime implicants. However it may still be possible to eliminate terms by subsuming. We shall consider two simple examples of using the consensus method. The first is taken from Tison (1967).
Table 6 shows a truth table for minterm's, illustrating what is happening as the consensus method is applied to terms 1 and 3, 2 and 3, and 2 and 4. Blanks are taken to be zeros. It is clear that the original terms represent the minimal set, and cannot be replaced by either
TABLE 6 Truth Table for Consensus Example MINTERMS FUNCTION CONSENSUS TERMS 4 5 6 1 2 3 1, 3 2, 3 2, 4 abcd| ac ad ab 1010 1 1011 1 1 1110 1 1 1111 1 1 1 0100 1 1 0110 1 1100 1 1 1 1110 1 1 0001 1 0101 1 1 1001 1 1 1101| 1 1 1 1
Page 247
of the additional prime implicants that were generated. Thus the minimal form is
The second example is from Quine (1955).
The procedure is shown in Table 7. Note that the subsuming

TABLE 7 Truth Table for an Example of the Consensus Method MINTERMS FUNCTION CONSENSUS TERMS 7 8 12 3 4 5 6 1, 5 3, 5 pqrst ps prs pqrt pqr 10010 1 10011 1 1 10110 1 1 10111 1 1 1 11010 1 11011 1 11110 1 1 1 11111 1 1 1 1 00000 1 00001 1 1 00100 1 00101 1 1 1 01000 1 01001 1 01100 1 1 01101 1 1 1 00011 1 00111 1 10001 1 10101 1 1 11100 1 1 11101 1 1 1 1
9 4, 8 prt
1 1
Page 248
rule eliminates term 6, pqrt. It is also apparent that the minimal form is
or
Tison (1967) has provided a method for finding all prime implicants, which is a simple extension of Quine's consensus method, which would seem to be more efficient. We must first define a monoform variable as one that appears in the whole function only in its uncomplemented form, or only in its complemented form. A biform variable is one that occurs in both forms. The algorithm is as follows. Tison's Consensus Method for Obtaining All Prime Implicants 1. Apply subsuming rule. 2. Scan the function and determine the first biform variable that occurs, that has not already been processed. IF an unprocessed biform variable is found THEN 3. Apply the consensus rule to all combinations where this variable appears, and add new terms to the function. 4. Apply subsuming rule. Go to 2 5. ELSE All prime implicants have been found. ENDIF Quine (1955) has provided a minimization method that does not require minterms. It is equivalent to making sure
Page 249
that all minterms are covered. The algorithm is as follows. Quine's Method for Minimizing a Set of Prime Implicants 1. Beginning with the longer terms, take each term in turn and set it equal to 1. This establishes that each literal must equal 1. 2. Substitute these values for the variables into the function, which is redefined to delete the term being tested. 3. If the redefined function equals 1, then the term is dispensable, and is deleted; otherwise it must be left in the function. The previous example had the following set of prime implicants.
Applying the above algorithm, and testing for prt, we have
or
Substituting these values into the function gives
which reduces to
Therefore, prt can be rejected. If we compare this operation with Table 7, it is clear that it is equivalent to making sure that all minterms are covered, and if other terms cover the minterms for prt, then it is unecessary.
Page 250
Other methods have been suggested in the literature for combining minimization with the consensus algorithm (Tison, 1967 and Wilson, 1982). They avoid the need to generate all prime implicants, but the algorithms are more complex. Wilson's method appears best adapted to computerization. Methods have been developed for heuristically selecting an approximate optimum set of the prime implicants when there are a very large number of variables. See for example Biswas (1986). 11.5 Multiple Functions In the procedures that we have been considering in the previous section, we have, in effect, been reducing a multilevel network to the simplest possible bilevel network, consisting of a group of AND gates, and one OR gate. We are now interested in doing the same thing when we have multiple functions or outputs, rather than a single one. The result would still be bilevel, but with a group of OR gates, rather than a single one. In the animal identification system the functions or outputs are mutually exclusive, if the system is determinate. Only one can be true for a given set of inputs. More generally, and perhaps more commonly, more than one output can be true, and therefore the functions may share one or more prime implicants. When multiple functions are being reduced to minimal
Page 251
sets of prime implicants, we could apply one of the algorithms of the previous section to each function independently. However, this may not give the overall minimum number of AND gates for the system, because of the sharing of prime implicants. It may be better to retain a widely shared prime implicant, even though it would be discarded if each function was minimized separately. A new step should therefore be added to our minimization algorithms. In Quine's minimization method, for example, we would first rank all prime implicants in the system in order of highest to lowest number of times they appear in different functions. High ranking implicants would be given low priority for deletion, even though they may have more literals than low ranking implicants. The same principle is illustrated for the minterm method of minimization, using the following functions.
The truth table below includes additional terms obtained by consensus. The terms marked in the last row were deleted in the minimization, giving finally the following functions.
Page 252 TABLE 8 Truth Table for Applying the Consensus Method to a Multiple Function MINTERMS Y1 Y2 1 2 1, 2 1 2 3 2, 3 1, 3 abcd ad bd 3 0011 1 1 1 1 11 1 0 1 1 1 1 1 5 0101 1 1 1 7 0111 1 1 1 1 1 9 1001 1 13 1 1 0 1 1 1 15 1 1 1 1 1 1 2 0010 6 0110 10 1 0 1 0 14 1 1 1 0 Y3 1 1 1 2 3 1, 2 1 1 2, 3
1 1 1 1
Although we have not reduced the number of terms in this example, we have eliminated 11.6 Conversion of Expert System Rules to Boolean Form
from the full set of terms.
Logic rules can be represented in Boolean form. This is illustrated by the following rules from the animal identification system, translated into Boolean expressions. In the notation used, Ri represents the ith rule; Aj represents the jth arc or intermediate conclusion; Ek represents the kth input; and Cl is the corresponding final conclusion. All variables have a value of 0 or 1, with 1
Page 253
representing ''true", and 0 representing "false". The numbering scheme shown in Fig. 8.1 is used. Input premises are represented by the input number, while all other premises are represented by the arc number.
The numbering system used above with the animal identification system is not necessarily optimum. It requires the use of additional expressions to represent the fact that some nodes have more than one arc leaving. An example would be the node for "BIRD", for which the following expressions are required.
The Ek's are the known quantities in this set of expressions. The solution of these expressions, which would give the values of the C's, is not necessarily any easier than
Page 254
the search procedure described in an earlier chapter. However Boolean methods can be used to reduce the expressions to a more compact form, and possibly make solutions easier or faster. The attribute rule system was described in the previous chapter, with a general Boolean form as follows.
where n is the number of evidential inputs, Nr is the number of rules, and the Ek's are actually literals representing either a variable or its complement. The left hand side can be recognized as a minterm. This must be qualified to state that it will only be an equality (equivalence), if the rules are IFF rules. If this is not the case, a Ci will not necessarily have one unique rule, and will be a sum of minterms, described as follows.
where Mj =jth minterm aij =1, if jth minterm is present in ith function =0, otherwise It should be possible, using Boolean methods, to convert a system of rules in the form of (11.6.1) to a
Page 255
corresponding set in the form of (11.6.4). However we would expect, from our discussion of Boolean procedures, that the full attribute set could be reduced in size. It could be desirable to convert either system of rules to a reduced form using Boolean methods. This would be particularly attractive for enhancing the use of bit matching, or other Boolean methods, as a control procedure. We shall now examine the procedure for reducing a set of expert system rules, in either form. Consider for example the following first three rules from the animal identification expert system.
The A's represent intermediate conclusions, which can be eliminated. The literal A25 occurs in both the right hand side of the first rule, and on the left side in the third rule; and it therefore could be eliminated by substitution. So now the first and third rules would be reduced to one rule.
All intermediate conclusions could be eliminated by continuing this process of substitution. An equivalent consensus type procedure can be used that is more amenable to algorithmic procedures (Lu and Siddall, 1989). Each rule is multiplied by the complement of its right hand side.
Page 256
The second and third new forms can be broken up into separate expressions, since if a sum of terms is false, each term must be false. So we now have
Any of these quantities can be recombined as a sum of products, so, using the consensus procedure as a guide, we combine the two where and A25 occur, (11.6.12) and (11.6.16).
The consensus method reduces this to
Note that this consensus procedure involves dropping the combined terms, in contrast to the previous consensus method. This process is repeated for all of the rules, until all A's have disappeared. Expressions containing a common final conclusion are then grouped and recombined, as in this example.
Page 257
The final attribute rule is
This rule could be broken down into separate rules such as the following, using the first term.
Generally speaking this is not an equivalence, and really should be written as an inequality. The nature of the animal identification rules results in this expression actually being an equivalence; but this is a special case. However the rule is valid in either event. It is intuitively clear, from from examination of the logic diagram, that we could substitute for = in all of the consensus operations, and the result would be the same. Equation (11.6.23) would, however, always be an "equality". The remaining rules for the animal identification system are as follows.
Page 258
For some systems it may be possible to now use methods of minimization to reduce the simultaneous set of rules even further. In some cases, as above, all inputs are uncomplemented literals, and it is immediately apparent that consensus cannot be used. However they should be checked for covering, and it may still be possible to eliminate some terms by using a minimization procedure. Because of the special nature of the animal system rules, we only actually need one term from each rule of (11.6.25), so the rule system, and number of inputs, could be reduced even further by selecting a single term set having the fewest inputs. A possible set would be as follows.
Inputs E1, E2, E3, E5, E7, E8, and E10 are found to be redundant. It is quite possible to work directly with the attribute rules of the minterm type shown in (11.6.24), by comparing an observed set of the attributes such as
Page 259
or 0 1 1 0 0 1 0 . . . . . .0 to each rule in turn, and if the attributes in the rule match, the conclusion is true. This can be easily done in assembly languages and some high level languages, which can work directly with binary numbers, but it is not too difficult, if less efficient, to do it in a high level language like FORTRAN. The following algorithm could be used. Algorithm for Solving Minterm Attribute Rules by Bit Matching
C(I) = Ith conclusion = 1, if true = 0, if false RULE(I,J) = Jth attribute of Ith rule = 1, if true = 0, if false = -1, if don't care O(J) = value of Jth attribute in observation set = 1, if true = 0, if false nr = number of rules na = number of attributes 1. Do for I = 1 to nr Do for J = 1 to na IF RULE(I,J) -1 THEN IF O(J) RULE(I,J) THEN C(I) = 0 Go to 2 ENDIF ENDIF
continues
Page 260
continued
2. ENDDO C(I) = 1 # Algorithm could be stopped here unless system is being checked # 3. ENDDO
The above algorithm assumes that the "don't cares" are known, and all minterms with matching "don't cares" can be combined. If they are not known then each must be treated as a separate rule, and the "don't Know" option would be deleted. Bit matching could also be used to solve the expressions when they are a minimal set of prime implicants, in the form of (11.6.25) or (11.6.26). In this situation the same algorithm could be used, by simply embedding each prime implicant in a minterm, with "don't cares" filling in for the missing literals. If a language is used that can work directly with binary numbers and logic operations, then an even faster solution is possible. Now however, the "don't care" representation is not possible, and a way must be found to get around this difficulty. See problem 11.1. Although the use of attribute rules is attractive in some applications, it does have two important, but not irredeemable, drawbacks. If it is desirable to give the user a justification for a decision, by displaying the chain of fired hierarchical rules, then this cannot be done directly with attribute rules, and a file of the original rules must be linked to the attribute rules, for use in
Page 261
tracing through the decision chain. A similar problem occurs if the expert system is to be modified. The expert system engineer must go back to the original hierarchical system, and modify it. The new system would then be reprocessed into attribute rule form. 11.7 Uncertainty with Boolean Representation It is interesting to consider the problem of uncertainty in the context of Boolean representations. If we only have type 1 uncertainty, or only uncertainty in the inputs, then the conversion of attribute rules might be quite worthwhile. In probabilistic terms, a typical full attribute or minterm type rule would have the following form, using the terminology of equation (11.6.4). Since the minterms are mutually exclusive events, the summation law applies, equation (9.5).
A typical minterm would have the form shown below.
Now if all of the input events are stochastically independent, then the simple product rule can be applied to the right hand side, and P(Mj) can be readily evaluated.
Page 262
If some are not independent, it may still be feasible to use equation (9.2.12). If the rules are in the form of sums of prime implicants, then the expressions are similar, except that the terms are not mutually exclusive, and equation (9.2.4) or (9.2.5) would be used. However it is important to note that all terms of an attribute rule must be included. However, if we have type 2 uncertainty, which is uncertainty in the rules, then the problem is considerably more difficult. The only approach would seem to be to abandon any attempt to codify type 2 uncertainty in terms of hierarchical rules, and subjectively assign type 2 probabilities directly to the attribute rules. References Becher, W. B. (1977). Logical Design Using Integrated Circuits, Hayden, Rochelle Park, N.J. Biswas, N. N. (1986). Computer-Aided Minimization Procedure for Boolean Functions, IEEE Trans. on Computer-Aided Design, Vol. CAD-5, No. 2, pp. 303-304. Hill, F. J. and Peterson, G. R. (1974). Switching Theory and Logical Design, 2nd ed., Wiley, N.Y.
Page 263
Lu, P. and J. N. Siddall (1989). Determination of the Minimum Evidence Set in Engineering Systems Using Boolean Methods, Computers on Engineering 1989, Proc. 1989 ASME Design Automation Conference, Montreal, Canada. Muroga, S. (1979). Logic Design and Switching Theory, Wiley, N.Y. Quine, W. V. O. (1952). The Problem of Simplifying Truth Functions, Amer. Math. Monthly, Vol. 59, No. 8, pp. 521531. Quine, W. V. O. (1955). A Way to Simplify Truth Functions, Amer. Math. Monthly, Vol. 62, pp. 627631. Rudeanu, S. (1974). Boolean Functions and Equations, North-Holland, Amsterdam. Tison, P. (1967). Generalization of Consensus Theory and Application to the Minimization of Boolean Functions, IEEE TRANS. on Electronic Computers, Vol. EC-16, No. 4, pp. 446456. Wilson, J. M. (1982). A Compact Method for the Minimization of Boolean Polynomials, Inter. Jour. Computer Math., Vol. 12, pp. 312.
Page 264
Suggested Reading Hohn, F. E. (1966). Applied Boolean Algebra; an Elementary Introduction, 2nd ed., Macmillan, N.Y. Kaye, F. E. (1966). Boolean Systems, Longmans, London. Whitesitt, J. E. (1961). Boolean Algebra and Its Applications, Addison-Wesley, Reading, Mass. Problems 11.1 For either the failure identification system, or for an expert system of your own, convert the hierarchic rules to a set of attribute rules. Do this manually using Boolean algebra representation, or write a computer program using the consensus method. Then write a computer program that will determine the appropriate conclusion(s) corresponding to any given input set. Use a simple control structure that is a bit matching technique. If the language that you are using permits working directly with binary numbers and logic operations, then use this type of bit matching. Otherwise, devise a method appropriate to your language, but indicate in an algorithm how you would do it if binary numbers and logic operations were available.
Page 265
11.2 In Section 11.6, the animal identification system is converted to a minimal Boolean form in (11.6.26). However it may be useful to retain the set of complete attribute rules, in order to utilize the redundancy in the event that the user erred in observing one or more inputs. Show how this could be done. 11.3 Rewrite the algorithm for Quine's minimization method when there are multiple functions. 11.4 Draw a logic diagram for the example in Section 11.5. 11.5 It is desirable to be able to provide the user with a justification for a decision, by displaying the fired hierarchical rules. Suggest a procedure when using Boolean methods.
Page 267
12 Design Systems
Page 269
12.1 The Use of Expert Systems in Design One must take care in the development of a general category of expert systems called design systems. There are many different kinds and levels of design activity. These have been discussed in Chapter 1 in some detail. Many engineering expert systems in the non-design area have been developed, and they are basically similar in type to non-engineering systems. They include systems for maintenance, fault diagnosis, data interpretation, system operation, interface with large engineering modeling systems (Kowalik 1986), and classification. But design systems must be given special consideration. There is considerable potential for applying expert systems in the various areas mentioned in Chapter 1, but in the near term most systems will likely be limited to component selection, material selection, and stereotype configuration selection. 12.2 The Special Nature of Design Systems Design systems have a unique difference from most other expert systems; they are concerned with a hypothetical physical system, rather than a real one. We shall illustrate this concept by means of an expert system for selection of bearing types. The user enters a set of inputs that are mostly design specifications, which are performance criteria that the design must meet. For the example of bearing selection, some typical input specifications are shown in Table 12.1.
Page 270
The inputs are the specifications, itemized in Table 12.1. The items are set up in hierarchical fashion to accomodate multivalued specifications. This type of input can have several forms. 1. Can be none, or one or more, and none don't care. The user can specify that none of the inputs in the set are required, or that one or more are required. Also, ''don't care" is not acceptable. 2. Must be one or more and none "don't care". The user can specify that any one or more of the set must be required. "Don't care" is not acceptable. 3. Must be one only and none "don't care". The user must specify one and only one item as being required. "Don't care" is not acceptable for any. 4. Must be one only or all "don't care". 5. Can be none or one only, and none don't care. The system conclusions are represented by bearing types; and a given set of input specifications should lead to a final conclusion in the form of the optimum bearing type selection. A typical set is shown in Table 12.2. Note that the table of bearing types represents a hierarchy of conclusions, with three levels, representing successively more restricted class concepts of bearings. The hypothetical nature of the design, as represented by these specifications and conclusions, causes differences from the earlier examples in some important ways that
Page 271 TABLE 12.1 SYSTEM PERFORMANCE CHARACTERISTICS BEARING SPECIFICATIONS CODE NUMBER DESCRIPTION 1.1 No contamination - food and pharmaceuticals 1.2 No contamination - textiles 1.3 Intermittent use 1.4 Varying speed (can be none, or one or more) 1.4.1 Start-stop 1.4.2 Alternating low and high speeds 1.5 Vacuum 1.6 Vibration 1.7 Permanent lubrication 1.8 Bearing must be stiff 1.9 Speed level (must be one or more, and none don't care) 1.9.1 Very low or stationary 1.9.2 Low 1.9.3 Medium 1.9.4 High 1.10 Wear rate (must be one only, and none don't care) 1.10.1 Low 1.10.2 Medium 1.10.3 High 1.11 Friction level (must be one only, or all don't care) 1.11.1 Low 1.11.2 Medium 1.11.3 High 1.12 Clearance level (must be one only, or all don't care) 1.12.1 Low 1.12.2 Medium 1.12.3 Large 1.13 Space limitations (must be one only, or all don't care) 1.13.1 Minimum diametral space 1.13.2 Minimum axial space 1.14 Low cost 1.15 Bearing size (must be one only, and none don't care) 1.15.1 Small 1.15.2 Medium 1.15.3 Large 1.16 Load level (must be one or more, and none don't care) 1.16.1 Low 1.16.2 Medium
Page 272

CODE NUMBER DESCRIPTION 1.16.3 High 1.16.4 Shock 1.17 Good electrical conductivity Ambient temperature level (must be one or more, and none don't 1.18 care) 1.18.1 Below -70 C 1.18.2 -70 to -30 1.18.3 -30 to 0 1.18.4 0 to 100 1.18.5 100 to 250 1.18.6 250 to 500 1.18.7 500 to 1000 1.18.8 Above 1000 C 1.19 Temperature range (must be one only, and none don't care) 1.19.1 Small 1.19.2 Medium 1.19.3 Large Contamination present (can be none, or one or more, and none don't 1.20 care) 1.20.1 Dust or grit (can be one only, and none don't care) 1.20.1.1 Moderate 1.20.1.2 Severe 1.20.2 Water 1.20.3 Gasoline 1.20.4 Chemicals 1.20.5 Reactive gas environment 1.21 Flammability risk 1.22 Oil feed cannot be used 1.23 Long period without operation 1.24 Reliability even after loss of lubricant 1.25 Low weight 1.26 Relative thrust present (must be one only, and none don't care) 1.26.1 Low 1.26.2 Medium 1.26.3 High 1.27 Very long life 1.28 Pressure to be sealed (must be one only, and none don't care) 1.28.1 None 1.28.2 Low 1.28.3 Medium 1.28.4 High 1.29 Angular misalignment present 1.30 Shaft position accuracy (must be one only, or all don't care) 1.30.1 Low 1.30.2 Medium
Page 273

CODE NUMBER 1.30.3 1.31 1.32 1.33 1.34 DESCRIPTION High Low noise Ease of disassembly High starting load with low friction torque Instrument pivot
TABLE 12.2 BEARING TYPES CODE DESCRIPTION NUMBER 3.1 Rolling element 3.1.1 Ball 3.1.1.1 Deep groove 3.1.1.2 Spherical 3.1.1.3 Angular contact 3.1.2 Cylindrical rollers 3.1.3 Spherical roller 3.1.4 Needle 3.1.5 Tapered rollers 3.2 Journal 3.2.1 Hydrodynamic 3.2.1.1 Liquid lubricant 3.2.1.2 Gas lubricant 3.2.2 Mixed lubrication 3.2.3 Boundary lubrication 3.2.4 Hydrostatic 3.2.4.1 Liquid lubricant 3.2.4.2 Gas lubricant 3.2.5 Steel backed soft bearing metal 3.2.6 Spherical plain bearing Hardened steel pin and hole with low viscosity synthetic oil (instrument 3.3 pivots) 3.4 Porous bronze with oil 3.5 Dry 3.5.1 Teflon (PFTE) 3.5.2 Nylon 3.5.3 Acetals 3.5.4 Metals 3.6 Composites 3.6.1 Porous bronze with Teflon and lead powder 3.6.2 Porous bronze with graphite 3.6.3 Glass fibre with Teflon
Page 274
affect the algorithm. 1. It is quite possible for the user to specify a set of inputs that result in no conclusion; all outputs would then be false. 2. The user can assign three possible values to inputs true or a required specification, false or a specification that must not be satisfied, and "don't care" if the specification is satisfied or not. The "don't care" category is treated in the rules the same as a "true" input. 3. There can be more than one true final conclusion; and therefore the program cannot stop as soon as one final conclusion has been fired. Every rule must be tested until its state has been determined. 4. Although this would likely be designed as a deterministic system, provision has been made for the user to assign "don't know" to an input. This must be propagated through the system so that the rules have a possible "don't know" state. It would be useful output information to give the complete state of such a rule. 5. Some of the inputs are multivalued. These are of different types, defined above, and the user is asked to make a selection. 6. Many intermediate and final conclusions are constrained by negative premises. That is, if a particular input is set as false, then a corresponding conclusion is precluded. It is easy to overlook these because we
Page 275
tend to make decisions based on positive premises, such as - if low starting friction is desired, then use rolling element bearings. However a negative attribute might be needed, such as - if very long life is required, do not use rolling element bearings. The user might innocently specify that very long life is desired, and unless the constraint of a negative attribute is applied, a rolling element bearing could be incorrectly recommended. These constraints could be included as premises to any rule having the conclusion as the rule recommendation. However, if the conclusion is burdened by a large number of these, it is likely easier to check a conclusion for constraint violation whenever the conclusion is thrown up by the rule search. It would be desirable to advise the user whenever a conclusion has been rejected by such constraints, so that he or she can review the specifications in order to decide if certain active constraints are really necessary. Items 1, 2 and 6 in the above set of characteristics of the system are due to the hypothetical nature of design systems. A diagnostic system, or a classification system, is a representation of a real physical system, and the inputs represent a real physical state. Therefore the inputs are inherently consistent, unless the user has made an error in observing them; and it is unnecessary to use
Page 276
constraints of the type described in item 6, or the "don't care" category for inputs, mentioned in item 2. However the inputs, or most of them, in a design system are hypothetical, and therefore not necessarily consistent. This also explains why a user could enter a set of inputs that lead to no solution, as mentioned in item 1. The welding expert system, mentioned briefly as an example in earlier chapters, is a design system, but it is a type in which the inputs are all real and can be observed, such as the orientation of the weld. It seems unlikely, but it may be possible for the user to ask for a welding situation where welding is impossible, making constraints necessary. 12.3 Some Typical Rules for the Bearing System In this section a few typical rules are given for the bearing selection expert system, having the specifications (inputs) and bearing types (conclusions) given above. We have chosen to incorporate constraints by using negative premises. The following notation is used in the description of the rules. . AND combination + OR combination . This symbol indicates that the item can be present or not, entered as "don't care". It is
Page 277
always used as input to an AND gate, as shown in Figure 12.1. [x] A negative input is indicated by enclosing the premise in square brackets The . is a useful representation when an input to a rule is a required condition for firing a rule if it is true, but it need not be true to fire the rule. It is therefore a sufficient but not necessary condition. It is a "don't care" input, and it can be either true or false to fire the rule. It is also useful when a rule will be satisfied by any one of a set of mutually exclusive inputs. This is illustrated by the following example, in algorithmic notation.
IF (C AND low speed AND medium speed AND NOT low wear) THEN O is true
Fig. 12.1 The "plus or minus" gate.
Page 278
Using the notation in the rules given below, it has the form
IF C . low speed . medium speed . [low wear] THEN O
This is represented in Figure 12.2, where low speed is represented by A, and medium speed by B. This is intended to codify the concept that the selected design will be satisfactory for either low or medium speed, but not high speed. A and B are mutually exclusive. It should be noted that the exclusive OR is used in the Boolean expressions shown in Figures 12.1 and 12.2.
Fig. 12.2 Example of "plus or minus" notation.
Page 279
Although the quantity ( ) is always equal to 1, it cannot be eliminated from the Boolean representation of a rule; because the user must be assured that specification A is satisfied, if it is specified in the input. Thus input A cannot be left hanging; the user must know that it is satisfied by the final recommendation of the system. The first set of rules are dummy OR rules, used to separate AND gates from OR gates. Dummy Gates These gates are all OR type gates.
No. D1 D2 D3 Description Vibration +shock Low speed +large clearance +shock +start-stop Temperature less than -70 C +temperature between -70 and -30 C Temperature < -70 C +temperature -70 to -30 C +temperature -30 to 0 c +temperature 0 to 100 C +temperature 100 to 250 C Temperature 250 to 500 C +temperature 500 to 1000 C +temperature > 1000 C D4 +temperature 250 to 500 C Code 1.6 1.16.4 1.9.2 1.12.3 1.16.4 1.14.1 1.18.1 1.18.2 1.18.1 1.18.2 1.18.3 1.18.4 1.18.5 1.18.6 1.18.7 1.18.8 1.18.6
D4
D5 D6
Page 280

No. Description D6 D7 +temperature 500 to 1000 C Low speed D8 +start-stop +vibration Temperature < -70 C D9 +temperature 500 to 1000 C + temperature > 1000 C D10 Temperature < -70 to > 1000 C Code 1.18.7 1.9.2 1.4.1 1.6 1.18.1 1.18.7 1.18.8 1.18.1 to 1.18.8
The next set of rules is a sample of the rules for bearing selection. These rules are given for illustrative purposes only; and while they are likely a reasonably good representation of expertise in the domain, they should not be used as an uncritical basis for a bearing selection expert system. Bearing Rules
No. Description Minimum dia-metral space .low cost .low shaft position accuracy B.1 .medium shaft position accuracy .low noise .[bearing must be stiff] IF THEN Code Description Code 1.13.1 1.14 1.30.1 Journal bearing 1.30.2 1.31 1.8 3.2
Page 281

No. Description .[low wear rate] .[low clearance level] .[good electrical conductivity] .[high shaft position accuracy] .[angular misalignment present] Journal bearing .low load .low friction .medium wear B.2 .high wear rate .temperature level -70 to 250 C .start-stop .chemical environment .oil feed cannot be used Journal bearing .low load .low speed .medium speed B.3 .large clearance .water .start-stop .[high speed] Journal bearing B.4 .low speed .low load .low cost IF THEN Code Description Code 1.10.1 1.12.1 1.17 1.30.3 1.29 3.2 1.16.1 1.11.1 1.10.2 1.10.3 Teflon D4 1.4.1 1.20.4 1.22 3.2 1.16.1 1.9.2 1.9.3 Nylon 1.12.3 1.20.2 1.4.1 1.9.4 3.2 Porous 1.9.2 bronze 1.16.1 with oil 1.14
3.5.1
3.5.2
3.4 3.4
Page 282

No. Description .oil feed cannot be used .small size .medium size .[low wear rate] .[low friction level] .[moderate dust or grit contamination] .[severe dust or grit contamination] .[gasoline contamination] .[chemical contamination] B.5 Journal bearing .medium speed .medium load .low wear .low friction .temperature level -70 to 250 C .start-stop .[low wear rate] .[severe dust or grit contamination] Journal bearing .high load B.6 .low friction .low wear rate IF Code Description 1.22 1.15.1 1.15.2 1.10.1 1.11.1 1.20.1.1 1.20.1.2 1.20.3 1.20.4 3.2 Porous bronze 3.6.1 1.9.3 1.16.2 1.10.1 1.11.2 with Teflon D4 and lead powder 1.4.1 1.10.1 1.20.1.2 3.2 1.16.3 Glass fibre with Teflon 3.6.3 1.11.1 1.10.1 THEN Code
Page 283

No. IF Description .temperature level -70 to 250 C .start-stop .chemical environment .[severe dust or grit contamination] Journal bearing .high speed B.7 .shock .very long life .low wear rate Low friction .high speed .low load .(temperature below -70 +temperature level above 500 C) .food .bearing must be stiff .low wear rate B.8 .low friction level .low clearance level .very long life .low noise .large temperature range .dust or grit contamination moderate .[varying speed] .[vacuum] THEN Code Description Code D4 1.4.1 1.20.4 1.20.1.2 3.2 1.9.4 Hydrodynamic 1.16.4 bearing 1.27 1.10.1 1.11.1 1.9.4 1.16.1 D9 1.1 1.8 1.10.1 Hydrodynamic 1.11.1 journal 1.12.1 bearing with 1.27 gas lubricant 1.31 1.19.3 1.20.1.1 1.4 1.5
3.2.1
3.2.1.2
Page 284

No. IF Description .[low cost] .[shock] .[severe dust or grit contamination] THEN Code Description Code 1.14 1.14 1.16.4 1.16.4 1.20.1.21.20.1.2 1.11.1 D10 1.4.1 1.9.4 1.9.3 1.9.2 1.10.1 1.16.1 1.16.2 1.19.3 1.20.1.1 1.27 1.31 1.5 1.14 1.20.1.2 1.4.1 1.8 1.12.1 1.17 Rolling element 1.26.1 bearing 1.26.2 1.26.3 1.30.3
Low friction .(temperature below -70 +temperature level above 1000 C) .start-stop .high speed .medium speed .low speed .low wear rate B.9 .low load .medium load .large temperature range .dust or grit contamination moderate .very long life .low noise .[vacuum] .[low cost] .[severe dust or grit contamination] Start-stop .stiff .low clearance level .good electrical conductivity B.10 .low thrust .medium thrust .high thrust .high shaft position accuracy
3.1
Page 285

No. IF Description .high starting load and low friction minimum axial space .[very long life] Code 1.33 1.13.2 1.27 THEN Description Code
This sample of rules illustrates the hierarchical nature of the rule system; the intermediate conclusion ''journal bearing", generated in rule B1, occurs as a premise in rules B2, B6 and B7. This particular design system has an unusually large number of premises in the rules, because bearing selection is sensitive to quite a large number of specifications. Some ingenuity will be desirable in the design of the control system, in order to minimize the total number of inputs required from the user. 12.4 An Extension of the Bearing Example We shall now assume that the system has been expanded to include the selection of the lubricant, the seal, and the lubrication system; all of these are interconnected with the selection of the bearing itself. We now require a set of lubrication specifications, shown in Table 12.3.
Page 286 Table 12.3 LUBRICANT PERFORMANCE CHARACTERISTICS Compatibility with contacting materials (can be none, or one or more, 2.1 and don't care is not acceptable) 2.1.1 Natural or SBR rubber 2.1.2 Copper, brass or bronze Process fluids as lubricants (can be none or one only, and none don't 2.2 care) 2.2.1 Refrigerants or cyrogenic liquids 2.2.2 Foods or food processing components 2.2.3 Treatment liquids 2.2.4 Brines 2.2.5 Melts 2.2.6 Paints 2.3 Viscosity (must be one only, or all don't care) 2.3.1 Low 2.3.2 Intermediate 2.3.3 High 2.4 Long storage life Table 12.4 SEAL TYPES Radial lip contact Gland O-ring Mechanical (axial contact) Clearance Fixed bush Floating bush Labyrinth 4.5.3.1 Plain 4.5.3.2 Helical Slinger Grease only No seal
4.1 4.2 4.3 4.4 4.5 4.5.1 4.5.2 4.5.3
4.6 4.7 4.8
Page 287 TABLE 12.5 TYPE OF LUBRICANT Liquid 5.1.1 Mineral oil 5.1.1.1 Paraffinic 5.1.1.2 Napthenic 5.1.2 Silicone 5.1.2.1 Phenyl methyl silicone 5.1.2.2 Methyl silicone 5.1.2.3 Chlorinated silicone 5.1.3 Di-ester 5.1.4 Polyglycol 5.1.5 Chlorinated diphenyl 5.1.6 Polyphenyl ether 5.1.7 Fluorocarbon 5.1.8 Water 5.2 Gas 5.2.1 Air 5.2.2 Helium 5.2.3 Nitrogen 5.3 Solid or dry 5.3.1 Graphite 5.3.2 Molybdenum disulfide 5.3.3 PFTE (Teflon or Fluon) 5.3.4 Nylon 5.3.5 Acetals 5.3.6 Metals 5.4 Grease or semi-solid 5.4.1 Consistency level - NLGI number 5.4.1.1 000 5.4.1.2 00 5.4.1.3 0 5.4.1.4 1 5.4.1.5 2 5.4.1.6 3 5.4.1.7 4 5.4.1.8 5 5.4.1.9 6 5.4.2 Components Base oil Thickener 5.4.2.1 Mineral oil Calcium soap 5.4.2.2 Mineral oil Sodium soap 5.4.2.3 Mineral oil Lithium soap 5.4.2.4 Mineral oil Bentonite clay 5.4.2.5 Petrolatum (vaseline) 5.4.2.6 White oils Suitable type 5.4.2.7 Polyethylene Suitable type 5.4.2.8 Polybutene Suitable type 5.4.2.9 Methyl silicon Suitable type 5.1
Page 288

5.4.2.10 Castor oil 5.4.2.11 Di-ester Lithium soap 5.4.2.12 Di-ester Bentonite clay 5.4.2.13 Silicone Lithium soap 5.4.2.14 Silicone Dye 5.4.2.15 Silicone Silica 5.5 Anti-seize and anti-scuffing compounds Grease or petrolatom (vaseline) and molybdenum disulphide 5.5.1 powder 5.5.2 Graphite and petrolatum or polyglycol Paste of graphite with a volatile solvent (ethyl alcohol or 5.5.3 acetone) Paste of graphite with water or or a non-flammable liquid 5.5.4 such as fluorocarbon Paste of low-friction metal powders (lead and copper) and 5.5.5 petroleum or polyglycol Paste of low friction metal powders and a low-melting point 5.5.6 polymer or flux TABLE 12.6 LUBRICATION SYSTEM TYPE Manual oil feed Automatic gravity oil feed Wick oil feed Oil mist lubrication Ring, disc or splash lubrication Oil circulation system External lubricant cooling
6.1 6.2 6.3 6.4 6.5 6.6 6.7
Page 289
We next need additional conclusions for the seal types, the lubricant types, and the lubrication system types, shown in Tables 12.4, 12.5 and 12.6. We are now able to formulate rules for the selection of the lubricant, the seal, and the lubrication system.
LUBRICANT RULES No. IF Description Vacuum .intermittent use .start-stop .low speed .medium speed Food .low speed .low load .temperature 0 to -30 Food .medium speed .grease Ditto Ditto Ditto Ditto Temperature range high .grease Code 1.5 1.3 1.4.1 1.9.2 1.9.3 1.1 1.9.2 1.16.1 1.18.3 1.1 1.9.3 5.4 Description Grease THEN Code 5.4
L.1
L.2
Petrolatum
5.4.2.5
L.3 L.4 L.5 L.6 L.7 L.8
White oil Polyethylene Polybutene Methyl silicone Castor oil
5.4.2.6 5.4.2.7 5.4.2.8 5.4.2.9 5.4.2.10 5.4.2.12
1.19.3 5.4
Di-ester & clay
Page 290

No. IF Description Low weight L.9 .temperature <250 C :Low weight L.10 .temperature >250 C High load L.11.very low or stationary speed Anti-sieze compound L.12.temperature <250 C Anti-sieze compound L.13.electrical conductivity .temperature <500 C Anti-sieze compound L.14 .temperature <1000 C L.15Ditto L.16Start-stop Low cost L.17 .low load L.18External lubricant cooling L.19Long storage life THEN Code Description 1.25 Grease D4 1.25 Liquid lubricant D5 1.16.3 Anti-seize compound 1.9.1 5.5 D4 disulphide 5.5 Graphite & petrolatum 1.17 or polyglycol D6 Low friction 5.5 metal powders & petrolatum D7 or polyglycol Metal powder & low melting polymer or flux 1.4.1 Grease 1.14 Grease 1.16.1 6.7 Liquid lubricant 2.4 Liquid lubricant 5.5.2 Grease or petrolatum & molybdenum Code 5.4 5.1 5.5
5.5.1
5.5.5 5.5.6 5.4 5.4 5.1 5.1
Page 291

No. IF Description Cylindrical rollers L.20.high speed .grease Cylindrical rollers L.21.large bearing size .grease Cylindrical rollers L.22.(vibration +shock) .grease L.23Journal bearing .(low speed +large clearance +shock +startstop) Journal bearing L.24 .grease Journal bearing L.25.grease .temperature level <-30 C Journal bearing L.26.grease .temperature level -30 to 0 C L.27Ditto L.28Ditto THEN Code Description 3.1.2 1.9.4 Consistency level - 3 5.4 3.1.2 1.15.3Consistency level - 4 5.4 3.1.2 D1 Consistency level - 4 5.4 3.2 Grease D2 3.2 5.4 3.2 5.4 D3 3.2 Consistency level - 1 Di-ester & lithium soap 5.4.1.4 Code 5.4.1.6 5.4.1.7 5.4.1.7 5.4
Mineral oil & 5.4 lithium soap 1.18.3 Mineral oil & bentonite clay Di-ester & bentonite clay
Page 292

No. IF Description L.29Ditto L.30Ditto Journal bearing L.31 .grease .temperature level 0 to 100 C L.32Ditto Journal bearing L.33 .grease .temperature level 100 to 250 C L.34Ditto Journal bearing .high load L.35 .(low speed +start-stop +vibration) .large clearance .temperature level -70 to 500 C Journal bearing .high load .low speed L.36 .temperature level -70 to 250 C .vacuum .low friction .start-stop THEN Code Description Code Silicone & lithium soap Silicone & silica 3.2 Mineral oil 5.4 & calcium soap 1.18.4 Mineral oil & sodium soap 5.4.2.2 3.2 5.4 Di-ester & bentonite clay 5.4.2.12 1.18.5 Silicone & silica 3.2 1.16.3 D8 1.12.3 D6 3.2 1.16.3 1.9.2 D4 1.5 1.11.1 1.4.1 5.4.2.15
Graphite
5.3.1
Molybdenum disulphide
5.3.2
Page 293 SEAL RULES IF Description Code 3.1 Rolling element bearing 5.1 .liquid lubricant 1.9.1 .very low or stationary speed S.1 .low speed 1.9.2 .high pressure 1.28.4 .temperature level D6 3.1 Rolling element bearing 5.1 .liquid lubricant 1.9.1 .very low or stationary speed S.2 .low speed 1.9.2 .medium pressure 1.28.3 .temperature level 0 to 100 C 1.18.4 Rolling element bearings 3.1 .liquid lubricant 5.1 .medium speed 1.9.3 .high speed 1.9.4 S.3 .no pressure 1.28.1 .low pressure 1.28.2 .moderate dust or grit 1.20.1.1 1.18.1 to 1.18.6 .temperature level D6 No. THEN Description Code
Gland
4.2
O-ring
4.3
Radial lip
4.1
Page 294

No. S.4 IF Description Rolling element bearing .liquid lubricant .medium speed .high speed .medium pressure .high pressure .severe dust or grit Rolling element bearing .liquid lubricant .no pressure Journal bearing .liquid lubricant .no pressure .Low clearance Journal bearing .liquid lubricant .no pressure .medium clearance .large clearance Vacuum .intermittent use .start-stop .low speed .medium speed Code 3.1 5.1 1.9.3 1.9.4 1.28.3 1.28.4 1.20.1.2 3.1 5.11 1.28.1 3.2 5.11 1.28.1 1.12.1 3.2 5.11 1.28.1 1.12.2 1.12.3 1.5 1.3 1.4.1 1.9.2 1.9.3 THEN Description Mechanical seal Code 4.4
S.5
Labyrinth
4.5.3
S.6
Fixed bush seal
4.5.1
S.7
Floating bush seal
4.5.2
S.8
Labyrinth seal
4.5.3
Page 295

No. S.9 S.10 Description Food .medium speed .grease Low cost .low load IF Code 1.1 1.9.3 5.4 1.14 1.16.1 Description Mechanical seal Grease only seal THEN Code 4.4 4.7
Rules for the lubrication system are not illustrated here. Considerably more work is required to complete the design of the system. More rules are required. A data structure is needed. The control structure will need considerable care, because of the large number of inputs. Speed of solution is not likely very important; but, because of the possibility of an input set leading to no solution, the control must designed so as to give the user as much guidance as possible in finding a feasible solution. Also the user interface will be a large component of this expert system design. 12.5 Using Frames with the Bearing Selection System We are now ready to suggest a possible design for using frames with this system. However, it should be emphasized that frames are not necessarily the best approach to data management for this system. It depends to some extent on how much more complex the final system becomes, and how the system is eventually used. Even if
Page 296
Fig. 12.3 A possible frame system for the bearing selection example.
frames were ultimately selected, the arrangement illustrated may not be the final one. A possible system is shown in Figure 12.3. Only three levels of frames are shown; but up to three more levels are required to fully define all levels of class concepts. Each subframe would have associated with it the rules for which it is the conclusion. Reference Kowalik, J. S., ed.(1986). Coupling Symbolic and Numerical Computing in Expert Systems, Elsevier, Amsterdam.
Page 297
13 Expert System Development
Page 299
13.1 Small Systems Sometimes, when reading the literature of expert systems, the impression is received that all useful expert systems are very large, with thousands of rules. This is not the case, and many small, but very useful expert systems have been written, in engineering and other fields. The experience of the AI division at DuPont is one illustration of this (Press, 1988). They are reported to have developed over 200 expert systems in a company teaching program attended by personnel from the operating divisions. These have averaged about 80 rules, and about half the systems were actually implemented in operations. A commercial shell was used. Bahill, Harris and Senn (1988) have reported a similar experience on 25 small expert systems developed in a teaching environment, using a shell. They had an average of 150 rules, and took about 100 hours. However there was no report on how many were actually implemented. The author also has a similar experience; but the use of shells or PROLOG was not allowed. The language used was otherwise left to the students, and have included FORTRAN, PASCAL, C and LISP. The students had the theoretical background material that is in this book. The reader who is seriously interested in getting involved in expert system development is strongly urged to do a relatively small ''practice" expert system, without using a shell. It is difficult otherwise to achieve
Page 300
insight into the design of expert systems. It need not be large; indeed it should not be more than about 75 rules, and can usefully be as low as 30. And it need not be technical; any subject of interest to the engineer is adequate to achieve the desired result. It is excellent if it is related to a project associated with the reader's technical activities, but the initial version should not be too ambitious. It can be a more circumscribed version of the future full system. The control structure in the learning version should not be made too elaborate. The use of data structured code1 is not necessary, but if used together with provision for a development mode2, has the advantage of providing a basic shell for future expert systems, or expansion of the practice version. However it is a shell that is completely flexible, since the system designer can modify it easily to suit the characteristics of different system designs. It would be desirable, but not essential, to incoroprate uncertainty into the system. It need not be a "pure" expert system; practice systems are commonly just complex logic systems, with no intuitive expertise. The development time for these kinds of practice expert systems varies over an approximate range of 20 to 100 hours.
1See Section 8.1. 2See Problem 8.1.
Page 301
13.2 Developing Larger Systems 13.2.1 Introduction The first question that arises in the development of larger systems, after the domain of expertise has been tentatively established, is whether or not to use a commercial shell. It is very tempting to do so, since it would seem that the system designer can immediately plunge into creation of the knowledge base, without need to be concerned with any programming problems. However the situation is far from being this simple; and in the following, we shall consider the features that larger systems may have, and discuss these features in the context of using shells. We shall call a system design a custom design if it is not based on a shell. When developing medium to large complex systems, the designer, who is doing a custom design, still will find it desirable to create his or her own development software. We have mentioned in the previous section the advantages of using structured code together with a simple algorithm for creating new rules, or changing existing ones. This can provide the foundation for building a system with as many of the features described below as may be judged appropriate, depending on the size and complexity of the system. The features discussed also provide a basis for selecting a commercial shell, in the event that a system designer decides to go that route. No one shell has all of
Page 302
the features listed. A full discussion of the features of many commercial shells, and their relative advantages and disadvantages, can be found in Harmon, Maus and Morrissey (1988). Also reviews of shells can frequently be found in publications such as Byte, AI Expert, Artificial Intelligence for Engineering Design, Analysis and Manufacturing, IEEE Expert, PC AI, and Expert Systems: the International Journal of Knowledge Engineering. 13.2.2 Languages There would appear to be a considerable trend away from the use of LISP as the dominant language for expert systems. Almost all of the commercial shells except the very largest, and oldest, are written in C, PASCAL, COBOL or FORTRAN. Also LISP is generally considered too slow for frequent consultations, unless a LISP compiler or chip is available. Some shells permit the incorporation of user written modules in a developed system; but almost invariably only one is accommodated, which may not be the designer's language of choice. There are clearly language advantages if a designer uses a custom design rather than a shell. This would be particularly true if there is a substantial component of the system that uses engineering modeling; or if the system is a front or rear end of a large modeling package such as optimization, circuit design, process simulation, kinematic analysis, or finite element analysis.
Page 303
13.2.3 Knowledge Base It will commonly be desirable to provide for multiple value inputs, and not all shells have this feature. The custom designer must make provision for this in his or her data structure. As the number of rules accumulate in a knowledge base development, and the logic network becomes more complex, some facility for keeping track of it will be useful. It is desirable to begin by using a logic network with AND and OR gates shown, but once the number of gates exceeds 50 or so this becomes rather difficult and time consuming. A simple drawing tool such as Autosketch3 is very helpful, but once the display has to be broken down into more than one sheet, and there are many interconnections, other bookkeeping methods are likely essential. At least one commercial shell4 provides a graphical network display as an integrated automatic feature. The development of the logic circuit for a programmable logic chip is quite similar to developing the logic system for an expert system. The Altera Corporation, which manufactures programmable logic chips, provides a software development package that can create a graphical logic network display of AND and OR gates; and plot it. This can be done by direct user
3Made by Autodesk, Inc. 4NEXPERT, Neuron Data Company
Page 304
input; or it can be done automatically, using Boolean expressions provided by the user to define the system. Since the Boolean expressions for a logic chip are essentially similar to those for an expert system, the same feature can be used for expert systems. This software even has a simulation feature; so that the operation of a circuit design can be checked to see if the anticipated performance of a logic chip has been achieved. This feature could, in principle, also be used to test the logic of an an expert system. However the developer must use the control system incorporated in the software, which is not revealed. A simple feature for rule management is the display of an index of rules. It would be desirable to then be able to display the predicates for any rule and the rule or input from which each comes. This display should also show each rule's conclusion and make available the destination rules or final conclusion of the rule output arc. There would be a similar index for all inputs and final conclusions. If frames are being used, a graphical display of the frame network would be useful. The rule management scheme described above could be extended to incorporate the frame structure. There is considerable scope for ingenuity in designing rule management and debugging features. One could include, for example, a procedure for checking to determine if any rule has an in or out arc that is not connected to any-
Page 305
thing. It would also be desirable to have a routine to check for rule duplication; or if one rule subsumes another, in the sense of Chapter 11. Another possibly valuable feature would be the preparation of a set of conflicting rule predicates, such as A and B in
IF (A AND B) THEN C
If A and B can never both be true by their inherent nature, then they should never occur together in an AND rule. A routine could be included that would scan all rules for such conflicts. Another possibility is to save a special file that lists all names (parameters) that are used to represent inputs and outputs. Any name entered by the designer in setting up the knowledge base could then be checked to ensure that it is in the master list. This would help debug typographical errors. In the same vein, it would be helpful to have a facility for going through a list of inputs or conclusions, and after selecting any one, be able to seek out every instance where it occurs in a rule. Other important features are facilities for rearranging rules, editing rules, deleting rules, and so on; with the same facilities for inputs and final conclusions. Another knowledge base consideration is the use of frames. If the designer has decided to use them, it should be noted that the smaller and less expensive commercial shells cannot incorporate frames, and they considerably complicate the use of those that do. These smaller shells
Page 306
can cost less than $500 and handle up to about 500 rules, whereas larger frame based systems can cost much more, but can handle thousands of rules. Some expert systems could be substantially enhanced by incorporating graphics displays to illustrate more clearly the nature of inputs and outputs. This can be done by using commercial graphics software for generating diagrams, or hardware and software that can scan and store photographs and existing drawings. Some shells can display figures created by specific commercial graphics packages. 13.2.4 Control System It is probably safe to say that no shell gives the system designer the complete flexibility in control system design that he or she has with one that is custom designed. Most commercial shells use either backward or forward chaining, or a combination of both, and usually provide control of the order of rule evaluation. Some also permit initial pruning of inputs, which ennables the user in a consultation to have the program ignore certain grouped segments of inputs. Some shells, during a consultation, provide for multiple evaluation (instantiation) of a rule or rule groups, without changing the remainder of the system. This ennables a kind of simulation, in which the effects of changing only one or a few inputs can be quickly determined. A similar feature is, during both development and consulta-
Page 307
tion, to be able to list all inputs after a consultation and modify one or more, and rerun without reentering all inputs. 13.2.5 Interface with User It is an important feature of an expert system that the user be able to quickly and easily get started in using the system in a consultation, without the need for extensive training or studying of a manual. This requires a carefully designed hierarchical menu system that leads the user easily through the system with the aid of prompts, and possibly "help" or "explain" displays in more complex systems. Screens that provide fill-in slots may be useful. Commercial screen and menu making utilities are available that could be used in a custom design. Most shells provide these facilities to a greater or less degree. 13.2.6 Uncertainty Most shells have some facility for incorporating the propogation of uncertainty in a system design. Either the MYCIN method or the Bayes' theorem method are usually used. It is rare that a provision is made for permitting the user to incorporate his or her own scheme for handling uncertainty. None make provision for using Monte Carlo simulation. If uncertainty is important in an engineering expert system, and it is decided to use rigorous
Page 308
probability methods, then it would seem mandatory to use a custom design. In the author's opinion, it would be quite difficult to justify the non use of rigorous probability estimates in most engineering expert systems. Another uncertainty feature that may be desirable, in some systems, is some means for coping with the situation where the user in a consultation is forced to enter "unknown" in response to a prompt for an input. The system should still give the best possible estimate of the correct result. 13.2.7 Access to Data Bases and Control of Data Management The custom designer may decide to incorporate the data structured code using data files created by using commercial data management packages. Such data files can usually be accessed by designer written code.5 Some shells also permit access to data files created in this way. An important characteristic of custom designs with their own data structures and data management systems is that the designer has complete flexibility in tailoring the data system to the specific needs of his or her design. All of the overhead and unneeded fatures of commercial data management packages can be avoided. A similar advantage in flexibility would exist over commercial shells.
5See Section 8.3.
Page 309
13.2.8 Cost There are several aspects to comparing costs for different approaches to designing an expert system. The initial cost of shells can be less than $500 for small shells without provision for frames and many of the features discussed here. They can run on low level personal computers and can accomodate up to about 500 rules. Midsize systems cost from $500 to $5000, and require the use of higher end personal computers, such as the 386, or work stations. And the very large shells can cost up to $60,000 and must be used on large work stations, mainframes, or quite expensive special LISP machines. The cost depends to a large extent on the number of features available. Another factor in costs is learning time. The material in this book can be learned in the equivalent of a one semester course. To learn to use midsize and large shells requires training courses of a week or more; and large systems commonly require familiarity with the language use, usually LISP. Small to medium size systems require considerable time to become familiar with the manuals and use of all of the features. A point worth noting is that design of a custom system involves the creation of a structure that is basically a shell itself, but one easily adapted to other different expert systems. It is therefore an investment in future expert system work. The same can be said of the learning time devoted to familiarization with a commercial shell.
Page 310
13.2.9 Report Shells commonly provide for an output file containing a report that summarizes the results of a consultation. This would be a desirable and fairly easy feature to incorporate in a custom system. It may be useful to have options controlling how extensive the report is to be. 13.2.10 Real Time Systems Expert systems have great potential value as control devices in engineering processes, circuits and mechanisms. In this role it is important to have provision for easy interface with sensors for inputs, and between outputs and control actuators. Shells that do this directly are not common, but it can rather easily be done with custom designs, using interaction with input and output ports of the microprocessor. The expert system on a chip, described in Section 7.11, has great potential for applications where high speed, simplicity, compactness, and lightness are important. 13.2.11 Interaction with Operating System Files A somewhat similar feature to interaction with data bases, discussed in Section 13.2.7, is the ability of a system to interact with operating system data files. Some shells can incorporate operating system commands to execute
Page 311
files. This could be used to read data files and write to them, or to access external microprocessor ports. This facility could be useful for real time applications. 12.3 Testing and Updating Testing is an important stage in the development of an expert system. The first stage of testing is to debug system errors. Every test run should keep track of all rules fired. This record should be examined for anomalies. Some other debugging features are suggested in Section 13.2.3. The next stage is to test the system's expertise. Possibly the ideal way to do this is to have available cases where the result can be checked by actual experimentation, or manual evaluation. An almost equally valuable testing is when there is a history of examples of expertise in the domain of the system, and the recommendations have been proven valid by their application. Not quite as definitive, but still excellent, is to have new test cases that are evaluated independently be one or more experts in the domain, and by the system. The final testing stage is to determine user acceptance by field trials. Does it do what the user expects? Does it solve the kinds of problems that the user thinks it will? Is it user friendly in its user interface? Does it give reports that are easy to interpret? Is the manual easy to use? Some feedback here will almost
Page 312
certaintly lead to some revisions of these aspects of the system. Testing and updating of an expert system must be an ongoing process in most expert systems, as experience with applications, and development of new knowledge proceeds. It is clearly important to design the system so that this can be done as easily as possible; and by different personnel than the original system designers. This requires well written code, and good documentation. The temptation to skimp on this aspect is almost overwhelming, because of cost and time pressures, but there is ample evidence of expert systems that have eventually died after substantial investment in them, because this guideline was not faithfully adhered to. References Bahill, A.T., Harris, P.N. and Senn, E. (1988). Lessons Learned Building Expert Systems, AI Expert, Vol. 3, No. 9, pp. 3645. Harmon, P., R. Maus, & Morrissey, W. (1988). Expert Systems: Tools & Applications, Wiley, N.Y. Press, L. (1988). Eight-Product Wrap-Up: PC Shells, AI Expert, Vol. 3, No. 9, pp. 6165.
Page 313
Appendix: Data Structures
Page 315
A.1 Stacks and Queues A stack is a list of items stored in a certain way as a convenience in handling data. It is a vector or array of numbers, or character strings. It is convenient to think of it physically as a stack of objects, and items are added by ''pushing them onto the stack". And items are removed by "popping them up from the stack". It is also called a last in first out list. It is thus used when a set of numbers are to be stored, and later used, or recalled from memory, by always recalling the last data item stored. A counter is used (STACK_SIZE) to keep track of how many items are in the stack at any time. Algorithm to Add an Item to a Stack Input
STACK_SIZE = number of items in stack STACK_MAX = maximum stack size LAST = index or subscript number for new data item to be stored in stack STACK(I) = array of pointers to data items in stack = subscript of data item in Ith position in stack
Procedure
1. STACK_SIZE = STACK_SIZE + 1 2. IF (STACK_SIZE < STACK_MAX) THEN STACK(STACK_SIZE) = LAST ELSE # Stack has overflowed # WRITE "Stack has overflowed" Stop ENDIF
Page 316
If DATA is the array of quantities that could be stored in the stack, in any order, then DATA(LAST) will now be at the top of the stack. There may not actually be any need to place a limit on the stack size. Algorithm to Remove an Item from the Stack Inputs
STACK(I) = array of pointers to data items in stack = subscript of data item in Ith position in stack STACK_SIZE = size of stack
Procedure
1. IF (STACK_SIZE 0) THEN NEW = STACK(STACK_SIZE) ELSE # Stack was empty # WRITE "Stack empty" STOP ENDIF 2. STACK_SIZE = STACK_SIZE -1
DATA(NEW) will be the data item popped from the stack. The contents of the stack can be determined at any time by using STACK, since it points to the subscript numbers for the data items in the array data that are in the stack. Algorithm for Transferring Contents of Stack to a New Array Input
STACK_SIZE = size of stack STACK(I) = array of pointers to elements in DATA that are in
Page 317 the stack
Procedure
DO for I=1 to STACK_SIZE NEW_DATA(I) = DATA(STACK(I)) ENDDO
Output
NEW_DATA(I) = array containing data elements from DATA in the stack
Each point in the stack may have items from more than one data array associated together. As long as they share the same subscript number, the single vector stack can be used to point to them all. Thus these parallel vectors could act as fields in a data record of, for example, a part specification. Queues are similar to stacks, except they are a first in first out special vector of data items. Thus items are added at one end, and removed from the other. The queue is most commonly used in circular form, with the number of elements in the queue set at a desired size, QUEUE_SIZE. QUEUE(I) is the vector that points to the data items actually stored in the queue. Pointers are used for the head and tail of the queue. The queue is considered empty when the head pointer is zero. An empty queue has both pointers at zero.
Page 318
Algorithm for Adding a Data Item to a QUEUE Input

QUEUE(I) = pointer to the Ith element in the data vector QUEUE_SIZE = preset maximum number of items that can be stored in the queue HEAD_POINT = pointer to the subscript of QUEUE at the head of the queue TAIL_POINT = pointer to the tail of the queue LAST = subscript of data array that is to be added to the back of the queue
Procedure
1. # Increment the tail pointer # TAIL_POINT = TAIL_POINT + 1 # If the tail pointer is greater than the maximum queue size, it is set to 1 # IF (TAIL_POINT > QUEUE) THEN TAIL_POINT = 1 ENDIF 2. # If the pointers are equal and greater than zero, then the queue has overflowed # IF (HEAD_POINT=TAIL_POINT AND HEAD_POINT 0) THEN WRITE "Queue has overflowed" STOP ELSE # Add item to queue # QUEUE(TAIL_POINT) = LAST ENDIF 3. # If the new item is the first to be placed in an empty queue, then set the head pointer equal to the tail pointer # IF(HEAD_POINT=0) THEN HEAD_POINT = TAIL_POINT ENDIF
Page 319
Algorithm for Removing an Element from the Head of a Queue Input

QUEUE(I) = array containing data pointers = pointer to the Ith element in the data vector QUEUE_SIZE = preset maximum number of items that can be stored in the queue HEAD_POINT = pointer to the subscript of QUEUE at the head of the queue TAIL_POINT = pointer to the tail of the queue
Procedure
1. # Make sure queue is not already empty # IF (HEAD_POINT = 0) THEN WRITE "Queue was already empty" STOP ELSE # Take the value from the head of the queue # NEW = QUEUE(HEAD_POINT) IF(HEAD_POINT=TAIL_POINT) THEN # Queue empty # HEAD_POINT = 0 # If queue size exceeds maximum it must be reset to 1# ELSEIF (HEAD_POINT > QUEUE_SIZE) THEN HEAD_POINT = 1 ELSE HEAD_POINT = HEAD_POINT + 1 ENDIF ENDIF
Output
NEW = subscript of data item to be taken from the queue
The array QUEUE points to all the data items in the queue, and could be used to dump the complete contents of the
Page 320
queue to a new data array, or to print them. A common example of using a queue would be for a running record of unprocessed orders in a factory. Again, the queue could actually contain a set of fields in each element, such as the customer name, item ordered, price, and so on. A.2 List Processing Special data structures are required if we wish to be able to add items to a list, or remove items, or associate items with items in another list, all in an unordered way. It is desirable to not have to maintain a strict order in lists, when we wish to be able to add and subtract items. This is done by using chained lists, which associate a pointer array with the data array or arrays. Each element of the pointer array has a subscript corresponding to the data array subscript, and points to the next data item. A variable , START_LIST, is required to record the subscript of the first data item in the list. So the actual data can be stored sequentially, but the correct order is recorded by the pointer array. A simple example of adding a new data item, DATA(NEW), between two adjacent items in the list, DATA(X) and DATA(Y), is shown in Figure A.1. As shown, the list has seven items, starting at DATA(3). The letters in the DATA cells represent any kind of data item, such as character strings, or numbers. The actual order of data items in
Page 321
Fig. A.1 Illustration of a simple linked list.
this list is C, A, E, G, B, F, D. Suppose that we wish to assign a new value to DATA(8), and insert it between cells 5 and 7. This can be done with two statements.
LINK(8) = LINK(5) # Which equals 7 # LINK(5) = 8
Generally, to insert a new data item, DATA(NEW), between adjacent cells X and Y, we have
LINK(NEW) = LINK(X) # Which equals Y # LINK(X) = NEW
We may also wish to remove data items, but still keep the list compact; it is therefore desirable to keep track of the empty cells, so that new items can be inserted into empty locations. instead of always at the end, as above. An additional available space list is used, which is also a chained list with an associated pointer list. We shall call these AVAIL_SPACE(I) and LINK_SPACE(I). A pointer is needed for the first item in the available space list, START_SPACE. The available space list must be initialized, before any data is stored, by the following steps.
Page 322 DO for I =1 to LIST_SIZE - 1 LINK_SPACE(I) = I + 1 ENDDO START_SPACE = 1 # Shows beginning of available space list # START_LIST = 0 # Shows list is empty #
Algorithm for Adding an Item to a List Input

X = pointer to some item in the list, after which we wish to insert the new item VALUE = value of data item to be inserted LINK_SPACE(I) = array of values pointing to Ith available space LINK(I) = array of values pointing to the next data item in the list START_SPACE = pointer to beginning of available space list START_LIST = pointer to first item in the data list
Procedure
1. # Check for any space available # IF (START_SPACE=0) Then WRITE "No space left in list" STOP ELSE 2. # Get a free cell from the available space list # NEXT = START_SPACE START_SPACE = LINK_SPACE(START_SPACE) 3. # Link to the freed cell # IF (START_LIST=0) THEN # Insert into empty list # START_LIST = NEXT LINK(NEXT) = 0 ELSE # Insert after X # LINK(NEXT) = LINK(X) LINK(X) = NEXT ENDIF
continues
Page 323
continued
4. # Enter the new data item # DATA(NEXT) = VALUE ENDIF
Algorithm for Removing an Item from a List Input

Y = pointer to some item in the list that is to be removed X = pointer to node preceding Y START_LIST = pointer to beginning of list START_SPACE = pointer to beginning of available space list LINK_SPACE(I) = array of values pointing to Ith available space LINK(I) = array of values pointing to the Ith data item
Procedure
1. # Check for empty list # IF (START_LIST = 0) THEN WRITE "List is empty" STOP ELSE IF (X=0) THEN # Remove first item # START_LIST = LINK(START_LIST) ELSE # Remove an interior item # LINK(X) = LINK(Y) # Return to available space # LINK_SPACE(Y) = START_SPACE START_SPACE = Y ENDIF ENDIF
When adding or removing items, we normally would specify the data item itself, rather than its pointer. We therefore need to be able to locate the pointer to a given
Page 324
data item in order to use the above procedures. One possible way to do this is to set up an additional array, POINT(I), which contains the pointer for the Ith data item. An alternative way is to loop through LINK(I) until the pointer to the particular data item is found. Pointer arrays can be used in many different ways to tie lists together. For example, networks can be built up by tieing an element at a node to the next node in the branch. A.3 Search Methods In working with lists, it is commonly necessary to find the location of an item in a list, or to determine if an item exists in a list. We shall, for the moment, ignore the question of linked lists; and assume that we are working with a simple list or array. The simplest way to do a search is by means of a sequential search of the list from the top, or bottom, until the match is found. Algorithm for Sequential Search Input
LIST(I) = array of items in the list, I=0 to N VALUE = specified value to be found LIST(0) is a dummy element in the array, used to eliminate an end of array test (I<1) in the DO WHILE loop.
Page 325
Procedure
1. LIST(0) = VALUE I=N 2. # Find specified value # DO WHILE (LIST(I) VALUE) I = I-1 ENDDO 3. # If search leaves loop, the value is not in the list # IF (I=0) THEN WRITE "Value specified is not in list" ENDIF ITEM = I
Output
ITEM = subscript of LIST(I) indicating the location of VALUE
The simple feature of using the dummy element LIST(0) gives a surprising reduction in search time (Horowitz and Sahni, 1983). The sequential search method is the only method available for unordered lists. However, if the list is ordered numerically for arrays containing numbers, or alphabetically for arrays containing character strings, then much more efficient search methods are possible. For larger lists it is therefore worth sorting the list into numerical or alphabetical order before the search is done. In the following section we shall look briefly at sorting methods. We shall only consider the binary search method for ordered lists; it is usually considered the simplest and best method for most applications. In this method, the
Page 326
specified value is first compared with the middle value of the sorted array. The result can be used to decide which half of the list contains the specified value. This is repeated with the selected half, and continued until the value is found. The following algorithm assumes that the order is from low to high in the array. Algorithm for Binary Search Input
LIST(I) = array of items in the list, I=1 to N VALUE = specified value to be found
Procedure
1. L = 1 U=N 2. # Search the indicated half of the subset # DO WHILE (L U) IF (L=U) THEN # Solution found # ITEM = L STOP ELSE M = (L+U)/2 # Index of middle item # IF (VALUE>LIST(M)) THEN L = M+1 ELSE U = M-1 ENDIF ENDIF ENDDO 3. # If all loops completed, then the value is not in list # WRITE "Value specified is not in list"
Output
ITEM = subscript of LIST(I) indicating location of VALUE
Page 327
A.4 Sorting We have noted above that sorting of a list is a useful adjunct to searching. There are many other applications of sorting in data management work. One common one is the converting of the order of a set of records containing associated lists, or fields. In this application the ordering is changed from one related to one field, to ordering related to another field. An example would be a file of employee names with associated salaries. The file might first be ordered alphabetically by names, and then resorted to one ordered numerically by salary level. There are many sorting algorithms, and each may have advantages for special applications. We shall first look at the algorithm for a very simple method, called the insertion method, which uses the idea of first ordering the first two elements; then inserting the third in its proper position relative to the first two; then inserting the fourth in order with the first three; and so until all elements are ordered. Algorithm for the Insertion Method of Sorting Input
LIST(I) = array of items in the list, I= 0 to N SMALL = any arbitrary number known to be smaller than any in the list LIST(0) is a dummy element having the value of SMALL
Page 328
Procedure
1. # Loop through all items after first # DO for I=2 to N TEMP = LIST(I) J=I-1 2. # Insert the current element in the ordered location # DO WHILE (TEMP<LIST(J)) LIST(J+1) = LIST(J) J=J-1 ENDDO LIST(J+1) = TEMP ENDDO
Output
LIST(I) = sorted array of items in the list, i=1 to N
The insertion method is very good for fairly short lists, less than 20 to 25 items. We shall next examine the quicksort method, said to have an excellent average efficiency for general purpose applications. The use of recursion gives the most elegant algorithm in this method. If the language being used does not allow recursion, then it must be simulated (Wagner, 1980). Algorithm fot the Quicksort Method of Sorting Input
LIST(I) = array of items in the list, I=1 to N+1 LIST(N+1) = LARGE, where LARGE is an arbitrary number greater than all list items
Page 329
Procedure Main
P=1 Q=N CALL QUICKSORT(P,Q) END
Procedure QUICKSORT(P,Q)
1. IF (P<Q) THEN I=P J=Q+1 TEMP = LIST(P) 2. I=I+1 IF (LIST(I)<TEMP) THEN GO TO 2 ENDIF 3. J=J-1 IF (LIST(J)>TEMP) THEN GO TO 3 ENDIF 4. IF (I<J) THEN CALL SWITCH(LIST(I),LIST(J)) ENDIF CALL SWITCH(LIST(P),LIST(J)) CALL QUICKSORT(P,J-1) CALL QUICKSORT(J+1,Q) ENDIF 5. RETURN
Procedure SWITCH(A,B)
1. # Switches values in A and B # TEMP = A A=B B = TEMP RETURN
Page 330
A.5 Searching and Sorting with Chained Lists If chained or linked lists are being worked with, then the search and sort algorithms can be adapted so that instead of the actual data array values being changed, the link array is changed. In searching, the algorithm will find the link array value that gives the subscript of the data value required. Similarly, in sorting, the solution gives the array values linked together in the proper order, while leaving the actual data array unsorted; and, in fact, unchanged physically. The following algorithm illustrates the concept for the binary search method. It would be used when the LIST array has been sorted, and the correct LIST locations are defined by the pointer array LINK. This situation is illustrated by the following simple example.
I = 1, 2, 3, 4, 5, 6 LINK(I) = 5, 6, 2, 0, 4, 1 LIST(I) = 8, 3, 2, 15, 9, 5 START_SPACE = 3
Algorithm for Binary Search with a Chained List Input

LIST(I) = array of items in the list, I=1 to N LINK(I) = array of pointers, I=1 to N START_SPACE = pointer to first item on the data list VALUE = specified value to be found
Page 331
Procedure
1. L = 1 U=N START = START_SPACE KOUNT = 1 2. # Search the indicated half of the subset # DO WHILE (L U) IF (L=U) THEN # Solution found # ITEM = LIST(K) STOP ELSE M = (L+U)/2 # Index of middle item # # Find mid-value by linear search # K = START 3. KOUNT = KOUNT + 1 K = LINK(K) IF(KOUNT M) THEN GO TO 3 ENDIF # K now defines the mid-value in LIST # 4. IF (VALUE>LIST(K)) THEN L = KOUNT + 1 START = LINK(K) ELSE U = M-1 ENDIF ENDIF ENDDO 5. # If all loops completed, then the value is not in list # WRITE "Value specified is not in list"
Output
ITEM = subscript of LIST(I) indicating the location of VALUE
Page 332
References Horowitz, E. and Sahni, S. (1983), Fundamentals of Data Structures, Computer Science Press, Rockville, Maryland. Wagner, J.L. (1980). FORTRAN77: Principles of Programming, Wiley, N.Y. Problems A.1 Develop an algorithm for the insertion sort method when a linked list is being used. Translate the algorithm to computer code in any language, and test it on a simple example.
Page 333
Index
A Abduction, 11 Abstraction, 108 Aesthetics, 22 Animal identification system, 66, 128, 190, 192 Applications, 26 Arc, 122 Artificial intelligence, 3, 29 Attribute rules, 189 B Backward chaining, 105 Bayes' theorem, 150, 181 in expert systems, 168, 180 Bearing example, 269 Belief functions, 181 Bit matching, 194, 259 Boolean methods, 11,223 multiple functions, 250 postulates, 227 uncertainty with, 261 Boolean polynomial, 229 Breadth first search, 102 C Capabilities of an expert system, 50 Classification network, 197 Complex logic systems, 40 Conclusions, 47 Conflict resolution, 105, 107 Consensus method, 244 Constraints, 275 Control system (structure), 47, 87, 99 abstraction, 108 breadth first, 102 decomposition, 108 depth first, 103 exhaustive search, 101 forward chaining, 105
pruning, 107 selection of, 113 sequential iteration, 72, 101 structuring, 106 using bit matching, 194 Cost of expert systems, 309 D Data structures, 315 in expert systems, 121, 137, 308 Debugging expert systems, 304 Deep reasoning, 38 Demon, 134 Depth first search, 103 Design, use of expert systems in, 18, 269 Dynamic memory allocation, 14 E Engineering modeling, 18, 37 Entropy, 198 Evidence, 47 Expertise, 47 F Frames, 130, 295, 305 FORTH, 14 Forward chaining, 105 Fuzzy sets, 181 G Games, 6, 111 Graphics display, 303, 306 H Hard wired systems, 112 Heuristic search, 12, 109 Hierarchical systems, 8, 189 I ID3 algorithm, 195 IF/THEN rules, 43

Expert

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Expert

Uploaded by

Copyright:

Available Formats

title: author: publisher: isbn10 | asin: print isbn13: ebook isbn13: language: subject publication date: lcc: ddc:

Expert Systems for Engineers

1 Introduction to Applied Artificial Intelligence

1 For detailed information on Boolean algebra, see Chapter 11.

Page 13 z = y2 + sin(y) output z

and f1, is an argument of a function f2 defined by

(table continued on next page)

(table continued from previous page)

(table continued on next page)

(table continued from previous page)

(table continued on next page)

(table continued from previous page)

2 Introduction to Expert Systems

3 The General Structure of Expert Systems

4 Phases of Developing an Expert System

Fig. 5.2 Class hierarchy for bearings.

Fig. 5.3 Fault tree for failure modes of a power plant.

Fig. 5.3 Continued

Fig. 5.4 Logic symbols.

6 Inputs, Intermediate Conclusions, and Outputs

NODTOT NODTOT NODTOT NODTOT ARCTOT ARCTOT ARCTOT

CONTOT INPFRS RULFRS CONFRS ORUNK arc

Page 125 True values of MESSAG(I)

9 Expert Systems Incorporating Uncertainty

The "OR" combination will have the form

If we have two events A1 and A2 then

If the events are mutually exclusive, then

The general AND combination expression is

The general AND rule for n dependent events is the following.

So Bayes' theorem can be thought of as a means of updating

And now (9.2.15) has the form

It may be easier to estimate P(E|C) and

rather than P(E).

If all Ei's are independent, then this equation reduces to

If the node is an OR gate, then the expression for P(E) is

can be treated just like P(E1E2 . . .En) in equation (9.3.5).

Page 168 DO for N=1 to NO_RULES PN = NO_FIRES(N)/NSAMP ENDDO 8. END

We also have the following relationships by the definition of probability.

It is important to note that P[C|E] and

Fig. 9.2 Transmission of events through a network.

Fig. 9.3 Obtaining the probability of the premise event of a node.

Fig. 9.4 Converting an input uncertainty to a probability.

And after the last updating,

The expert must provide estimates of all P(Ei|C) and

Thus the probability of an OR rule firing would be

Another example, which is non-diagnostic, is from material

10 Machine Learning Expert Systems

Fig. 10.1 Classification network.

And the gain in information from using Ek is

or until there is no increase in total entropy. We thus

Fig. 10.2 Segment of a network to illustrate the generation of a rule.

problem is to terminate a branch at some preset probability for

less than one, say 0.95.

Equation (10.3.8) becomes

The Boolean algebra codification is

Fig. 11.1 AND type logic gate.

are applicable can we conclude that

which represents the truth or logic statement

whereas in Boolean algebra,

An alternative way of expressing this is

where Eij represents the jth variable in the ith term. An

, and D or , and so on.

, and otherwise use the