Professional Documents
Culture Documents
Table of Contents
1.0 Summary: 3
2.0 Introduction: 4
3.0 Basic Concepts and Terminology: 5
4.0 System Model: 7
5.0 Solution Approach: 9
5.1 Model 1 - Duplex control system 9
5.2 Model 2 — Triplex Control System (2of3) 10
5.3 Model 3 — Triplex Control System (1 of 3) 11
6.0 Simulation Results: 12
6.1 Input Section 12
6.2 Output Section 12
6.3 Model 1 - Duplex control system. 12
6.4 Model 2 — Triplex Control System (2of3) 12
6.5 Model 3 — Triplex Control System (1 of 3) 12
7.0 Conclusions: 13
8.0 Additional Research 14
9.0 Appendix A 15
10.0 Appendix B 17
11.0 Appendix C 18
12.0 References: 19
04/30/01 Page 2 of 19
CPRE 545 Project Turbine Control System Reliability Analysis Greg Johnson
1.0 Summary:
This paper is composed of five sections. The first describes a common problem with reliability analysis in
complex real time control systems. The second provides a description of a typical real time control system
and lists the individual elements which will be included in the model. The third section describes the Fault
Tree and Markov models used for the reliability analysis. The fourth section compares the results from the
various simulations. The fifth section describes additional research that could be done in this area and
comments on some of the limitation of the research included in this paper.
04/30/01 Page 3 of 19
CPRE 545 Project Turbine Control System Reliability Analysis Greg Johnson
2.0 Introduction:
When a control system is analyzed for reliability it is typical for the analysis to calculate the
reliability of the control system. The analysis may also include the comparison of the reliability of a
duplex system against a triplex system. These analysis use concrete mathematical comparisons
and, depending on the complexity of the model, may include repair rates and other various
assumptions. These analysis do not however, account for the complex I/O environment
surrounding the control system. It is the purpose of this report to generate a comparison of a
duplex and triplex system including the external elements necessary for complete real time control
of a process.
A control system can generally be defined as consisting of three components: the operating
environment, the controlled system and the controlling system'. This paper will expand the
reliability analysis to include the controlled system as well as the controlling system. This will
include the sensors which monitor the operating environment, the electronics which read these
sensors and perform the control algorithms, and the actuators which modify the operating
environment. We will refer to the system which includes the controlled system and the controlling
system as the Complete System.
Sensors
Operating Operating
Environment Environment
Figure 1
The purpose of this project is to determine the effect on the overall system reliability of a Complete
System due to the changes in the reliability of the electronic controlling system. Specifically this
paper will compare the reliability of a duplex control system vs. a triplex control system. The
control system used for this study will be a typical steam turbine and compressor train.
There are many documents and papers showing the theoretical values and reliability increases
gained by increasing the redundancy of the control system from a simplex system to a duplex
system. There are similar documents showing the theoretical increase in reliability of a triplex
system compared to a duplex system. However most of these studies limit the scope of the system
to those components directly contained in the control system.
04/30/01 Page 4 of 19
CPRE 545 Project Turbine Control System Reliability Analysis Greg Johnson
As an example of this sequence we will consider a simple switching power supply. In this example a
resistor fails causing the power supply to fail. The power supply powers the current loop for a pressure
signal transmitter. As the voltage dropped to zero, the pressure signal would ramp to zero also. The
control system would respond as if the pressure had actually changed to the incorrect value. In this
example, the resistor causes the output from a switching power supply to go to zero volts. This is the
failure. This failure propagates to cause the signal from the transmitter to incorrectly indicate the state of
the operating environment, this is the error. The control system responds according to what the erroneous
data indicates, this is the fault.
The system can address these three stages of a fault at any of the stages. If it is done at the Failure stage,
it is considered to be Fault Avoidance. If it is resolved at the Error stage, it is Fault Masking. And if it is
done at the Fault stage, it is considered Fault Tolerance.
As each of these are enhanced, Fault Avoidance, Fault Masking and Fault Tolerance, the system reliability
is increased. In the above example, the resistor could have been sized differently, increased from 1/4 watt to
1 watt, so that there was less chance of it failing in the circuit. This would be an example of Fault
Avoidance, and is the primary design practice responsible for the development of a reliable system.
If the power supply was in parallel with another power supply, the failure of the first power supply would be
unnoticed by the transmitter. This is fault masking. This method of providing increased reliability always
requires some level of hardware redundancy.
Taking this example further, as the transmitter fails due to the loss of the power supply, the control software
could detect the failure and switch to another transmitter or use a simplified algorithm which allows for
acceptable control without this particular transmitter. Both of these solutions incorporate some type of
redundancy. The alternate transmitter obviously includes a redundant transmitter while the simplified
algorithm requires a level of software redundancy — there need to be two different algorithms available to
the control system.
This paper is not intended to address the methods of providing fault tolerance or the associated cost issues,
but certainly the methods used to provide redundancy can dramatically affect the overall system cost.
Redundant transmitters can be expensive, while redundant algorithms can provide equivalent or certainly
adequate levels of fault tolerance.
In common literature Fault Masking and Fault Tolerance are lumped together and considered to be Fault
Tolerance. While these two techniques may seem similar, the significant difference is in the ability of the
system to diagnose the failure. In the case of Fault Masking the system may not be aware that a Failure
has occurred. For example a memory system with error correction codes will correct an internal ram failure
but will not indicate that there is a problem. In this case no maintenance will be done and the system is in a
less reliable state than before.
04/30/01 Page 5 of 19
CPRE 545 Project Turbine Control System Reliability Analysis Greg Johnson
To help clarify some of these terms, this paper will make use of the term Dependability. This term is used
in place of the term reliability to help resolve the confusion caused when reliability is used as a concrete
mathematical attribute of a system (system reliability) and also as a general description of a system (a
reliable system). This term is introduced and further defined in an IEEE paper titled Dependable Computing
from Concepts to Design Diversity'. A dependable system includes the attributes of reliability, availability,
and maintainability. While these latter attributes have defined mathematical definitions the concept of a
Dependable system is a combination of these that provide for the overall quality of the service provided by
the system.
04/30/01 Page 6 of 19
CPRE 545 Project Turbine Control System Reliability Analysis Greg Johnson
04/30/01 Page 7 of 19
CPRE 545 Project Turbine Control System Reliability Analysis Greg Johnson
The control system was divided into three sections, the processor, the input output board and the
power supply. These models assume that the processor, input output board and power supply are
independent. This means that a failure of one of these sub-systems does not depend on or cause
the failure of another sub-system. For example in the case of the triplex system (2 out of 3) there
could be one failed processor, one failed input output board and one failed power supply. The
system was modeled this way because it provides the most comprehensive set of test data, the
alternative case, where a processor is paired with a specific input output board is a sub-set of this
case and is easily determined from the data in this paper.
04/30/01 Page 8 of 19
CPRE 545 Project Turbine Control System Reliability Analysis Greg Johnson
A A A 0 0
The assumption in this model is that there is 100% diagnostics of the redundant elements. This
means that if a processor fails, the backup unit always assumes control if it is available. The same
diagram is used for the IOC and the power supply. These can be seen in appendix C.
04/30/01 Page 9 of 19
CPRE 545 Project Turbine Control System Reliability Analysis Greg Johnson
2 ot 3
I
Q
- - -
pi of Pro2o3 11 of 10 Cio3 p01 of Power
Fos hoard proc hoard proc hoard
A A A 0 0 0
In this model the IOC and the Processor modules are considered failed if 2 of the three units are
failed. The power supply module is duplex.
04/30/01 Page 10 of 19
CPRE 545 Project Turbine Control System Reliability Analysis Greg Johnson
locr3
I 3. cd 3
*
0 0 0 0 0 0
04/30/01 Page 11 of 19
CPRE 545 Project Turbine Control System Reliability Analysis Greg Johnson
04/30/01 Page 12 of 19
CPRE 545 Project Turbine Control System Reliability Analysis - Greg Johnson
7M Conclusions:
The results provided two very interesting results. The first is that a duplex control system provides better
reliability than a triplex (2 of 3) control system. From an analysis of the Markov map for the two systems it
can be seen that this is due to the increased probability of an additional failure on the triplex systems during
the time that the first module has failed and before it can be repaired. During this time the duplex system
has only one additional module, while the triplex system has two additional modules. Since a single
additional failure on either system will cause the failure of the system, the triplex system has twice the
probability of experiencing this additional failure.
The triplex (1 of 3) control system provides a significant reliability gain for the control system. The Mean
Time to Failure (MTTF) for all the systems is calculated as follows:
—1000
MTTF =
ln(reliability)
For the triplex system (1 of 3) the MTTF is 9359 years. The duplex system has a MTTF of 363 years. This
appears to be a significant difference. And at first glance it would appear that the Triplex (1 of 3) system is
significantly better than the duplex system. However when you incorporate the reliability of the inputs and
outputs into the complete system, you see that the MTTF of the triplex (1 of 3) system drops to 2.027 years
while the MTTF of the duplex system drops to 2.016 years.
For all three systems, the MTTF is within 20 days. This result clearly indicates that for complete systems
which have repairable modules, there is no significant reliability difference between either duplex, triplex (1
of 3 ) or triplex (2 of 3).
It also indicates that the reliability of the sensor elements is significant in determining the overall
dependability of the final complete control system. Any enhancements, whether they be adding transmitter
redundancy, decreasing individual components in the I/O loop, or adding software algorithms capable of
dealing with sensor failures will significantly improve the overall complete control system dependability.
04/30/01 Page 13 of 19
CPRE 545 Project Turbine Control System Reliability Analysis Greg Johnson
04/30/01 Page 14 of 19
CPRE 545 Project Turbine Control System Reliability Analysis Greg Johnson
4. HIMAP would often crash or just shutdown. For example when trying to view the Systemlo3
Markov map the HIMAP will unload.
04/30/01 Page 15 of 19
CPRE 545 Project Turbine Control System Reliability Analysis Greg Johnson
10.0 Appendix A
System Diagram — P&I Drawing
04/30/01 Page 16 of 19
CPRE 545 Project Turbine Control System Reliability Analysis Greg Johnson
11.0 Appendix B
Component Reliability Values.
04/30/01 Page 17 of 19
Entire Project2compressorsrev2.vsd
Li_ Li-
C \I
cS
23 23
0
u,
ci
o
o
c,
Venture Valve 0.9987507809246
N.- C;
X x
800000
I FQ1Magnetic Speed Pickup 0.9990004998334 0.000001
Magnetic Speed Pickup 0.9990004998334 - 0.000001 7
c.1 el
Magnetic SpeedPickup 0.9990004998334
..
cr
w
Flow Transmitter 0.9982471520706
C,
rn c
NI Ns
`7" "T
w
D (9
Lo Lo
Flow Transmitter 0.9982471520706
I
.
i
,
XX XX X H
.- ,
, ID
... Lo
—
C \I
cD
U)
cu
-6
8
0
0
- e)- 0
,-
o
c,
0.9987507809246 800000
CNI Lo
o
o
0
0
,-
ci
o
I to Iisolator 0.9987507809246
'a
cu
0
0.9987507809246 0.00000125
—
CO
co
0
0
0
0
u) u)
0
0
0 0
XIXX
I.
0.9987507809246 0.00000125
x x
CU
—
CNJ
O
0
8
0
o
0.9987507809246 800000
O O
CV
0
00
0o 0
d d
I to Iisolator 0.9987507809246
x x x x
0 'r-
XX
x
•t- (-e•-•NI
to Iisolator 0.9987507809246 0.00000125
:).
—
03
—
CO
cp
u)
..-.
0
8
XX
L 0.9987507809246
°
800000
C,
CO
c,-)
x
I to P converter 0.9966722160545 ° CO
\I °
300000
C
a. a. a
CO
IL w
° CO CY)
to CO CO
CO
co
6 (-6 cri
0 co ct,
X X
9
x
i_ F_
C \I
X
i_
Cc)
0- 0- 0- 0-
X
NI-
c>
in
X
7
X
CAF-
C.) i
X
ri
Temperature Transmitter L0.9982471520706L 1.75439E-06 570000
x
I ,.<
Verified
—
MTBF
CL
RS
vs
7>
.-
to
Description
0)
1000 In HIMAP
•
1.75439E-06 570000
•
• ,
L
LLJ
A—
0.9982471520706
■ ,
• 1 1
XI X .
I I
Temperature Transmitter 0.9982471520706 J
D
AntiSurge Controller
. V, ‘1- 0 CV
I
0
5
AntiSurge Controller
O
O
O
O
c
Cs1
0,
,O
cp cp
0
in
0
i
I
,
100feet of wire 0.9995001249792 _ 1 1.
C \I
C.0
•cr
67000
(0
07,
Lo
co
Lo
co
O
O
1.49254E-05
_
C.)
to •
0
InputOutputCard
,
,
2.85714E-06
,
x x•x ,,
I
.
,
,
.
.
1
I
I
,
•
1
i
.
:
I
1
1
Definitions
1
I
12.0 Appendix C
HIMAP output
04/30/01 Page 18 of 19
Systemlo2.BRE
Page 1
Conlo2.rel
Control Block
1 of 2 with diagnostics
All modules are repairable
1000 hour mission time
1 1
MISSION 1
PHASE1
REL: .999685878157D+00
UNREL: .314121842780D-03
Page 1
1-1.1.X4 A P tir V1,-,e1,a1;rin. A P•arlr.mr. TIONT Trs.u.1 Ct•af. I InivPrcitv
HIMAP- Hierarchical Modeling Analysis Package DCNL Iowa State University
HIMAP- HIerarchical Modeling Analysis Package DCNL Iowa State University
HIMAP- HIerarchical Modeling Analysis Package DCNL Iowa State University
HIMAP- HIerarchical Modeling Analysis Package DCNL Iowa State University
System2o3.BRE
Page 1
Con2o3.rel
Control Block
Triplex 2 of 3 system
all modules repairable
1000 hour mission time
1 1
MISSION 1
PHASE1
REL: .999082138176D+00
UNREL: .917861823644D-03
Page 1
HIMAP- Hierarchical Modeling Analysis Package DCNL Iowa State University
HIMAP- HIerarchical Modeling Analysis Packqap nrivr. . Qt•Ito T inivorcity
Tem,
HIMAP- HIerarchical Modeling Analysis Package DCNL Iowa State University
HIMAP- HIerarchical Modeling Analysis Package DCNL Iowa State University
k. •
HIMAP- Hierarchical Modeline Analysis Package TICTSt. • Inva,-. T In;cror-cits,
Systemlo3.BRE
Page 1
Conlo3.rel
Control Block
1 of 3 Triplex System
All modules repairable
1000 hour mission time
1 1
MISSION 1
PHASE1
REL: .999987803755D+00'
UNREL: .121962451197D-04
Page 1
HIMAP- HIerarchical Modeling Analysis Package DCNL Iowa State University
HIMAP- HIerarchical Modeling Analysis Package DCNL Iowa State University
proc board proc board proc board
Input Block
Repair for MPU's
1 of 3 for MPU's
All other signals are simplex
1000 hour mission time
MISSION 1
PHASE1
REL: .959784274949D+00
UNREL: .402157250515D-01
Page 1
Outputs.rel
Output Block
No Repair
Valves are Simplex
1000 hour mission time
1 1
MISSION 1
PHASE1
REL: .984865702249D+00
UNREL: .1513429775100-01
Page 1
Input Plocb
A
PTS of PT
A A A A
0 0 0 0
0 0 0 0
HIMAP- HIerarchical Modeling Analysis Package :1)CNL ' Iowa State University
- -
0 0 0 0
HIMAP- Hierarchical Modeling Analysis Package nCNT Inura cf•.1-1. T !In ;v. r-c; t
SIC output UICI ou puts UIC2 outputs
0 0 0 0 0