You are on page 1of 12

BETM 4613

RELIABILITY, MAINTAINABILITY & RISK

SEM 1 2014/2015

CHAPTER 1
The History of Reliability and Safety
Technology

Failure Data

CHAPTER 1 :The History of Reliability


and Safety Technology

Throughout the history of engineering, reliability improvement (also called reliability


growth), arising as a natural consequence of the analysis of failure, has long been a
central feature of development.
This test and correct principle was practiced long before the development of formal
procedures for data collection and analysis for the reason that failure is usually selfandof
thus
leads, inevitably,
to design
modifications.
evident
The design
safety-related
systems
(for example,
railway signaling) has evolved partly
in response to the emergence of new technologies but largely as a result of lessons
learnt from failures.
The application of technology to hazardous areas requires the formal application of this
feedback principle in order to maximize the rate of reliability improvement.
Nevertheless, as mentioned above, all engineered products will exhibit some degree of
reliability growth even without formal improvement programs.
Nineteenth- and early twentieth-century designs were less severely constrained by the
cost and schedule pressures of today. Thus, in many cases, high levels of reliability were
achieved as a result of over-design. The need for quantified reliability assessment
techniques during the design and development phase was not therefore identified.

Therefore, failure rates of engineered components were not required, as they are now, for use
in prediction techniques and consequently there was little incentive for the formal collection
of failure data.
Another factor, component parts were individually fabricated in a craft environment.
The advent of the electronic age, accelerated by the Second World War, led to the need for
more complex mass-produced component parts with a higher degree of variability in the
parameters and dimensions involved.
The experience of poor field reliability of military equipment throughout the 1940s and 1950s
focused attention on the need for more formal methods of reliability engineering. This gave
rise to the collection of failure information from both the field and from the interpretation of
test data.
Failure rate databanks were created in the mid-1960s as a result of work at such
organizations as UKAEA (UK Atomic Energy Authority) and RRE (Royal Radar Establishment,
and RADC (Rome Air Development Corporation, US).
UK)
The availability and low cost of desktop personal computing (PC) facilities, together with
versatile and powerful software packages, has permitted the listing and manipulation of
incident data with an order of magnitude less effort. Fast automatic sorting of data
encourages the analysis of failures into failure modes.
With the rapid growth of built-in test and diagnostic features in equipment, a future trend
ought to be the emergence of automated fault reporting.

Hazardous Failures
In the early 1970s the process industries became aware that, with larger plants involving
higher inventories of hazardous material, the practice of learning by mistakes was no longer
acceptable.
Methods were developed for identifying hazards and for quantifying the consequences of
failures. They were evolved largely to assist in the decision-making process when developing
plants.there
External
to identify
were
to come
or
By modifying
the mid-1970s
was pressures
already concern
overand
thequantify
lack of risk
formal
controls
forlater.
regulating
those activities which could lead to incidents having a major impact on the health and safety
of the general public.
The Flixborough incident in June 1974 resulted in 28 deaths and focused public and media
attention on this area of technology. Successive events such as the tragedy at Seveso in Italy
in 1976 right through to the Piper Alpha offshore and more recent Paddington rail and Texaco
incidents
have kept the
thatpredicted
interest alive
and resulted
in guidance
and legislation
Oil
TheRefinery
techniques
for quantifying
frequency
of failures
were originally
applied to
assessing plant availability, where the cost of equipment failure was the prime concern. Over
the last twenty years these techniques have also been used for hazard assessment.
Maximum tolerable risks of fatality have been established according to the nature of the risk
and the potential number of fatalities. These are then assessed using reliability techniques.

Reliability and Risk Prediction


System modeling, using failure mode analysis and fault tree analysis methods, has been
developed over the last thirty years and now involves numerous software tools which enable
predictions to be updated and refined throughout the design cycle.
The criticality of the failure rates of specific component parts can be assessed and, by
successive computer runs, adjustments to the design configuration (e.g. redundancy) and to
the maintenance philosophy (e.g. proof test frequencies) can be made early in the design cycle
in order to optimize reliability and availability.

The need for failure rate data to support these predictions has therefore increased and
Chapter 4 examines the range of data sources and addresses the problem of variability within
and between them.

Figure 1.1 illustrates the relationship between a


component failure rate based reliability or risk
prediction and the eventual field performance. In
practice, prediction addresses the component-based
design reliability, and it is necessary to take account of
the additional factors when assessing the integrity of a
system.
Figure 1.1: Design v. achieved reliability

In fact, Figure 1.1 gives some perspective to the idea of reliability growth. The design
reliability is likely to be the figure suggested by a prediction exercise. However, there will be
many sources of failure in addition to the simple random hardware failures predicted in this
way.
Thus the achieved reliability of a new product or system is likely to be an order, or even more,
less than the design reliability. Reliability growth is the improvement that takes place as
modifications are made as a result of field failure information. A well-established item, perhaps
with tens of thousands of field hours, might start to approach the design reliability. Section
12.3 deals with methods of plotting and extrapolating reliability growth.

Achieving Reliability and Safety-Integrity


characteristics of design and construction
Complexity: the fewer component parts and the fewer types of material used then, in
general, the greater is the likelihood of a reliable item
Duplication/replication: the use of additional, redundant, parts whereby a single failure does
not cause the overall system to fail is a method of achieving reliability. It is probably the
major design feature that determines the order of reliability that can be obtained.
Nevertheless, it adds capital cost, weight, maintenance and power consumption.
Furthermore, reliability improvement from redundancy often affects one failure mode at the
expense of another type of failure.
Excess strength: deliberate design to withstand stresses higher than are anticipated will
reduce failure rates. Small increases in strength for a given anticipated stress result in
substantial improvements. This applies equally to mechanical and electrical items. Modern
commercial pressures lead to the optimization of tolerance and stress margins that just
meet the functional requirement. The probability of the tolerance-related failures
mentioned above isThe
thuslatter
further
increased.
two
of the above methods are costly
and, the cost of reliability improvements needs to
be paid for by a reduction in failure and operating
costs.

Three main areas (activities) in achieving reliability, safety and maintainability


1. Design:
reduction in complexity
duplication to provide fault tolerance
derating of stress factors
qualification testing and design review
feedback of failure information to provide reliability growth.
2. Manufacture:
control of materials, methods, changes
control of work methods and standards.
3. Field use:
adequate operating and maintenance instructions
feedback of field failure information
proof testing to reveal dormant failures
replacement and spares strategies (e.g. early replacement of items with a known
wearout characteristic).
It is much more difficult, and expensive, to add reliability/safety after the design stage. The
quantified parameters, dealt with in Chapter 2, must be part of the design specification and can no
more sensibly be specified retrospectively than power consumption, weight, signal-to-noise ratio,
etc.

The RAMS Cycle

The life-cycle model shown in Figure 1.2 provides a


visual link between RAMS activities and a typical
design cycle.
The feedback loops shown in Figure 1.2 represent RAMS-related
activities as follows:
A review of the system RAMS feasibility calculations against the initial
RAMS targets
(loop [1]).
A formal (documented) review of the conceptual design RAMS
predictions against the RAMS targets (loop [2]).
A formal (documented) review, of the detailed design, against the
RAMS targets (loop [3]).
A formal (documented) design review of the RAMS tests, at the end of
design and development, against the requirements (loop [4]). This is
the first opportunity (usually somewhat limited) for some level of real
demonstration of the project/contractual requirements.
A formal review of the acceptance demonstration, which involves
RAMS tests against the requirements (loop [5]). These are frequently
carried out before delivery but would preferably be extended into, or
even totally conducted in, the field (loop [6]).
An ongoing review of field RAMS performance against the targets
(loops [7,8,9]) including subsequent improvements.

Figure 1.2: RAMS-cycle model

Not every one of the above review loops will be applied to


each contract and the extent of review will depend on the size
and type of project.

Test, although shown as a single box in this simple RAMS-cycle


model, will usually involve a test hierarchy consisting of
component, module, subsystem and system tests. These must
be described in the project documentation.

The maintenance strategy (i.e. maintenance program) is


relevant to RAMS since both preventive and corrective
maintenance affect reliability and availability. Repair times
influence unavailability as do preventive maintenance
parameters. Loop [10] shows that maintenance is considered
at the design stage where it will impact on the RAMS
predictions. At this point the RAMS predictions can begin to
influence the planning of maintenance strategy (e.g. periodic
replacements/overhauls, proof-test inspections, auto-test
intervals, spares levels, number of repair crews).

For completeness, the RAMS-cycle model also shows the


feedback of field data into a reliability growth programme and
into the maintenance strategy (loops [8], [9] and [11]).
Sometimes the growth program is a contractual requirement
and it may involve targetsbeyond those in the original design
specification.

Contractual and Legal Pressures


it is now common for reliability (including safety) parameters to be specified in invitations to
tender and other contractual documents.
Failure rates, probabilities of failure on demand, availabilities, and so on, are specified and
quantified for both cost- and safety-related failure modes.
This is for two main reasons:
1. Cost of failure: failure may lead to huge penalty costs. The halting of industrial processes can
involve the loss of millions of pounds per week. Rail and other transport failures can each involve
hundreds of thousands of pounds in penalty costs. Therefore system availability is frequently
specified as part of the functional requirements.
2. Legal implications: there are various legal and implied legal reasons (Chapters 1921),
including fear of litigation, for specifying safety-related parameters (e.g. failure rates, safety
integrity levels) in contracts.

END

You might also like