You are on page 1of 28

Software Reliability

Engineering

The Patriot Missile


Used to destroy incoming
Iraqi Scud Missiles
Hailed for effectiveness
Operated for 100
consecutive hours
28 American soldiers
killed
Cause: Software Failure

The Patriot Missile


A Learning Experience
The software can be redesigned
A new Patriot Missile can be built
The fate of the 28 soldiers remains the
same
THE MORAL: Software Engineers
need to find a way to engineer reliability
into software.

Objectives
Definition of Software Reliability
Importance of Reliability Engineering
Why Reliability Engineering is Difficult
Reliability Engineering Processes
Weibull
Musa
Monte Carlo

Conclusion

What is Software Reliability?


IEEE Definition:
The ability of a system or component to perform its
required functions under stated conditions for a
specified period of time.
Definition allows for Just Right level of
reliability for software
Software Reliability and Hardware Reliability
have the same definition

Why is Software Reliability


Important?
Manager View
Reliable software means satisfied customers
Reliable software means repeat customers
Reliable software is ethical
Legal liability

Customer View
Reliable software saves time
Reliable software increases efficiency

Why Software Reliability is


Difficult to Calculate
Without considering program evolution, failure rate
is statistically non existent

There are many possible causes for design defects


for failures to arise from

Why Software Reliability is


Difficult to Calculate
Errors can occur without warning
Cannot improve software quality if identical
software components are used
Periodic restarts can sometimes help fix problems
Errors are caused by incorrect logic, incorrect
statements, or incorrect input data
Software may require infinite testing
Software reliability models do not always fit the
data points well

Over View
There are many models to chose from when
calculating software reliability
Focus on three
Weibull Failure Time Model
Musas Basic Execution Time Model
Monte Carlo Simulation

Of all the models, each has strengths and


limitations

Weibull Failure Time

About Weibull Failure Model


Used to model failure processes of hardware
One of the first models to be applied to
software reliability modeling
Flexible accommodates increasing,
decreasing or constant failure rates

Weibull Failure Model


Weibull Failure Model Assumptions:
There are a fixed number of faults in the
software being tested
The number of faults are detected in time
intervals ((t=0, t1), (t1,t2).)

Limitations:
Flexibility allows for greater chance of making
the wrong assumption

Weibull Failure Model Example


Notice how the model follows the actual
data

Musa

About Musas Basic Time


Execution Model
Developed by John Musa of AT&T Bell
Laboratories
One of the first models to use actual
execution time of software components
versus calendar time
Time between failures is expressed in terms
of CPU time

Musas Basic Time Execution


Model
Uses a Poisson Distribution
Model Assumptions:
The execution times between failures is exponentially
distributed
The hazard rate for a single fault is constant

Limitations:
Assumes new faults are not introduced after correction
Assumes number of faults decreases over time

Musas Basic Time Execution


Model Example
Notice how the model follows the actual
data

Monte Carlo
Simulation

About Monte Carlo Simulation


Developed in 1940s as part of the atomic bomb
program
Named after Monte Carlo, Monaco because citys
casinos featured games of chance like dice and
roulette
Today Monte Carlo Simulations are used in many
applications including physics, finance, and system
reliability

Monte Carlo Simulation


Used for very complex problems which are
difficult to solve or no solution exists
Uses statistics to mathematically model real life
processes and then estimates the probability of
possible outcomes
Involves fitting a curve to a process and then using
the fitted curve to model a process over time
Dice Example

Monte Carlo Simulation Process


Determine a probability function
Weibull Distribution Best for failure process
Lognormal Distribution Best for repair process

Determine the random number generator, the


source for selecting random numbers that are
distributed uniformly on the proper unit interval
Determine a sampling rule for selecting samples
for the model given a unit interval of random
numbers
Record a count successes and failures

Monte Carlo Example

Select a random location within the rectangle


If the selected location is blue, record a hit
Repeat 10,000 times
Blue Area = (Hits / 10,000) * Area of Rectangle
Note: The standard error in the result is inversely
proportional to the square root of the sample size

Monte Carlo Software Example


Arbitrary 3 component subsystem

The failure probability of each component given in the


diagram above
If the first component fails, then the second is checked
If the second component fails, then the third component is
checked
If the third component fails, then the entire subsystem fails

Monte Carlo Software Example


The actual failure of the subsystem is:

The results of the actual simulation are:

Conclusion

Conclusion
Engineering reliable software is important to both
the engineer and the end user
Engineering reliable software is not an easy task to
accomplish
There are methods available for measuring
reliability
Each method has its strengths and weaknesses
At this time, no one method is superior

Questions

References
Ganesh, Pai. Survey of Software Reliability Models. Fall 2002.
Korver, Brian. The Monte Carlo Method and Software Reliability
Theory. Portland State University Computer Science, Portland
Oregan, 1994.
Lyu, Michael R, Editor. Handbook of Software Reliability Engineering.
IEEE Computer Society Press, McGraw-Hill, 1996.
Mladen, Vouk A. Software Reliability Engineering. Tutorial
Presented at Annual Reliability and Maintenance Symposium,
Pham, Hoang. Software Reliability. Springer-Verlag, 2000.

1998.

You might also like