Professional Documents
Culture Documents
of Computing Systems
Jean-Claude Laprie
Conclusion
1
Propagation
Causation
FailuresCausationFaultsActivationErrors
Failures
Faults
Attributes
Means
Fault
Prevention
Fault
Tolerance
Fault
Removal
Fault
Forecasting
Availability
Reliability
Attributes
Safety
Authorized
actions
Security
Confidentiality
Integrity
Maintainability
Fault Prevention
Dependability
Means
Fault Tolerance
Fault Removal
Fault Forecasting
Faults
Threats
Errors
Failures
3
Attributes
Availability
Reliability
Safety
Confidentiality
Integrity
Maintainability
Fault
prevention
Fault
tolerance
Dependability
Error detection
System recovery
Means
Development
Fault
removal
Fault
forecating
Threats
Faults
Errors
Failures
Operational life
Error handling
Fault handling
Static verification
Verification
Dynamic verification
Diagnosis
Correction
Non-regression verification
Corrective or preventive maintenance
Modeling
Operational testing
Development
Physical
Interaction
4
Dependability definition
First part: delivery of service that can justifiably be trusted
Dependability attributes
Availability, Reliability, Safety, Confidentiality, Integrity, Maintainability:
Property-based attributes
Event-based attributes
Robustness: persistence of dependability with respect to external faults
Survivability: persistence of dependability in the presence of active fault(s)
Resilience: persistence of dependability when facing functional,
environmental, or technological, changes
Attributes based on distinguishing among various types of (meta-)information
Accountability: availability and integrity of the person who performed an
operation
Authenticity: integrity of a message content and origin, and possibly some
other information, such as the time of emission
Non-repudiability: availability and integrity of the identity of the sender of a
message (non-repudiation of the origin), or of the receiver (nonrepudiation of reception)
6
Dependability Threats
Failure
Adjudged or
hypothesized
cause of an
error
Fault
Internal
fault
Context
dependent Computation
process
Error
Activation
External
fault
Failure
Propagation
Part of system
state that may
cause a
subsequent
failure
Taking advantage of
a vulnerability, i.e.,
an internal fault
which enables the
system to be harmed
Fault
Causation
Deviation of the
delivered service
from correct service,
i.e., implementing
the system function
System does
not comply with
the specification
describing the
system function
Specification
does not
adequately
describe the
system function
Causation
Failures
Faults
Activation
Errors
Propagation
Causation
Failures
Domain
Faults
Content failures
Timing failures
Consistent failures
Consistency
Detectability
Inconsistent
(Byzantine) failures
Signaled failures
Unsignaled failures
Minor failures
Consequences
Catastrophic failures
Identification
Extent
Latent errors
Detected errors
Single errors
Multiple related errors
8
Causation
Failures
Faults
Activation
Errors
Propagation
Failures
Style
Development faults
Operational faults
System
boundaries
Internal faults
Phenomenological
cause
Natural faults
Intent
Activation
reproducibility
External faults
Human-made faults
Malicious faults
Non-malicious faults
Accidental faults
Capability
Persistence
Dimension
Omission faults
Commission faults
Permanent faults
Transient faults
Solid faults
Elusive faults
Software faults
System faults
Plurality
Correlation
Deliberate faults
Incompetence faults
Faults
Hardware faults
Manifestation
Origin
Phase of creation
or occurrence
Causation
Damage
Status
Single faults
Multiple faults
Independent faults
Related faults
Soft faults
Hard faults
Dormant faults
Active faults
9
Faults
by origin
Phase of creation
or occurrence
Development
System boundaries
Phenomenological
cause
Intent
Capability
Operational
Internal
Human-made
Malicious
Nonmalicious
Internal
Natural
Natural
NonNonmalicious. malicious.
Design Flaws
Physical
Deterior.
External
Natural
Non-malicious
Accidental
Physical
Interference
Human-made
Non-malicious
Accid.
Delib.
Incomp.
Input Mistakes
Configuration and
Reconfiguration
Mistakes
Production Defects
Malicious
Deliberate
Intrusion
Attempts
Viruses
Worms
Overload
Development faults
Physical faults
Interaction faults
10
Faults by origin
Physical faults Development faults Interaction faults
Style
Faults by manifestation
Persistence
Activation
reproducibility
Dimension
Plurality
Correlation
Damage
Status
Omission faults
Commission faults
Permanent faults
Transient faults
Solid faults
Elusive faults
Hardware faults
Software faults
System faults
Single faults
Multiple faults
Independent faults
Related faults
Hard faults
Soft faults
Dormant faults
Active faults
11
A few milestones
12th IEEE Int. Symposium on Fault-Tolerant Computing, Santa Monica, June
1982
Session Fundamental Concepts of Fault Tolerance, papers by
A. Avizienis, H. Kopetz, J.C. Laprie and A. Costes, A.S. Robinson,
T. Anderson and P.A. Lee, P.A. Lee and D. Morgan
Session Prospective on the State-of-the-Art, position paper by
W.C. Carter
Paper by J.C. Laprie, 15th IEEE Int. Symposium on Fault-Tolerant
Computing, Ann Arbor, June 1985, Dependable Computing and Fault
Tolerance: Concepts and Terminology
Publication by J.C. Laprie of the book Dependability: Basic Concepts and
Terminology in English, French, German, Italian, Japanese, Springer
Verlag, 1992, vol. 5 of the series Dependable Computing and Fault-Tolerant
Systems, A. Avizienis, H. Kopetz, J.C. Laprie, eds.
Paper by A. Avizienis, J.C. Laprie, B. Randell, and C. Landwehr, Basic
Concepts and Taxonomy of Dependable and Secure Computing, IEEE
Transactions on Dependable and Secure Computing, 2004
12
14
Nature
Timing
Functional
Foreseen
Short term
Environmental
Foreseeable
Medium term
Unforeseen
Long term
Technological
[incl. development
and physical faults]
Threat evolution
Attacks
Mismatches in evolutionary changes
Side-effects in emerging behaviors
Increasing importance of configuration faults
Increasing proportion of hardware transient faults
15
Resilience
in other domains
Material
science
Adjective Resilient
In use for 30+ years
Recently, escalating use buzzword
Used essentially as synonym, or
substitute, to fault tolerant
Noteworthy exception: preface
of Resilient Computing Systems,
T. Anderson (Ed.), Collins, 1985
The two key attributes here are dependability
and robustness. [] A computing system can
be said to be robust if it retains its ability to
deliver service in conditions which are beyond
its normal domain of operation
change tolerance
Social
psychology
Adaptation to
changes, and
getting back
after a setback
Child
psychiatry
and
psychology
Ecology
Business
Industrial
safety
16
Changes
Trusted service
Assessability
Ambient intelligence
Usability
Complex systems
Diversity
Fault Prevention
Fault Tolerance
Fault Removal
Fault Forecasting
17
Assessability
Need for complementing
pre-deployment verification and evaluation
with
Continuous changes
Vast number of
configurations,
including
dependencies
not existing prior
to deployment
Elusive,
contextdependent,
faults
Concurrent
executions due
to prevalence of
multi-core
systems
18
Continuous testing
Perturbation injection
Uncovering
configuration faults,
elusive faults,
concurrency faults
Failure prediction
via estimation of
failure rate
increase
19
Rodney Brooks : New formalisms will let us analyze complex distributed systems,
producing new theoretical insights that lead to practical real-world payoffs. Exactly
what the basis for these formalisms will be is, of course impossible to guess. My own
bet is on resilience and adaptability
Barry Boehm : In the 21st century, software engineers face the often formidable
challenges of simultaneously dealing with rapid change, uncertainty and emergence,
dependability, diversity, and interdependence
20
Dependability of current
computing systems
Web sites availability:
one to two 9s
Well managed systems
availability: four 9s
Outage
duration/year
Availability
0,999999
32s
Five 9s
0,99999
5mn 15s
Four 9s
0,9999
52mn 34s
Three 9s
0,999
8h 46mn
Two 9s
0,99
3j 16h
One 9
0,9
36j 12h
//
Increasing
dependency
Dependability of classical
essential infrastructures
public transportation,
energy and water supply,
fixed phone: availability of
at least five 9s
Society problem
Six 9s
Mismatch
Societal reponsibility
21