Professional Documents
Culture Documents
Introduction
Background
Distributed Database Design
Database Integration
Semantic Data Control
Distributed Query Processing
Distributed Transaction Management
Transaction Concepts and Models
Distributed Concurrency Control
Distributed Reliability
Data Replication
Parallel Database Systems
Distributed Object DBMS
Peer-to-Peer Data Management
Web Data Management
Current Issues
Distributed DBMS
Ch.12/1
Reliability
Problem:
How to maintain
atomicity
durability
properties of transactions
Distributed DBMS
Ch.12/2
Ch.10/2
Fundamental Definitions
Reliability
A measure of success with which a system conforms to some
Availability
Distributed DBMS
Ch.12/3
Fundamental Definitions
Failure
The deviation of a system from the behavior that is described in its
specification.
Erroneous state
The internal state of a system such that there exist circumstances in
Error
Fault
Distributed DBMS
Ch.12/4
Faults to Failures
causes
Fault
Distributed DBMS
results in
Error
Failure
Ch.12/5
Types of Faults
Hard faults
Permanent
Resulting failures are called hard failures
Soft faults
Transient or intermittent
Account for more than 90% of all failures
Resulting failures are called soft failures
Distributed DBMS
Ch.12/6
Fault Classification
Permanent
fault
Incorrect
design
Unstable or
marginal
components
Unstable
environment
Permanent
error
Intermittent
error
System
Failure
Transient
error
Operator
mistake
Distributed DBMS
Ch.12/7
Failures
MTBF
MTTD
Fault Error
occurs caused
MTTR
Detection
of error
Repair
Fault Error
occurscaused
Time
Distributed DBMS
Ch.12/8
e-m(t)[m(t)]k
k!
where
z(x) is known as the hazard function which gives the timedependent failure rate of the component
Distributed DBMS
Ch.12/9
Fault-Tolerance Measures
Reliability
The mean number of failures in time [0, t] can be computed as
Rsys(t) = Ri(t)
i =1
Distributed DBMS
Ch.12/10
Fault-Tolerance Measures
Availability
A(t) = Pr{system is operational at time t}
Assume
Distributed DBMS
Ch.12/11
Fault-Tolerance Measures
MTBF
Mean time between failures
MTTR
Mean time to repair
Availability
MTBF
MTBF + MTTR
Distributed DBMS
Ch.12/12
Types of Failures
Transaction failures
safe
Partial vs. total failure
Media failures
lost
Head crash/controller failure (?)
Communication failures
Lost/undeliverable messages
Network partitioning
Distributed DBMS
Ch.12/13
Stable storage
Local Recovery
Manager
Main memory
Fetch,
Flush
Stable Read
database
Write
Distributed DBMS
Database
Write buffers
Database Buffer
(Volatile
Manager
Read database)
M. T. zsu & P. Valduriez
Ch.12/14
Update Strategies
In-place update
Each update causes a change in one or more data values on pages
Out-of-place update
Each update causes the new value(s) of data item(s) to be stored
Distributed DBMS
Ch.12/15
Old
stable database
state
Update
Operation
New
stable database
state
Database
Log
Distributed DBMS
Ch.12/16
Logging
The log contains information used by the recovery process to
restore the consistency of a system. This information may include
transaction identifier
type of operation (action)
items accessed by the transaction to perform the action
old value (state) of item (before image)
new value (state) of item (after image)
Distributed DBMS
Ch.12/17
Why Logging?
Upon recovery:
all of T1's effects should be reflected in the database (REDO if
necessary)
Begin
Begin
T1
End
system
crash
T2
0
Distributed DBMS
t
M. T. zsu & P. Valduriez
time
Ch.12/18
REDO Protocol
Old
stable database
state
REDO
New
stable database
state
Database
Log
Ch.12/19
UNDO Protocol
New
stable database
state
UNDO
Old
stable database
state
Database
Log
The UNDO operation uses the log information and restores the old
value of the object.
Distributed DBMS
Ch.12/20
Distributed DBMS
Ch.12/21
WAL protocol :
Distributed DBMS
Ch.12/22
Logging Interface
Secondary
storage
Main memory
Stable
log
Stable
database
Distributed DBMS
Local Recovery
Manager
Fetch,
Flush
Read
Write
Database Buffer
Manager
Log
buffers
d
a
Re
e
t
i
r
Database
W
buffers
Read
(Volatile
Write database)
Ch.12/23
When an update occurs, don't change the old page, but create a
shadow page with the new values and write it into the stable
database.
Update the access paths so that subsequent accesses are to the
new shadow page.
The old page retained for recovery.
Differential files
Distributed DBMS
Ch.12/24
Execution of Commands
Commands to consider:
begin_transaction
read
write
Independent of execution
strategy for LRM
commit
abort
recover
Distributed DBMS
Ch.12/25
Execution Strategies
Dependent upon
Can the buffer manager decide to write some of the buffer pages
fix/no-fix decision
Does the LRM force the buffer manager to write certain buffer pages
flush/no-flush decision
Ch.12/26
No-Fix/No-Flush
Abort
Buffer manager may have written some of the updated pages into
stable database
LRM performs transaction undo (or partial undo)
Commit
Recover
Distributed DBMS
Ch.12/27
No-Fix/Flush
Abort
Buffer manager may have written some of the updated pages into
stable database
LRM performs transaction undo (or partial undo)
Commit
LRM issues a flush command to the buffer manager for all updated
pages
LRM writes an end_of_transaction record into the log.
Recover
Distributed DBMS
Ch.12/28
Fix/No-Flush
Abort
None of the updated pages have been written into stable database
Release the fixed pages
Commit
Recover
Distributed DBMS
Ch.12/29
Fix/Flush
Abort
None of the updated pages have been written into stable database
Release the fixed pages
pages
LRM sends an unfix command to the buffer manager for all pages
Recover
No need to do anything
Distributed DBMS
Ch.12/30
Checkpoints
Simplifies the task of determining actions of transactions that
need to be undone or redone when a failure occurs.
Distributed DBMS
Ch.12/31
Main memory
Stable
log
Stable
database
Local Recovery
Manager
Fetch,
Flush
Read
Write
Database Buffer
Manager
Write
d
a
Re
e
t
i
r
Database
W
buffers
Read
(Volatile
Write database)
Write
Archive
database
Distributed DBMS
Log
buffers
Archive
log
Ch.12/32
Distributed Reliability
Protocols
Commit protocols
Termination protocols
If a failure occurs, how can the remaining operational sites deal with
it.
Non-blocking : the occurrence of failures should not force the sites
to wait until the failure is repaired to terminate the transaction.
Recovery protocols
When a failure occurs, how do the sites where the failure occurred
Ch.12/33
Distributed DBMS
Ch.12/34
Centralized 2PC
ready?
yes/no commit/abort?
commited/aborted
Phase 1
Distributed DBMS
Phase 2
Ch.12/35
Participant
INITIAL
INITIAL
PREPA
write
begin_commit
in log
VO
write abort
in log
RT
O
B
A
TE-
Yes
write ready
in log
GLOBAL-ABORT
write abort
in log
- CO M
VOTE
write commit
in log
ACK
ABORT
ACK
Abort
write abort
in log
write
end_of_transaction
in log
Distributed DBMS
READ
Y
MIT
Unilateral abort
Yes
No
COMMIT
Ready to
Commit?
No
VOTE-COMMIT
WAIT
Any No?
RE
ABORT
Type of
msg
Commit
write commit
in log
COMMIT
Ch.12/36
Linear 2PC
Phase 1
VC/VA
2
3
GC/GA
VC/VA
4
VC/VA
5
GC/GA
VC/VA
Prepare
GC/GA
GC/GA
GC/GA
Phase 2
VC: Vote-Commit, VA: Vote-Abort, GC: Global-commit, GA: Global-abort
Distributed DBMS
Ch.12/37
Distributed 2PC
Coordinator
Participants
prepare
Phase 1
Distributed DBMS
Participants
global-commit/
global-abort
vote-abort/ decision made
independently
vote-commit
Phase 2
M. T. zsu & P. Valduriez
Ch.12/38
INITIAL
Commit command
Prepare
Vote-abort
Global-abort
Prepare
Vote-abort
READY
WAIT
Vote-commit (all)
Global-commit
ABORT
Prepare
Vote-commit
Global-abort
Ack
COMMIT
Coordinator
Distributed DBMS
ABORT
Global-commit
Ack
COMMIT
Participants
M. T. zsu & P. Valduriez
Ch.12/39
COORDINATOR
Who cares
Timeout in WAIT
INITIAL
Commit command
Prepare
WAIT
acks
Vote-abort
Global-abort
ABORT
Distributed DBMS
Vote-commit
Global-commit
COMMIT
Ch.12/40
Timeout in INITIAL
Coordinator must have failed in
INITIAL
INITIAL state
Unilaterally abort
Timeout in READY
Stay blocked
Prepare
Vote-commit
Prepare
Vote-abort
READY
Global-abort
Ack
ABORT
Distributed DBMS
Global-commit
Ack
COMMIT
Ch.12/41
Failure in INITIAL
Start the commit process upon
INITIAL
recovery
Failure in WAIT
Restart the commit process upon
Commit command
Prepare
recovery
been received
Otherwise the termination protocol is
involved
Distributed DBMS
WAIT
Vote-abort
Global-abort
ABORT
Vote-commit
Global-commit
COMMIT
Ch.12/42
PARTICIPANTS
Failure in READY
INITIAL
Prepare
Vote-abort
READY
Global-abort
Ack
ABORT
Distributed DBMS
Prepare
Vote-commit
Global-commit
Ack
COMMIT
Ch.12/43
Participant site fails after writing ready record in log but before
vote-commit is sent
Participant site fails after writing abort record in log but before
vote-abort is sent
Distributed DBMS
Ch.12/44
Distributed DBMS
Ch.12/45
Distributed DBMS
Ch.12/46
Three-Phase Commit
3PC is non-blocking.
A commit protocols is non-blocking iff
it is synchronous within one state transition, and
its state transition diagram contains
Distributed DBMS
Ch.12/47
INITIAL
INITIAL
Commit command
Prepare
Prepare
Vote-commit
Prepare
Vote-abort
WAIT
READY
Vote-abort
Global-abort
ABORT
Vote-commit
Prepare-to-commit
Global-abort
Ack
PRECOMMIT
ABORT
Ready-to-commit
Global commit
Prepared-to-commit
Ready-to-commit
PRECOMMIT
Global commit
Ack
COMMIT
Distributed DBMS
Participants
COMMIT
M. T. zsu & P. Valduriez
Ch.12/48
Communication Structure
P
ready?
yes/no
Phase 1
Distributed DBMS
pre-commit/
pre-abort?
P
P
yes/no
Phase 2
commit/abort
ack
Phase 3
Ch.12/49
Timeout in INITIAL
INITIAL
Who cares
Timeout in WAIT
Commit command
Prepare
Unilaterally abort
Timeout in PRECOMMIT
WAIT
Vote-abort
Global-abort
Vote-commit
Prepare-to-commit
ABORT
PRECOMMIT
Ready-to-commit
Global commit
PRECOMMIT state
Terminate by globally
committing
COMMIT
Distributed DBMS
Ch.12/50
INITIAL
transaction as completed
Commit command
Prepare
WAIT
Vote-abort
Global-abort
ABORT
Vote-commit
Prepare-to-commit
PRECOMMIT
Ready-to-commit
Global commit
COMMIT
Distributed DBMS
Ch.12/51
Timeout in INITIAL
INITIAL
Prepare
Vote-commit
Prepare
Vote-abort
ABORT
Unilaterally abort
Timeout in READY
READY
Global-abort
Ack
INITIAL state
Timeout in PRECOMMIT
Global commit
Ack
in READY state
COMMIT
Distributed DBMS
Ch.12/52
in ABORT and COMMIT states. Those in these states respond with Ack
but stay in their states.
Coordinator guides the participants towards termination:
If the new coordinator is in the ABORT or COMMIT states, at the end of the
first phase, the participants will have moved to that state as well.
Distributed DBMS
Ch.12/53
Failure in INITIAL
INITIAL
recovery
Commit command
Prepare
Failure in WAIT
the participants may have
WAIT
Vote-abort
Global-abort
ABORT
Vote-commit
Prepare-to-commit
PRECOMMIT
Ready-to-commit
Global commit
COMMIT
Distributed DBMS
transaction
Failure in PRECOMMIT
ask around for the fate of the
transaction
M. T. zsu & P. Valduriez
Ch.12/54
INITIAL
Commit command
Prepare
WAIT
Vote-abort
Global-abort
ABORT
Vote-commit
Prepare-to-commit
PRECOMMIT
Ready-to-commit
Global commit
COMMIT
Distributed DBMS
Ch.12/55
Failure in INITIAL
INITIAL
Prepare
Vote-commit
Prepare
Vote-abort
recovery
Failure in READY
READY
Global-abort
Ack
Prepared-to-commit
Ready-to-commit
Failure in PRECOMMIT
ABORT
PRECOMMIT
Global commit
Ack
COMMIT
Distributed DBMS
Ch.12/56
Network Partitioning
Simple partitioning
Only two partitions
Multiple partitioning
More than two partitions
Formal bounds:
partition.
Distributed DBMS
Ch.12/57
Independent Recovery
Protocols for Network
Partitioning
Voting-based (quorum)
Distributed DBMS
Ch.12/58
Quorum Protocols
The network partitioning problem is handled by the commit
protocol.
Va + Vc > V where 0 Va , Vc V
Before a transaction commits, it must obtain a commit quorum Vc
Before a transaction aborts, it must obtain an abort quorum Va
Distributed DBMS
Ch.12/59
INITIAL
INITIAL
Commit command
Prepare
PREABORT
Ready-to-abort
Global-abort
ABORT
Distributed DBMS
Prepare
Vote-commit
Prepare
Vote-abort
READY
WAIT
Vote-abort
Prepare-to-abort
Participants
Prepared-to-abortt
Vote-commit
Prepare-to-commit Ready-to-abort
PREABORT
PRECOMMIT
Ready-to-commit Global-abort
Ack
Global commit
COMMIT
ABORT
M. T. zsu & P. Valduriez
Prepare-to-commit
Ready-to-commit
PRECOMMIT
Global commit
Ack
COMMIT
Ch.12/60
failures that change the network's topology are detected by all sites
instantaneously
each site has a view of the network consisting of all the sites it can
communicate with
Distributed DBMS
Ch.12/61